For ImageNet classification, there are two common ways of normalizing the input images:

- Normalize to [-1, 1] using an affine transformation (2*(x/255) - 1).

- Normalize using ImageNet mean = (0.485, 0.456, 0.406) and std = (0.229, 0.224, 0.225).

I observe that the first one is more common in TensorFlow codebases (including Jax models with TensorFlow data processing, e.g. the official Vision Transformers code), whereas the second is ubiquitous in PyTorch codebases.

I tried to find empirical comparisons of the two, but there doesn't seem to be any.

Which one is better in your opinion? I guess the performance shouldn't be too different, but still it's interesting to hear your experience.

Comments

You must log in or register to comment.

melgor89 t1_j6xufba wrote on February 2, 2023 at 5:38 PM

From my experience, they are equal now, especially when we are using now BatchNorm or LayerNorm. Both normalization methods also use mean and std value, and I make irrelevant, which kind of method you are using. Then I prefere the TensorFlow idea as it is simpler one.

netw0rkf10w OP t1_j6z0oia wrote on February 2, 2023 at 10:01 PM

So no noticeable difference in performance in your experiments?

puppet_pals t1_j6ygho0 wrote on February 2, 2023 at 7:56 PM

ImageNet normalization is an artifact of the era of feature engineering. In the modern era you shouldn’t use it. It’s unintuitive and overfits the research dataset.

nicholsz t1_j6yniui wrote on February 2, 2023 at 8:39 PM

With data augmentation techniques (especially contrast or luminance randomization), normalizing would end up being a no-op in the end, right?

netw0rkf10w OP t1_j6z15t0 wrote on February 2, 2023 at 10:04 PM

I think normalization will be here to stay (maybe not the ImageNet one though), as it usually speeds up training.

nicholsz t1_j6z1jgm wrote on February 2, 2023 at 10:07 PM

Oh I meant fitting to the statistics of ImageNet / the training dataset. There's always got to be some kind of normalization

netw0rkf10w OP t1_j6zbbkb wrote on February 2, 2023 at 11:12 PM

Agreed!

puppet_pals t1_j701uqt wrote on February 3, 2023 at 2:27 AM

>I think normalization will be here to stay (maybe not the ImageNet one though), as it usually speeds up training.

the reality is you are tied to the normalization scheme of whatever you are transfer learning from. (assuming you are transfer learning). Framework authors and people publishing weights should make normalization as easy as possible; typically via a 1/255.0 rescaling operation (or x/127.5 - 1, I'm indifferent though I opt for 1/255 personally)

netw0rkf10w OP t1_j6zb957 wrote on February 2, 2023 at 11:12 PM

If I remember correctly it was first used in AlexNet, which started the deep learning era though. I agree that it doesn't make much sense nowadays, but it's still be used everywhere :\

MadScientist-1214 t1_j6yj0v6 wrote on February 2, 2023 at 8:11 PM

Some models actually just use [0, 1] normalization (divide by 255). Some normalization is necessary, but [0, 1] is enough. On real world datasets, computing the specific mean/std never gave me better results.

netw0rkf10w OP t1_j6zbfz4 wrote on February 2, 2023 at 11:13 PM

Indeed. Maybe we have a new battle between [-1, 1] and [0, 1] lol.

CyberDainz t1_j715ayh wrote on February 3, 2023 at 9:14 AM

use trainable normalization

self._in_beta = nn.parameter.Parameter( torch.Tensor(in_ch,), requires_grad=True)
self._in_gamma = nn.parameter.Parameter( torch.Tensor(in_ch,), requires_grad=True)
...
self._out_gamma = nn.parameter.Parameter( torch.Tensor(out_ch,), requires_grad=True)
self._out_beta = nn.parameter.Parameter( torch.Tensor(out_ch,), requires_grad=True)

...

x = x + self._in_beta[None,:,None,None]
x = x * self._in_gamma[None,:,None,None]
...
x = x * self._out_gamma[None,:,None,None]
x = x + self._out_beta[None,:,None,None]

netw0rkf10w OP t1_j71r8e8 wrote on February 3, 2023 at 1:30 PM

Any references?