Submitted by netw0rkf10w t3_10rtis6 in MachineLearning

For ImageNet classification, there are two common ways of normalizing the input images:

- Normalize to [-1, 1] using an affine transformation (2*(x/255) - 1).

- Normalize using ImageNet mean = (0.485, 0.456, 0.406) and std = (0.229, 0.224, 0.225).

I observe that the first one is more common in TensorFlow codebases (including Jax models with TensorFlow data processing, e.g. the official Vision Transformers code), whereas the second is ubiquitous in PyTorch codebases.

I tried to find empirical comparisons of the two, but there doesn't seem to be any.

Which one is better in your opinion? I guess the performance shouldn't be too different, but still it's interesting to hear your experience.

2

Comments

You must log in or register to comment.

melgor89 t1_j6xufba wrote

From my experience, they are equal now, especially when we are using now BatchNorm or LayerNorm. Both normalization methods also use mean and std value, and I make irrelevant, which kind of method you are using. Then I prefere the TensorFlow idea as it is simpler one.

3

netw0rkf10w OP t1_j6z0oia wrote

So no noticeable difference in performance in your experiments?

1

puppet_pals t1_j6ygho0 wrote

ImageNet normalization is an artifact of the era of feature engineering. In the modern era you shouldn’t use it. It’s unintuitive and overfits the research dataset.

1

nicholsz t1_j6yniui wrote

With data augmentation techniques (especially contrast or luminance randomization), normalizing would end up being a no-op in the end, right?

2

netw0rkf10w OP t1_j6z15t0 wrote

I think normalization will be here to stay (maybe not the ImageNet one though), as it usually speeds up training.

1

nicholsz t1_j6z1jgm wrote

Oh I meant fitting to the statistics of ImageNet / the training dataset. There's always got to be some kind of normalization

1

puppet_pals t1_j701uqt wrote

>I think normalization will be here to stay (maybe not the ImageNet one though), as it usually speeds up training.

the reality is you are tied to the normalization scheme of whatever you are transfer learning from. (assuming you are transfer learning). Framework authors and people publishing weights should make normalization as easy as possible; typically via a 1/255.0 rescaling operation (or x/127.5 - 1, I'm indifferent though I opt for 1/255 personally)

1

netw0rkf10w OP t1_j6zb957 wrote

If I remember correctly it was first used in AlexNet, which started the deep learning era though. I agree that it doesn't make much sense nowadays, but it's still be used everywhere :\

1

MadScientist-1214 t1_j6yj0v6 wrote

Some models actually just use [0, 1] normalization (divide by 255). Some normalization is necessary, but [0, 1] is enough. On real world datasets, computing the specific mean/std never gave me better results.

1

netw0rkf10w OP t1_j6zbfz4 wrote

Indeed. Maybe we have a new battle between [-1, 1] and [0, 1] lol.

1

CyberDainz t1_j715ayh wrote

use trainable normalization

self._in_beta = nn.parameter.Parameter( torch.Tensor(in_ch,), requires_grad=True)
self._in_gamma = nn.parameter.Parameter( torch.Tensor(in_ch,), requires_grad=True)
...
self._out_gamma = nn.parameter.Parameter( torch.Tensor(out_ch,), requires_grad=True)
self._out_beta = nn.parameter.Parameter( torch.Tensor(out_ch,), requires_grad=True)

...

x = x + self._in_beta[None,:,None,None]
x = x * self._in_gamma[None,:,None,None]
...
x = x * self._out_gamma[None,:,None,None]
x = x + self._out_beta[None,:,None,None]        

        

1