Submitted by netw0rkf10w t3_10rtis6 in MachineLearning
For ImageNet classification, there are two common ways of normalizing the input images:
- Normalize to [-1, 1]
using an affine transformation (2*(x/255) - 1
).
- Normalize using ImageNet mean = (0.485, 0.456, 0.406)
and std = (0.229, 0.224, 0.225)
.
I observe that the first one is more common in TensorFlow codebases (including Jax models with TensorFlow data processing, e.g. the official Vision Transformers code), whereas the second is ubiquitous in PyTorch codebases.
I tried to find empirical comparisons of the two, but there doesn't seem to be any.
Which one is better in your opinion? I guess the performance shouldn't be too different, but still it's interesting to hear your experience.
melgor89 t1_j6xufba wrote
From my experience, they are equal now, especially when we are using now BatchNorm or LayerNorm. Both normalization methods also use mean and std value, and I make irrelevant, which kind of method you are using. Then I prefere the TensorFlow idea as it is simpler one.