Im currently working with the transformer architecture and doing depth estimation. My dataset is 6700 images of dimensions 3x256x256. I've run into a wierd thing. My validation loss suddenly falls alot around epoch 30-40 while my training loss barely does. I cant seem to find out why it is happening. Hope you can help me! I use Adam with lr=0.000001

The code for the vision transformer is here.

https://stackoverflow.com/questions/75582628/why-does-my-validation-loss-suddenly-fall-dramatically-while-my-training-loss-do

Comments

You must log in or register to comment.

trajo123 t1_ja8cyw2 wrote on February 27, 2023 at 4:34 PM

How is your loss defined? How is your validation set created? Does it happen if for any test/validation split?

Apprehensive_Air8919 OP t1_ja96vdu wrote on February 27, 2023 at 7:43 PM

nn.MSELoss(), I used sklearn train_test_split() with test_size being = 0.2. It is consistent behavior across any split i've seen. The wierd thing is that it only happens when I run very low lr

trajo123 t1_ja9aghn wrote on February 27, 2023 at 8:05 PM

Very strange.

Are you sure your dataset is shuffled before the split? Have you tried different random seeds, different split ratios?

Or maybe there a bug in how you calculate the loss, but that should affect the training set as well...

So my best guess is you either don't have your data shuffled and the validation samples are "easier" or maybe it's something more trivial, like a bug in the plotting code. Or maybe that's the point where your model become self-aware :)

Apprehensive_Air8919 OP t1_jacst55 wrote on February 28, 2023 at 2:47 PM

omg... I think I found the bug. I had used the depth estimation image as input for the model in the validation loop....................

trajo123 t1_jaekibz wrote on February 28, 2023 at 9:37 PM

gif

Apprehensive_Air8919 OP t1_jackmpu wrote on February 28, 2023 at 1:46 PM

I just did a run with test_size being 0.5. The same thing happend. Wtf is going on :/

[deleted] t1_ja8d3hh wrote on February 27, 2023 at 4:35 PM

[deleted]

alam-ai t1_ja8czf5 wrote on February 27, 2023 at 4:34 PM

Maybe it doesn't do the dropout regularization during validation but does only for training? And without the dropout the model does better, sort of like how you see better with both eyes open together than individually with either eye.

Also probably can just switch your training and validation sets and rerun your test to see that the actual data in the splits isn't somehow the issue.

Apprehensive_Air8919 OP t1_ja94rat wrote on February 27, 2023 at 7:29 PM

good analogy! Yes I use model.eval() so dropout is removed when doing the forward pass on the validation set.

yannbouteiller t1_ja8cd7n wrote on February 27, 2023 at 4:30 PM

That is pretty strange indeed. Perhaps this would be a magical effect of dropout ?

Oceanboi t1_ja8egtc wrote on February 27, 2023 at 4:43 PM

It could be too much dropout. But also how large is your test data in relation to your train data and are you leaking any information from one into the other?

Apprehensive_Air8919 OP t1_ja8mxcj wrote on February 27, 2023 at 5:37 PM

The size of the testing data is 20%, and dropout is 10%. I am not sure how I could be leaking information