Submitted by Apprehensive_Air8919 t3_11dfgfm in deeplearning

Im currently working with the transformer architecture and doing depth estimation. My dataset is 6700 images of dimensions 3x256x256. I've run into a wierd thing. My validation loss suddenly falls alot around epoch 30-40 while my training loss barely does. I cant seem to find out why it is happening. Hope you can help me! I use Adam with lr=0.000001

​

batch_size=32

Batch_size=16

The code for the vision transformer is here.

https://stackoverflow.com/questions/75582628/why-does-my-validation-loss-suddenly-fall-dramatically-while-my-training-loss-do

1

Comments

You must log in or register to comment.

trajo123 t1_ja8cyw2 wrote

How is your loss defined? How is your validation set created? Does it happen if for any test/validation split?

2

Apprehensive_Air8919 OP t1_ja96vdu wrote

nn.MSELoss(), I used sklearn train_test_split() with test_size being = 0.2. It is consistent behavior across any split i've seen. The wierd thing is that it only happens when I run very low lr

1

trajo123 t1_ja9aghn wrote

Very strange.

Are you sure your dataset is shuffled before the split? Have you tried different random seeds, different split ratios?

Or maybe there a bug in how you calculate the loss, but that should affect the training set as well...

So my best guess is you either don't have your data shuffled and the validation samples are "easier" or maybe it's something more trivial, like a bug in the plotting code. Or maybe that's the point where your model become self-aware :)

1

Apprehensive_Air8919 OP t1_jacst55 wrote

omg... I think I found the bug. I had used the depth estimation image as input for the model in the validation loop....................

2

Apprehensive_Air8919 OP t1_jackmpu wrote

I just did a run with test_size being 0.5. The same thing happend. Wtf is going on :/

1

alam-ai t1_ja8czf5 wrote

Maybe it doesn't do the dropout regularization during validation but does only for training? And without the dropout the model does better, sort of like how you see better with both eyes open together than individually with either eye.

Also probably can just switch your training and validation sets and rerun your test to see that the actual data in the splits isn't somehow the issue.

2

Apprehensive_Air8919 OP t1_ja94rat wrote

good analogy! Yes I use model.eval() so dropout is removed when doing the forward pass on the validation set.

1

yannbouteiller t1_ja8cd7n wrote

That is pretty strange indeed. Perhaps this would be a magical effect of dropout ?

1

Oceanboi t1_ja8egtc wrote

It could be too much dropout. But also how large is your test data in relation to your train data and are you leaking any information from one into the other?

1

Apprehensive_Air8919 OP t1_ja8mxcj wrote

The size of the testing data is 20%, and dropout is 10%. I am not sure how I could be leaking information

1