Submitted by Quiet-Investment-734 t3_10ek5oh in deeplearning
trajo123 t1_j4rzvz3 wrote
What do you mean by "dynamically changing the structure"? Do the previous classes remain?
One solution is to treat this as a transfer learning problem and get rid of the last layer and re-train your network with the new set of classes when the set of classes changes.
ed3203 t1_j4sim4z wrote
Depending on the size of the training data and the network it may be better to just retain the whole thing from scratch
BrotherAmazing t1_j4tjnj1 wrote
I would like to see what happens if you train an N-class classifier with a final FC output layer that is of size (N+M) x 1 and you simply pretend there are “M” unknown classes that you have no training examples for so those M components are always 0 for your initial training set and you always make predictions by re-normalizing/conditioning on the fact that those elements are 0.
Now you add a new class with your spare “capacity” in that last layer and start re-training from where you left off without modifying the architecture, but now some data have non-zero labels for the N+1st class and now you re-normalize predictions only to condition on the last M-1 classes being 0 instead of M.
Then see how training starting from this initially trained N-class network progresses in becoming an (N+1)-class classifier compared to the baseline of just starting over from scratch and see whether it saves you compute time for certain problems while simultaneously being just as accurate in the end (or not!).
IDK how practical or important this would really be (probably not much!) even if it did lead to computational savings, but would be a fun little nerdy study.
Quiet-Investment-734 OP t1_j4w3eby wrote
This is exactly what I was thinking of doing, but I wanted to know if there are any comparatively efficient methods to achieve the same.
WinterExtreme9316 t1_j4skh3m wrote
Why? If you're just adding a category, why not use what you've got and just train the last layer. You mean in case the new category has some unique low-level feature that early layers of network need to extract?
ed3203 t1_j4slyft wrote
Yes, you may arrive at a different local minima which could be more performant. You give the model more freedom to explore. OP gave no context, if it's a huge transformer model for instance that would be impractical to retain then sure use the model as is with a different final classification layer.
WinterExtreme9316 t1_j4sl2cg wrote
Also since last layer is likely a dense/fullyconnect layer, you should be able to start out with original weights (padded for new output depth).
Quiet-Investment-734 OP t1_j4w3ujs wrote
Yes, the previous classes do remain.
The transfer learning solution seems a decent option, will try it out. Thank you
Viewing a single comment thread. View all comments