derpderp3200 OP t1_j1ufkob wrote
Reply to comment by magpiesonskates in [D] Has any research been done to counteract the fact that each training datapoint "pulls the model in a different direction", partly undoing learning until shared features emerge? by derpderp3200
Are there any articles or papers benchmarking this, or exploring more elaborate solutions than just batching?
HateRedditCantQuitit t1_j1v0fto wrote
The whole SGD & optimizer field is kinda this. Think about how momentum and the problem you’re talking about interact, for a small example.
Viewing a single comment thread. View all comments