derpderp3200 OP t1_j1ufkob wrote on December 27, 2022 at 2:06 PM

Are there any articles or papers benchmarking this, or exploring more elaborate solutions than just batching?

HateRedditCantQuitit t1_j1v0fto wrote on December 27, 2022 at 4:41 PM

The whole SGD & optimizer field is kinda this. Think about how momentum and the problem you’re talking about interact, for a small example.