magpiesonskates t1_j1u1oag wrote on December 27, 2022 at 11:38 AM Reply to [D] Has any research been done to counteract the fact that each training datapoint "pulls the model in a different direction", partly undoing learning until shared features emerge? by derpderp3200 This is only true if you use batch size of 1. Randomly sampled batches should average out the effect you speak of Permalink 10
magpiesonskates t1_j1u1oag wrote
Reply to [D] Has any research been done to counteract the fact that each training datapoint "pulls the model in a different direction", partly undoing learning until shared features emerge? by derpderp3200
This is only true if you use batch size of 1. Randomly sampled batches should average out the effect you speak of