Viewing a single comment thread. View all comments

blarg7459 t1_jbetts9 wrote

Doesn't that mean that if you include inference costs, and the model will be used extensively, you may actually get much better bang for your bucks by training much more than chinchilla-optimal?

1