Viewing a single comment thread. View all comments

adt t1_jbbzba8 wrote

There are a few that 'feel' that way. Try Megatron-11B (~200:1) based on RoBERTa (6,198:1). Wayyyyy ahead of its time, and I've matched it with much larger models in some testing.

https://app.inferkit.com/demo

Here's the full table of Chinchilla-align comparisons:

https://lifearchitect.ai/models-table/

2

whata_wonderful_day t1_jbcxdwf wrote

Nice! How did you get access to Megatron-11B? I can't find it online anywhere

1

__Maximum__ OP t1_jbdqy5c wrote

Thanks for the links. Looks like RoBERTa did not gain a lot from the additional trainings, only minor improvements, but yeah, it was a tiny model. How was this not a good lesson? Why did people need Chinchilla? Maybe it's just having a lot of data comes easy so people gather as much as possible, even though they know they will go maximum 1 epoch over it.

1