Viewing a single comment thread. View all comments

bjergerk1ng t1_jakszgr wrote

Is it possible that they also switched from non-chinchilla-optimal davinci to chinchilla-optimal chatgpt? That is at least 4x smaller

8

LetterRip t1_jal4y8i wrote

Certainly that is also a possibility. Or they might have done teacher student distillation.

6

[deleted] t1_jamt0wc wrote

[deleted]

8

Pikalima t1_janc14v wrote

I’d say we need an /r/VXJunkies equivalent for statistical learning theory, but the real deal is close enough.

4