bjergerk1ng t1_jakszgr wrote on March 2, 2023 at 3:20 AM

Is it possible that they also switched from non-chinchilla-optimal davinci to chinchilla-optimal chatgpt? That is at least 4x smaller

LetterRip t1_jal4y8i wrote on March 2, 2023 at 5:05 AM

Certainly that is also a possibility. Or they might have done teacher student distillation.

[deleted]

I’d say we need an /r/VXJunkies equivalent for statistical learning theory, but the real deal is close enough.

[deleted]