Viewing a single comment thread. View all comments

gamerx88 t1_jdmndip wrote

Food for thought. Is this really surprising considering that the InstructGPT paper in early 2022, already showed how even a 1.3B model after RLHF could beat a much larger 175B model?

I guess what this shows is that it's the data that matters rather than SFT vs RLHF. Wondering if any ablation studies have been done here.

2