Submitted by austintackaberry t3_120usfk in MachineLearning
gamerx88 t1_jdmndip wrote
Food for thought. Is this really surprising considering that the InstructGPT paper in early 2022, already showed how even a 1.3B model after RLHF could beat a much larger 175B model?
I guess what this shows is that it's the data that matters rather than SFT vs RLHF. Wondering if any ablation studies have been done here.
Viewing a single comment thread. View all comments