_Arsenie_Boca_ t1_j72g4g4 wrote on February 3, 2023 at 4:23 PM

Reply to comment by alpha-meta in [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta

Im not sure if they vary the sampling hyperparemeters. The point is that langauge modelling objectives are to some degree ill-posed because we calculate the loss on intermediate results rather than the final output that we care about.