EthansWay007 t1_j1w05nk wrote on December 27, 2022 at 8:37 PM

Reply to comment by ktpr in [D] When chatGPT stops being free: Run SOTA LLM in cloud by _underlines_

I’m curious, how do they use the data of it being asking questions to improve it? Does it flag questions it couldn’t answer and then the team updates it?

Nextil t1_j1zqxp9 wrote on December 28, 2022 at 4:47 PM

You can rate the responses up or down and provide an "ideal" response.

[deleted] t1_j2305i1 wrote on December 29, 2022 at 7:27 AM

[deleted]

gelukuMLG t1_j23znll wrote on December 29, 2022 at 2:24 PM

I think it saves the highly rated responses and feeds it into a dataset then it uses reinforcement learning by giving a positive reward to them.