Submitted by _underlines_ t3_zstequ in MachineLearning
EthansWay007 t1_j1w05nk wrote
Reply to comment by ktpr in [D] When chatGPT stops being free: Run SOTA LLM in cloud by _underlines_
I’m curious, how do they use the data of it being asking questions to improve it? Does it flag questions it couldn’t answer and then the team updates it?
Nextil t1_j1zqxp9 wrote
You can rate the responses up or down and provide an "ideal" response.
[deleted] t1_j2305i1 wrote
[deleted]
gelukuMLG t1_j23znll wrote
I think it saves the highly rated responses and feeds it into a dataset then it uses reinforcement learning by giving a positive reward to them.
Viewing a single comment thread. View all comments