Viewing a single comment thread. View all comments

sassydodo t1_jebs78j wrote

Something tells me it's not chatgpt data, it's just very large dataset and it just so happens that we as humanity aren't having some other alternative data

Same as saying that someone who builds wind turbines had built his turbines on other company's wind

37

ttocs89 t1_jedn7p1 wrote

One of the current methods for training competing models is to have ChatGPT literally create prompt -> completion data sets. That's what was used for https://github.com/hpcaitech/ColossalAI. A model based off of the Llama weights released by facebook, then fine tuned on ChatGPT3.5 prompt + completions. So yes, there is a good chance that google is literally using ChatGPT in the training loop.

5