Submitted by _underlines_ t3_zstequ in MachineLearning
londons_explorer t1_j1a3zrf wrote
I've got a feeling chatGPT benefits massively from it's human-curated finetuning feedback loop.
Thats hard to reproduce without tens of thousands of man-hours upvoting/downvoting/editing the bots responses.
satireplusplus t1_j1afqub wrote
This ^^
Compared to GPT3, ChatGPT is a huge step up. There is basically an entire new reward network, as large as the LM, that is able to judge the quality of the answers. See https://cdn.openai.com/chatgpt/draft-20221129c/ChatGPT_Diagram.svg
That said, I'd welome a community effort to build an open source version of this.
sanman t1_j1b9tun wrote
Do we know when ChatGPT itself will cease to be free, or cease to be available to the general public? I kind of like using this thing - I find it really convenient, so I'd like to know when I'm going to lose access to it.
amhotw t1_j1bnmw5 wrote
I mean it is pretty cheap. You probably can't spend more than $10/month if it is priced similar to gpt3.
ktpr t1_j1cb1nd wrote
I suspect they’ll move towards paid tiers when the popularity goes down. Right now they’re getting a ton of interesting and rich data for free from going viral. But when that eventually fades they’ll want to continue generating some kind of value from it.
EthansWay007 t1_j1w05nk wrote
I’m curious, how do they use the data of it being asking questions to improve it? Does it flag questions it couldn’t answer and then the team updates it?
Nextil t1_j1zqxp9 wrote
You can rate the responses up or down and provide an "ideal" response.
[deleted] t1_j2305i1 wrote
[deleted]
gelukuMLG t1_j23znll wrote
I think it saves the highly rated responses and feeds it into a dataset then it uses reinforcement learning by giving a positive reward to them.
ibraheemMmoosa t1_j1bb29r wrote
Only the gods at open ai cam know the answer to that.
f10101 t1_j1cm39r wrote
Step 1 definitely explains why its responses often feel so similar to SEO waffle-farm content. I had been wondering where that aspect was coming from.
macguyversmusic t1_j2cq1m6 wrote
over 42 different transformers in cascade i read.....
maxToTheJ t1_j1c6ut5 wrote
Yup. The training techniques have got a lot better since that first GPT-3 paper.
pilibitti t1_j1ai82j wrote
it can be crowdsourced once we have something up and running. this stuff will be commoditized eventually.
IWantAGrapeInMyMouth t1_j1b13kx wrote
It really does but there’s a point in time where OpenAI is going to want to cash in. Virtually all of their outputs could benefit from utilizing reinforcement learning to improve after the initial training, but we’ve seen how GPT3 and DallE-2 ultimately chose to be shipped as a sort of finished product that gets updates like any shipped app might, with costs attached. I don’t see why ChatGPT will be any different after x amount of time, unless Stable Diffusion is really eating their Dall-E 2 profitability and they need to find new ways of monetization that doesn’t charge the user utilizing ChatGPT
sanman t1_j1bacsy wrote
Well, remember when Youtube was totally free without any ads whatsoever? And of course we all wondered how they were going to continue offering their service for free. Then one day the ads crept in, and we knew.
I'm thinking OpenAI hasn't made this thing free just for generosity. They're using us as free beta-testers to shake down the product for them, so that they can iron out the kinks and bugs. Once that process has run its course, they'll just cut off our access and only allow paying customers to use it.
jrkirby t1_j1bnhkx wrote
Why do you think they'll make us pay, when they could instead the treasure trove of personal information to sell to advertisers and train the AI to subliminally (or explicitly) advertise to us?
sanman t1_j1bqgvb wrote
I wonder if there'll be a new budding industry for SEO with GPT, just like there is for SEO with Google search? I'm not sure how that would work though, since it might be harder to integrate spam/ads into GPT responses.
KimmiG1 t1_j1bf7bb wrote
I'm curious if they keep a free version that sneeks inn adds as natural conversations where it fits.
slashtom t1_j1bf92g wrote
Well, they're also getting feedback and the model is only being improved by human interaction. I'd bet they still keep a free tier in order to get access to a broader pool and charge companies/people a subscription fee if they want unlimited access or something.
lucidrage t1_j1c5jxp wrote
Imagine if chatgpt was ad supported... You just invented a new business model!
harharveryfunny t1_j1d5m40 wrote
Yes - not sure if everyone understands this. ChatGPT took GPT 3.5 as a starting point, but then has a reinforcement learning stage on top of that which has aligned it's output to what humans want from a question-answering chat-bot. It's basically the next generation InstructGPT.
https://arxiv.org/abs/2203.02155
From a quick scan of the Bloomz link, that seems to be just an LLM (i.e. more like GPT-3), not an instruction/human aligned chat-bot. There's a huge qualitative difference.
the-z t1_j1b8h6i wrote
To be fair, that's roughly how natural minds are trained, too.
meyerhot t1_j1cg6jj wrote
Anyone have any ideas about how they assigned rewards? Somehow take the sum of the prob(logits) from each token in the sentence and multiply that by the reward?
maizeq t1_j1cj523 wrote
10s of thousands of hours splits across thousands of people does not seem too significant.
x246ab t1_j1utz7f wrote
Very true, but it only needs one good data dump hack
Viewing a single comment thread. View all comments