Viewing a single comment thread. View all comments

londons_explorer t1_j1a3zrf wrote

I've got a feeling chatGPT benefits massively from it's human-curated finetuning feedback loop.

Thats hard to reproduce without tens of thousands of man-hours upvoting/downvoting/editing the bots responses.

178

satireplusplus t1_j1afqub wrote

This ^^

Compared to GPT3, ChatGPT is a huge step up. There is basically an entire new reward network, as large as the LM, that is able to judge the quality of the answers. See https://cdn.openai.com/chatgpt/draft-20221129c/ChatGPT_Diagram.svg

That said, I'd welome a community effort to build an open source version of this.

81

sanman t1_j1b9tun wrote

Do we know when ChatGPT itself will cease to be free, or cease to be available to the general public? I kind of like using this thing - I find it really convenient, so I'd like to know when I'm going to lose access to it.

10

amhotw t1_j1bnmw5 wrote

I mean it is pretty cheap. You probably can't spend more than $10/month if it is priced similar to gpt3.

11

ktpr t1_j1cb1nd wrote

I suspect they’ll move towards paid tiers when the popularity goes down. Right now they’re getting a ton of interesting and rich data for free from going viral. But when that eventually fades they’ll want to continue generating some kind of value from it.

7

EthansWay007 t1_j1w05nk wrote

I’m curious, how do they use the data of it being asking questions to improve it? Does it flag questions it couldn’t answer and then the team updates it?

1

Nextil t1_j1zqxp9 wrote

You can rate the responses up or down and provide an "ideal" response.

2

gelukuMLG t1_j23znll wrote

I think it saves the highly rated responses and feeds it into a dataset then it uses reinforcement learning by giving a positive reward to them.

1

f10101 t1_j1cm39r wrote

Step 1 definitely explains why its responses often feel so similar to SEO waffle-farm content. I had been wondering where that aspect was coming from.

3

maxToTheJ t1_j1c6ut5 wrote

Yup. The training techniques have got a lot better since that first GPT-3 paper.

0

pilibitti t1_j1ai82j wrote

it can be crowdsourced once we have something up and running. this stuff will be commoditized eventually.

19

IWantAGrapeInMyMouth t1_j1b13kx wrote

It really does but there’s a point in time where OpenAI is going to want to cash in. Virtually all of their outputs could benefit from utilizing reinforcement learning to improve after the initial training, but we’ve seen how GPT3 and DallE-2 ultimately chose to be shipped as a sort of finished product that gets updates like any shipped app might, with costs attached. I don’t see why ChatGPT will be any different after x amount of time, unless Stable Diffusion is really eating their Dall-E 2 profitability and they need to find new ways of monetization that doesn’t charge the user utilizing ChatGPT

9

sanman t1_j1bacsy wrote

Well, remember when Youtube was totally free without any ads whatsoever? And of course we all wondered how they were going to continue offering their service for free. Then one day the ads crept in, and we knew.

I'm thinking OpenAI hasn't made this thing free just for generosity. They're using us as free beta-testers to shake down the product for them, so that they can iron out the kinks and bugs. Once that process has run its course, they'll just cut off our access and only allow paying customers to use it.

13

jrkirby t1_j1bnhkx wrote

Why do you think they'll make us pay, when they could instead the treasure trove of personal information to sell to advertisers and train the AI to subliminally (or explicitly) advertise to us?

6

sanman t1_j1bqgvb wrote

I wonder if there'll be a new budding industry for SEO with GPT, just like there is for SEO with Google search? I'm not sure how that would work though, since it might be harder to integrate spam/ads into GPT responses.

2

KimmiG1 t1_j1bf7bb wrote

I'm curious if they keep a free version that sneeks inn adds as natural conversations where it fits.

3

slashtom t1_j1bf92g wrote

Well, they're also getting feedback and the model is only being improved by human interaction. I'd bet they still keep a free tier in order to get access to a broader pool and charge companies/people a subscription fee if they want unlimited access or something.

1

lucidrage t1_j1c5jxp wrote

Imagine if chatgpt was ad supported... You just invented a new business model!

1

harharveryfunny t1_j1d5m40 wrote

Yes - not sure if everyone understands this. ChatGPT took GPT 3.5 as a starting point, but then has a reinforcement learning stage on top of that which has aligned it's output to what humans want from a question-answering chat-bot. It's basically the next generation InstructGPT.

https://arxiv.org/abs/2203.02155

From a quick scan of the Bloomz link, that seems to be just an LLM (i.e. more like GPT-3), not an instruction/human aligned chat-bot. There's a huge qualitative difference.

2

the-z t1_j1b8h6i wrote

To be fair, that's roughly how natural minds are trained, too.

1

meyerhot t1_j1cg6jj wrote

Anyone have any ideas about how they assigned rewards? Somehow take the sum of the prob(logits) from each token in the sentence and multiply that by the reward?

1

maizeq t1_j1cj523 wrote

10s of thousands of hours splits across thousands of people does not seem too significant.

1

x246ab t1_j1utz7f wrote

Very true, but it only needs one good data dump hack

1