LetterRip t1_jaj1kp3 wrote on March 1, 2023 at 8:04 PM

> I have no idea how OpenAI can make money on this.

Quantizing to mixed int8/int4 - 70% hardware reduction and 3x speed increase compared to float16 with essentially no loss in quality.

A*.3/3 = 10% of the cost.

Switch from quadratic to memory efficient attention. 10x-20x increase in batch size.

So we are talking it taking about 1% of the resources and a 10x price reduction - they should be 90% more profitable compared to when they introduced GPT-3.

edit - see MS DeepSpeed MII - showing a 40x per token cost reduction for Bloom-176B vs default implementation

https://github.com/microsoft/DeepSpeed-MII

Also there are additional ways to reduce cost not covered above - pruning, graph optimization, teacher student distillation. I think teacher student distillation is extremely likely given reports that it has difficulty with more complex prompts.

Thunderbird120 t1_jajok9y wrote on March 1, 2023 at 10:26 PM

I'm curious which memory efficient transformer variant they've figured out how to leverage at scale. They're obviously using one of them since they're offering models with 32k context but it's not clear which one.

lucidraisin t1_jakb7h4 wrote on March 2, 2023 at 1:08 AM

it is flash attention (Tri Dao et al)

Thunderbird120 t1_jakbyew wrote on March 2, 2023 at 1:14 AM

You're better qualified to know than nearly anyone who posts here, but is flash attention really all that's necessary to make that feasible?

lucidraisin t1_jakdtf7 wrote on March 2, 2023 at 1:27 AM

yes

edit: it was also used to train Llama. there is no reason not to use it at this point, for both training and fine-tuning / inference

fmai t1_jalcs0x wrote on March 2, 2023 at 6:29 AM

AFAIK, flash attention is just a very efficient implementation of attention, so still quadratic in the sequence length. Can this be a sustainable solution for when context windows go to 100s of thousands?

lucidraisin t1_jamtx7b wrote on March 2, 2023 at 3:46 PM

it cannot, the compute still scales quadratically although the memory bottleneck is now gone. however, i see everyone training at 8k or even 16k within two years, which is more than plenty for previously inaccessible problems. for context lengths at the next order of magnitude (say genomics at million basepairs), we will have to see if linear attention (rwkv) pans out, or if recurrent + memory architectures make a comeback.

LetterRip t1_janljeo wrote on March 2, 2023 at 6:49 PM

Ah, I'd not seen the Block Recurrent Transformers paper before, interesting.

visarga t1_jalg9iu wrote on March 2, 2023 at 7:11 AM

I think the main pain point was memory usage.

Dekans t1_jamokhr wrote on March 2, 2023 at 3:10 PM

> We also extend FlashAttention to block-sparse attention, yielding an approximate attention algorithm that is faster than any existing approximate attention method.

...

> FlashAttention and block-sparse FlashAttention enable longer context in Transformers, yielding higher quality models (0.7 better perplexity on GPT-2 and 6.4 points of lift on long-document classification) and entirely new capabilities: the first Transformers to achieve better-than-chance performance on the Path-X challenge (seq. length 16K, 61.4% accuracy) and Path-256 (seq. length 64K, 63.1% accuracy).

In the paper bold is done using the block-sparse version. The Path-X (16K length) is done using regular FlashAttention.

Hsemar t1_jalp8as wrote on March 2, 2023 at 9:12 AM

but does flash attention help with auto-regressive generation? My understanding was that it prevents materializing the large kv dot product during training. At inference (one token at a time) with kv caching this shouldn't be that relevant right?

[deleted] t1_jarikhz wrote on March 3, 2023 at 3:21 PM

[deleted]

minimaxir OP t1_jajcf4s wrote on March 1, 2023 at 9:10 PM

It's safe to assume that some of those techniques were already used in previous iterations of GPT-3/ChatGPT.

LetterRip t1_jajezib wrote on March 1, 2023 at 9:26 PM

June 11, 2020 is the date of the GPT-3 API was introduced. No int4 support and the Ampere architecture with int8 support had only been introduced weeks prior. So the pricing was set based on float16 architecture.

Memory efficient attention is from a few months ago.

ChatGPT was just introduced a few months ago.

The question was 'how OpenAI' could be making a profit, if they were making a profit on GPT-3 2020 pricing; then they should be making 90% more profit per token on the new pricing.

jinnyjuice t1_jalkbvu wrote on March 2, 2023 at 8:04 AM

How do we know these technical improvements result in 90% extra revenue? I feel I'm missing some link here.

[deleted] t1_jall6xi wrote on March 2, 2023 at 8:16 AM

[deleted]

Smallpaul t1_jam673c wrote on March 2, 2023 at 12:45 PM

I think you are using the word revenue when you mean profit.

LetterRip t1_jani50o wrote on March 2, 2023 at 6:23 PM

We don't know the supply demand curve, so we can't know for sure that the revenue increased.

andreichiffa t1_jajuk03 wrote on March 1, 2023 at 11:07 PM

That, and the fact that OpenAI/MS want to completely dominate LLM market, in the same way Microsoft dominated OS/browser market in the late 90s/early 2000s.

Smallpaul t1_jam6et8 wrote on March 2, 2023 at 12:47 PM

They’ll need a stronger story around lock-in if that’s their strategy. One way would be to add structured and unstructured data storage to the APIs.

bjergerk1ng t1_jakszgr wrote on March 2, 2023 at 3:20 AM

Is it possible that they also switched from non-chinchilla-optimal davinci to chinchilla-optimal chatgpt? That is at least 4x smaller

LetterRip t1_jal4y8i wrote on March 2, 2023 at 5:05 AM

Certainly that is also a possibility. Or they might have done teacher student distillation.

[deleted] t1_jamt0wc wrote on March 2, 2023 at 3:40 PM

[deleted]

Pikalima t1_janc14v wrote on March 2, 2023 at 5:43 PM

I’d say we need an /r/VXJunkies equivalent for statistical learning theory, but the real deal is close enough.

[deleted] t1_jarj0kn wrote on March 3, 2023 at 3:24 PM

[deleted]

cv4u t1_jakzhqj wrote on March 2, 2023 at 4:14 AM

LLMs can quantize to 8 bit or 4 bit?

LetterRip t1_jal4vgs wrote on March 2, 2023 at 5:04 AM

Yep, or a mix between the two.

GLM-130B quantized to int4, OPT and BLOOM int8,

https://arxiv.org/pdf/2210.02414.pdf

Often you'll want to keep the first and last layer as int8 and can do everything else int4. You can quantize based on the layers sensitivity, etc. I also (vaguely) recall a mix of 8bit for weights, and 4bits for biases (or vice versa?),

Here is a survey on quantization methods, for mixed int8/int4 see the section IV. ADVANCED CONCEPTS: QUANTIZATION BELOW 8 BITS

https://arxiv.org/pdf/2103.13630.pdf

Here is a talk on auto48 (automatic mixed int4/int8 quantization)

https://www.nvidia.com/en-us/on-demand/session/gtcspring22-s41611/

londons_explorer t1_jam6oyr wrote on March 2, 2023 at 12:49 PM

Aren't biases only a tiny tiny fraction of the total memory usage? Is it even worth trying to quantize them more than weights?

londons_explorer t1_jam6r8g wrote on March 2, 2023 at 12:50 PM

Don't you mean the other way around?

tomd_96 t1_jamp6kt wrote on March 2, 2023 at 3:14 PM

Where was this introduced?

CellWithoutCulture t1_javhjpc wrote on March 4, 2023 at 11:29 AM

I mean... why were they not doing this already? They would have to code it but it seems like low hanging fruit

> memory efficient attention. 10x-20x increase in batch size.

That seems large, which paper has that?

LetterRip t1_javpxbv wrote on March 4, 2023 at 1:07 PM

> I mean... why were they not doing this already? They would have to code it but it seems like low hanging fruit

GPT-3 came out in 2020 (they had their initial price then a modest price drop early on).

Flash attention is June of 2022.

Quantization we've only figured out how to do it fairly lossless recently (especially int4). Tim Dettmers LLM int8 is from August 2022.

https://arxiv.org/abs/2208.07339

> That seems large, which paper has that?

See

https://github.com/HazyResearch/flash-attention/raw/main/assets/flashattn_memory.jpg

>We show memory savings in this graph (note that memory footprint is the same no matter if you use dropout or masking). Memory savings are proportional to sequence length -- since standard attention has memory quadratic in sequence length, whereas FlashAttention has memory linear in sequence length. We see 10X memory savings at sequence length 2K, and 20X at 4K. As a result, FlashAttention can scale to much longer sequence lengths.

https://github.com/HazyResearch/flash-attention

CellWithoutCulture t1_javqw9s wrote on March 4, 2023 at 1:17 PM

Fantastic reply, it's great to see all those concrete advances thst made it intro prod. Thanks for sharing.

harharveryfunny t1_jairuhd wrote on March 1, 2023 at 7:02 PM

It says they've cut their costs by 90%, and are passing that saving onto the user. I'd have to guess that they are making money on this, not just treating it as a loss-leader for other more expensive models.

The way the API works is that you have to send the entire conversation each time, and the tokens you will be billed for include both those you send and the API's response (which you are likely to append to the conversation and send back to them, getting billed again and again as the conversation progresses). By the time you've hit the 4K token limit of this API, there will have been a bunch of back and forth - you'll have paid a lot more than 4K * 0.2c/1K for the conversation. It's easy to imagine chat-based API's becoming very widespread and the billable volume becoming huge. OpenAI are using Microsoft Azure compute, who may see a large spike in usage/profits out of this.

It'll be interesting to see how this pricing, and that of competitors evolves. Interesting to see also some of OpenAI's annual price plans outlined elsewhere such as $800K/yr for their 8K token limit "DV" model (DaVinci 4.0?), and $1.5M/yr for the 32K token limit "DV" model.

luckyj t1_jajaz53 wrote on March 1, 2023 at 9:01 PM

But that (sending the whole or part of the conversation history) is exactly what we had to do with text-davinci if we wanted to give it some type of memory. It's the same thing with a different format, and 10% of the price... And having tested it, it's more like chatgpt (I'm sorry, I'm a language model type of replies), which I'm not very fond of. But the price... Hard to resist. I've just ported my bot to this new model and will play with it for a few days

currentscurrents t1_jajg818 wrote on March 1, 2023 at 9:33 PM

> It says they've cut their costs by 90%

Honestly this seems very possible. The original GPT-3 made very inefficient use of its parameters, and since then people have come up with a lot of ways to optimize LLMs.

visarga t1_jaj4bqs wrote on March 1, 2023 at 8:21 PM

> $1.5M/yr

The inference cost is probably 10% of that.

xGovernor t1_jaksopw wrote on March 2, 2023 at 3:18 AM

Oh boy what I got away with. I have been using hundreds of thousands of tokens, augmenting parameters and only ever spent 20 bucks. I feel pretty lucky.

Im2bored17 t1_jam6y5y wrote on March 2, 2023 at 12:52 PM

$20.00 / ($0.002/ 1k tokens) = 10m tokens. If you only used a few hundred k, you got scammed hard lol

[deleted] t1_jarmz1h wrote on March 3, 2023 at 3:51 PM

[deleted]

[deleted] t1_jap9wft wrote on March 3, 2023 at 1:47 AM

[removed]

xGovernor t1_jasx7r9 wrote on March 3, 2023 at 8:52 PM

You needed the secret api key, included with the plus edition. Prior to Whispers I don't believe you could obtain a secret key. Also gave early access to new features and provides me turbo day one. Also I've used to much more and got turbo to work with my plus subscription.

Had to find a workaround. Don't feel scammed. Plus I've been having too much fun with it.

[deleted] t1_jajmeil wrote on March 1, 2023 at 10:12 PM

[deleted]

[deleted] t1_japa07x wrote on March 3, 2023 at 1:48 AM

[removed]

Thin_Sky t1_jav7a6e wrote on March 4, 2023 at 9:02 AM

Where do you find info on these 8k and 32k token prices? Is this listed on their page or is it leaked from consultations?

harharveryfunny t1_javmsab wrote on March 4, 2023 at 12:34 PM

It's a leak, but seems to be legitimate.

https://twitter.com/transitive_bs/status/1628118163874516992

Thin_Sky t1_jaxmuu6 wrote on March 4, 2023 at 9:24 PM

Thanks!

Educational-Net303 t1_jair4wf wrote on March 1, 2023 at 6:58 PM

Definitely a loss-leader to cut off Claude/bard, electricity alone would cost more than that. Expect a rise in price in 1 or 2 months

JackBlemming t1_jaisvp4 wrote on March 1, 2023 at 7:09 PM

Definitely. This is so they can become entrenched and collect massive amounts of data. It also discourages competition, since they won't be able to compete against these artificially low prices. This is not good for the community. This would be equivalent to opening up a restaurant and giving away food for free, then jacking up prices when the adjacent restaurants go bankrupt. OpenAI are not good guys.

I will rescind my comment and personally apologize if they release ChatGPT code, but we all know that will never happen, unless they have a better product lined up.

jturp-sc t1_jaj45ek wrote on March 1, 2023 at 8:20 PM

The entry costs have always been so high that LLMs as a service was going to be a winner-take-most marketplace.

I think the best hope is to see other major players enter the space either commercially or as FOSS. I think the former is more likely, and I was really hoping that we would see PaLM on GCP or even something crazier like a Meta-Amazon partnership for LLaMa on AWS.

Unfortunately, I don't think any of those orgs will pivot fast enough until some damage is done.

badabummbadabing t1_jajdjmr wrote on March 1, 2023 at 9:17 PM

Honestly, I have become a lot more optimistic regarding the prospect of monopolies in this space.

When we were still in the phase of 'just add even more parameters', the future seemed to be headed that way. With Chinchilla scaling (and looking at results of e.g. LLaMA), things look quite a bit more optimistic. Consider that ChatGPT is reportedly much lighter than GPT3. At some point, the availability of data will be the bottleneck (which is where an early entry into the market can help getting an advantage in terms of collecting said data), whereas compute will become cheaper and cheaper.

The training costs lie in the low millions (10M was the cited number for GPT3), which is a joke compared to the startup costs of many, many industries. So while this won't be something that anyone can train, I think it's more likely that there will be a few big players (rather than a single one) going forward.

I think one big question is whether OpenAI can leverage user interaction for training purposes -- if that is the case, they can gain an advantage that will be much harder to catch up to.

farmingvillein t1_jajw0yj wrote on March 1, 2023 at 11:17 PM

> The training costs lie in the low millions (10M was the cited number for GPT3), which is a joke compared to the startup costs of many, many industries. So while this won't be something that anyone can train, I think it's more likely that there will be a few big players (rather than a single one) going forward.

Yeah, I think there are two big additional unknowns here:

How hard is it to optimize inference costs? If--for sake of argument--for $100M you can drop your inference unit costs by 10x, that could end up being a very large and very hidden barrier to entry.
How much will SOTA LLMs really cost to train in, say, 1-2-3 years? And how much will SOTA matter?

The current generation will, presumably, get cheaper and easier to train.

But if it turns out that, say, multimodal training at scale is critical to leveling up performance across all modes, that could jack up training costs really, really quickly--e.g., think the costs to suck down and train against a large subset of public video. Potentially layer in synthetic data from agents exploring worlds (basically, videogames...), as well.

Now, it could be that the incremental gains to, say, language are not that high--in which case the LLM (at least as these models exist right now) business probably heavily commoditizes over the next few years.

[deleted] t1_japaq3w wrote on March 3, 2023 at 1:53 AM

[removed]

[deleted] t1_japasem wrote on March 3, 2023 at 1:54 AM

[removed]

[deleted] t1_japauo6 wrote on March 3, 2023 at 1:54 AM

[removed]

[deleted] t1_jako73i wrote on March 2, 2023 at 2:43 AM

[removed]

[deleted] t1_japabmm wrote on March 3, 2023 at 1:50 AM

[removed]

Derpy_Snout t1_jajfxrw wrote on March 1, 2023 at 9:32 PM

> This would be equivalent to opening up a restaurant and giving away food for free, then jacking up prices when the adjacent restaurants go bankrupt.

The good old Walmart strategy

VertexMachine t1_jajjq8b wrote on March 1, 2023 at 9:55 PM

Yea, but one thing is not adding up. It's not like I can go to a competitor and get access to similar level of quality API.

Plus if it's a price war... with Google.. that would be stupid. Even with Microsoft's money, Alphabet Inc is not someone you want to go to war on undercutting prices.

Also they updated their polices on using users data, so the data gathering argument doesn't seem valid as well (if you trust them)

Edit: ah, btw. I don't say that there is no ulterior motive here. I don't really trust "Open"AI since the "GPT2-is-to-dangerous-to-release" bs (and corporate restructuring). Just that I don't think is that simple.

farmingvillein t1_jajtmly wrote on March 1, 2023 at 11:01 PM

> Plus if it's a price war... with Google.. that would be stupid

If it is a price war strategy...my guess is that they're not worried about Google.

Or, put another way, if it is Google versus OpenAI, openai is pretty happy about the resulting duopoly. Crushing everyone else in the womb, though, would be valuable.

astrange t1_jajpps3 wrote on March 1, 2023 at 10:34 PM

"They're just gathering data" is literally never true. That kind of data isn't good for anything.

TrueBirch t1_jakosce wrote on March 2, 2023 at 2:48 AM

I worked in adtech. It's often true.

Purplekeyboard t1_jajcnb5 wrote on March 1, 2023 at 9:12 PM

> This is not good for the community.

When GPT-3 first came out and prices were posted, everyone complained about how expensive it was, and that it was prohibitively expensive for a lot of uses. Now it's too cheap? What is the acceptable price range?

JackBlemming t1_jajg4dz wrote on March 1, 2023 at 9:33 PM

It's not about the price, it's about the strategy. Google maps API was dirt cheap so nobody competed, then they cranked up prices 1400% once they had years of advantage and market lock in. That's not ok.

If OpenAI keeps prices stable, nobody will complain, but this is likely a market capturing play. They even said they were losing money on every request, but maybe that's not true anymore.

Beli_Mawrr t1_jajvgax wrote on March 1, 2023 at 11:14 PM

I use the API as a dev. I can say that if Bard works anything like OpenAI, it will be super easy to switch.

[deleted] t1_jajgqsv wrote on March 1, 2023 at 9:37 PM

[deleted]

bmc2 t1_jajjjvd wrote on March 1, 2023 at 9:54 PM

Training based on submitted data is going to be curtailed according to their announcement:

“Data submitted through the API is no longer used for service improvements (including model training) unless the organization opts in”

lostmsu t1_jaj0dw2 wrote on March 1, 2023 at 7:56 PM

I would love an electricity estimate for running GPT-3-sized models with optimal configuration.

According to my own estimate, electricity cost for a lifetime (~5y) of a 350W GPU is between $1k-$1.6k. Which means for enterprise-class GPUs electricity is dwarfed by the cost of the GPU itself.

currentscurrents t1_jajfjr5 wrote on March 1, 2023 at 9:29 PM

Problem is we don't actually know how big ChatGPT is.

I strongly doubt they're running the full 175B model, you can prune/distill a lot without affecting performance.

MysteryInc152 t1_jal7d3p wrote on March 2, 2023 at 5:29 AM

Distillation doesn't work for token predicting language models for some reason.

currentscurrents t1_jalajj3 wrote on March 2, 2023 at 6:03 AM

DistillBERT worked though?

MysteryInc152 t1_jalau7e wrote on March 2, 2023 at 6:07 AM

Sorry i meant the really large scale models. Nobody has gotten a gpt-3/chinchilla etc scale model to actually distill properly.

harharveryfunny t1_jaj8bk2 wrote on March 1, 2023 at 8:45 PM

Could you put any numbers to that ?

What are the FLOPS per token inference for a given prompt length (for a given model)?

What do those FLOPS translate to in terms of run time on Azure's GPUs (V100's ?)

What is the GPU power consumption and data center electricity costs ?

Even with these numbers can we really relate this to their $/token pricing scheme ? The pricing page mentions this 90% cost reduction being for the "gpt-3.5-turbo" model vs the earlier davinci-text-3.5 (?) one - do we even know the architectural details to get the FLOPs ?

WarProfessional3278 t1_jaj9nnt wrote on March 1, 2023 at 8:53 PM

Rough estimate: with one 400w gpu and $0.14/hr electricity, you are looking at ~0.00016/sec here. That's the price for running the GPU alone, not accounting server costs etc.

I'm not sure if there are any reliable estimate on FLOPS per token inference, though I will be happy to be proven wrong :)

bmc2 t1_jajj03y wrote on March 1, 2023 at 9:50 PM

They raised $10B. They can afford to eat the costs.

Smallpaul t1_jam6mjl wrote on March 2, 2023 at 12:49 PM

1 of 2 months??? How would that short time achieve the goal against well-funded competitors?

It would need to be multiple years of undercutting and even that might not be enough to lock google out.

WarAndGeese t1_jalq339 wrote on March 2, 2023 at 9:25 AM

Don't let it demotivate competitors. They are making money somehow, and planning to make massive amounts more. Hence the space is ripe for tons of competition, and those other companies would also be on track to make tons of money. Hence, jump in competitors, the market is waiting for you.

Smallpaul t1_jam7abr wrote on March 2, 2023 at 12:55 PM

> Don't let it demotivate competitors. They are making money somehow,

What makes you so confident?

MonstarGaming t1_japbd46 wrote on March 3, 2023 at 1:58 AM

>They are making money somehow

Extremely doubtful. Microsoft went in for $10B at a $29B valuation. We have seen pre-revenue companies IPO for far more than that. Microsoft's $10B deal is probably the only thing keeping them afloat.

>Hence the space is ripe for tons of competition

I think you should look up which big tech companies already offer chatbots. You'll find the space is already very competitive. Sure, they aren't large, generative language models, but they target the B2C market that ChatGPT is attempting to compete in.

[deleted] t1_jak0est wrote on March 1, 2023 at 11:49 PM

[removed]

elsrda t1_jak6drt wrote on March 2, 2023 at 12:32 AM

Indeed, at least not for now.

EDIT: source

[deleted] t1_jak7jf3 wrote on March 2, 2023 at 12:41 AM

[removed]

[deleted] t1_jao2iuo wrote on March 2, 2023 at 8:43 PM

[removed]

[deleted] t1_jap9jyg wrote on March 3, 2023 at 1:44 AM

[removed]

qqYn7PIE57zkf6kn t1_japrx5u wrote on March 3, 2023 at 4:11 AM

What does system message mean?

earslap t1_jb0qamw wrote on March 5, 2023 at 3:32 PM

When you feed messages into the API, there are different "roles" to tag each message ("assistant", "user", "system"). So you provide content and tell it from which "role" the content comes from. The model continues from there using the role "assistant". There is a token limit (limited by the model) so if your context exceeds that (combined token size of all roles), you'll need to inject salient context from the conversation using the appropriate role.

[deleted] t1_jarkcfb wrote on March 3, 2023 at 3:33 PM

[deleted]

jturp-sc t1_jaj2w4j wrote on March 1, 2023 at 8:12 PM

Glad to see them make ChatGPT accessible via API and go back to update their documentation to be more clear on which model is which.

I had an exhausting number of conversations with confused product managers, engineers and marketing managers on "No, we're not using ChatGPT".

ShowerVagina t1_jamiqb4 wrote on March 2, 2023 at 2:28 PM

> I had an exhausting number of conversations with confused product managers, engineers and marketing managers on “No, we’re not using ChatGPT”.

They use your conversations for further training which means if you use it to help you with proprietary code or documentation, you're effectively disclosing that.

---AI--- t1_jamo555 wrote on March 2, 2023 at 3:07 PM

OpenAI updated their page to promise they will stop doing that.

ShowerVagina t1_jamts00 wrote on March 2, 2023 at 3:45 PM

Is that for everyone or just API/Enterprise users?

---AI--- t1_jasgezh wrote on March 3, 2023 at 7:02 PM

I only saw it mentioned in the context of API/Enterprise users.

Timdegreat t1_jaj3gpr wrote on March 1, 2023 at 8:15 PM

Will we be able to generate embeddings using the ChatGPT API?

visarga t1_jaj4lxx wrote on March 1, 2023 at 8:22 PM

Not this time. Still text-embedding-ada-002

NoLifeGamer2 t1_jaj9i1b wrote on March 1, 2023 at 8:52 PM

Gotta love getting those "Model currently busy" errors for only a single request

sebzim4500 t1_jan01xr wrote on March 2, 2023 at 4:26 PM

Would you even want to? Sounds like overkill to me, but maybe I am missing some use case of the embeddings.

Timdegreat t1_jan7sel wrote on March 2, 2023 at 5:16 PM

You can use the embeddings to search through documents. First, create embeddings of your documents. Then create an embedding of your search query. Do a similarity measurement between the document embeddings and the search embedding. Surface the top N documents.

sebzim4500 t1_jan85s7 wrote on March 2, 2023 at 5:18 PM

Yeah, I get that's that embeddings are used for semantic search but would you really want to use a model as big as ChatGPT to compute the embeddings? (Given how cheap and effective Ada is)

Timdegreat t1_jangbi7 wrote on March 2, 2023 at 6:10 PM

You got a point there! I haven't given it too much thought really -- I def need to check out ada.

But wouldn't the ChatGPT embeddings still be better? Given that they're cheap, why not use the better option?

farmingvillein t1_japqcq1 wrote on March 3, 2023 at 3:58 AM

> But wouldn't the ChatGPT embeddings still be better? Given that they're cheap, why not use the better option?

Usually, to get the best embeddings, you need to train them somewhat differently than you do a "normal" LLM. So ChatGPT may not(?) be "best" right now, for that application.

londons_explorer t1_jam8409 wrote on March 2, 2023 at 1:03 PM

It was an interesting business decision to make a blog post announcing two rather different products (ChatGPT API and Whisper) at the same time...

ChatGPT is a best-in-class, or even only-in-class chatbot API... While Whisper is one of many hosted speech to text solutions.

harharveryfunny t1_jamab7m wrote on March 2, 2023 at 1:22 PM

The two pair up very well though - now that there's a natural language API, you could leverage that for speech->text->ChatGPT. From what I've seen of the Whisper demos, it seems to be the best out there by quite a margin. Does anything else perform as well?

fasttosmile t1_janaaex wrote on March 2, 2023 at 5:31 PM

GCP, speechmatics, rev, otter.ai, assemblyai etc. etc. offer similar or better performance, as well as streaming and a much more rich output.

MonstarGaming t1_jap8605 wrote on March 3, 2023 at 1:34 AM

That seems to be the gist of this entire thread. This is the first API most of /r/machinelearning have heard of so it must be best on the market. /s

To your point, there are companies who have been developing speech-to-text for decades. The capability is so unremarkable that most (all?) cloud providers have a speech-to-text offering already and it easily integrates with their other services.

I know this is a hot take, but I don't think OpenAI has a business strategy. They're deploying expensive models that directly compete with entrenched, big tech companies. They can't be thinking they're going to take market share away from GCP, AWS, Azure with technologies that all three offer already, right? Right???

fasttosmile t1_japaes4 wrote on March 3, 2023 at 1:51 AM

To be fair, they are technically very competent and the pricing is very cheap. And their marketing is great.

But yeah dealing with B2B customers (where the money is) and integrating feedback from them is a very different thing than what they've been doing so far. They might be angling to serve as a platform for AI companies that then have to deal with average customers. That way they get to only deal with people who understand the limitations of AI. Could work. Will change the company to be less researchy though.

soobardo t1_japo5w5 wrote on March 3, 2023 at 3:39 AM

Yes, they pair up perfectly. Whisper detects anything I babble to it, english or french and it's surprisingly fast. I've wrapped a loop that:

listens micro -> whisper STT -> chatgpt -> lang detect -> Google TTS -> speaker

With noise/silence detection, it's a complete hands-off experience, like chatting with a real person. Delay is ~ 5s for all calls. "Glueing" the APIs is straightforward and intuitive.

xGovernor t1_jaksctz wrote on March 2, 2023 at 3:15 AM

I've been tinkering with DaVinci but even with turbo/premium using gpt3.5turbo api requires a credit card added to the account. Excited to fool with it, however I typically use 2048-4000 tokens on DaVinci 3.

Lychee7 t1_jalbr7l wrote on March 2, 2023 at 6:17 AM

Criteria for tokens ? Complex, longer the prompt more tokens it'll use ?

Trotskyist t1_jalk4j5 wrote on March 2, 2023 at 8:02 AM

A token is (roughly) 4 characters. Both prompt and result are counted.

iTrooz_ t1_jale5ca wrote on March 2, 2023 at 6:45 AM

I hope the API doesn't have the same restrictions as https://chat.openai.com

Stakbrok t1_jam0bpq wrote on March 2, 2023 at 11:42 AM

You can edit what it replied of course (and then hope it builds off of that and keeps that specific vibe going, which always works in the playground) but damn, they locked it down tight. 😅

Even when you edit the primer/setup into something crazy (you are a grumpy or deranged or whatever assistant) and change some things it said into something crazy, it overrides the custom mood you set for it and goes right back to its ever serious ChatGPT mode. Even sometimes apologizing for saying something out of character (and by that it means the thing you 'made it say' by editing, so it believes it said that)

ShowerVagina t1_jamyp12 wrote on March 2, 2023 at 4:17 PM

I might be in the minority but I strongly believe in unfiltered AI (or a minimal filter, only blocking thing like directions to cool drugs or make weapons). I know they filter it for liability reasons but I wish they didn't.

Sea_Alarm_4725 t1_janmlir wrote on March 2, 2023 at 6:56 PM

I can’t seem to find anywhere what the token limit per request is? With davinci is something like 4k tokens, what about this new chatgpt api?

minimaxir OP t1_jann3ze wrote on March 2, 2023 at 7:00 PM

4k

Bluebotlabs t1_jar58e4 wrote on March 3, 2023 at 1:39 PM

Doesn't the number of tokens increase exponentially with chat history?

minimaxir OP t1_jaru4ch wrote on March 3, 2023 at 4:38 PM

More cumulatively than exponentially but yes.

With the new prices that's not a big deal.

Bluebotlabs t1_jarufrq wrote on March 3, 2023 at 4:40 PM

My mistake, I was confused with the system I was.using for chat history lol

bdambrosio94563 t1_jb2ct4n wrote on March 5, 2023 at 10:16 PM

I've spent the last week exploring gpt-3.5-turbo. Went back to text-davinci. (1) gpt-3.5-turbo is incredibly heavily censored. For example, good luck getting anything medical out of it other than 'consult your local medical professional'. It also is much more reluctant to play a role. (2) As is well documented, it is much more resistant to few-shot training. Since I use it in several roles, including google search information extraction and response-composition, I find it very dissappointing.

Luckily, my use case is as my personal companion / advisor / coach, so my usage is low enough I can afford text-davinci. Sure wish there was a middle-ground, though.

Akbartus t1_jbs0hkp wrote on March 11, 2023 at 6:41 AM

Cannot agree. It is not a deal at all. Such a pricing strategy in the long term is very profitable for its creators. But it does not matter for those who would like to use it, but due to financial situation cannot afford using such APIs for a longer period of time (think about people beyond rich countries). Moreover 1k tokens can be generated in just one small talk in a matter of a few seconds...

peanutbutterjambread t1_jak2p3i wrote on March 2, 2023 at 12:05 AM

Cool

MonstarGaming t1_jakqs01 wrote on March 2, 2023 at 3:03 AM

>I have no idea how OpenAI can make money on this.

Personally, I don't think they can. What is the main use case for chat bots? How many people are going to pay $20/month to talk to a chatbot? I mean, chatbots aren't exactly new... anybody who wanted to chat with one before ChatGPT could have and yet there wasn't an industry for it. Couple that with it not being possible to know whether its answers are fact or fiction and I just don't see the major value proposition.

I'm not overly concerned one way or another, I just don't think the business case is very strong.

Smallpaul t1_jam83rb wrote on March 2, 2023 at 1:03 PM

I guess you haven’t visited any B2C websites in the last 5 years.

But also: there is a world model behind the chatbot which can translate between human languages, between computer languages, can compose marketing copy, summarise text...

MonstarGaming t1_jap3jzc wrote on March 3, 2023 at 12:59 AM

>I guess you haven’t visited any B2C websites in the last 5 years.

I have and that is exactly my point. The main use case is B2C websites, NOT individuals, and there are already very mature products in that space. OpenAI needs to develop a lot of bells, whistles, and integration points with existing technologies (salesforce, service now, etc.) before they can be competitive in that market.

>can translate between human languages

Very valuable, but Google and Microsoft both offer this for free.

>between computer languages

This is niche, but it does seem like an untapped, albeit small, market.

>can compose marketing

Also niche. That being said, would it save time? Marketing materials are highly curated.

>summarise text...

Is this a problem a regular person would pay to have fixed? The maximum input size is 2048 tokens / ~1,500 words / three pages. Assuming an average person pastes in the maximum input, they're summarizing material that would take them 6 minutes to read (Google is saying the average person reads 250 words per minutes). Mind you it isn't saving 6 minutes, they still need to read all of the content ChatGPT produces. Wouldn't the average person just skim the document if they wanted to save time?

To your point, it is clearly a capable technology, but that wasn't my argument. There have been troves of capable technologies that were ultimately unprofitable. While I believe it can be successful in the B2C market, I don't think the value proposition is nearly as strong for individuals.

Anyhow, only time will tell.

[deleted] t1_jap8ttt wrote on March 3, 2023 at 1:39 AM

[removed]

MonstarGaming t1_japjnn4 wrote on March 3, 2023 at 3:02 AM

Nice, nothing demonstrates the Dunning-Kruger effect quite like a string of insults.

For whatever its worth, that argument is exceedingly weak. I'll let you brainstorm on why that might be. I don't have interest in debating with someone who so obviously lacks tact.

Smallpaul t1_jb5rab7 wrote on March 6, 2023 at 5:17 PM

https://www.vox.com/technology/2023/3/6/23624015/silicon-valley-generative-ai-chat-gpt-crypto-hype-trend

caedin8 t1_jakcasg wrote on March 2, 2023 at 1:16 AM

It's exciting to see that ChatGPT's cost is 1/10th that of GPT-3 API, which is a huge advantage for developers who are looking for high-quality language models at an affordable price. OpenAI's commitment to providing top-notch AI tools while keeping costs low is commendable and will undoubtedly attract more developers to the platform. It's clear that ChatGPT is a superior option for developers, and OpenAI's dedication to innovation and affordability is sure to make it a top choice for many in the AI community.

big_ol_tender t1_jakmlmc wrote on March 2, 2023 at 2:32 AM

-totally not chatgpt

GrumpyMcGillicuddy t1_jakqy81 wrote on March 2, 2023 at 3:04 AM

Uhhhh

[deleted] t1_jajwxlq wrote on March 1, 2023 at 11:24 PM

[removed]

Comments