Submitted by minimaxir t3_11fbccz in MachineLearning
https://openai.com/blog/introducing-chatgpt-and-whisper-apis
> It is priced at $0.002 per 1k tokens, which is 10x cheaper than our existing GPT-3.5 models.
This is a massive, massive deal. For context, the reason GPT-3 apps took off over the past few months before ChatGPT went viral is because a) text-davinci-003 was released and was a significant performance increase and b) the cost was cut from $0.06/1k tokens to $0.02/1k tokens, which made consumer applications feasible without a large upfront cost.
A much better model and a 1/10th cost warps the economics completely to the point that it may be better than in-house finetuned LLMs.
I have no idea how OpenAI can make money on this. This has to be a loss-leader to lock out competitors before they even get off the ground.
LetterRip t1_jaj1kp3 wrote
> I have no idea how OpenAI can make money on this.
Quantizing to mixed int8/int4 - 70% hardware reduction and 3x speed increase compared to float16 with essentially no loss in quality.
A*.3/3 = 10% of the cost.
Switch from quadratic to memory efficient attention. 10x-20x increase in batch size.
So we are talking it taking about 1% of the resources and a 10x price reduction - they should be 90% more profitable compared to when they introduced GPT-3.
edit - see MS DeepSpeed MII - showing a 40x per token cost reduction for Bloom-176B vs default implementation
https://github.com/microsoft/DeepSpeed-MII
Also there are additional ways to reduce cost not covered above - pruning, graph optimization, teacher student distillation. I think teacher student distillation is extremely likely given reports that it has difficulty with more complex prompts.