Tiamatium t1_j9uzmge wrote on February 24, 2023 at 7:21 PM

Weeks, maybe months.

The larger problem might be long-term memory, but once we figure that out... Actually no, it is easy to figure it out.

So weeks, maybe months, but you will need wifi. A d it will be a bit laggy, as in it will take a noticable delay to respond. Not long, just noticable, so that will take out a lot of emotions out of shit.

Honestly, this depends on when OpenAI releases chatGPT API, because once that's out, it's out. It really is just a quick connection of voice-to-text API, chatGPT and text-to-voice, ad that's it.

[deleted] OP t1_j9v0iuq wrote on February 24, 2023 at 7:27 PM

[deleted]

ChronoPsyche t1_j9v75hx wrote on February 24, 2023 at 8:10 PM

Wait till you find out about GPT3. Lol.

BarockMoebelSecond t1_j9v9nu7 wrote on February 24, 2023 at 8:26 PM

There's already a GPT3 API.

Tiamatium t1_j9wxp1i wrote on February 25, 2023 at 3:43 AM

Yeah, but it's not as good as chatgpt. Plus, chatgpt API will have 8x the content window, thus memory.

TFenrir t1_j9v3joj wrote on February 24, 2023 at 7:46 PM

A lot of it has to do with computational intensity and latency. Text to audio and vice versa takes a bit of time - and different challenges for local or cloud based solutions.

Let's say you want chatbot to real-time reply to you in audio, with a cloud based solution.

First you speak to it in audio, and that is sent to a cloud server - this part is relatively fast, and what already happens with things like Google home/Alexa. Then it needs to convert it to text, and run that text through an LLM. Then the LLM creates a response, and that needs to be converted to audio.

Let's say for a solution like we see with elevenlabs, it takes 2 seconds for every second of audio you want to generate. That means if the reply is going to be 10 seconds, it takes 20 seconds to generate. That would be too slow.

You might have opportunity to stream that audio, by only generating some of the text to audio before starting the process, but these solutions work better when they are given more text to generate all at once... Generating a word at a time would be like talking with A. Period. In. Between. Every. Word.

aionskull t1_j9v45m8 wrote on February 24, 2023 at 7:50 PM

I voice chat to bing chat on my phone now so, it's here.

When will AI chatbots speak with us through audio?

Comments