The larger problem might be long-term memory, but once we figure that out... Actually no, it is easy to figure it out.
So weeks, maybe months, but you will need wifi. A d it will be a bit laggy, as in it will take a noticable delay to respond. Not long, just noticable, so that will take out a lot of emotions out of shit.
Honestly, this depends on when OpenAI releases chatGPT API, because once that's out, it's out. It really is just a quick connection of voice-to-text API, chatGPT and text-to-voice, ad that's it.
A lot of it has to do with computational intensity and latency. Text to audio and vice versa takes a bit of time - and different challenges for local or cloud based solutions.
Let's say you want chatbot to real-time reply to you in audio, with a cloud based solution.
First you speak to it in audio, and that is sent to a cloud server - this part is relatively fast, and what already happens with things like Google home/Alexa. Then it needs to convert it to text, and run that text through an LLM. Then the LLM creates a response, and that needs to be converted to audio.
Let's say for a solution like we see with elevenlabs, it takes 2 seconds for every second of audio you want to generate. That means if the reply is going to be 10 seconds, it takes 20 seconds to generate. That would be too slow.
You might have opportunity to stream that audio, by only generating some of the text to audio before starting the process, but these solutions work better when they are given more text to generate all at once... Generating a word at a time would be like talking with A. Period. In. Between. Every. Word.
Tiamatium t1_j9uzmge wrote
Weeks, maybe months.
The larger problem might be long-term memory, but once we figure that out... Actually no, it is easy to figure it out.
So weeks, maybe months, but you will need wifi. A d it will be a bit laggy, as in it will take a noticable delay to respond. Not long, just noticable, so that will take out a lot of emotions out of shit.
Honestly, this depends on when OpenAI releases chatGPT API, because once that's out, it's out. It really is just a quick connection of voice-to-text API, chatGPT and text-to-voice, ad that's it.