Comments
NWCoffeenut t1_jdsgb83 wrote
I think a good part of the latency was with the TTS system. The actual text response for the most part came back reasonably quickly.
illathon t1_jdsoud8 wrote
No most implementations of whisper are slow.
itsnotlupus t1_jdt280v wrote
Whisper is the speech recognition component.
I don't think he said what he's using for TTS, might be MacOS' builtin thingy.
eggsnomellettes t1_jdt5dxl wrote
They're using elevenlabs, which isn't local and hence a slow API call
tortoise888 t1_jdtp8yj wrote
If we eventually get open source Elevenlabs quality models running locally it's gonna be insane.
Genesis_Fractiliza t1_jdu2f6x wrote
!remind me 1 month
RemindMeBot t1_jdu2nlm wrote
I will be messaging you in 1 month on 2023-04-27 04:55:12 UTC to remind you of this link
3 OTHERS CLICKED THIS LINK to send a PM to also be reminded and to reduce spam.
^(Parent commenter can ) ^(delete this message to hide from others.)
^(Info) | ^(Custom) | ^(Your Reminders) | ^(Feedback) |
---|
ebolathrowawayy t1_jdvfmrk wrote
There's also Tortoise TTS which can be run locally but idk how fast it is.
stupidcasey t1_jdsff4l wrote
I expect gpt-5 or 6 to be super multimodal where they train it on anything and everything we have data for, audio shur video of course crossword puzzles hell yeah pong yup car driving why not, I think the only thing stopping us is it takes to long and we’ll have more processing power by then.
pokeuser61 t1_jdskrfs wrote
If you ran this on the hardware that gpt5 will require, it wouldn’t have a delay.
RedditLovingSun t1_jdtn0z9 wrote
It looks like from the title bar he's using whisper api for transcribing his audio to a text query. That has to send a API request with the audio out and wait for the text to come back over the internet. I'm sure a local audio text transcriber would be considerably faster
Edit nvm whisper can be run locally so he's probably doing that
itsnotlupus t1_jdt2igm wrote
The model text output is(/can be) a stream, so it ought to be possible to pipe that text stream into a warmed up TTS system and start getting audio before the text is fully generated.
Drown_The_Gods t1_jdww8zc wrote
Use Talon Voice. The developer has their own engine that blows Whisper out of the water. Never worry about speed again. Don’t thank me, but do chuck them a few dollars if you find it useful.
moonpumper t1_jdsxn21 wrote
I just want a screen free phone that's basically just Jarvis. Read my texts to me, look shit up for me, keep track of and make appointments for me, give me stock quotes, tell me the news, just don't suck me into an infinite scroll anymore. If I need to see something cast it to a screen in my house. Done with phone screens.
RedditLovingSun t1_jdtnafr wrote
Can't wait till we get there with a better alpaca model + local transcription and audio generation + chatgpt style plugins for operating apps. All possible today we just have to wait for it to be developed
SkyeandJett t1_jdt2zli wrote
That was my thought. No more phone. Just the smart watch.
SnipingNinja t1_jdv68wu wrote
I actually have a concept in my mind, don't have all the skills needed but will be learning things in the next few months, hopefully I'm not too late when I'm done making my idea into reality.
SkyeandJett t1_jdv6nju wrote
This is probably just my anxiety but I feel like anything we think of or try to execute is going to be eclipsed before it can be realized. We're going to go overnight from this moment to indistinguishable from human androids and FDVR. This past couple of weeks has been overwhelming in the extreme.
SnipingNinja t1_jdvg55n wrote
You're right but I think that issue isn't relevant to this, having a locally running AI would be useful regardless of other innovations, and there's something to say about cyberpunkness of such a device
[deleted] t1_jdv9fmu wrote
[deleted]
moonpumper t1_jdva8ow wrote
With chat gpt type stuff how would it sound much different than a phone conversation? The whole idea is that the os responds to natural language, like talking to a personal assistant or secretary.
[deleted] t1_jdvfk27 wrote
[deleted]
czmax t1_jdwcbuq wrote
I was hoping that wearables (like a watch) could do this for me. Or at least force development in that direction.
(Seems to not be panning out… but i still have hope. I’d love to only carry a watch for most of my day. Initially I’d go through screen withdrawal but in the long run I think life would be better).
Dwanyelle t1_jdsjgfb wrote
Yeah, I'd be surprised if we don't have something like that available publicly before the end of the year(if only cause big tech is slowly and unwieldy and things need to work their way through the proper paperwork
pokeuser61 t1_jdskvem wrote
It is both public and open source
Dwanyelle t1_jdsmha6 wrote
I should clarify, it will be a packaged product from a big tech person.
I could do this, sure, I can putz around on computers a bit, but once you can just click an "install" button in the Microsoft store, that's it
micseydel t1_jdsr6vx wrote
Big tech will offer it as a service instead of a locally-running system. That will mean latency, increased data use, and other... differences 😅
Dwanyelle t1_jdss5uk wrote
Oh, there will definitely be a ton of downsides, but convenience will not be one of them.
GoldenRain t1_jdvlweg wrote
Like https://chat.d-id.com/ which already exists?
Tobislu t1_jdt046n wrote
Which means Ultron isn't far behind 👀
fuck_your_diploma t1_jdtdkjw wrote
Don’t tease me like this
Burgundy_and_Pearl t1_jdtf8ms wrote
As long as we don’t prompt it with Pinocchio.
SnipingNinja t1_jdv6hr0 wrote
imlaggingsobad t1_jdu555l wrote
I'm like 100% certain that Apple, Google and Meta are making a JARVIS assistant that connects to AR glasses. It would be a revolutionary product and it's actually feasible imo.
LevelWriting t1_jdvn9mt wrote
I would give up phone if could replace with ar.
_dekappatated t1_jdt8e99 wrote
TIL there was a B programming language
GoSouthYoungMan t1_jdu54fg wrote
And before there was B, there was APL: A Programming Language. (This is not a joke.)
Grecu69 t1_jds1x7q wrote
This looks like a slightly better version of siri imo
HarbingerDe t1_jdst2fh wrote
It's a significantly better version of Siri.
GPT-4 can borderline pass the Turing Test and Siri can barely do... anything?
kevinzvilt t1_jdsv83o wrote
Me: Siri, set my alarm for 7am.
Siri: Here is a list of videos titled Tom Tom Solo by River Banks!
DaffyDuck t1_jdu15vr wrote
13b parameter llama is not as good as GPT4.
averyminya t1_jdt33w0 wrote
[deleted] t1_jdta7a5 wrote
Samantha >>>>>>>>>>
the_funambule t1_jdtjah6 wrote
ChatGPT states Samantha is the most accurate representation of AI in movies
darien_gap t1_jdthhsi wrote
I've been waiting for this since Apple's concept video in 1987: https://www.youtube.com/watch?v=umJsITGzXd0
Specific-Chicken5419 t1_jdrvy8m wrote
lol
axidentalaeronautic t1_jdsuc3k wrote
YESSSS 😫 this has been my dream for years.
InfoOnAI t1_jdtab19 wrote
I've been trying to set something similar up.
_Alasdair t1_jdy9mfd wrote
I built something exactly like this back when GPT3 API came out. Was pretty cool but eventually got bored with it because it couldn't do anything. I tried hooking it up to external apis to get real world live data but by the end everything was so complicated and slow that I gave up.
Hopefully with the GPT4 plugins we can now make something actually useful. It's gonna be awesome.
HesThePianoMan t1_jdtr4iy wrote
This is nothing special, just sounds like Google assistant
Sigma_Atheist t1_jdswekv wrote
Marvel is cringe. Can we use some other name to compare stuff like this to?
UnexpectedVader t1_jdsyi5s wrote
I’m much more in favour of HAL 9000.
Anjz t1_jdsynyi wrote
You mean you don't like MODOK?
How about we just name it Dan? Dan's a cool guy.
sumane12 t1_jds5lwr wrote
That delay kills me, far too long. I'm guessing gpt5 will have to be multimodal with sound so can recognise words and doesn't need to process into text