sumane12 t1_jds5lwr wrote on March 26, 2023 at 7:39 PM

That delay kills me, far too long. I'm guessing gpt5 will have to be multimodal with sound so can recognise words and doesn't need to process into text

NWCoffeenut t1_jdsgb83 wrote on March 26, 2023 at 8:54 PM

I think a good part of the latency was with the TTS system. The actual text response for the most part came back reasonably quickly.

illathon t1_jdsoud8 wrote on March 26, 2023 at 9:55 PM

No most implementations of whisper are slow.

itsnotlupus t1_jdt280v wrote on March 26, 2023 at 11:37 PM

Whisper is the speech recognition component.
I don't think he said what he's using for TTS, might be MacOS' builtin thingy.

eggsnomellettes t1_jdt5dxl wrote on March 27, 2023 at 12:02 AM

They're using elevenlabs, which isn't local and hence a slow API call

tortoise888 t1_jdtp8yj wrote on March 27, 2023 at 2:47 AM

If we eventually get open source Elevenlabs quality models running locally it's gonna be insane.

Genesis_Fractiliza t1_jdu2f6x wrote on March 27, 2023 at 4:55 AM

!remind me 1 month

RemindMeBot t1_jdu2nlm wrote on March 27, 2023 at 4:57 AM

I will be messaging you in 1 month on 2023-04-27 04:55:12 UTC to remind you of this link

3 OTHERS CLICKED THIS LINK to send a PM to also be reminded and to reduce spam.

^(Parent commenter can ) ^(delete this message to hide from others.)

^(Info)	^(Custom)	^(Your Reminders)	^(Feedback)

ebolathrowawayy t1_jdvfmrk wrote on March 27, 2023 at 2:19 PM

There's also Tortoise TTS which can be run locally but idk how fast it is.

stupidcasey t1_jdsff4l wrote on March 26, 2023 at 8:48 PM

I expect gpt-5 or 6 to be super multimodal where they train it on anything and everything we have data for, audio shur video of course crossword puzzles hell yeah pong yup car driving why not, I think the only thing stopping us is it takes to long and we’ll have more processing power by then.

pokeuser61 t1_jdskrfs wrote on March 26, 2023 at 9:26 PM

If you ran this on the hardware that gpt5 will require, it wouldn’t have a delay.

RedditLovingSun t1_jdtn0z9 wrote on March 27, 2023 at 2:28 AM

It looks like from the title bar he's using whisper api for transcribing his audio to a text query. That has to send a API request with the audio out and wait for the text to come back over the internet. I'm sure a local audio text transcriber would be considerably faster

Edit nvm whisper can be run locally so he's probably doing that

itsnotlupus t1_jdt2igm wrote on March 26, 2023 at 11:39 PM

The model text output is(/can be) a stream, so it ought to be possible to pipe that text stream into a warmed up TTS system and start getting audio before the text is fully generated.

Drown_The_Gods t1_jdww8zc wrote on March 27, 2023 at 7:59 PM

Use Talon Voice. The developer has their own engine that blows Whisper out of the water. Never worry about speed again. Don’t thank me, but do chuck them a few dollars if you find it useful.

moonpumper t1_jdsxn21 wrote on March 26, 2023 at 11:02 PM

I just want a screen free phone that's basically just Jarvis. Read my texts to me, look shit up for me, keep track of and make appointments for me, give me stock quotes, tell me the news, just don't suck me into an infinite scroll anymore. If I need to see something cast it to a screen in my house. Done with phone screens.

RedditLovingSun t1_jdtnafr wrote on March 27, 2023 at 2:30 AM

Can't wait till we get there with a better alpaca model + local transcription and audio generation + chatgpt style plugins for operating apps. All possible today we just have to wait for it to be developed

SkyeandJett t1_jdt2zli wrote on March 26, 2023 at 11:43 PM

That was my thought. No more phone. Just the smart watch.

SnipingNinja t1_jdv68wu wrote on March 27, 2023 at 1:07 PM

I actually have a concept in my mind, don't have all the skills needed but will be learning things in the next few months, hopefully I'm not too late when I'm done making my idea into reality.

SkyeandJett t1_jdv6nju wrote on March 27, 2023 at 1:11 PM

This is probably just my anxiety but I feel like anything we think of or try to execute is going to be eclipsed before it can be realized. We're going to go overnight from this moment to indistinguishable from human androids and FDVR. This past couple of weeks has been overwhelming in the extreme.

SnipingNinja t1_jdvg55n wrote on March 27, 2023 at 2:23 PM

You're right but I think that issue isn't relevant to this, having a locally running AI would be useful regardless of other innovations, and there's something to say about cyberpunkness of such a device

[deleted] t1_jdv9fmu wrote on March 27, 2023 at 1:33 PM

[deleted]

moonpumper t1_jdva8ow wrote on March 27, 2023 at 1:40 PM

With chat gpt type stuff how would it sound much different than a phone conversation? The whole idea is that the os responds to natural language, like talking to a personal assistant or secretary.

[deleted] t1_jdvfk27 wrote on March 27, 2023 at 2:19 PM

[deleted]

czmax t1_jdwcbuq wrote on March 27, 2023 at 5:53 PM

I was hoping that wearables (like a watch) could do this for me. Or at least force development in that direction.

(Seems to not be panning out… but i still have hope. I’d love to only carry a watch for most of my day. Initially I’d go through screen withdrawal but in the long run I think life would be better).

Dwanyelle t1_jdsjgfb wrote on March 26, 2023 at 9:16 PM

Yeah, I'd be surprised if we don't have something like that available publicly before the end of the year(if only cause big tech is slowly and unwieldy and things need to work their way through the proper paperwork

pokeuser61 t1_jdskvem wrote on March 26, 2023 at 9:27 PM

It is both public and open source

Dwanyelle t1_jdsmha6 wrote on March 26, 2023 at 9:38 PM

I should clarify, it will be a packaged product from a big tech person.

I could do this, sure, I can putz around on computers a bit, but once you can just click an "install" button in the Microsoft store, that's it

micseydel t1_jdsr6vx wrote on March 26, 2023 at 10:13 PM

Big tech will offer it as a service instead of a locally-running system. That will mean latency, increased data use, and other... differences 😅

Dwanyelle t1_jdss5uk wrote on March 26, 2023 at 10:21 PM

Oh, there will definitely be a ton of downsides, but convenience will not be one of them.

GoldenRain t1_jdvlweg wrote on March 27, 2023 at 3:02 PM

Like https://chat.d-id.com/ which already exists?

Tobislu t1_jdt046n wrote on March 26, 2023 at 11:21 PM

Which means Ultron isn't far behind 👀

fuck_your_diploma t1_jdtdkjw wrote on March 27, 2023 at 1:08 AM

Don’t tease me like this

Burgundy_and_Pearl t1_jdtf8ms wrote on March 27, 2023 at 1:22 AM

As long as we don’t prompt it with Pinocchio.

SnipingNinja t1_jdv6hr0 wrote on March 27, 2023 at 1:09 PM

strings?

gif

imlaggingsobad t1_jdu555l wrote on March 27, 2023 at 5:26 AM

I'm like 100% certain that Apple, Google and Meta are making a JARVIS assistant that connects to AR glasses. It would be a revolutionary product and it's actually feasible imo.

LevelWriting t1_jdvn9mt wrote on March 27, 2023 at 3:12 PM

I would give up phone if could replace with ar.

_dekappatated t1_jdt8e99 wrote on March 27, 2023 at 12:26 AM

TIL there was a B programming language

GoSouthYoungMan t1_jdu54fg wrote on March 27, 2023 at 5:26 AM

And before there was B, there was APL: A Programming Language. (This is not a joke.)

Grecu69 t1_jds1x7q wrote on March 26, 2023 at 7:13 PM

This looks like a slightly better version of siri imo

HarbingerDe t1_jdst2fh wrote on March 26, 2023 at 10:27 PM

It's a significantly better version of Siri.

GPT-4 can borderline pass the Turing Test and Siri can barely do... anything?

kevinzvilt t1_jdsv83o wrote on March 26, 2023 at 10:44 PM

Me: Siri, set my alarm for 7am.

Siri: Here is a list of videos titled Tom Tom Solo by River Banks!

JDP87 t1_jdt6uhc wrote on March 27, 2023 at 12:13 AM

At least you're getting an answer.

Working on that. Something went wrong. Please try again.

[deleted] t1_jdt7noy wrote on March 27, 2023 at 12:20 AM

[deleted]

DaffyDuck t1_jdu15vr wrote on March 27, 2023 at 4:41 AM

13b parameter llama is not as good as GPT4.

averyminya t1_jdt33w0 wrote on March 26, 2023 at 11:44 PM

It's using LLaMA and Alpaca

[deleted] t1_jdta7a5 wrote on March 27, 2023 at 12:40 AM

Samantha >>>>>>>>>>

the_funambule t1_jdtjah6 wrote on March 27, 2023 at 1:56 AM

ChatGPT states Samantha is the most accurate representation of AI in movies

darien_gap t1_jdthhsi wrote on March 27, 2023 at 1:40 AM

I've been waiting for this since Apple's concept video in 1987: https://www.youtube.com/watch?v=umJsITGzXd0

Specific-Chicken5419 t1_jdrvy8m wrote on March 26, 2023 at 6:31 PM

lol

axidentalaeronautic t1_jdsuc3k wrote on March 26, 2023 at 10:37 PM

YESSSS 😫 this has been my dream for years.

InfoOnAI t1_jdtab19 wrote on March 27, 2023 at 12:41 AM

I've been trying to set something similar up.

_Alasdair t1_jdy9mfd wrote on March 28, 2023 at 1:48 AM

I built something exactly like this back when GPT3 API came out. Was pretty cool but eventually got bored with it because it couldn't do anything. I tried hooking it up to external apis to get real world live data but by the end everything was so complicated and slow that I gave up.

Hopefully with the GPT4 plugins we can now make something actually useful. It's gonna be awesome.

HesThePianoMan t1_jdtr4iy wrote on March 27, 2023 at 3:03 AM

This is nothing special, just sounds like Google assistant

Sigma_Atheist t1_jdswekv wrote on March 26, 2023 at 10:53 PM

Marvel is cringe. Can we use some other name to compare stuff like this to?

UnexpectedVader t1_jdsyi5s wrote on March 26, 2023 at 11:09 PM

I’m much more in favour of HAL 9000.

Anjz t1_jdsynyi wrote on March 26, 2023 at 11:10 PM

You mean you don't like MODOK?

How about we just name it Dan? Dan's a cool guy.

Comments