Viewing a single comment thread. View all comments

itsnotlupus t1_jdt280v wrote

Whisper is the speech recognition component.
I don't think he said what he's using for TTS, might be MacOS' builtin thingy.

4

eggsnomellettes t1_jdt5dxl wrote

They're using elevenlabs, which isn't local and hence a slow API call

11

tortoise888 t1_jdtp8yj wrote

If we eventually get open source Elevenlabs quality models running locally it's gonna be insane.

1

ebolathrowawayy t1_jdvfmrk wrote

There's also Tortoise TTS which can be run locally but idk how fast it is.

1