Free and Open Source Software

17932 readers

1 users here now

If it's free and open source and it's also software, it can be discussed here. Subcommunity of Technology.

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago

MODERATORS

Gaywallet@beehaw.org

alyaza@beehaw.org

Text-to-speech options? (beehaw.org)

submitted 8 months ago by friendly_ghost@beehaw.org to c/foss@beehaw.org

8 comments fedilink hide all child comments

I'm setting up a laptop with Mint (Cinnamon) for a person who needs text-to-speech software. It seems like most of the nice-sounding ones are proprietary. Any recommendations for FOSS alternatives? And any ideas why this is an underdeveloped area for open source?

all 9 comments

sorted by: hot top controversial new old

[–] Shareni@programming.dev 14 points 8 months ago* (last edited 8 months ago)

It seems like most of the nice-sounding ones are proprietary.

That's pretty standard. Most FOSS projects don't have corporations feeding them 100's of thousands of dollars. Even when they do, well people still say gimp is far worse than ps. Blender is one of the rare complex projects that can compete with proprietary alternatives.

And any ideas why this is an underdeveloped area for open source?

My best guess is that it's really expensive and time consuming. I'd be surprised if those really good proprietary models didn't cost $100k+ just for training.

[–] Lemongrab@lemmy.one 9 points 8 months ago

Its underdeveloped because it isn't flashy, though quite necessary. Accessibility is one thing that often is neglected (from large support) in general, OSS or otherwise.

Piper tts has quality models. Here are 2 references using it with speech-dispatcher:

Hacked together: https://github.com/rhasspy/piper/discussions/328

Ready-made: https://github.com/Elleo/pied

[–] ninpnin@sopuli.xyz 5 points 8 months ago

AFAIK all of the state of the art TTS models are openly available on huggingface.co or similar. However, I'm not sure if there are nice front ends/UIs for them

[–] valvin@beehaw.org 4 points 8 months ago (1 children)

TTS with coqui xTTS is fun to run with a known voice (10sec wav file is enought). It requires some resources but far less than STT like faster-whisper. I think the main issue is not running them but integrate them with the OS/softwares.

[–] mariah@feddit.rocks 1 points 8 months ago

I tried tts but i get a error trying to tts from a wav file

[–] h3ndrik@feddit.de 4 points 8 months ago* (last edited 8 months ago)

It's been an underdeveloped topic for some time. espeak-ng is available on most distros and has some integrations available that somewhat tie it into the desktop. There are more modern solutions that sound way better. For example Coqui's xtts2, maybe Piper which is part of Home Assistand nowadays. If your language is English, you got quite some more solutions available to choose from. But it's a mixed bag if they sound nice, are easy to install (that also depends on which Linux distro you use and if it's available as a package) and if they tie into the rest of the system. I'm not an expert on this, but I'd also like to have TTS and STT available on my Linux desktop witout putting to much effort into it.

[–] Paragone@beehaw.org 2 points 7 months ago

IF they've the horsepower to run it, I gather there is a reversal of Whisper, called WhisperSpeech, or something like that, which uses an LLM to convert text to speech.

...

Here: found it for you.

https://github.com/collabora/WhisperSpeech

[–] shortwavesurfer@monero.town 2 points 8 months ago

As a blind person, I think it's mostly due to the fact that Linux is only on 4% of desktop. So not many blind people are using it and therefore demanding better software.