Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For neutral sounding very fast/efficient voices, I find Coqui TTS VITS models to be very good. For slower, more expressive voice or voice cloning I think the Coqui TTS XTTS is good (or you can look at the mrq/tortoise-tts).

I'm still awaiting a StyleTTS2 implementation. The audio samples sound top notch: https://styletts2.github.io/



You're in luck, the code dropped 6 hours ago :) https://github.com/yl4579/StyleTTS2

Looks promising, I'm going to check it out too! MIT license, even! If it's fast enough for real time, it could be the new best option. The paper claims faster inference than VITS...


Ha awesome! I just checked the repo literally before I posted and it was still empty, thanks for the heads up, will give it a spin now.


Just a followup for those interested, inference implementation notes and comparison clip between StyleTTS2, TTS VITS, and XTTS: https://fediverse.randomfoo.net/notice/AaOgprU715gcT5GrZ2


Wow you got it working so fast! I'm still stuck in package manager hell trying to debug a million little issues.


In my post I link to my issue where I outline what I needed to do from a clean mamba env that might help.

Pytorch nightly (I use for cuda-12) doesn't work w Python 3.12, but if you stick w 3.11 or 3.10 you should be ok. Rest was just w/o version numbers if you're on a clean venv should be fine, however there's a bug in the Utils lib that requires a 1-line fix if you're trying to inference (also linked). nltk was the only dependency not listed so not bad compared to most code drops!


I spent a couple of hours debugging why jupyter's debugger wasn't working right, so not exactly related to the code. I did also find and fix that utils bug you mentioned. But my current issue is that phonemizer won't find espeak even though I set the environment variables that are supposed to work. I'll figure it out eventually...

Thanks for writing up your experience! Good to know it works! And it's fast!


Are you on Windows? I've had the issue and was able to fix it by manually adding these system variables:

  PHONEMIZER_ESPEAK_LIBRARY = c:\Program Files\eSpeak NG\libespeak-ng.dll

  PHONEMIZER_ESPEAK_PATH = c:\Program Files\eSpeak NG


Yes I'm on Windows at the moment. I did try setting those yesterday but I must have made a typo or something. I'll try again, thanks!

Edit: Got it working, sounds really great and is super fast as advertised. Amazing! Just tried modifying the code to make it speak more quickly and it worked first try and still sounds good too! This is way better than using Coqui TTS. Just need a few more pretrained models and the voice cloning that was in the paper and this will become super popular very quickly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: