Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Alp
    @alpoktem
    Hi, any Dockerfile's I can readily use to do basic inference?
    2 replies
    Nina Cheonkam Jeong
    @ninackjeong
    is there anyone who did Korean data using coqui?
    9 replies
    Edresson 🐸
    @edresson:matrix.org
    [m]
    To fix this error you need to add the parameter "ignored_speakers" in the koreanFormatter function. or add **kwargs in koreanFormatter function. Check the Mozilla formatter: https://github.com/coqui-ai/TTS/blob/c410bc58ef3bd07b72ab05d29bbdc2a6df47afea/TTS/tts/datasets/formatters.py#L31
    6 replies
    awsabdulhamed
    @awsabdulhamed
    I am wondering if it is possible to get a dataset for english singers ?
    4 replies
    awsabdulhamed
    @awsabdulhamed
    hello guys ?what is the best method to generate a singer voice ?thers is any code that can help me ?
    weberjulian 🐸
    @weberjulian:matrix.org
    [m]
    Hey, you can try to train a fast pitch model since it explicitly models pitch but we've never experimented with singing datasets.
    1 reply
    sanjaesc
    @sanjaesc:matrix.org
    [m]
    I trained a VITS model with the juice wrld dataset which was shared here some time ago. The result was a somewhat singing voice when synthesizing but I didn't investigate further. 😅
    2 replies
    Dane
    @bstg32:matrix.org
    [m]
    Oh, you used my little singing Juice model I sent, I hope you liked that. I was curious on how good it would sound, I have a lot more to add to that dataset, hours and hours of raps and songs and him speaking. If anyone wants to teach a newby about singing tts and give me examples on how, I would love to help out in making the ultimate singing Juice WRLD model. I wish I could show him his text to speech model, I wonder what he would think about it. But it's interesting using him as a starting point, because i wonder if i pretrained LJ using his model, would it make a singing black linda? That'd be interesting, or what about a french one.
    1 reply
    Dane
    @bstg32:matrix.org
    [m]
    I will re-upload the link tonight, and give it tomorrow, once I fix it and make it better. Dude, me and you could totally collaborate if you can teach me how this voice training stuff works, and it would be so so fun! Well, for a text to speech, I’m blind, so i use things like espeak, pico, vocalizer by nuance, eti-eloquence/ibmtts, the list goes on. I can't use coqui-tts yet because it's not screenreader compatible and idk how to get a faster than realtime model that i can run on practically any hardware and use with things like orca, speakup, or even fenrir. Those are linux screenreaders. I also have my 200 dollar windows pc, i've nicknamed him wally, because i bought it at walmart. It's a stupid little asus vivobook thing that i actually kinda like. Drunk ramblings asside, i'm excited to play with this singing stuff now. We could make our own small helpers for music, and eventually, make a raspberry pi maybe, be a singer! What an idea!
    i would be very open to the idea of training a lieu voice, and making our own AI hyperpop songs. What fun!
    1 reply
    Dane
    @bstg32:matrix.org
    [m]
    you're blind yourself? Wow! What a dream! Did i read that right? If so, i've managed to set up a team! That's cool! Hey, the blind members of this server, we should make a githu repo called openAccess, with thorsten's permission of course with this next one, because i would love to contribute to openvoice, and make an australian english voice using my, perfect my friends say, australian accent. I can also do an indian and southern american, but my normal accent for those who have heard my voice in pm is sort of a mixture between southern, african american, and caucasian. I don't know the propper wording for that. I have an interesting accent when i'm naturally speaking, more of a hyperish sounding mixture of where i've been throughout my life. It'd be very interesting to clone my reading voice vs my just speaking to people on the internet voice, that being teamtalk because woohoo to beasty audio quality and wav recording. Sideramble again, i can't wait for element call to be fully functioning. But i'm excited now! I hope i'm not breaking any rules with this message, i just got super excited and i think, we can finally open speech for everyone, i even have plans for a new AAC device using AI. Just making it easier for everyone, having the SingerAI built into the OS, so anyone can sing, and speak, multiple languages. I know this awesome guy named Mike Hanson, and me and him mess with larynx from time to time, trying to figure ways to make it work with screenreaders and such, and it'd be awesome to have testers for a new system? Maybe, we can somehow combine vitts fully into a system that could even run on 32 bit systems, like the rpi0? Yeah, it'd be slow, but think of how fast, building from that small beast, to something like an m1 would be! So, so fast! The tech is there!
    1 reply
    Dane
    @bstg32:matrix.org
    [m]
    and there is this awesome thing synesthesium made for the raspberry pi called glowtts, that uses espeak, the synth us SR users know and love, as a phonemizer. Just imagine, an ai synth that is basic that could just send espeaks pitch rate and prosody streight to the network, and it would sound just like whatever voice was trained with vitts or similar. that would be amazing, and it would be very lightweight and fast? That would possibly be years in the making, I’m not good with this coding stuff, so somebody else would have to tell me if that’s even feasible. It’s just an idea though. I plan on with help from the community and selfteaching eventually make openspeech accessible to everyone. Thorsten? I'm sorry for the misspelling, but the german guy who makes the awesome videos, this would probably interest you, i would imagine. Because we could all be a giant free world together! It could be wonderful! I love free and open source tech, this would be a great fun thing to do, build the next generation of speech tech, in all departments. Thorsten, i appreciate openvoice a lot, and i hope my ideas can be a contribution, i just don't really know how to get started. If this isn't the right place for this, again, i'm sorry, i'm just typing out loud, because my volumes a bit high and it's reading by character on the iPhone, here.
    Dane
    @bstg32:matrix.org
    [m]
    Did you experiment with the voice converter with juice's vitts model? I'm curious what it would sound like if that model tried to sing his own songs. How long did it take to train the voice? I'd love to experiment with that, considering vitts is the best free neuralnet i've ever heard, maybe i could make a web app that raps around tts and lets people play with the voice converter and stuff, making their own music with a singing voice. I wouldn't use juices publicly, I'd train a singing version of eloquence tts or something like that lol. That'd be interesting, i'm thinking of making an eti-eloquence dataset and training it. Then, expressive eloquence!
    sanjaesc
    @sanjaesc:matrix.org
    [m]
    It was just a quick experiment, I didn't really do much testing with it. Think I only trained it for half a day but don't remember 😅
    Dane
    @bstg32:matrix.org
    [m]
    do you still have the model, and would you mind sharing it to me? I would love! To experiment with it, break it in very cool ways, make it speak multiple languages, etc.
    sanjaesc
    @sanjaesc:matrix.org
    [m]
    I don't have it anymore sorry. But it was just a simple test with the data you shared. It wouldn't be able to speak other languages.
    Dane
    @bstg32:matrix.org
    [m]
    i only said that because i know you created it, so i was just confirming weather or not i could openly and freely share what i learn as i start actually getting into this openspeech world that i love so much. I have a lot in mind, like making natural voices for assistants, voices for computers, like as a general predictable voice like eloquence or espeak, a singing voice, for foolery and fun, a couple test voices to see how good i can attempt accents, i even have a friend who's willing to help me make a romanian english voice, that'll be a long time though, but it'll be fun. Thanks for making the openvoice project. I only looked at it briefly a couple months ago so i don't know much about it.
    josh 🐸
    @josh-coqui:matrix.org
    [m]
    the ThorstenVoice data is under a Creative Commons Zero License : https://openvoice-tech.net/index.php?title=Thorsten_(neutral)
    😎
    thorsten.mueller
    @thorsten.mueller:matrix.org
    [m]
    Yes, it is ☺️
    Dane
    @bstg32:matrix.org
    [m]
    in that case, i now want to make an english thorsten. Like one that i can just throw in an inferer real quick, on any computer, and it could speak and sing in both german and english. Maybe even spanish, or indian english years from now, unless that's already possible. I'm ltd on hardware right now, i have my m1 mini only and my google colab, i was thinking of starting with colab, considering they fixed it significantly accessibilitywise. I've installed coqui-tts on mac os before, on the m1, so i figured i'd use that as the inferer. I would use it as the trainer, but that's not feasible. I have some time on my hands, so i figured I could learn, and make some interesting projects using thorstens models. I'm assuming thorsten's colab still works fine, and i can use it to train vitts models? Is vitts the best architecture to use for a custom voice? I read about people using fastpitch and tacotron and others, but vitts sounds like an interesting one, since it can voice convert and almost voiceclone now with yourtts.
    Daniel D
    @daniel-dona
    Good morning! I'm training a custom dataset with VITS and I had to change the batch_size to fit my GPU memory, does that change need to be compensated in the learning rate somehow? I read something about this training Tacotron2 in the past, but I'm not sure...
    4 replies
    weberjulian 🐸
    @weberjulian:matrix.org
    [m]
    Try upping it to 8 and set the max audio length to a lower value
    You can also experiment with the architecture, lowering the hidden sizes and number of layers
    Jarod
    @Chaanks
    Hello, I tried the VITS pretrain model and there is a bug. An error is concatenated with the input text before the synthesis, I extracted the error message from the tokens array :
    ailed to create secure directory (/run/user/2323/pulse): No such file or directory həlˈoʊ wˈɜːld
    Ludovic Vialle
    @lvialle
    Hello there! Is there a way to change or set the pronunciation for words in a dictionary ? I know that tools like Nemo use a CMU Dictionary, but I cannot see how I can achieve that on coqui-tts. I'd like to fix a few words that are pronunced like it was a french word (like Twitter spoke as "Twitté") or add specific domains words
    4 replies
    erogol 🐸
    @golero:matrix.org
    [m]
    Nice tutorial video about 🐸TTS https://www.youtube.com/watch?v=cQl9aXp-uO4
    ZDisket
    @ZDisket
    VITS + TorchMoji
    2 replies
    sanjaesc
    @sanjaesc:matrix.org
    [m]
    How do you condition it during inference? Can you choose from a set of emojis?
    ZDisket
    @ZDisket
    No, I input separate text into the TorchMoji, like "I'm so excited!" (like in the second clip) and feed the output as input to the TTS model. It's hard for me to explain in words, I'll wrap up a notebook in a sec
    2 replies
    erogol 🐸
    @golero:matrix.org
    [m]
    This is a smart 🔼 way to get expressive speech
    Looking forward to trying it out in a notebook
    Dane
    @bstg32:matrix.org
    [m]
    my blindy friend ethan is using ben andrews app to train tacotron2 models. My question is, do i need a gigantic gpu to train models? Now, i have days and days i can leave my m1 on. So, if i can at all, i'll hire my m1 to make models. What would be the param to force cpu training? I'll do it, i'll put my poor m1 through that torture, as long as it wouldn't damage it.
    1 reply
    bstg32
    @bstg32:matrix.org
    [m]
    was this an nvidia GPU?
    erogol 🐸
    @golero:matrix.org
    [m]
    Yes it was.
    Baybars Külebi
    @gullabi
    hi, is trainer tested for multi-node training in clusters? i still couldn't get it to train multi-node after updates and using the new instructions from the docs.
    1 reply
    if it is intended to work in clusters and i am having a problem, i will open an issue with the details
    Baybars Külebi
    @gullabi
    updates: i am currently trying to use trainer combined with torch distribute to run TTS on a cluster
    4 replies
    Nabarun Goswami
    @naba89
    Hello, new here! Nice to meet you all! 🥳
    erogol 🐸
    @golero:matrix.org
    [m]
    @gullabi: mutli-node meaning multiple machines with multiple GPUs or just multiple GPUs? For the first case, 👟 is not tested.
    erogol 🐸
    @golero:matrix.org
    [m]
    Alas I have no idea. I've never tried. Hopefully someone in the channel might inform us
    pdav
    @pdav:matrix.org
    [m]
    the NaturalSpeech paper from Microsoft... wowie
    Nabarun Goswami
    @naba89
    Those results sound truly amazing. I wonder if having access to a memory bank of posterior latents during inference rather than generating from scratch is what makes the synthesized speech sound so good. However, the ablation shows that the memory feature leads to least improvement in CMOS.
    awsabdulhamed
    @awsabdulhamed

    hello guys i'm trying to pip install TTS ,i get that error any sultions :

    error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64\cl.exe' failed with exit code 2
    [end of output]

    note: This error originates from a subprocess, and is likely not a problem with pip.
    ERROR: Failed building wheel for pyworld
    Failed to build pyworld
    ERROR: Could not build wheels for pyworld, which is required to install pyproject.toml-based projects

    1 reply
    image.png
    arif334
    @arif334:matrix.org
    [m]
    Quick question: Is any text_cleaner used during inference? I've trained my models with the basic_cleaners. But just noticed that, it is not invoked during inference. Shouldn't we clean the text before passing to the synthesizer?
    thorsten.mueller
    @thorsten.mueller:matrix.org
    [m]
    I am cleaning my text manually / with external tooling before passing it to TTS.