Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Bülent Özden
    @bozden:mozilla.org
    [m]
    @ninackjeong: You can scroll the chat upwards to see previous posts, or click on the reply josh posted to locate it automatically.
    Nina Cheonkam Jeong
    @ninackjeong
    @bozden:mozilla.org ah, yes. zeroth, ksponspeech, pansori, I know what they are. I would check what kind of format Coqui STT requires.
    Bülent Özden
    @bozden:mozilla.org
    [m]
    1. It needs all audio in wav format
    1. You would need train.csv, dev.csv (training validation), and test.csv files
    They are like this:
    wav_filename,wav_filesize,transcript
    common_voice_tr_28858334.wav,135980,her şey harika görünüyor
    common_voice_tr_28858331.wav,156716,o istedi
    common_voice_tr_28858337.wav,138284,telefonunu açmıyor
    common_voice_tr_28858335.wav,155564,kalça kemikleri sipsivri dışarı fırlamıştı
    They need to be cleaned from all punctuation and other characters which are not in the alphabet.
    1. You would need an alphabet.txt file in the following format:
    Bülent Özden
    @bozden:mozilla.org
    [m]
    PS: I'm using commonvoice-utils repo by spectie but as Korean is not yet in Common Voice, it does not include Korean yet, perhaps you would like to help it get started?
    Nina Cheonkam Jeong
    @ninackjeong
    @bozden:mozilla.org got it! thank you for your explanation! Okay, if there is anything that I could do, I would go for it
    1 reply
    spectie
    @spectie:matrix.org
    [m]
    i can also look at including Korean
    if someone makes an issue
    3 replies
    Bülent Özden
    @bozden:mozilla.org
    [m]
    Nina Cheonkam Jeong
    @ninackjeong
    several Korean people would join the Korean model
    Bülent Özden
    @bozden:mozilla.org
    [m]
    Welcome @Ykmoon
    Nina Cheonkam Jeong
    @ninackjeong
    I have 41 Korean people's recordings for my experiment and one long spontaneous speech (my family's conversation). Could I provide these to the common voice dataset? If so, is there any human subject-related issue?
    All were collected by me, and I may be the licenser.
    Bülent Özden
    @bozden:mozilla.org
    [m]
    @ninackjeong: No...
    • As I mentioned on DM, Common Voice does not use/include previous datasets/recordings. Volunteers need to record the sentences shown to them on the screen. It can start with family but should include thousands of different people.
    • But first you need to start the language there, by finding 1500+ more sentences for the text corpus
    • All text-corpus and voice-corpus should be CC0/Public Domain, no lesser license, so that the resultant datasets can be used for any purpose.
    • By default text-corpus should be 1-14 words long (English rules), and voice recordings 1-10 seconds.
    Here is the general Common Voice related matrix/element channel:
    https://chat.mozilla.org/#/room/#common-voice:mozilla.org
    YkMoon
    @Ykmoon
    Hello :)
    Nina Cheonkam Jeong
    @ninackjeong
    @bozden:mozilla.org I am a little confused. If Common Voice does not use/include previous datasets/recordings, what kinds of utterances are in?
    1 reply
    @bozden:mozilla.org and if 41 people spoke the same utterances, would it be okay?
    thanks!
    If I could get some Korean samples, that would be nice.
    3 replies
    You need some 1200 more to be able to start recording.
    You add them through sentence collector here:
    These should be self written or from cc0 (public domain) resources.
    See how to page on sentence collector to learn more.
    Also make sure you read the About page on Common Voice:
    Bülent Özden
    @bozden:mozilla.org
    [m]
    These all are about starting Korean on Common Voice. There is NO dataset yet. It will take time to build one, and the next dataset release will be on December 2022. It will be a good idea to start to build a community for this purpose.

    For voice-chess you can use other datasets of course. Best is to use different sentences spoken by 1-5 people (as we cannot generate a custom one for chess). The more data, the better.
    1 person speaking => Voice / gender bias
    1 reply
    41 people speaking same sentences => Text corpus bias
    You can randomly select some from these to add to other datasets of course.
    Bülent Özden
    @bozden:mozilla.org
    [m]
    These are not specific to Coqui STT but general rules in ML/DL...
    Bülent Özden
    @bozden:mozilla.org
    [m]
    Well, I think people are adding new ones...
    Nina Cheonkam Jeong
    @ninackjeong
    @bozden:mozilla.org Thank you for such detailed guideline. I think now I got it. The first thing that I would need to is to add more sentences from public domains via the sentence collector, and then is to start recordings.
    @bozden:mozilla.org I read through the sentences collected so far, but they are quite written style. It seems that they are from some books. Does style matter?
    @bozden:mozilla.org for the voice chess, I would use youtube or other datasets. Thanks.
    2 replies
    Bülent Özden
    @bozden:mozilla.org
    [m]
    Most of them are also shorter sentences, 1-5 words... You need to balance them.
    As they are limited, many people, including myself, should go to old literary works which became public domain. But they need to match currently used language. Thus, I scan them to "translate" the old wording to newer ones. These are mostly descriptive in nature. But if you can find public domain conversational books (stories where people talk to each other, theatrical productions etc) that would be best.
    Bülent Özden
    @bozden:mozilla.org
    [m]
    I'll copy paste these to the other open group also for others sake...
    Nina Cheonkam Jeong
    @ninackjeong
    @bozden:mozilla.org I would copy conversations from me and other people via a messenger, and then provide them to the corpus.