Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    spectie
    @spectie:matrix.org
    [m]
    Then it will save the one at ,74
    smfai200
    @smfai200:mozilla.org
    [m]
    I was just excited to finally see some output working!
    πŸ˜‚
    spectie
    @spectie:matrix.org
    [m]
    Most of the developers are here
    I'm not sure who wrote the original swift code
    But maybe it was reuben
    manish.jain
    @manish.jain:matrix.org
    [m]
    reuben: Any document is available for steps to configure code for ios ?
    can we run sample swift code on iPhone 8 ?
    spectie: Thanks for adding me
    spectie
    @spectie:matrix.org
    [m]
    Np!
    reuben
    @reuben_m:matrix.org
    [m]
    manish.jain: I don't think there's much documentation for it. you build the static framework with bazel and then use it in your iOS app, or the example project.
    manish.jain: iOS support is still experimental so any reports from actual usage are greatly appreciated :)
    jlampart
    @jlampart:matrix.org
    [m]
    Considering manually splitting 10-min+ audio myself for hundreds of audio files wouldn't be very scalable, would you say the training data would be unusable if some words are cut off a bit?
    The faster the speaker, the tougher it is for the subtitles timestamps to record the exact audio down to the nearest millisecond I would imagine
    spectie
    @spectie:matrix.org
    [m]
    Not unusable
    I would try it and see how it works
    If the results aren't good, then try and improve the data processing
    Maybe using dsalign or something like that
    manish.jain
    @manish.jain:matrix.org
    [m]
    reuben: i have hard time genrating static framework , Can u pls send me both for simulator or real device if you have pls ?
    1 reply
    I have added model to make code work but it crashes on my iPhone 8 so i want to run it on simulator
    smfai200
    @smfai200:mozilla.org
    [m]
    Coqui-STT or DeepSpeech Repo to Follow for better?
    manish.jain
    @manish.jain:matrix.org
    [m]
    DeepSpeech
    spectie
    @spectie:matrix.org
    [m]
    deepspeech is unmaintained
    smfai200
    @smfai200:mozilla.org
    [m]
    Quite Right with What i saw from the git
    I guess should we move to Coqui-STT Repo
    manish.jain
    @manish.jain:matrix.org
    [m]
    so should i compile code from STT
    ?
    smfai200
    @smfai200:mozilla.org
    [m]
    Probably, I had the misunderstanding as you have!
    manish.jain
    @manish.jain:matrix.org
    [m]
    reuben: so i needs to generate framework for swift on STT repo with bazle
    reuben
    @reuben_m:matrix.org
    [m]
    correct
    manish.jain
    @manish.jain:matrix.org
    [m]
    that should be correct path?
    okay
    jlampart
    @jlampart:matrix.org
    [m]
    What if the next silence is not after the next word, but a few more words until the speaker actually pauses for a second?
    Though to be sure, I'll study the YT subtitle file a bit more closely to try and figure out how the timestamps are determined, if it matches when the speaker actually pauses for a second before continuing to speak
    Thanks for suggesting to split at the next silence after the timestamp tho, Ciaran!
    Ciaran (ccoreilly)
    @ccoreilly:matrix.org
    [m]
    Yeah, that's true. It is not an easy task. Maybe as suggested by spectie you can realign the text with dsalign and then split.
    spectie
    @spectie:matrix.org
    [m]
    DSalign does the splitting for you
    it first uses vad and then aligns the text with the VAD segments
    jlampart
    @jlampart:matrix.org
    [m]
    Got it, will give it a try. Thanks!
    jlampart
    @jlampart:matrix.org
    [m]
    spectie: Just confirming this is the correct repo: https://github.com/mozilla/DSAlign
    spectie
    @spectie:matrix.org
    [m]
    yup
    jlampart
    @jlampart:matrix.org
    [m]
    I did a quick test and the aligned.json output looked like gibberish. I tried running the $ bin/align.sh --audio data/test1/audio.wav --script data/test1/transcript.txt --aligned data/test1/aligned.json --tlog data/test1/transcript.log
    command (replaced with my own files and directory)
    I assume I need to configure it to work for Spanish (the provided script seems to be only for English)
    But thanks for confirming this is it. I assumed from your explanation above it was some tool where I can supply the .wav audio file and transcript and manually align them before it splits.
    1 reply
    Yeah, looks like I need to do more preparation: https://github.com/mozilla/DSAlign/blob/master/doc/algo.md
    spectie
    @spectie:matrix.org
    [m]
    yes
    but i don't think you want to do it manually for all of your data
    1 reply
    jlampart
    @jlampart:matrix.org
    [m]
    --stt-model-dir <DIR> points DeepSpeech to the language specific model data directory. It defaults to models/en bingo