Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    petri koski
    @petri_koski_twitter
    File "/home/petri/kur-env/lib/python3.4/site-packages/kur/utils/normalize.py", line 144, in <lambda>
    return lambda x: (x - self.state['mean']).dot(
    ValueError: operands could not be broadcast together with shapes (1213,442) (161,)
    after trying to train a model .. kur train speech_myyntipitsi.yml
    petri koski
    @petri_koski_twitter
    It seemed to be audio decoding problem .. now its training but giving this after each epoch .. "InvalidArgumentError (see above for traceback): slice index 0 of dimension 0 out of bounds." Is it about vocab - size .. ?
    roxima
    @roxima
    Does Kur support training a speech model on non-English datasets?
    Jeungmin Oh
    @ohtangza
    Hello, Kur community. Thanks for a great framework. I need a little advice from you guys regarding project feasibility. Can we use Kur for actual speech recognition system for production-level service? (We are currently using Google Speech API and expecting a huge bill if we do not build our own due to large volume of requests.) My team will soon have tens of hours of labelled sounds and would love to build our own ASR model for real service. We have a machine with 112GB ram and Nvidia K80 X 2! Do you guys think this is possible? Please understand I have know prior-knowledge on DNN (only with a traditional HMM/SVM style ML experience).
    Stan Silas(Vivek Mangipudi)
    @StanSilas
    Hello I'm new to kur!
    I have few questions:
    How many hours of audio + annotations is recommended to train/build our own domain specific asr?
    can we use kur for production?
    Has anyone been able to build/train their own asr? (If so could they please add a small tutorial on how to do so)
    Thanks in advance
    Idealist001
    @Idealist001
    Does Kur support distributed TensorFlow ?
    clabinc
    @clabinc
    Nice to join this room!
    I'm training multi-language:Korean and English with one model of kur!
    I have three questions
    one, is there any visiualization tool for monitoring behaving a model?
    two, do you have a plan for standby-mode or api for kur?
    three, where can I get kur 0.7? isn't it a product version? if the beta one, I'd like to get it whatever...
    Ezequiel Adrián
    @EzequielAdrianM
    Hi guys, I'm facing some restrictions. I am currently training speech.yml example on tensorflow-gpu with one GTX 960. The problem is that due to its low memory (2GB), i am having a lot of OOM warnings while creating tensors and eventually kur stops the training. My question: Can I prevent this behaviour by changing someting on the speech.yml settings? Or I must buy a better graphincs card? Thankyou!
    Adam Sypniewski
    @ajsyp
    @EzequielAdrianM Try decreasing your batch size and/or imposing a limit on the maximum audio length (with a max_duration: X entry under the settings.data section).
    @clabinc -- Kur 0.7 is in PyPI. Alternatively, you can just grab the latest development version from GitHub.
    Also, there have been a couple questions about multi-language. Yep, Kur can happily do different languages.
    Kur is also definitely ready for production, @StanSilas. We use it at Deepgram. Some examples of using Kur's Python API (@clabinc) can be found in the discussion on GitHub (https://github.com/deepgram/kur/issues/23).
    Ezequiel Adrián
    @EzequielAdrianM
    Suppose the model is completely trained and I want to recognize the speech in a new WAV audio file. How can I make that using your python API and trough the bash? Thanks
    Adam Sypniewski
    @ajsyp
    Take a look at the examples in #23 and see if that helps.
    Ezequiel Adrián
    @EzequielAdrianM
    Hi there! Why [ Inferring vocabulary from dataset
    kur.supplier.speechrec is marked as a warning? What can I do in order to fix this warning?
    Liquorshotz
    @jungwon1413
    @ajsyp Nice to see your answers! Deep thanks from your wonderful framework btw. Anyway, is there any plan for kurhub? I often see you mention about this, but I never get a chance to see this webpage's working.
    Adam Sypniewski
    @ajsyp

    @EzequielAdrianM -- you need a vocabulary for a training set, so that your model knows what it is permitted to output. You can specify one manually, either inline in the Kurfile (using vocab: ['a', 'b', 'c', ...]) or on-disk as a JSON file (using vocab: my-vocab.json, and then my-vocab.json contains ["a", "b", "c", ...]).

    If you don't specify one, then the best Kur can do is guess your vocabulary from the training data itself. This is definitely intended for testing/debugging only. Why? Because otherwise, your model will "lock into" the vocabulary you present it, and then suddenly fail when you show it new data. For example, let's say you have a character-based model that learns the letters of the alphabet. Let's say you give it some training data that just happens to not include the letter z and you make Kur infer the vocabulary. Now it's only learning 25 letters instead of 26! And what happens when you show it data that includes z? Suddenly, it doesn't work, because it will infer 26 letters now, and that is incompatible with the 25-letter weights Kur saved earlier.

    So the short version is: only let it infer for testing/playing around. Then lock-in a vocabulary.
    @jungwon1413 Great question. We've gone back and forth with it at Deepgram, and we had a beta run earlier this year as part of our Deep Learning Hackathon we hosted in San Francisco. Right now, though, it's in stasis while we juggle other priorities.
    Liquorshotz
    @jungwon1413
    @ajsyp Ahh, I see. One more question about general kur framework usage, is there anyway to implement distributed deep learning model? (For example, I want to assign CNN Layer1 to use PC-1's GPU-0, Layer2 to use PC-1's GPU-1, RNN Layer1 to use PC-2's GPU-0 and Layer2 to use PC-2's GPU-1.) If there isn't, is there any plan for adding such feature in future?
    Adam Sypniewski
    @ajsyp
    It already supports out-of-the-box multi-GPU training and/or inference using the TensorFlow or PyTorch backends, yep! The implementation uses data-parallel training (splitting training data over multiple devices, each of which has its own copy of the model parameters) rather than model-parallel (which splits the model parameters over multiple GPUs), so you can't tell layer X to be computed on device Y.
    Liquorshotz
    @jungwon1413
    @ajsyp Always thanks for your kind responses. So it's rather data-parallel than model-parallel! I guess it's possible to use multiple GPUs over different devices (in my case, it's PC). I probably need test this feature. :smile: I got another question about backend provider. Recent version of tensorflow, which is tensorflow r1.4, supports Keras within its module. (it's like, tf.Keras) Would this affect setting backend for YML file later on? (I don't know whether this change would make a deep learning model under Keras environment faster or not, and I think it's not that big deal with using kur framework.) Also, I'm assuming that the example model for speech recognition (speech.yml under example folder in github) is using 1D ConvNet, and using 'utterance' as an input. I'm trying to make this into 2D ConvNet, and I thought 'asr' is a good input for this. Problem is, I don't know what to put for output since i'm using data specification. (If i put 'asr' for output at the same time, it's more like saying 'My model's input and output are same.') I wonder this is where Kur API kicks in.
    Adam Sypniewski
    @ajsyp
    I haven't taken the time to port Kur's use of Keras into TensorFlow tf.Keras. I don't really think this will have any impact on the speed/accuracy or anything else, and it shouldn't affect the Kurfile at all.
    As for layer names, they are 100% arbitrary--Kur will happily let you play with the names. They are only used to logically connect / reference pieces of your model, and to tie the model input/outputs to the data sources. So you can call them whatever you want, as long as you are consistent and have the respective name for your data source.
    Liquorshotz
    @jungwon1413
    @ajsyp Wow, thank you for detailed answers! These are helping me a lot!
    Adam Sypniewski
    @ajsyp
    :thumbsup:
    Liquorshotz
    @jungwon1413
    @ajsyp One more question about speech recognition supplier. When I check with command 'kur -vv build speech.yml', I see speech recognition example uses 1D conv. But when I actually check utils/audiotools.py file, I see utterance has a shape of (time, frequency). (I see the comment saying '# Format spec as (time, frequency)') I want to know whether kur speech recognition example uses 1D Conv structure, or 2D Conv structure. (I know direct output shape from input layer is (None, None, 13) for mfcc, and (None, None, 161) for spec with 16000 Hz sampling rate dataset.)
    Do I really need to consider about changing Convolutional Layer structure for better performance? (I'm not so clear about the example's model being 2D ConvNet or not.)
    Liquorshotz
    @jungwon1413
    Here's what I tried so far:
    1) Changing CNN Size into [n, n] format from just an integer. (Which gave me error)
    2) Setting up Input shape into [330, 13] for mfcc (Which worked fine with kur build ~, but failed on kur train ~)
    3) Changing Kernel value into [n, n] format from just an integer. (Which I expected an error, and it did.)
    4) Using custom name for input name, other than pre-defined features from speech_recognition supplier (such as utterance, asr, etc.) - Hardly worked with many many modifications, but still gave me error with ctc loss part.
    Liquorshotz
    @jungwon1413
    I'm just trying to make example speech recognition YML file, look like a model described in deepspeech2 paper. https://arxiv.org/abs/1512.02595
    Allen
    @allenleein
    Hi all! Is there anyone working on implementing Kur on iOS (offline)?
    mahi19
    @mahi19
    My laptop got shut down. how to reload my model after reducing its loss from 250 to 20 so that it starts from 20 itself.
    Ezequiel Adrián
    @EzequielAdrianM
    @ajsyp Hi Adam, in which file settings and which line do I have to add ['a', 'b', 'c', ...] in order to specify the training and test vocabulary? I am using the speech.yml sample but I do not find a way solve that problem :c Thankyou in advance
    Adam Sypniewski
    @ajsyp
    @EzequielAdrianM try this:
    settings:
      data:
        vocab: ['a', 'b', 'c', ...]
    @mahi19 -- Kur can automatically save during training (see the checkpoint feature in the Kurfile specifciation) or validation (using weights)
    @jungwon1413 -- Kur can do 1, 2, or 3 dimensional convolutions on speech!
    Kur is, after all, just a framework for doing deep learning--you can use it for more than speech if you want.
    The speech.yml example we provided does 1D CNN, though.
    sxpstls
    @sxpistols
    hey guys, i have downloaded data from speech.yml url, how can i change the url to path on my local pc?
    jack pod
    @lag1337_twitter
    Does the speech recognition support other languages(unicode) ?
    Ezequiel Adrián
    @EzequielAdrianM
    Hello friends, I wander to know if it is possible to make Kur output alternative words for the words that guessed and its probability in percent. Too difficult?
    Petrimnz
    @Petrimnz
    Hello all! With latest Kur (17.8.2018 installed with pip ..) I am getting OOM errors with Tensorflow backend. Process just get killed and Dmsg gives me: "Out of memory: Kill process 2183 (kur) score 986 or sacrifice child [26757.059828] Killed process 2183 (kur) total-vm:39825696kB, anon-rss:15910776kB, file-rss:0kB, shmem-rss:4kB". I am doing speech to text using out-of-the-box speech.yml -file but with my own data. Batch size is 1. Max audio lenght is 1777 seconds (Yes, its a lot .. but average lenght is 300-500 seconds) .. Is this a bug or Do I just have too big audio files ? My PC is running ubuntu / 8gb ram and CPU is i7, so no GPU -yet-. Switching to Theano backend wont help, because its get stucked in compile and whole machine freezes.
    hisoyeah
    @hisoyeah
    Hello all ! I'm trying to adapt the speech.yml example to my custom dataset (french) with a vocab size different. I did read all posts about that but I'm still having issues.
    hisoyeah
    @hisoyeah

    I have the latest Kur and here is my speech.yml:

      vocab:
        ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', ' ', '"', "'", '.', 'é', 'è', 'ç', 'ê', 'ù', 'û', '-']
        size: 37

    I got this error message : expected <block end>, but found '<block mapping start>'
    in "<unicode string>", line 80, column 5:
    size: 37

    I tried to remove size: 37 and hard-code dense: "{{ 37 + 1 }}" but it failed in the loss part (CTC)

    2018-08-28 10:12:15.648394: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at ctc_loss_op.cc:166 : Invalid argument: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 38 labels: 31,26,32,31,0,14,16,23,12,0,14
    [ERROR 2018-08-28 10:12:15,957 kur.model.executor:352] Exception raised during training.

    thanks for your help

    Petrimnz
    @Petrimnz
    Hello, try putting this ,"END"] after '-', let me know if that helped.
    Rex Ridley
    @lexridley_twitter
    Hi. Is this lobby still active? I have a probably stupid question - once i have trained a model how exactly do i use it? IE how do i run audio through it to transcribe? (Im new to linux, python and kur/deepspeech, but learning quickly!)
    @jungwon1413 @ajsyp