Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Rajat Singhal
    @srajat84
    if it is data pipeline, we can extend it by adding more support for other storage type here: https://github.com/Open-Speech-EkStep/audio-to-speech-pipeline/tree/master/packages/ekstep_data_pipelines/common/infra_commons/storage
    Meera Mohan
    @meera_mohan:matrix.org
    [m]
    Thank you sir
    Rajat Singhal
    @srajat84
    @meera_mohan:matrix.org we are already collecting datasets of many languages for pre training. What dataset you are looking for exactly and how much. Please send you dataset requirements.
    vis27
    @vis27

    Hi @agupta54 I was implementing the steps of the link which you have provided but in the bash start_pretraining_base.sh step I am getting the below error:
    "RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx"
    I assume it requires nvidia gpu to pretrain the model so I switched to google colab to leverage GPU but there I was stuck with the last step of the wav2letter module - pip install -e .
    The error was:
    "ERROR: Command errored out with exit status 1: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/content/wav2letter/bindings/python/setup.py'"'"'; file='"'"'/content/wav2letter/bindings/python/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' develop --no-deps Check the logs for full command output."

    Hence I am unable to complete my testing. Could you please help me out with either of the error?

    Anirudh Gupta
    @agupta54
    @vis27 pretraining is a very computationally intensive process and wouldn't make much sense to do it on a cpu. We assumed that nvidia drivers are already set up.
    On Google colab please make sure that you are executing any shell command with a ! in the beginning of the command or use %%bash in cells which strictly comprise of shell commands. (in your case pip)
    Even then there can be compatibility issues since wav2letter requires a particular version of C++ compiler. And it will be a little tricky to to change the C++ version inside colab.
    vis27
    @vis27
    @agupta54 ofcourse I was using ! in the beginning of shell commands and didn't faced any issues when installing fairseq, kenlm packages in colab. It was wav2letter package which was causing issue. May be it is due to C++ compiler issue.
    Another query I have is if I am using docker environment to execute above commands then I should install nvidia driver inside the docker environment also? Because my laptop had nvidia driver but I guess it was not accessible through docker environment.
    Mirishkar S Ganesh
    @mirishkarganesh
    Hi Team Vakyansh
    How can I test the models on my own audio data? Can you help me with the API if you have any?
    Anirudh Gupta
    @agupta54
    Hi @mirishkarganesh what is the duration of audio you want to test?
    Rishi Lulla
    @rishi-lulla
    Hi.. I am new to Vakyansk. I would like to use the Hindi language pre trained model on some audio files I have. Can you guide me how can I get started? Is there a step by step process available to use the pre trained model?
    Mirishkar S Ganesh
    @mirishkarganesh
    @agupta54 I would like to have transcripts for 20-30 seconds of audio data
    Anirudh Gupta
    @agupta54
    @rishi-lulla pre-trained model will not be able to produce any text, it will produce a representation of the audio. You can simply do it by loading the model in pytorch and do a forward pass. If your goal is to generate text from audio, you will have to use the fine-tuned model.
    @mirishkarganesh for that you can use the this repo https://github.com/Open-Speech-EkStep/vakyansh-wav2vec2-experimentation and use the single_file_inference.sh code. All installation and usage instructions are given.
    Mirishkar S Ganesh
    @mirishkarganesh
    Hi, @agupta54 I have tried to install the repo which you have mentioned above at wav2letter I was stuck with the last step of the wav2letter module - pip install -e . The error was ERROR: Command errored out with exit status 1: /home/asr/anaconda3/envs/ekstp/bin/python -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/asr/wav2letter/bindings/python/setup.py'"'"'; file='"'"'/home/asr/wav2letter/bindings/python/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' develop --no-deps Check the logs for full command output.
    Gowtham.R
    @gowtham1997

    Hi Vakyansh team,

    Thanks for the amazing work on CLSRIL-23
    Do you have the statistics/distribution of how many hours of pretraining data was used for each of the 23 indic languages?

    14 replies
    Surendrasingh Sucharia
    @surendrasinghs

    I (as a product manager) would love to understand Vakyansh. Can you help me understand various building blocks?

    It would be great if you can share

    • Are there any user facing apps which are leveraging micro services?
    • List of micro services
    • Any pluggable tool or widgets?
    6 replies
    Anirudh Gupta
    @agupta54
    This message was deleted
    Aswin Pradeep
    @aswinpradeep
    image.png
    Hi Team, i am pretty new to grpc.
    I cloned the open api and ran the server.py it seems model is loaded and server is running fine.
    However, when I edit the IP address in main.py and make it access the localhost, running the client code is giving out this error:
    soujyo
    @soujyo
    @aswinpradeep It should work fine if you had used '127.0.0.1:50051' in main.py
    Aswin Pradeep
    @aswinpradeep
    image.png
    its still the same
    @soujyo
    soujyo
    @soujyo
    It runs fine at my end.Lets connect on this once tomorrow
    Aswin Pradeep
    @aswinpradeep
    ok sure
    Aswin Pradeep
    @aswinpradeep
    @soujyo
    How can we connect? shall I drop you an E-mail?
    soujyo
    @soujyo
    Sure something for 4pm would be good
    Aswin Pradeep
    @aswinpradeep
    @agupta54
    I have been trying to run the open Api by directly using the finetuned models downloaded from the models repo and they were not working.
    However, tried generating a custom model using the script in wav2vec experimentation repo, and things are fine. Can you help me understand why is this custom_model generation necessary and what is basically happening in this phase.
    1 reply
    Ajitesh Sharma
    @aj7tesh

    hi @agupta54 This is Ajitesh from Team Anuvaad, there couple of problems we are facing. For some of them Soujyo helped still a few of them are there. Kindly let us know how to solve them:

    1. To run the srt generation I have to comment this :
      from inverse_text_normalization.run_predict import inverse_normalize_text
      in model_service.py and set enableInverseTextNormalization=False in examples/python/main.py

      Otherwise with ITN enabled I get following error

      File "/home/ajitesh/anaconda3/envs/server-vakyansh/lib/python3.8/site-packages/inverse_text_normalization/hi/taggers/tokenize_and_classify_final.py", line 20, in <module>
      from inverse_text_normalization.hi.graph_utils import GraphFst, delete_extra_space, delete_space
      File "/home/ajitesh/anaconda3/envs/server-vakyansh/lib/python3.8/site-packages/inverse_text_normalization/hi/graph_utils.py", line 53, in <module>
      suppletive = pynini.string_file(get_abs_path(data_path + 'suppletive.tsv'))
      File "extensions/_pynini.pyx", line 1042, in _pynini.string_file
      File "extensions/_pynini.pyx", line 1118, in _pynini.string_file
      _pywrapfst.FstIOError: Read failed

    1. I was trying to generate srt for various length of audio files. In most of the cases it was working when length is <1 min with punctuation enabled
      However with a file of aorund 3 min, I was able to get srt printed howver in the last step I was getting below error which I think is related to punctuation because when I set it to false it works.

      File "/home/ajitesh/anaconda3/envs/server-vakyansh/lib/python3.8/site-packages/punctuate/punctuate_text.py", line 76, in get_tokens_and_labels_indices_from_text
      output = self.model(input_ids)
      File "/home/ajitesh/anaconda3/envs/server-vakyansh/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
      result = self.forward(input, **kwargs)
      File "/home/ajitesh/anaconda3/envs/server-vakyansh/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 149, in forward
      return self.module(
      inputs, kwargs)
      File "/home/ajitesh/anaconda3/envs/server-vakyansh/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
      result = self.forward(*input,
      kwargs)
      File "/home/ajitesh/anaconda3/envs/server-vakyansh/lib/python3.8/site-packages/transformers/models/albert/modeling_albert.py", line 1069, in forward
      outputs = self.albert(
      File "/home/ajitesh/anaconda3/envs/server-vakyansh/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
      result = self.forward(input, **kwargs)
      File "/home/ajitesh/anaconda3/envs/server-vakyansh/lib/python3.8/site-packages/transformers/models/albert/modeling_albert.py", line 685, in forward
      embedding_output = self.embeddings(
      File "/home/ajitesh/anaconda3/envs/server-vakyansh/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
      result = self.forward(
      input, **kwargs)
      File "/home/ajitesh/anaconda3/envs/server-vakyansh/lib/python3.8/site-packages/transformers/models/albert/modeling_albert.py", line 239, in forward
      embeddings = inputs_embeds + position_embeddings + token_type_embeddings
      RuntimeError: The size of tensor a (848) must match the size of tensor b (512) at non-singleton dimension 1

    1. What are the idea system requirements for hosting open speech api service with these fine tuned model? CPU as well as GPU?
    4 replies
    Mirishkar S Ganesh
    @mirishkarganesh
    Traceback (most recent call last):
    File "../../utils/inference/single_file_inference.py", line 386, in <module>
    result = parse_transcription(args_local.model, args_local.dict, args_local.wav, args_local.cuda, args_local.decoder, args_local.lexicon, args_local.lm_path, args_local.half)
    File "../../utils/inference/single_file_inference.py", line 361, in parse_transcription
    model.cuda()
    AttributeError: 'dict' object has no attribute 'cuda'
    2 replies
    can you help me with the above issue?
    Akash Singh
    @singhaki
    hi team,
    I was trying to generate hindi pretrained model using generate_custom_model.sh but the generated model is giving blank output
    1 reply
    Akash Singh
    @singhaki

    hi team,
    I was trying to generate hindi pretrained model using generate_custom_model.sh but the generated model is giving blank output

    Also,
    first i was trying to generate custom model from hindi_finetuned_4k, there was an error for w2v_path, so i passed the hindi_pretrained_4k, model got converted but while inferencing on single audio file it was giving blank output. which pretrained checkpoint to use for custom model generation?

    Second, i have also tried to convert the model to huggingface transformers model, but the converted model was giving random output. How can we convert to hugging face? as i have tried converting fairseq english model to hugging face. It got converted.

    1 reply
    tensorfoo
    @tensorfoo
    hi guys. Just came across Vakyansh, really impressed so far! well done
    SUJIT SAHOO
    @Sujit27
    Hi @agupta54 @soujyo. This is Sujit from Anuvaad. I see two different repos for setting up inference server -- speech-recognition-open-api and inference_service_wrapper. What is the difference between these two, and which one is the preferred repo for setting up inference ?
    2 replies
    Kanchan112
    @Kanchan112
    Hello, we are trying to develop ASR for Nepali language. Instead of directly finetuning for ASR, we are thinking to do some more pretraining with Nepali data, by starting with the CLSRIL-23 model. How should we begin on this? We have 4 tesla k80 GPUs. So, how feasible would it be for us in terms of training resources if we decide to train on 100 hours of audio data?
    1 reply
    tensorfoo
    @tensorfoo
    @Kanchan112 yeah that should be totally do-able with those resources. But you might have to reduce the max token size because the GPUs have less than 16Gb vram. Try 120k?
    Kanchan112
    @Kanchan112
    @tensorfoo thank you, we will consider that!
    What does 1 update in the log file correspond to? Does it refer to update after a pass through a batch?
    2 replies
    Aswin Pradeep
    @aswinpradeep
    @agupta54 Thanks for sharing the CLSRIL paper! Btw what would be the recommended specs if we would like to experiment fine tuning ?
    2 replies
    Aswin Pradeep
    @aswinpradeep
    Also while going through your models repository, I can see only kannada based on XLSR. Is there any specific reason for it ?
    1 reply
    Aswin Pradeep
    @aswinpradeep
    @agupta54
    I can see two types of finetuning. One without Hydra and another with it. With branch would you recommend and can you just give a short comment on the key difference between both methods. Also please do recommend an infra to try out the fine-tuning for <50hr.
    1 reply
    Ajitesh Sharma
    @aj7tesh
    @agupta54 @soujyo How do we enable real time transcript generation in Speech Recognition Open API. Assuming audio will be from live mic. Is it processed in javascript and then sent in smaller buffer chunks to speech recognition server, if not how is it be done ?
    1 reply
    Mirishkar S Ganesh
    @mirishkarganesh
    Hi @agupta54 @srajat84 , I am trying to build a pretrain model. I have been encountered with the following error. Please go through it and help me out in solving these issues.
    1 reply
    SUJIT SAHOO
    @Sujit27
    Hi @agupta54. I was going through the Speaker Clustering work that you guys have done here. A link there points me to the Resemblyzer repo that has the Voice encoder to create the embeddings from audio. But I could not find any link to the repo where you have actually done the clustering and the subsequent steps that you mention in the page. Do you have such a repo ?
    2 replies
    Ankur Bhatia
    @ankurbhatia24
    Hi Team,
    I was setting up the Intelligent Data Pipeline (https://open-speech-ekstep.github.io/intelligent_data_pipelines). I have set up the Infra for the same (composer environment) and modified the configs which I could understand. Now my question is how to start running the pipeline on a test audio folder that i have uploaded to my google cloud bucket? How does the process start.
    Also, through the documentation, some parts were unclear for setting up the environment variables. It would be great if someone could help me with it.
    1 reply
    Anchal Jaiswal
    @Anchal5604218_twitter
    Hi Team,
    We are trying to fine-tune your Hindi pre-trained model for our own use case. Our requirement is to transcribe audio that is slightly code-mixed (majority is in Hindi with some English words in between). We have already tried the APIs of big cloud service providers and they don't do a great job at identifying code mixed audio. My question is that would it be possible to fine-tune the Vakyansh model to better identify code-mixed audio, given that we preparing around 200 hours of data for fine-tuning.
    Thanks in advance
    MRG
    @gurjar112
    Hi @agupta54
    I have few queries.
    1. I wanted to generate custom model using script provided by vakyansh repo but custom model is not getting generated, Custom_model repo is blank.
      please let me know where I am doing something Wrong
    2. How inference is done at real time?
    3. What all strategies u have used to deploy a model
      Thanks
    alicekile-tw
    @alicekile-tw
    Hello, is this website hosted somewhere?