Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Anirudh Gupta
    @agupta54
    This message was deleted
    Aswin Pradeep
    @aswinpradeep
    image.png
    Hi Team, i am pretty new to grpc.
    I cloned the open api and ran the server.py it seems model is loaded and server is running fine.
    However, when I edit the IP address in main.py and make it access the localhost, running the client code is giving out this error:
    soujyo
    @soujyo
    @aswinpradeep It should work fine if you had used '127.0.0.1:50051' in main.py
    Aswin Pradeep
    @aswinpradeep
    image.png
    its still the same
    @soujyo
    soujyo
    @soujyo
    It runs fine at my end.Lets connect on this once tomorrow
    Aswin Pradeep
    @aswinpradeep
    ok sure
    Aswin Pradeep
    @aswinpradeep
    @soujyo
    How can we connect? shall I drop you an E-mail?
    soujyo
    @soujyo
    Sure something for 4pm would be good
    Aswin Pradeep
    @aswinpradeep
    @agupta54
    I have been trying to run the open Api by directly using the finetuned models downloaded from the models repo and they were not working.
    However, tried generating a custom model using the script in wav2vec experimentation repo, and things are fine. Can you help me understand why is this custom_model generation necessary and what is basically happening in this phase.
    1 reply
    Ajitesh Sharma
    @aj7tesh

    hi @agupta54 This is Ajitesh from Team Anuvaad, there couple of problems we are facing. For some of them Soujyo helped still a few of them are there. Kindly let us know how to solve them:

    1. To run the srt generation I have to comment this :
      from inverse_text_normalization.run_predict import inverse_normalize_text
      in model_service.py and set enableInverseTextNormalization=False in examples/python/main.py

      Otherwise with ITN enabled I get following error

      File "/home/ajitesh/anaconda3/envs/server-vakyansh/lib/python3.8/site-packages/inverse_text_normalization/hi/taggers/tokenize_and_classify_final.py", line 20, in <module>
      from inverse_text_normalization.hi.graph_utils import GraphFst, delete_extra_space, delete_space
      File "/home/ajitesh/anaconda3/envs/server-vakyansh/lib/python3.8/site-packages/inverse_text_normalization/hi/graph_utils.py", line 53, in <module>
      suppletive = pynini.string_file(get_abs_path(data_path + 'suppletive.tsv'))
      File "extensions/_pynini.pyx", line 1042, in _pynini.string_file
      File "extensions/_pynini.pyx", line 1118, in _pynini.string_file
      _pywrapfst.FstIOError: Read failed

    1. I was trying to generate srt for various length of audio files. In most of the cases it was working when length is <1 min with punctuation enabled
      However with a file of aorund 3 min, I was able to get srt printed howver in the last step I was getting below error which I think is related to punctuation because when I set it to false it works.

      File "/home/ajitesh/anaconda3/envs/server-vakyansh/lib/python3.8/site-packages/punctuate/punctuate_text.py", line 76, in get_tokens_and_labels_indices_from_text
      output = self.model(input_ids)
      File "/home/ajitesh/anaconda3/envs/server-vakyansh/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
      result = self.forward(input, **kwargs)
      File "/home/ajitesh/anaconda3/envs/server-vakyansh/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 149, in forward
      return self.module(
      inputs, kwargs)
      File "/home/ajitesh/anaconda3/envs/server-vakyansh/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
      result = self.forward(*input,
      kwargs)
      File "/home/ajitesh/anaconda3/envs/server-vakyansh/lib/python3.8/site-packages/transformers/models/albert/modeling_albert.py", line 1069, in forward
      outputs = self.albert(
      File "/home/ajitesh/anaconda3/envs/server-vakyansh/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
      result = self.forward(input, **kwargs)
      File "/home/ajitesh/anaconda3/envs/server-vakyansh/lib/python3.8/site-packages/transformers/models/albert/modeling_albert.py", line 685, in forward
      embedding_output = self.embeddings(
      File "/home/ajitesh/anaconda3/envs/server-vakyansh/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
      result = self.forward(
      input, **kwargs)
      File "/home/ajitesh/anaconda3/envs/server-vakyansh/lib/python3.8/site-packages/transformers/models/albert/modeling_albert.py", line 239, in forward
      embeddings = inputs_embeds + position_embeddings + token_type_embeddings
      RuntimeError: The size of tensor a (848) must match the size of tensor b (512) at non-singleton dimension 1

    1. What are the idea system requirements for hosting open speech api service with these fine tuned model? CPU as well as GPU?
    4 replies
    Mirishkar S Ganesh
    @mirishkarganesh
    Traceback (most recent call last):
    File "../../utils/inference/single_file_inference.py", line 386, in <module>
    result = parse_transcription(args_local.model, args_local.dict, args_local.wav, args_local.cuda, args_local.decoder, args_local.lexicon, args_local.lm_path, args_local.half)
    File "../../utils/inference/single_file_inference.py", line 361, in parse_transcription
    model.cuda()
    AttributeError: 'dict' object has no attribute 'cuda'
    2 replies
    can you help me with the above issue?
    Akash Singh
    @singhaki
    hi team,
    I was trying to generate hindi pretrained model using generate_custom_model.sh but the generated model is giving blank output
    1 reply
    Akash Singh
    @singhaki

    hi team,
    I was trying to generate hindi pretrained model using generate_custom_model.sh but the generated model is giving blank output

    Also,
    first i was trying to generate custom model from hindi_finetuned_4k, there was an error for w2v_path, so i passed the hindi_pretrained_4k, model got converted but while inferencing on single audio file it was giving blank output. which pretrained checkpoint to use for custom model generation?

    Second, i have also tried to convert the model to huggingface transformers model, but the converted model was giving random output. How can we convert to hugging face? as i have tried converting fairseq english model to hugging face. It got converted.

    1 reply
    tensorfoo
    @tensorfoo
    hi guys. Just came across Vakyansh, really impressed so far! well done
    SUJIT SAHOO
    @Sujit27
    Hi @agupta54 @soujyo. This is Sujit from Anuvaad. I see two different repos for setting up inference server -- speech-recognition-open-api and inference_service_wrapper. What is the difference between these two, and which one is the preferred repo for setting up inference ?
    2 replies
    Kanchan112
    @Kanchan112
    Hello, we are trying to develop ASR for Nepali language. Instead of directly finetuning for ASR, we are thinking to do some more pretraining with Nepali data, by starting with the CLSRIL-23 model. How should we begin on this? We have 4 tesla k80 GPUs. So, how feasible would it be for us in terms of training resources if we decide to train on 100 hours of audio data?
    1 reply
    tensorfoo
    @tensorfoo
    @Kanchan112 yeah that should be totally do-able with those resources. But you might have to reduce the max token size because the GPUs have less than 16Gb vram. Try 120k?
    Kanchan112
    @Kanchan112
    @tensorfoo thank you, we will consider that!
    What does 1 update in the log file correspond to? Does it refer to update after a pass through a batch?
    2 replies
    Aswin Pradeep
    @aswinpradeep
    @agupta54 Thanks for sharing the CLSRIL paper! Btw what would be the recommended specs if we would like to experiment fine tuning ?
    2 replies
    Aswin Pradeep
    @aswinpradeep
    Also while going through your models repository, I can see only kannada based on XLSR. Is there any specific reason for it ?
    1 reply
    Aswin Pradeep
    @aswinpradeep
    @agupta54
    I can see two types of finetuning. One without Hydra and another with it. With branch would you recommend and can you just give a short comment on the key difference between both methods. Also please do recommend an infra to try out the fine-tuning for <50hr.
    1 reply
    Ajitesh Sharma
    @aj7tesh
    @agupta54 @soujyo How do we enable real time transcript generation in Speech Recognition Open API. Assuming audio will be from live mic. Is it processed in javascript and then sent in smaller buffer chunks to speech recognition server, if not how is it be done ?
    1 reply
    Mirishkar S Ganesh
    @mirishkarganesh
    Hi @agupta54 @srajat84 , I am trying to build a pretrain model. I have been encountered with the following error. Please go through it and help me out in solving these issues.
    1 reply
    SUJIT SAHOO
    @Sujit27
    Hi @agupta54. I was going through the Speaker Clustering work that you guys have done here. A link there points me to the Resemblyzer repo that has the Voice encoder to create the embeddings from audio. But I could not find any link to the repo where you have actually done the clustering and the subsequent steps that you mention in the page. Do you have such a repo ?
    2 replies
    Ankur Bhatia
    @ankurbhatia24
    Hi Team,
    I was setting up the Intelligent Data Pipeline (https://open-speech-ekstep.github.io/intelligent_data_pipelines). I have set up the Infra for the same (composer environment) and modified the configs which I could understand. Now my question is how to start running the pipeline on a test audio folder that i have uploaded to my google cloud bucket? How does the process start.
    Also, through the documentation, some parts were unclear for setting up the environment variables. It would be great if someone could help me with it.
    1 reply
    Anchal Jaiswal
    @Anchal5604218_twitter
    Hi Team,
    We are trying to fine-tune your Hindi pre-trained model for our own use case. Our requirement is to transcribe audio that is slightly code-mixed (majority is in Hindi with some English words in between). We have already tried the APIs of big cloud service providers and they don't do a great job at identifying code mixed audio. My question is that would it be possible to fine-tune the Vakyansh model to better identify code-mixed audio, given that we preparing around 200 hours of data for fine-tuning.
    Thanks in advance
    MRG
    @gurjar112
    Hi @agupta54
    I have few queries.
    1. I wanted to generate custom model using script provided by vakyansh repo but custom model is not getting generated, Custom_model repo is blank.
      please let me know where I am doing something Wrong
    2. How inference is done at real time?
    3. What all strategies u have used to deploy a model
      Thanks
    alicekile-tw
    @alicekile-tw
    Hello, is this website hosted somewhere?
    Shahzeb Ali
    @ShahzebAli42
    hi vakyansh you did a fabulous work i have been following your wav2vec2 updates.
    Right now i am finetuning a model. can any one please tell me that why am i getting cuda out of memory error after some epochs. what change can i make in the base_10.yml config to overcome this
    Can anyone describe the parameters in the base_10h.yml? because i am getting good transcription results at 26th epoch weights but bad results at 43rd epoch result?
    Andrew Lauder
    @AndrewLauder
    @srajat84 we are also looking to collect more languages, please pm me