Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    tripfish
    @tripfish

    I did a first test with kaldi. It's very fast. Thanks a lot for this!
    I mainly use Dragon with a german profile. The problem is, however, that I sometimes need english words too. Switching between an english and a german dragon profile takes far too long. My idea now was to run kaldi in addition to the german dragon profile so that I could insert english words.

    I only defined the command "dictate <text>" in _multiedit. Unfortunately, this command is recognized very often, even though I didn't mention the word "dictate" at all. I've tried other trigger words as well, but they are also often recognized, although I say something else that doesn't even sound similar. Can I adjust the sensitivity even further? Or is there another way to solve this?

    LexiconCode
    @LexiconCode
    @tripfish yes you can actually set commands to be more or less recognized with the following documentation https://dragonfly2.readthedocs.io/en/latest/kaldi_engine.html#grammar-rule-element-weights
    David Zurow
    @daanzu
    @tripfish I would suggest also making a <text> command with ActionBase() as a no-op action to slurp up the other sounds
    There's also an engine parameter to discard utterances with poor quality, but it is new and not in that version. it is not perfect either though, so the no-op is the best solution regardless I think
    tripfish
    @tripfish
    thanks! I will try it later.
    tripfish
    @tripfish
    this is really better. I've used a combination of both. thanks!
    David Zurow
    @daanzu
    @tripfish good to hear!
    tripfish
    @tripfish
    i am using your model "kaldi_model_daanzu_20200905_1ep-biglm". can i increase recognition accuracy to my pronunciation with training based on this model?
    what do i have to do? can i use the "speech-training-recorder" for this?
    Can the model still be updated without losing my training when you release a new version of this model?
    David Zurow
    @daanzu
    @tripfish daanzu/kaldi-active-grammar#33 It's still quite complicated at the moment, but something I'm working on improving. The recorder is good for generating training data, but that isn't really the hard part currently. The other good way is to retain recognition data during normal use (see docs): either retaining everything (and possibly actively marking some utterances as incorrect), or just actively marking some utterances as correct (or important for training).
    tripfish
    @tripfish
    @daanzu I've read the whole page of the issue. It's really complicated. The whole kaldi universe is still unclear to me.
    How can I actively mark some utterances as incorrect or correct? I have read the docs page, but I see no place that explains it. Many Thanks!
    Shervin Emami
    @shervinemami
    @tripfish If you run kaldi_module_plus.py with '-r' it will generate a 'retained/retain.tsv' log file with all your utterances. Then you can add a command in your grammar that marks your latest utterance as incorrect, by calling this command from you grammar:
    get_current_engine().audio_store[0].set('tag', 'misrecognition')
    Shervin Emami
    @shervinemami
    The kaldi script holds onto the latest utterance and delays storing it to file by 1 utterance, so you can mark that an utterance was a misrecognition after you made the mistake, not before you make a mistake. I personally hooked up a USB footpedal to mark my mistakes, so I can mark mistakes without affecting the audio. But just using a spoken command works OK too.
    Here is a grammar you could add, for example:
    class WhoopsCommandRule(MappingRule):
        mapping = { 
            "action whoops":                    Function(lambda: engines.get_engine().audio_store[0].set('tag', 'misrecognition'))
        }   
        extras = [ 
        ]   
    grammar.add_rule(WhoopsCommandRule())
    tripfish
    @tripfish
    @shervinemami thank you very much! I just tried. I see the "misrecognition" flag in the .tsv file. So that seems to be working. Do I have to do anything else or will the file be processed automatically?
    Shervin Emami
    @shervinemami
    thats all :-) It generates both that tsv file and a wav file of every utterance
    A few days ago Daanzu recommended adding a grammar of no-op actions to slurp up other sounds. This is how I do that:
    # The set of utterances that are often picked up by Kaldi from background noises.
    noiseUtterances = "eh|ha|heh|uh|pa|fu|ka|is|du|ki"
    
    def handleNoise():
        # Store a misrecognition tag, so it won't be used as training data? Might not be needed?
        #get_current_engine().audio_store[0].set('tag', 'misrecognition')
        pass
    
    # Catch the short random noise sounds. Better to be detected here than as a short command such as "pause"!
    class NoiseRule(MappingRule):
        mapping = {
            noiseUtterances: Function(handleNoise),
        }
    sleep_grammar.add_rule(NoiseRule())
    tripfish
    @tripfish
    In my case, kaldi keeps on running and listening in the background. I only have two grammar commands "dictate <text>" and "<text>". the last command just calls ActionBase() to discard all spoken words that do not begin with "dictate". These words now also appear in the tsv file and as a wav file. Does this mean that these words will be interpreted as correctly recognized? Would that increase the recognition error rate?
    JohnDoe02
    @JohnDoe02
    @tripfish As long as you don't train a model with those files, it does not have any influence on the recognition. However, it does render your efforts of manual labelling incorrect recognitions useless, as indeed you will have a lot of incorrectly recognized utterances in your tsv that are not marked accordingly
    Shervin Emami
    @shervinemami
    In that case instead of calling ActionBase() to discard noise you should use the "handleNoise" function I posted above and uncomment the line. So that it marks anything without "dictation" as being misrecognised.
    tripfish
    @tripfish
    Okay thanks everyone! But a training with these files is still complicated?
    David Zurow
    @daanzu
    @tripfish It's still complicated, but it should be getting easier with more implementation work. I'm hoping to put together a Docker image that is relatively turn-key. And then you would have a nice dataset to use once that is available.
    tripfish
    @tripfish
    :+1:
    tripfish
    @tripfish
    @daanzu I'm trying to get gcloud to work. But I get the following message:
    kaldi.compiler (ERROR): Exception performing alternative dictation Traceback (most recent call last): File "C:\Daten\kaldi-dragonfly\winpython\python-3.7.4.amd64\lib\site-packages\kaldi_active_grammar\compiler.py", line 518, in parse_output parsed_output = self.alternative_dictation_regex.sub(replace_dictation, parsed_output) File "C:\Daten\kaldi-dragonfly\winpython\python-3.7.4.amd64\lib\site-packages\kaldi_active_grammar\compiler.py", line 510, in replace_dictation dictation_audio = audio_data[dictation_span['offset_start'] : dictation_span['offset_end']] TypeError: slice indices must be integers or None or have an __index__ method
    David Zurow
    @daanzu
    @tripfish are you on the winpython version? there was a bug fix in a later version.
    tripfish
    @tripfish
    yes
    David Zurow
    @daanzu
    I will try to put up a new version today or tomorrow
    or you can run the bat file to open a command prompt and just pip update dragonfly and kaldiag
    tripfish
    @tripfish
    okay I'll try it later. thanks!
    tripfish
    @tripfish
    It works. Thank you!
    Unfortunately, the result from Google is just as bad as yesterday in my test. Azure clearly achieved the better results. But I can't really explain the poor result from Google. Because the same sentences are correctly recognized on the smartphone. Even when I use Google Search with a microphone on my computer.
    Shervin Emami
    @shervinemami
    I've had the same results: Google speech rec works great on my phone but when I use the gcloud backend in Kaldi it has far low accuracy, roughly on par with the default Kaldi accuracy for general dictation, and far worse than Dragon15 at dictation.
    Maybe Google use different dictation systems on phone than through their cloud API
    Lauren Horne
    @lahwran
    I've been pondering trying to run an android vm just to do speech recognition
    I'm more interested in custom ASR, though
    do any of yall know how hard it would be to integrate a custom language model, such as a fine-tuned gpt2 transformer?
    I currently use dragon but have the suspicion they
    that using a transformer language model specialized to how I talk, while more computationally intensive, would probably produce significantly better recognitions
    David Zurow
    @daanzu
    @lahwran integrating a custom ARPA LM with Kaldi is fairly easy, as is training one to use. I could direct you on how to do this if you are interested.
    RNNLMs are also fairly well supported, and my upcoming v2.0 of KaldiAG will support them, although they are slower to evaluate at run time.
    however, integrating more complex LMs like transformers isn't included in Kaldi AFAIK, although it is possible and I have seen at least one paper trying it.
    Leif Linse
    @Leffe108
    I wonder how to get started with kaldi-active-grammar on windows? In the 1.8 release post and in the README there is mention of windows builds, but I don't see any artifact with names that match what is described in the release or README.
    Leif Linse
    @Leffe108
    I think I got something up and running with pip install dragonfly2[kaldi], added a model and followed the instructions here for CLI args to start dragonfly with kaldi engine. I downgraded numpy to 1.19.3 and I get it to "Listening ..." but it doesn't seem to understand any word I try (although I don't know which words it do support)
    https://dragonfly2.readthedocs.io/en/latest/kaldi_engine.html
    https://developercommunity.visualstudio.com/comments/1244066/view.html
    I have used the mic with Dragon and Talon, and it is my master mic in windows so python should default to it I think. But I don't know what python hears.
    PS C:\Users\Leif\kaldi-active-grammar\kaldi_active_grammar> py -3 -m dragonfly load _*.py --engine kaldi --engine-options "model_dir=kaldi_model vad_padding_end_ms=300"
    WARNING:engine:KaldiEngine(): Enabling logging of actions execution to avoid bug processing keyboard actions on Windows
    INFO:engine:Loading Kaldi-Active-Grammar v1.8.1 in process 20740.
    INFO:engine:Kaldi options: {'model_dir': 'kaldi_model', 'tmp_dir': None, 'audio_input_device': None, 'audio_self_threaded': True, 'audio_auto_reconnect': True, 'audio_reconnect_callback': None, 'retain_dir': None, 'retain_audio': False, 'retain_metadata': False, 'retain_approval_func': None, 'vad_aggressiveness': 3, 'vad_padding_start_ms': 150, 'vad_padding_end_ms': 300, 'vad_complex_padding_end_ms': 600, 'auto_add_to_user_lexicon': True, 'lazy_compilation': True, 'invalidate_cache': False, 'expected_error_rate_threshold': None, 'alternative_dictation': None, 'cloud_dictation_lang': 'en-US', 'decoder_init_config': None}Kaldi-Active-Grammar v1.8.1:
        If this free, open source engine is valuable to you, please consider donating
        https://github.com/daanzu/kaldi-active-grammar
        Disable message by calling `kaldi_active_grammar.disable_donation_message()`
    WARNING:kaldi.model:<kaldi_active_grammar.model.Model object at 0x0000025CAF515250>: creating tmp dir: 'kaldi_model.tmp\\'
    INFO:kaldi:<kaldi_active_grammar.utils.FSTFileCache object at 0x0000025CE8901790>: failed to load cache from 'kaldi_model.tmp\\file_cache.json'
    INFO:kaldi:<kaldi_active_grammar.utils.FSTFileCache object at 0x0000025CE8901790>: version or dependencies did not match cache from 'kaldi_model.tmp\\file_cache.json'; initializing empty
    INFO:kaldi.model:generating lexicon files
    INFO:engine:streaming audio from 'Microphone (Yeti Stereo Microph' using MME: 16000 sample_rate, 10 block_duration_ms, 30 latency_ms
    INFO:module:CommandModule('__init__.py'): Loading module: 'C:\Users\Leif\kaldi-active-grammar\kaldi_active_grammar\__init__.py'
    ERROR:module:CommandModule('__init__.py'): Error loading module: "'__name__' not in globals"
    Traceback (most recent call last):
      File "C:\Python38-64bit\lib\site-packages\dragonfly\loader.py", line 65, in load
        exec(compile(contents, self._path, 'exec'), namespace)
      File "C:\Users\Leif\kaldi-active-grammar\kaldi_active_grammar\__init__.py", line 20, in <module>
        from .compiler import Compiler, KaldiRule
    KeyError: "'__name__' not in globals"
    INFO:module:CommandModule('__main__.py'): Loading module: 'C:\Users\Leif\kaldi-active-grammar\kaldi_active_grammar\__main__.py'
    ERROR:module:CommandModule('__main__.py'): Error loading module: "'__name__' not in globals"
    Traceback (most recent call last):
      File "C:\Python38-64bit\lib\site-packages\dragonfly\loader.py", line 65, in load
        exec(compile(contents, self._path, 'exec'), namespace)
      File "C:\Users\Leif\kaldi-active-grammar\kaldi_active_grammar\__main__.py", line 11, in <module>
        from . import _name
    KeyError: "'__name__' not in globals"
    INFO:engine:Listening...
    Speech start detected.
    Sorry, what was that?
    Speech start detected.
    Sorry, what was that?
    
    / ... /
    David Zurow
    @daanzu
    @Leffe108 apologies for the trouble, and for not having updated the portable version! I'm not sure what is going on with your particular error. it might be worth trying to install it in a python virtualenv, but that is just a wild guess. however, I got around to updating and uploading the portable version. maybe give it a try and see if it works https://github.com/daanzu/kaldi-active-grammar/releases/download/v1.8.0/kaldi-dragonfly-winpython37.zip
    LexiconCode
    @LexiconCode
    Is there some sort of lightweight package for detecting whether or not two words are homonyms?
    David Zurow
    @daanzu
    @LexiconCode I haven't thought about it deeply, but I think it is as simple as looking at the lexicon.txt file in one of my models, and checking if the pronunciation is identical
    LexiconCode
    @LexiconCode
    Oh that's interesting. This issue is particularly difficult when parsing free dictation. Similar use case dealing with OCR result compared to dictation result.
    Lauren Horne
    @lahwran
    @daanzu any chance you know what paper that was? I'm very interested in what it would take to get transformers working, especially if a pretrained LM transformer can be used rather than the papers I've found, which all seem to be trying to do end-to-end transformer
    David Zurow
    @daanzu
    @lahwran https://arxiv.org/pdf/2001.01140.pdf Unfortunately, as far as I know, the code for it isn't released. You could check with the author. I'd be curious to see it too.
    I don't think integrating a transformer with Kaldi is too difficult conceptually, though I say that only knowing the basics of how transformers usually work. Doing it efficiently may be more difficult. I might be able to give you pointers or help out some. Basically, you need to run the transformer so that it mimics a FST LM.
    Most simplistically, you could just take the n-best most likely recognitions for an utterance, and then run them through the transformer, and rank them by its perplexity.