Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jan 14 12:48

    Danesprite on master

    Update docs with info on the XD… (compare)

  • Jan 09 19:30
    wolfmanstout edited #317
  • Jan 09 19:30
    wolfmanstout opened #317
  • Dec 31 2020 13:53

    Danesprite on 0.29.0

    (compare)

  • Dec 31 2020 13:53

    Danesprite on master

    Update AUTHORS.txt Update changelog Update to version 0.29.0 Pleas… (compare)

  • Dec 31 2020 12:50

    Danesprite on master

    Improve error message when X11 … Merge pull request #314 from da… (compare)

  • Dec 31 2020 12:50
    Danesprite closed #314
  • Dec 31 2020 12:47
    Danesprite commented #314
  • Dec 31 2020 12:37
    Danesprite labeled #314
  • Dec 31 2020 12:37
    Danesprite labeled #314
  • Dec 31 2020 12:36
    Danesprite assigned #314
  • Dec 31 2020 12:35

    Danesprite on functions-cmd-module

    (compare)

  • Dec 31 2020 12:35

    Danesprite on master

    Add _functions_example.py examp… Merge pull request #316 from di… (compare)

  • Dec 31 2020 12:35
    Danesprite closed #316
  • Dec 31 2020 12:27
    Danesprite edited #316
  • Dec 31 2020 12:27
    Danesprite assigned #316
  • Dec 31 2020 12:26
    Danesprite synchronize #316
  • Dec 31 2020 12:26

    Danesprite on functions-cmd-module

    Add _functions_example.py examp… (compare)

  • Dec 31 2020 12:25
    Danesprite synchronize #316
  • Dec 31 2020 12:25

    Danesprite on functions-cmd-module

    Add _functions_example.py examp… (compare)

tieTYT
@tieTYT
nice
thanks for the info
saves me a lot of time getting the dictionary back in shape
JohnDoe02
@JohnDoe02
Is there a straightforward way to make dragonfly stop whatever it is doing and send it to sleep? I am trying to implement something like push-to-talk for chatting with my colleagues via microsoft teams, with the obvious problem that dragonfly wants to interpret anything I say as a voice command. My idea is now to use a ps4 controller to mute/unmute teams and at the same time send-to-sleep/awake dragonfly
LexiconCode
@LexiconCode
I'm not sure about stop whatever it's doing but dragonfly can make rules exclusive via bool. When a rule becomes exclusive it means all the other rules are ignored. Inside that exclusive rule you can include a sleep / wake cmd and a dictation element catch all other dictation.
JohnDoe02
@JohnDoe02
The kaldi engine has a ignore_current_phrase() call, which seems to be close enough to "stop whatever it is doing"
I will try to combine that with your suggestion to make a sleep grammar exclusive .. Hopefully it is safe to do so from a different thread
JohnDoe02
@JohnDoe02
first try is looking good :D
LexiconCode
@lexicon-code:matrix.org
[m]
I'd be interested as well if you feel like it's something you can share
JohnDoe02
@JohnDoe02
I've uploaded the relevant files to github: https://github.com/JohnDoe02/teams-dragonfly-push-to-talk
It's not in a runnable state, but should be fairly easy to figure out .. within kaldi_module_loader a thread is started which waits for ps4 controller input. The dragonfly engine object, and a toggle object for the sleep grammar are passed. That's all
LexiconCode
@lexicon-code:matrix.org
[m]
Thank you!
sean-hut
@sean-hut

I am trying to set up https://github.com/daanzu/kaldi-active-grammar

I tried following the installation instructions in the readme.

This command was successful:
pip install kaldi-active-grammar

This command was unsuccessful:
pip install 'dragonfly2[kaldi]'

The output and it's error messages are here:
https://termbin.com/b6h1

I looked at the repository's open issues and did not see an issue title that looked like it was covering this issue.

I'm using a Linux system with:
Python 3.9.1
pip 20.3.3

David Zurow
@daanzu
@sean-hut ack, two issues: issue the first is that I need to rerun webrtcvad to build a binary wheel for py3.9. however, that won't fix issue the second, which is that evdev doesn't have any binary wheels available. have you installed your package manager's python dev package? @Danesprite may know more
LexiconCode
@LexiconCode
So call back for recognition states. On begin is close to what I want as a function to fire when the user begin speaking. However but I don't think it exists is having that function complete then begin processing recognition. Begin on sound, Function complete, start process recognition. The use cases updating list using the function with choice elements for the utterance.
LexiconCode
@LexiconCode
Actually I misunderstood on_begin is blocking
sean-hut
@sean-hut

@daanzu thank you for your work on free libre open source speech recognition.

Your suggestion worked I installed python3-devel-3.9.1 and the command pip install 'dragonfly2[kaldi]' is now successful.

I cloned https://github.com/daanzu/kaldi-grammar-simple.

In the repository kaldi-grammar-simple I downloaded and unziped https://github.com/daanzu/kaldi-active-grammar/releases/download/v1.8.0/kaldi_model_daanzu_20200905_1ep-biglm.zip.

This command is unsuccessful:
python kaldi_module_loader_plus.py

This is the output and error message that I am getting:
https://termbin.com/0boup

I am following the instructions here https://voxhub.io/kag.

David Zurow
@daanzu
@sean-hut hmm, it would appear that talon has done something weird to your python installation. it doesn't look like anything python will run like that. I'm afraid I don't have experience with talon, especially on linux, so I don't know what its installation might be doing. maybe try checking with the talon folks, or try general python fixes. A very simple test would be to try running: python -c 'print(42)'
4 replies
LexiconCode
@lexicon-code:matrix.org
[m]
I'm guessing it's an embedded version of Python which which may not have all the packages
Shervin Emami
@shervinemami
@JohnDoe02 I've got a custom push-to-talk setup for Dragon and one for Kaldi
I do it in a fairly complex way cos I do it through Aenea & Virtual Box for Windows in Linux, but maybe I can help you in a simpler setup
Shervin Emami
@shervinemami
For Dragon, the easiest way is to press Dragon's hotkey for Mute. Something like "/". Easiest is to use a USB footpedal or similar
sean-hut
@sean-hut

@LexiconCode thank you for your reply.

I am using Void Linux. https://voidlinux.org/

Here is information on the python 3 version I have installed.

python3-3.9.1_1 Python programming language (3.9 series)

https://github.com/void-linux/void-packages/tree/master/srcpkgs/python3

I did have to installed python3-devel and python3-pip in addition to python3.

What other python packages would use suggest I install?

LexiconCode
@lexicon-code:matrix.org
[m]
Given that information I'm pretty sure it's not embedded. Are you using any sort of Python virtual environment?
sean-hut
@sean-hut

@lexicon-code it does not look like I am using a virtual environment. This is based on the top answer here:

https://stackoverflow.com/questions/990754/how-to-leave-exit-deactivate-a-python-virtualenv

and that the deactivate command is not available.

deactivate: command not found

LexiconCode
@LexiconCode

@sean-hut I would highly recommend trying to utilize a virtualized instance of Python. I've had problems with underlying packages breaking due to changed Python dependencies with distributions of Python that are included with Linux. https://realpython.com/python-virtual-environments-a-primer/

virtualization should make it easier to experiment with other Python versions and isolate packages to your projects. At one point I hope to automate this process is much as possible for new users.

sean-hut
@sean-hut
@LexiconCode thanks for sharing that link.
sean-hut
@sean-hut
Now when I run
python kaldi_module_loader_plus.py
it runs but I get an error message about "Invalid sample rate":
https://termbin.com/7jncs
Ryan Hileman
@lunixbochs
it's trying to open a 16khz mono stream
talon does a 16khz stereo stream, if talon was working then maybe talon was resampling for you
sean-hut
@sean-hut

@lunixbochs in the Talon Slack I received help from aegis and was able to get Talon working after I installed pulseaudio.

How would I find out if it was Talon or pulseaudio that was resampling for me?

Ryan Hileman
@lunixbochs
"paInvalidSampleRate" / pa_linux_alsa.c seems to be portaudio and alsa, so pulse isn't in the loop for kaldi
David Zurow
@daanzu
@sean-hut hmm, I haven't seen that error before, but a quick search looks like it is a somewhat common portaudio error. maybe try some of the suggestions there. the kaldi backend uses straight portaudio.
JohnDoe02
@JohnDoe02
@shervinemami Interesting! Do you have your push-to-talk implementation online somewhere?
sean-hut
@sean-hut

@daanzu @lunixbochs thank you for your help.

I have solved the error about "Invalid sample rate".

I uninstalled pulseaudio and pavucontrol.

Then I changed my .asoundrc file to:
https://termbin.com/pufz

The key settings where format S16_LE, channels 2 and rate 16000.

I am now getting another error message whenever I say anything.

"On Linux, the XDG_SESSION_TYPE environment variable may not be set correctly in some circumstances, in which case it can be set manually in ~/.profile."

$ echo $XDG_SESSION_TYPE
tty

What should XDG_SESSION_TYPE be set as?

The full output is here:
https://termbin.com/ylgq

sean-hut
@sean-hut
Setting XDG_SESSION_TYPE=x11 solves that error message.
Kyle M. Douglass
@kmdouglass
@sean-hut I build and run Dragonfly/Caster/Kaldi in a container. You might be able to reverse engineer what I've done from my Makefile and Dockerfile to get your bare metal system setup: https://github.com/kmdouglass/homelab/tree/master/speech-recognition/toolchains/caster-kaldi
The TROUBLESHOOTING.md file might also help you.
Dane Finlay
@Danesprite
Hi @sean-hut. I have updated Dragonfly's documentation to mention that XDG_SESSION_TYPE must be set to x11. The variable may be set to tty instead of x11 on certain desktop environments. I have also noticed this occurs when using OpenSSH X11 forwarding.
sean-hut
@sean-hut
@Danesprite thank you for your work on free libre open source speech recognition.
LexiconCode
@LexiconCode
does anyone have squirreled away in the grammars a function that restores the last minimized window that I might use? I can't seem to figure out how to determine the last window minimized.
sean-hut
@sean-hut

@kmdouglass thank you for releasing your Dockerfile for this as free libre open source software.

I like the idea of running the speech recognition in a container.

The README says that the application container requires a PulseAudio socket on the host machine.

It would be nice if the container host did not need pulseaudio installed. I do not know how it would be implemented without pulseaudio.

Kyle M. Douglass
@kmdouglass
@sean-hut o You're welcome! One of the image's build stages uses vanilla ALSA instead of PulseAudio. Take a look at the Dockerfile.
Dane Finlay
@Danesprite
@sean-hut No worries.
Dane Finlay
@Danesprite

@LexiconCode I don't think Windows has a simple way of retrieving this information. You should be able to set an event hook to catch the MINIMIZESTART and MINIMIZEEND events (see Event Constants) and maintain a stack of the current minimised windows.

If you want to go that way, you could use the code in Sapi5SharedEngine._do_recognition() as a starting point.

This is all assuming you want to match the system's last minimised window. If you just want the last window minimised by dragonfly, then you could just save the window's handle before minimising it. I think the event hook would be better though.
sean-hut
@sean-hut

I am using https://github.com/daanzu/kaldi-grammar-simple

I am having a problem with dictation.

What I said:
dictate the quick brown fox jumped over the lazy dog

What is printed to the terminal:
Recognized: dictate the quick brown fox jumped over the lazy dog

What is actually typed out:
kjd xfgiv nos,l ysb cfmrdh s.do kjd pa/t hsu

I am using the dvorak programmer keyboard layout.
https://www.kaufmann.no/roland/dvorak/

It is selecting the correct key for the letter on the dvorak programmer layout but it then types the qwerty key value.

t -> k
h -> j
e -> d

Dane Finlay
@Danesprite

@sean-hut I use multiple keyboard layouts and have seen this happen only occasionally. It appears to be a long-standing issue with the program Dragonfly uses for X11 keyboard input (xdotool): jordansissel/xdotool#150

Using the setxkbmap command to set the layout manually has worked for me in the past. You might try one of the workarounds mentioned in that issue.

sean-hut
@sean-hut

@Danesprite thank you for your helpful reply.

I was able to get it working. This is what worked for me.

While the speech recognition is running I switch the keyboard layout from dvorak programmer to qwerty and back to dvorak programmer with setxkbmap.