Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jun 20 13:38
    quintijn commented #354
  • Apr 04 04:17
    Danesprite labeled #367
  • Apr 04 04:17
    Danesprite assigned #367
  • Apr 04 04:17
    Danesprite opened #367
  • Apr 04 04:02

    Danesprite on master

    Remove unnecessary install requ… (compare)

  • Apr 04 04:02

    Danesprite on major

    Remove multiple engine options … Remove the dragonfly.util sub-p… Make dragonfly.accessibility su… (compare)

  • Mar 31 05:17

    Danesprite on master

    Clarify sections of the documen… (compare)

  • Mar 31 05:17

    Danesprite on major

    Remove dragonfly.rpc sub-packag… (compare)

  • Mar 23 11:43
    Danesprite closed #354
  • Mar 23 11:43
    Danesprite commented #354
  • Mar 22 10:39
    Danesprite closed #337
  • Mar 22 10:39

    Danesprite on master

    Update sections of the document… (compare)

  • Mar 20 06:09

    Danesprite on master

    Remove unused test suite module (compare)

  • Mar 20 06:02

    Danesprite on minor

    Fix Kaldi marking of whether re… Add Kaldi optional local or onl… Update Kaldi version, to fix er… and 32 more (compare)

  • Mar 20 06:01

    Danesprite on major

    Remove training mode from the S… Remove built-in sleep/wake mode… (compare)

  • Mar 20 05:36

    Danesprite on talon-dev

    (compare)

  • Mar 19 11:42

    Danesprite on master

    Add missing documentation file … (compare)

  • Mar 19 11:36
    Danesprite closed #240
  • Mar 19 11:36
    Danesprite labeled #240
  • Mar 19 11:35

    Danesprite on 0.35.0

    (compare)

Dane Finlay
@Danesprite
Yeah fair, the clipboard code is a bit of a mess.
LexiconCode
@lexicon-code:matrix.org
[m]

When working with extras it would be nice in some circumstances to remember the last extra recognized for that particular spec. Let's consider the following command select word [<left_or_right>] [<n>] The user says select word left and then again says select word left extra would be remembered.

Now we could set a directional default in extras. However in more complex scenarios it's not that simple. Now wrapping this up in the function would work but it really does pollute grammar sets making them harder to read. Any thoughts on implementation or should I stick to using a function?

Ryan Hileman
@lunixbochs
can you make a kind of sticky RuleRef
David Zurow
@daanzu
@LexiconCode I can't think of a clean way to do this other than as you said a function
Quintijn Hoogenboom
@quintijn
In voicecode, "long time ago", there was the "again" command, which could repeat the last performed move or select action. Even again three times etc.
Dane Finlay
@Danesprite

If you do go with a function, then you can get at the underlying rule by including a _rule argument:

def function(_rule):
    print(_rule)
    print(_rule._defaults["left_or_right"])

An alternative is a separate command for explicitly changing your default value for left_or_right. This would change it for all mappings using the extra.

I think Quintijn is right here though, repeating the last action is probably easier.

Quintijn Hoogenboom
@quintijn
In voicecode quite a lot of navigation and selection commands were coupled to "again", and this seemed to work quite well!
Shervin Emami
@shervinemami
Nice, so is there an easy way to get Dragonfly to playback a previous command? I did set up an elaborate way so I can play back my recent commands by saying things like "play 3" as well as to record macros and play them back too such as "play inbox". But it took noticeable effort to set it all up, with code in something like 8 files! So I haven't put in the effort to document it and make it public. But if there was an easy method to playback or read the last command using just 1 or 2 lines of code then I can imagine many people would be interested!
David Zurow
@daanzu
@shervinemami I think I recall an example using a recognition observer in some of the old original dragonfly examples
Dane Finlay
@Danesprite
I think Caster has this functionality. I remember Christo Butcher's _cmdmemory.py module being quite useful for that.
LexiconCode
@LexiconCode
Caster again do command is similar to what Quintijn described. This looks much more complex than it is due to utilizing Caster asynchronous actions. It could be converted utilizing to pure dragonfly. This allows to repeat not only the last command but the last utterance. Useful for CCR chains or commands with dragonfly repetition elements.
# migrated this function from elsewhere for completeness of the post.
def get_and_register_history(utterances=10):
    history = RecognitionHistory(utterances)
    history.register()
    return history

_history = get_and_register_history(10)

class Again(MappingRule):

    mapping = {
        "again (<n> [(times|time)] | do)":
            R(Function(lambda n: Again._create_asynchronous(n)), show=False), 
    }
    extras = [ShortIntegerRef("n", 1, 50)]
    defaults = {"n": 1}

    @staticmethod
    def _repeat(utterance):
        Playback([(utterance, 0.0)]).execute()
        return False 

    @staticmethod
    def _create_asynchronous(n):
        last_utterance_index = 2
        if len(_history) == 0:
            return

        # ContextStack adds the word to history before executing it for WSR 
        if get_current_engine().name in ["sapi5shared", "sapi5", "sapi5inproc"]:  
            if len(_history) == 1: return

        # Calculatees last utterance from recognition history and creates list of str for Dragonfly Playback
        utterance = list(map(str, _history[len(_history) - last_utterance_index]))

        if utterance[0] == "again": return
        # Create AsynchronousAction
        forward = [L(S(["cancel"], lambda: Again._repeat(utterance)))]
        AsynchronousAction(
            forward,
            rdescript="Repeat Last Action",
            time_in_seconds=0.2,
            repetitions=int(n),
            blocking=False).execute()
westocl
@westocl
I am getting started with he command line interface (CLI) in order to do some testing.
Is it possible to specify 'command line arguments to your own python file'? It seems
that the sub-command test only allows you to specify a file but not any aguments to be
taken with the file.
image.png
David Zurow
@daanzu
An alternative option might be to use environment variables, which can be specified for a given command line only
MYARG=foo42 python -m dragonfly ...
westocl
@westocl
that seems clever. can you elaborate a little? I very rarely set environment variables
im working with WINDOWS, maybe i can call an '.bat' file? That may be an ugly solution
David Zurow
@daanzu
if you type in the above command line, filling in the rest, it runs the python command with that environment variable set, and then forget it. read it in your python file with os.environ['MYARG']
oops, you would need to do it in git bash. normal windows cmd can probably do it but may be different syntax
should be an easy google question
or ugh powershell
or just setting the environment variable first with set on its own command line first would work too I think, but that sets it for multiple commands
westocl
@westocl
ahhh... Learned something new everyday. Never knew that you could just have temporary environment variables in a on-liner.
one-liner
David Zurow
@daanzu
yep, can be pretty handy
westocl
@westocl
when running the text backend for test, does the text engine execute the same recognition sequence, "on_begin_callback", "recognition_callback", "failure_callback".. ect?
David Zurow
@daanzu
@Danesprite Is mimic() supposed to adhere to grammar/rule contexts?
David Zurow
@daanzu
I'm trying to remember whether it's a good idea to use one Element in multiple Rules. Perhaps this should be mentioned in https://dragonfly2.readthedocs.io/en/latest/elements.html#refelementclasses ?
Dane Finlay
@Danesprite

@westocl

Environment variables are a good way to go here. There is no way to specify arguments to the Python files in question, since they are in fact being imported as modules, not run as programs.

Regarding recognition callbacks, yes, the sequence of callbacks should be exactly the same when using the text backend. That is the whole point of that engine backend. :-)

@daanzu

Yep, mimic() is supposed to do that.

I think reusing an element is fine in most cases. I would say just copy the element with copy.copy() if it causes a problem. Mentioning this in the documentation sounds like a good idea to me.

westocl
@westocl
@daanzu, @Danesprite Thanks for you help.
Dane Finlay
@Danesprite
No problem. :+1:
Dane Finlay
@Danesprite

If the engine.speak() method for text-to-speech is used by anyone in this channel, I was wondering if there is any interest in changing Dragonfly to make use of the text-to-speech built into Windows for all engine back-ends, if it is available, instead of only with the WSR/SAPI 5 back-end. The Natlink back-end would still use Dragon's TTS.

I suppose this would be mostly interesting to Kaldi users.

Vojtěch Drábek
@comodoro
Seems the right time to look here, I would definitely be interested, another advantage is already having language TTS support (for possible future Dragonfly development). I wonder, SAPI 5 seems possible to use everywhere, but MS Mobile voices usage is unknown to me.
David Zurow
@daanzu
@Danesprite Thanks for the info! Regarding TTS, that sounds good to me. It would be nice to integrate something open and cross platform, but that would entail significantly more work.
Dane Finlay
@Danesprite

@comodoro Okay then, I will have a look into this.

I hadn't considered the advantage for other languages. Windows has TTS support for quite a few languages or "voices" through SAPI 5. It should be possible to separate the TTS functionality from each engine class so that you could, for example, use the SAPI 5 TTS instead of Dragon's.

I don't think any of this would work on mobile unless pywin32 does. I would guess only x86/x86_64 devices would work.

@daanzu No worries. It is certainly possible to add an integration for eSpeak or Festival that simply shells out to the command-line programs:
$ echo "speak some words" | espeak --stdin
$ echo "speak some words" | festival --tts
Vojtěch Drábek
@comodoro
Yes, I remember a while ago, I think it was a school assignment, it was super easy to use SAPI 5, however in .NET. In Python it should not be too hard either. I mean using the API, not saying that it is not work to implement. I have been under the impression however that SAPI 5 TTS is being slowly (like the SAPI STT perhaps) phased out or frozen in favor of MS Mobile voices and some search did not reveal much about handling those. I guess there is nothing wrong with using just SAPI for now, but a Czech system for example has one MS Mobile and no SAPI 5 voice.
Dane Finlay
@Danesprite

@comodoro Ah, okay that is a shame. Thanks for elaborating on MS Mobile voices. I would never have guessed what it was from the name. Leave it to Microsoft to make things more complicated than they ought to be.

Both the TTS and STT parts of SAPI haven't really been actively worked on for a long time. It isn't too difficult to work with the API in Python, I suppose. Dragonfly's SAPI 5 engine back-end works using COM. I can see it is pretty simple to set the current TTS voice. The API error messages given could be more helpful though.

If there is a public API for utilising MS Mobile voices, it would probably require .NET. Since we are using CPython, that would be difficult. I'll just stick with SAPI for now. I suppose you could try Google TTS instead for Czech.

Vojtěch Drábek
@comodoro
Well, there is https://pypi.org/project/winrt/, but it is experimental and for Python 3.7 or later, not worth the effort. Google TTS has one big disadvantage, almost decisive, and that is cloud. But open Czech STT for Dragonfly is not on the horizon, unless Deepspeech adds grammars, or perhaps, @daanzu , what is needed for a Kaldi model? I have several hundred hours of data and some grapheme to phoneme could be generated e.g. using espeak, Czech is less irregular than English.
Ryan Hileman
@lunixbochs
I thought deepspeech was dead?
Shervin Emami
@shervinemami
I've tried various TTS options in Windows & Linux, mostly from when I tried what it's like to be a fully blind computer programmer. (Answer is that it's extremely frustrating, but it is possible!)
There's basically 2 paths you'd want to choose between for TTS: 1) Nice & natural sounding TTS. 2) Robotic & harsh sounding TTS.
Shervin Emami
@shervinemami
Nice sounding TTS is great for beginners, or anyone just wanting to hear the text easily, at normal playback speed (between around 0.7x - 1.5x speeds). Whereas robotic TTS is great for people that want to use TTS a lot at fast playback speeds (1.5x - 4x speeds), such as if you wanted to hear the content of a whole paragraph or page of text and consume it very quickly, and it's something you'd do often and therefore you want speed & efficiency even if it takes some weeks to get accustomed to the fast robotic TTS.
Shervin Emami
@shervinemami
"eSpeak" is great at supporting the power users that want fast robotic speech, it's open source & portable & well established. While for natural nice sounding speech, there are various open source options that work on Linux & other OSes but it's an area that Google & Microsoft & others also invest in since it has commercial prospects for them. My preference is that we make a nice & naturally sounding open-source cross-platform solution as the default TTS backend, and potentially allow people to replace it with alternative backend such as a Microsoft / Google or espeak if the user wants it. But default to open-source cross-platform.
Shervin Emami
@shervinemami
I personally really like SVOX "pico2wave", I believe it's a free open source TTS with a nice & natural voice in Linux. I've also tried some commercial TTS systems including Acapela TTS & Cepstral TTS, and they tend to be smarter at handling language intonations but I prefer the way symbols are handled by SVOX pico2wave, since it's important for programming & technical content rather than just natural language content.
David Zurow
@daanzu
deepspeech: mozilla terminated its involvement, but since the project is open source, I believe in various people are continuing to work on it, including some of the original contributors
@comodoro for kaldi, what you have may be enough to work. What is needed is: the audio and matched transcripts, plus a lexicon listing all of the words and their matching pronunciations. I think the lexicon may be available for czech already.
David Zurow
@daanzu
@shervinemami thanks for the info and comparison. Very interesting!
Vojtěch Drábek
@comodoro
I fail to see an official statement, but the repository seems to be alive. Anyway there is at the minimum a fork called Coqui, recently announced by some of the same people.