Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jan 31 2019 22:57
    erezsh commented #316
  • Jan 31 2019 22:57
    erezsh commented #316
  • Jan 31 2019 22:42
    excitoon commented #316
  • Jan 31 2019 22:41
    excitoon commented #316
  • Jan 31 2019 22:29

    erezsh on master

    Docs: Fixup (compare)

  • Jan 31 2019 22:28
    erezsh commented #316
  • Jan 31 2019 22:28

    erezsh on master

    BUGFIX: Indenter was in corrupt… Docs: Added instructions on how… (compare)

  • Jan 31 2019 22:05
    erezsh commented #309
  • Jan 31 2019 22:04
    erezsh commented #309
  • Jan 31 2019 22:03

    erezsh on master

    BUGFIX: Fixed common.ESCAPED_ST… (compare)

  • Jan 31 2019 19:33
    excitoon edited #316
  • Jan 31 2019 19:33
    excitoon edited #316
  • Jan 31 2019 19:32
    excitoon opened #316
  • Jan 31 2019 15:24
    chaosite commented #314
  • Jan 31 2019 02:31
    Vesuvium commented #314
  • Jan 30 2019 19:24
    Agitolyev starred lark-parser/lark
  • Jan 30 2019 07:36
    YaakovTooth starred lark-parser/lark
  • Jan 29 2019 05:59
    macdavid313 starred lark-parser/lark
  • Jan 29 2019 02:36
    ibrahimsharaf starred lark-parser/lark
  • Jan 29 2019 02:07
    fuyunliu starred lark-parser/lark
Erez Shinan
@erezsh
Btw, some whimsical news: Lark-parser is the 1100th most downloaded Python package this month
marsanos
@marsanos_twitter
Quite a few packages out there! But certainly not nearly so many as nice as Lark. ;-)
Erez Shinan
@erezsh
Cheers!
srlombriga
@srlombriga
Hi! I am trying to implement a Power Query parser using Lark (based on the grammar documentation that is available on https://docs.microsoft.com/en-us/powerquery-m/m-spec-consolidated-grammar). Unfortunately, I can’t find a way to implement the following rule: “available-identifier: A keyword-or-identifier that is not a keyword”. The EBNF standard allows the use of the except-symbol (-), but I keep getting GrammarError when trying to use it. Does Lark have a similar feature?
MegaIng
@MegaIng
No, not directly. You can use a complex regex to simulate that, but it shouldn't be a problem anyway. lark will disambiguate for you and not confuse identifiers and keywords.
@srlombriga
Erez Shinan
@erezsh
@srlombriga You can give it low priority, to ensure that it's only matched if it didn't match any keyword
like
AVAILABLE_IDENTIFIER.-1: /\w+/
Costa Alexoglou
@konsalex

Hey folks! I want to create a Python transpiler that will take as an input a file example.gry (Gry from Greek Python) and convert greek keywords to the proper english ones, like εκτύπωσε("We love Lark") to print("We love Lark").

Can anyone redirect me to what to read for how to start? I read the JSON example, but unfortunately I am unable to understand yet how to work with function declarations, and how to avoid enclosed cases like this print("print('example')") which should convert only the outside print statement.

MegaIng
@MegaIng
There is a python3 grammar: https://github.com/lark-parser/lark/blob/master/examples/advanced/python3.lark and a corresponding Script: https://github.com/lark-parser/lark/blob/master/examples/advanced/python_parser.py You should just be able to translate the keywords and then use https://github.com/lark-parser/lark/blob/master/examples/advanced/reconstruct_python.py with the english grammar to generate back valid python code
Note that for the stuff like function names (e.g. print), I would acutally suggest using a wrapper library instead of doing it in a parser: e.g. add a from gry_builtins import * line at the top of the file, which adds aliases to all the relevant builtins
@konsalex I might create a proof of concept to show you.
Costa Alexoglou
@konsalex
Hey @MegaIng . Thanks for the support. A PoC would be super awesome, but besides that thank you for the reference files.
Costa Alexoglou
@konsalex
Hey @MegaIng I managed with your material to create a simple PoC, not the most elegant but a working version, but it seems that there is not support for Greek letters. How I could extend the alphabet to included the Greek letters for the parsing? (etc extend with this rule \p{Greek}
Costa Alexoglou
@konsalex

Hey Erez!

I tried regex and stumbled upon this error when I modified/extended the NAME terminal.

lark.exceptions.UnexpectedToken: Unexpected token Token('NAME', 'grython') at line 3, column 5.
Expected one of: 
        * "%="
        * LPAR
        * DOT
        * PLUS
        * "=="
        * _NEWLINE
        * SEMICOLON
        * STAR
        * COLON
        * "-="
        * AND
        * PERCENT
        * "<<"
        * ">>="
        * "!="
        * OR
        * "**="
        * "@="
        * EQUAL
        * "//="
        * COMMA
        * VBAR
        * "*="
        * ">>"
        * "&="
        * ">="
        * IN
        * "<>"
        * "|="
        * LSQB
        * "<="
        * IS
        * "<<="
        * MORETHAN
        * "^="
        * LESSTHAN
        * AT
        * AMPERSAND
        * MINUS
        * SLASH
        * "/="
        * "**"
        * CIRCUMFLEX
        * IF
        * "//"
        * "+="
        * NOT
Previous tokens: [Token('NAME', 'από')]
The NAME terminal is : NAME: /[a-zA-Z_\p{Greek}]\w*/
Costa Alexoglou
@konsalex
And the file (testing code) I was testing:
import numpy

από grython εισήγαγε *
από grython εισήγαγε *
από grython εισήγαγε πακέτο
MegaIng
@MegaIng
This has probably nothing to do with regex, but with your grammar: It appers as if από (probably from?) is not regcognized as a keyword.
Costa Alexoglou
@konsalex

Any clue how to fix this in the grammar?
Here is the grammar to avoid huge and long paste.

https://pastebin.com/P27sAabP

Any feedback also on how I approach the Grammar for the transpiling would be awesome.

MegaIng
@MegaIng
try chaning FROM: "από" | "απο" to _from: "από" | "απο"
Costa Alexoglou
@konsalex
Still nope, Lark.exceptions.LexError: Lexer does not allow zero-width terminals. (NAME: '([a-zA-Z_]\\w*|[\\p{Greek}]*)')
Then if I change * to + it raises another error
MegaIng
@MegaIng
I mean yeah, the terminal can be zero width. More intersting would be the other error
Costa Alexoglou
@konsalex

If I use the native NAME regex the error is becoming:

lark.exceptions.UnexpectedCharacters: No terminal defined for 'π' at line 5 col 22

από grython εισήγαγε πακέτο
                                               ^
Expected one of:
    * _NEWLINE
    * NAME
    * STAR
    * LPAR

Previous tokens: Token('IMPORT', 'εισήγαγε')

You sharing an example repo with you help you @MegaIng ?

Erez Shinan
@erezsh
@konsalex This is a grammar error, probably with one of the rules. If you provide a full example that fails I can help. Otherwise I would just be guessing how you set it up.
Costa Alexoglou
@konsalex
@erezsh here: https://github.com/konsalex/Grython
Just run reconstruct_grython.py which uses the greek.lark
Erez Shinan
@erezsh
@konsalex Well, setting a negative priority on NAME seems to solve it. I don't know what you changed to make it stop working with regular priority.
NAME.-10: /([a-zA-Z_]\w*)|(\p{Greek}\w*)/
I suggest starting with the original codebase (no regex, no greek), see that it works, and then apply your changes gradually until it stops working
Costa Alexoglou
@konsalex
Actually I did not change anything tbh, I used the example from here: https://github.com/lark-parser/lark/blob/master/examples/advanced/python3.lark and just create 3-4 rules for catching the Greek keywords that are of interest. That's weird
MegaIng
@MegaIng
The problem are the terminals in the form FROM: "από" | "απο", which are regex and therefore there is not collision resolving between them and the names. That is why I made my suggestion (which partially helped I think)
change all your new alternatives to the rule form _from and it should wor
Erez Shinan
@erezsh
@MegaIng Good catch. That's why.
schism15
@schism15:matrix.org
[m]
Hello. I've been working with Lark in the function of a lexer/parser. Now I need to validate the syntax of information I've parsed. Is this something that would ideally be handled by a Lark lexer/parser (either through Transformers or some other Lark construct)?
I was reading an article about parsers on tomassetti.me and it casts the role of validation of something separate from parsing.
So not sure if it is optimal to try to use a transformer (e.g. on method for the broadest branch rule) that returns bool indicated whether syntax is valid). Or if validation should happen outside of Lark in some other function/class.
MegaIng
@MegaIng
@schism15:matrix.org What do you mean with 'valid syntax'? Most of the syntax should be fully defined in the grammar. Maybe you mean Semantics?
In general, you could use a Visitor and raise Exceptions on errors
schism15
@schism15:matrix.org
[m]
Hi. Yes, I mean semantics.
I just skimmed the doc on Visitors and this looks promising thanks
Costa Alexoglou
@konsalex

@MegaIng thanks for the explanation. Now it is indeed working. The issue now appears to be

Tree('compound_stmt', [Tree('funcdef', [Token('NAME', 'test'), Tree('suite', [Tree('funccall', [Tree('var', [Token('NAME', 'print ')]), Tree('arguments', [Tree('string', [Token('STRING', '"δουλεύει"')])])]), Tree('return_stmt', [Tree('testlist', [Tree('const_false', [])])])])])])

That the function name συνάρτηση is not a token, so how am I able to change the value from συνάρτηση to def when I cannot access this as a token in the tree?

_funcdecl: "συνάρτηση" | "συναρτηση"
funcdef: _funcdecl NAME "(" parameters? ")" ["->" test] ":" suite
MegaIng
@MegaIng
You now need to reconstruct the tree to a text.
@konsalex You can take a look at the example I provided in the advanced folder. You would just use another grammar with the english keywords.
chuck_master_grep
@cad:matrix.org
[m]
Hey all... so I'm parsing some data from an early 90s database (which someone decided it would be a good idea to embed a custom non-quite-CSV format into individual database cells), and whoever made this database decided to handle the problem of "special characters" by using random other ASCII value, then looking them up in a table of known values. In order to deal with this, I need to be able to have random non-printing characters as terminals. I know the hex values for said terminal, e.g. if I print out text.encode("utf8"). Is there an elegant way to handle this in Lark? Are there any gotchas I need to be on the lookout for?
As a specific example, if I have the string <TŽŽO (as hex: 3c 54 c2 9d c2 8e c2 8e 4f), I need to recognize 0xc29d, 0xc28e, and O as separate terminals (obviously I know how to do this for 'O', since it's a valid ASCII letter)
MegaIng
@MegaIng
@cad:matrix.org You can use use_bytes=True to work on a byte level instead of on a string level.
chuck_master_grep
@cad:matrix.org
[m]
Ahh, didn't catch that in the docs
Erez Shinan
@erezsh
@MegaIng Do you remember why we added the flag, instead of just detecting the string type? I'm sure there was a reason, but I can't think why
MegaIng
@MegaIng
@erezsh Because we are still using unicode string grammar and therefore need to compile the patterns as byte patterns despite not knowing what type we get when .parse gets called.
We could instead delay the compilation, or compile for both, but I don't think that will actually do a lot.
*be worth it