## Where communities thrive

• Join over 1.5M+ people
• Join over 100K+ communities
• Free without limits
##### Activity
• 10:39
ramblingenzyme starred SAP/chevrotain
• 05:33
samvasta starred SAP/chevrotain
• Nov 30 11:28

dependabot-preview[bot] on npm_and_yarn

• Nov 30 11:28
dependabot-preview[bot] closed #1281
• Nov 30 11:28
dependabot-preview[bot] commented #1281
• Nov 30 11:28
dependabot-preview[bot] labeled #1284
• Nov 30 11:28
dependabot-preview[bot] opened #1284
• Nov 30 11:28

dependabot-preview[bot] on npm_and_yarn

build(deps-dev): bump prettier … (compare)

• Nov 27 20:21
Aarbel starred SAP/chevrotain
• Nov 27 18:00
• Nov 27 01:22
smarthug starred SAP/chevrotain
• Nov 26 00:32
toddpress starred SAP/chevrotain
• Nov 25 11:58
• Nov 24 23:01
dependabot-preview[bot] labeled #1283
• Nov 24 23:01
dependabot-preview[bot] labeled #1283
• Nov 24 23:01
dependabot-preview[bot] opened #1283
• Nov 24 23:01

dependabot-preview[bot] on npm_and_yarn

build(deps): [security] bump hi… (compare)

• Nov 24 10:26

dependabot-preview[bot] on npm_and_yarn

• Nov 24 10:26
dependabot-preview[bot] closed #1278
• Nov 24 10:26
dependabot-preview[bot] commented #1278
Shahar Soel
@bd82

@matthew-dean regarding nested parenthesis expression parsing:
Have a look at the calculator example, after a left parenthesis start a new expression from scratch. There is indeed recursion.

Matthew Dean
@matthew-dean
@bd82 The tricky thing is that the logical and/or/not must be followed by parentheses. For some reason the recursion of math expressions seemed intuitive, but this is tripping me up.
Matthew Dean
@matthew-dean
therefore (a > b) and (b > c) is allowed but ((a > b and (b > c))) is not. But if I require parens between logical parens, and then recurse, Then an optional parens ends up requiring a second set of parens, which it must do ONLY if joined by and/or. So in the examples, you have optional parens, but in this case, it's a mix of required and optional, depending on what's being joined
Incidentally, this is the same as media query syntax, which may have an expression or another media query, either of which starts with a left parentheses. An expression is valid unless it's part of another query, in which case it needs to be wrapped in parentheses.
Ricky Reusser
@rreusser
      $.SUBRULE($.expression, { LABEL: "lhs" })
$.MANY(() => {$.OPTION(() => $.CONSUME(SomeOperator, {LABEL: "op"}))$.SUBRULE2(\$.expression, { LABEL: "rhs" })
});
I think the above code successfully picks up expressions with an optional operator between them, but is there any way to line up the expressions with the operator?
expr * expr expr and expr expr * expr both seem to give one lhs expression, two rhs expressions, and an array with a single operator.
there doesn't seem to be enough information to distinguish one from the other
Ricky Reusser
@rreusser
my workaround has been to route the other option through a separate rule so that you'd get something more like expr * (expr expr) or (expr expr) * expr, which is enough to disambiguate the two
Shahar Soel
@bd82
@matthew-dean perhaps you should perform the analysis if a parenthesis is missing POST parsing? basically accept a generic expression grammar and later figure if its valid, or even its meaning if needed.
Shahar Soel
@bd82

but is there any way to line up the expressions with the operator?

@rreusser , what about using the same LABEL for RHS and LHS so you will have one array of operands and one array of operators, I am not sure the rhs/lhs is actually accurate terminology here.

Ricky Reusser
@rreusser
ah, I hadn't thought of doing that across different subrules. I adapted the code from the calculator example: https://github.com/SAP/chevrotain/blob/a81b06c5b87f2c814667002fd2dd86b507ad474f/examples/grammars/calculator/calculator_pure_grammar.js#L101-L110
in that case it seems to work reasonably cleanly, but I'll give it a try and see if that changes how things work out for me
Nanonid
@Nanonid
Hey Shahar. I've published a simple library built on your XML example grammar. https://www.npmjs.com/package/xmljsonkit
Hello everyone
I've just opened #1245 but wondering if I would need to write my problem here first :/
Shahar Soel
@bd82
Hello @Nanonid you may want to use https://www.npmjs.com/package/@xml-tools/parser instead of building on-top of the XML Example grammar as this package is more productive quality, e.g it is used in the prettier xml-parser.
I meant prettier XML-Plugin
Matthew Dean
@matthew-dean

For visitor methods (https://sap.github.io/chevrotain/docs/tutorial/step3a_adding_actions_visitor.html#introduction), I would have expected that this.visit, if visiting an array, would return an array of visit results. However, that does not appear to be the case. I think this is one of those cases where there is non-intuitive behavior based on the structure of the CST. That is, the CST always produces children as arrays, for whatever reason, even if they only have one production. So maybe the typical scenario is that you would only visit arrays with a length of 1?

...Okay, just found the doc note: "If an array is passed (ctx.fromClause is an array) it is equivalent to passing the first element of that array"... IMO that should be called out in a dedicated section on arrays, with a convenience method such as visitArray to return an array of results.

Shahar Soel
@bd82

@matthew-dean

Although its not high on my priority queue, I'm currently trying to deprecate/replace the regexp-to-ast library
and evaluate cutting out less used / less needed features to shrink down Chevrotain to a more manageable size.

As a workaround you can see some visitor utilities implemented in the Java-Parser's visitor class:

Shahar Soel
@bd82

I think this is one of those cases where there is non-intuitive behavior based on the structure of the CST. That is, the CST always produces children as arrays, for whatever reason, even if they only have one production.

The structure of the CST (map of arrays) was something that I've put a-lot of thought and effort into in the past.
Initially I wanted to avoid the arrays when there is only one element, however there is the problem of consistency:

• If a repetition is only entered once, should there be an array or not?
• If there are two distinct "places" in a single rule where the same token may be optionally consumed, should it always be modeled as an array or only if both "places" have been "consumed"?

I choose to always use arrays as the most consistent option (imho). and attempt to "hide" some of it using
utilities in the visitor.

Although I do agree that at some point a Visitor 2.0 for CST should be created, based also on previous feedback you have provided :smile:

Cheers.
Shahar.

davout1806
@davout1806
I'm I missing something regarding using the Visitor? The object (ctx) passed to each Visitor method is the collection of the node children. But there is no information about the order the children occurred, they're grouped by token. My lexer has the correct order of the tokens in the tokens property. Thanks.
Shahar Soel
@bd82
Hello @davout1806 There is an in-depth discussion on the topic here:
bvfnbk
@bvfnbk

Hi all, i hope this is the right place to ask my question - if not, please ignore ;) I fiddled around a bit with the lexer and noticed something i cannot quite explain; i created a simple token with a unicode regex. When i tokenize a character matching this token i get an error.

The pen https://codepen.io/bvfnbk/pen/GRZbyRq illustrates this. I am not sure if https://sap.github.io/chevrotain/docs/guide/resolving_lexer_errors.html#UNICODE_OPTIMIZE applies here and if, how to provide a start_char_hint...

4 replies
Emiliano Heyns
@retorquere

I want to (re)create a bibtex parser using chevrotain. I've followed the tutorials and I think I have a decent grasp on how chevrotain works. I have an existing parser written using PEG.js.

If I understand correctly, tokenization in chevrotain is context-free; I specify what the tokens look like, but I can't steer the tokenization process further. The problem I'm facing is that bibtex is strongly context-dependent also in the tokenization phase; there are lists where and is a separator, and the way to recognize these fields is by their name that came before; there are verbatim and non-verbatim fields, with very different parsing rules, again distinguished by the field name, and there are verbatim and non-verbatim commands which mean the parsing rules for the arguments depend on the command name that came before them.

An example of an verbatim command would be \url{http://www.google.com{}}; the URL and the trailing {} must be parsed in verbatim mode because the prerceding command \url.

I've seen the lexer modes section of the docs, but that seems to require that the mode switches are indicated by unique entry and exit tokens, and that is not the case in bibtex; verbatim mode ends when the matching brace is found, so in the eralier example, it is the last } that ends verbatim mode, and not the one immediately preceding.

Shahar Soel
@bd82

Hello @retorquere

It is possible Chevrotain would not fit your scenario as there may be too much context in the lexer phase and you would need a lexerless parser (like PEGjs).

Custom Token Patterns enable you to inspect previously lexed tokens, perhaps that will provide enough context: https://sap.github.io/chevrotain/docs/guide/custom_token_patterns.html#lexing-context

You can also write your own Lexer by hand and convert the results to Chevrotain Tokens.

Last option I can think of is defer identification of a Token to the parsing phase, e.g identify a token of Type XorY, and then combine it with logic in the parser (perhaps with backtracking) to make a final determination.
However if you need to go this route on a regular basis its a good indication the tool may not be suitable for your use case.

Emiliano Heyns
@retorquere
Hmm, the custom token pattern could suffice. I could use that in combination with push_mode and pop_mode, right?
Is it possible to do mode manipulation in the custom token pattern itself? Or to dynamically return the pop_mode there?
Shahar Soel
@bd82
You can define a custom token to push or pop a mode but only statistically. if you need more dynamic logic perhaps you will want to build your own hand crafted lexer instead of limiting yourself to Chevrotain APIs.
Shahar Soel
@bd82
I meant in a static manner
Emiliano Heyns
@retorquere
I was already wondering :D. I think I'll be able to do it with the custom token approach while still getting the benefits from the mode management and the parse phase.
Shahar Soel
@bd82
:thumbsup:
Emiliano Heyns
@retorquere
Is it possible to have a token set a specific mode rather than pushing the mode? IOW is there a way to discard the mode stack and start with a new stack?
Emiliano Heyns
@retorquere
And can I check the current mode in the exec function of a custom token?
Shahar Soel
@bd82
Hello. @retorquere
The mode handling is not that advanced, you can inspect the custom token execution logic here:
Don't be wary of implementing your own lexer if you have unique logic required, it is normally the simpler part of implementing a parser.
Bill Barthel
@bbarthel
Is there any recommended way to easily change the error messages? eg "Expecting: one of these possible Token sequences:"
2 replies
Bill Barthel
@bbarthel
If you want to create another Identifier, eg EntityName that has the same pattern, what do you do?
4 replies
Bill Barthel
@bbarthel
Appears that LABEL: is not working ... even in the playground ...
Shahar Soel
@bd82
Hello @bbarthel see replies inside the threads for each of your questions.
Bill Barthel
@bbarthel
Thank you @bd82!
Emiliano Heyns
@retorquere

Don't be wary of implementing your own lexer if you have unique logic required, it is normally the simpler part of implementing a parser.

@bd82 I've gone with the custom lexer -- what does createChevToken(chevTokenClass, acornToken) expect for parameters?

Ah wait -- I create tokens as usual, but override the image and Xoffset? That's it?
Emiliano Heyns
@retorquere
Should endOffset be inclusive the position of the last char of the match, or one past?
Emiliano Heyns
@retorquere
When I implement my own lexer, how can I assign them to Lexer.SKIPPED?
15 replies
Emiliano Heyns
@retorquere
How do I get the current token in a GATE function?
3 replies
Emiliano Heyns
@retorquere
Can I put tokens back into the token stream in the parser? I am parsing bibtex which has a sort of macro facility (@string declarations), so when a macro is encountered, the tokens associated with the @string should be offered to the parser instead.