by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jul 07 22:18
    flashtheman closed #1229
  • Jul 07 22:18
    flashtheman commented #1229
  • Jul 07 19:45
    flashtheman opened #1229
  • Jul 02 15:39

    dependabot-preview[bot] on npm_and_yarn

    (compare)

  • Jul 02 15:39
    dependabot-preview[bot] closed #1218
  • Jul 02 15:39
    dependabot-preview[bot] commented #1218
  • Jul 02 15:39
    dependabot-preview[bot] labeled #1228
  • Jul 02 15:39
    dependabot-preview[bot] opened #1228
  • Jun 22 21:58
    bd82 commented #1200
  • Jun 22 20:23
    matthew-dean commented #1200
  • Jun 20 00:11
    cmawhorter starred SAP/chevrotain
  • Jun 18 13:58
    whilp starred SAP/chevrotain
  • Jun 18 11:58
    dependabot-preview[bot] synchronize #1222
  • Jun 18 11:58

    dependabot-preview[bot] on npm_and_yarn

    build(deps-dev): bump mocha fro… (compare)

  • Jun 18 11:58
    dependabot-preview[bot] edited #1222
  • Jun 18 11:54
    dependabot-preview[bot] edited #1222
  • Jun 18 11:54

    bd82 on npm_and_yarn

    (compare)

  • Jun 18 11:54

    bd82 on master

    build(deps-dev): bump lint-stag… (compare)

  • Jun 18 11:54
    bd82 closed #1227
  • Jun 17 10:30

    dependabot-preview[bot] on npm_and_yarn

    (compare)

If you build a wrapper around a RULE such that:

  • Before invocation the position of the next token would be saved.
  • After invocation the position of the previous token would be saved.

I think you should be able to add the position range to your object automatically.

Shahar Soel
@bd82
It should be something like this:
  RULE(name, def) {

    const orgGrammarRuleWrapper = super.RULE(name, def);

    const ourNewRuleWrapper =  function(idxInCallingRule, args) {
      // nextToken before rule invocation
      const start = this.LA(1).startOffset;
      const ruleResult = orgGrammarRuleWrapper.call(this, idxInCallingRule, args)
      // prev token after successful rule invocation
      const end = this.LA(0)

      ruleResult.position = {
        start: start, end: end
      }

      return ruleResult;
    }

    // this is because inside `defineRule` the original wrapper is added directly
    // on the parser instance, so we need to overwrite it.
    this[name] = ourNewRuleWrapper
    // Dark voodoo magic
    ourNewRuleWrapper["originalGrammarAction"] = def
    return ourNewRuleWrapper;
  }
Maybe you could add your wrapper on every rule definition instead:
    $.RULE("json", POSITION_WRAPPER(() => {
      $.OR([
        { ALT: () => $.SUBRULE($.object) },
        { ALT: () => $.SUBRULE($.array) }
      ])
    }))
Lukas Köbis
@lukas1994
ahh nice! that's helpful, I'll try that :)
Shahar Soel
@bd82
So POSITION_WRAPPER would do collect the position information, I think you will not need to go into Chevrotain internals as much with this approach.
Shahar Soel
@bd82
This is starting to look better, disregard my previous example and try to build on this:
  RULE(name, def) {
    return super.RULE(name, () => {
      const start = this.LA(1).startOffset;
      const ruleResult = def();
      const end = this.LA(0)

      if (ruleResult !== undefined) {
        ruleResult.position = {
          start: start, end: end
        }
      }

      return ruleResult;
    });
  }
Please keep in mind that this may not behave as you expect when error recovery is involved as tokens can be skipped / dropped during error recovery.
Good night
Lukas Köbis
@lukas1994

I tried this but when I overwrite the RULE function the parser actually returns a different result:

RULE(name, def) {
      return super.RULE(name, () => {
        const ruleResult = def();
        return ruleResult;
      });
    }

This test case passed before:

Object {
    -   "postfix": "",
    -   "prefix": "$",
    -   "type": "NUMBER",
    -   "value": 12000,
    +   "code": Object {
    +     "0": Object {
    +       "0": "$",
    +       "1": "1",
    +       "2": "2",
    +       "3": ",",
    +       "4": "0",
    +       "5": "0",
    +       "6": "0",
    +       "endColumn": 7,
    +       "endLine": 1,
    +       "startColumn": 1,
    +       "startLine": 1,
    +     },
    +     "endColumn": 7,
    +     "endLine": 1,
    +     "startColumn": 1,
    +     "startLine": 1,
    +   },
    +   "type": "CODE",
      }
Sorry I'm dumb. I use parametrized rules and needed to pass the parameters along. It all works now! Thanks :)
One other random question. What is the offset parameter for? I get startColumn and startLine. Do I also need to keep track of startOffset? I just want to show error lint warnings at the right positions.
Shahar Soel
@bd82
Offset is the matching index in the input string, e.g "123456".charAt(2)" --> "3"
Lukas Köbis
@lukas1994
ohh so if my input is "11\n22" and offset is 3 then it'll point to the the first 2? so offset and line/column are giving me the same information?
Shahar Soel
@bd82
Yes, its the same information in a different format
You have a few options to control what is being collected by the Lexer: https://github.com/SAP/chevrotain/blob/master/packages/chevrotain/api.d.ts#L1303-L1313
Santhosh Kumar
@brsanthu
@team we are looking into this library to use for one of dsl implementation. We want to provide UI editor to edit the language and we want to use Monaco as editor. Is there an example that we could refer to see how to connect Monaco with chevrotain based dsl?
Shahar Soel
@bd82

@brsanthu There is no monaco specific example, generally to support Editor Services scenarios you would implement support for LSP (Language Service Protocol).
There are some XML related examples in these two projects for building Language Services with a Chevrotain Parser:

Note that the Parser is just a small part of such a scenario...

You canalso find some other examples for Language Services in this thread: SAP/chevrotain#921
Santhosh Kumar
@brsanthu
@bd82 thanks for the pointers.
Santhosh Kumar
@brsanthu
We are getting error Cannot read property 'length' of undefined at stacktrace below. Any idea what wrong I could be dong?
TypeError: Cannot read property 'length' of undefined
    at pathToHashKeys (node_modules/chevrotain/lib/src/parse/grammar/lookahead.js:354:57)
    at node_modules/chevrotain/lib/src/parse/grammar/lookahead.js:389:24
    at Object.forEach (node_modules/chevrotain/lib/src/utils/utils.js:77:30)
    at node_modules/chevrotain/lib/src/parse/grammar/lookahead.js:388:17
    at Object.map (node_modules/chevrotain/lib/src/utils/utils.js:45:30)
    at lookAheadSequenceFromAlternatives (node_modules/chevrotain/lib/src/parse/grammar/lookahead.js:386:30)
    at Object.getLookaheadPathsForOptionalProd (node_modules/chevrotain/lib/src/parse/grammar/lookahead.js:458:12)
    at SqlParser.ErrorHandler.raiseEarlyExitException (node_modules/chevrotain/lib/src/parse/parser/traits/error_handler.js:46:56)
    at SqlParser.RecognizerEngine.atLeastOneSepFirstInternalLogic (node_modules/chevrotain/lib/src/parse/parser/traits/recognizer_engine.js:290:24)
    at SqlParser.RecognizerEngine.atLeastOneSepFirstInternal (node_modules/chevrotain/lib/src/parse/parser/traits/recognizer_engine.js:257:14)
Shahar Soel
@bd82
I would guess that there is something invalid in the grammar, but I wonder why it was not picked up by earlier checks. Try to reproduce it in a small example and open an issue.
Santhosh Kumar
@brsanthu
@bd82 I could reduce the project so simple enough usecase that exhibited the issue. So we reverted the change to earlier version and made incremental refactoring and it is working now.
Any idea how to go about adding comments support? I'm going through https://github.com/jhipster/prettier-java/tree/master/packages/java-parser/src, seems like a big project that is using this library. They create line and block comments in lexers but could find any references where those lexers are referred in parser or visitor.
Matthew Dean
@matthew-dean

I guess I can't do this:

          GATE: () => {
            const isDeclaration = $.BACKTRACK($.testVariable)() && !$.isVariableCall
            $.isVariableCall = false
            return isDeclaration
          },

That is, I can't call $.BACKTRACK within a gate, I guess? I get "Cannot read property 'isBackTrackingStack' of undefined"

Matthew Dean
@matthew-dean
Ah, it just needed a this context. Fixed it!
          GATE: () => {
            const isDeclaration = $.BACKTRACK.call($, $.testVariable) && $.isVariableCall
            $.isVariableCall = false
            return isDeclaration
          },
Matthew Dean
@matthew-dean
I ended up with: GATE: () => ($.BACKTRACK($.testVariable)).call($) && !$.isVariableCall
Santhosh Kumar
@brsanthu

We are trying to support special wildcard support sql like syntax. Typically we say select * from table but we need to support select *name from table or select name* from table or select name*name from table etc., where * is part of the identifier. We created WildcardIdentifier token (see below) but lexer is still tokenizing name* as two tokens instead of one.

export const WildcardIdentifier = createToken({
  name: 'WildcardIdentifier',
  pattern: /[a-z][a-z0-9_*]*[*]|[*][a-z][a-z0-9_*]*|[a-z][a-z0-9_*]*[a-z0-9_]/i,
  line_breaks: false,
  longer_alt: Identifier,
  start_chars_hint: ['*', ...atoz, ...AtoZ],
});

Any hints to the problem?

Santhosh Kumar
@brsanthu

We tried below iteration as well.

export const WildcardIdentifier = createToken({
  name: 'WildcardIdentifier',
  pattern: /[a-z*][a-z0-9_*]*/i,
  line_breaks: false,
  start_chars_hint: ['*', ...atoz, ...AtoZ],
});

export const Asterisk = createToken({
  name: 'Asterisk',
  pattern: '*',
  label: '*',
  longer_alt: WildcardIdentifier,
});

export const ALL_TOKENS = [
...
Asterisk,
WildcardIdentifier,
...
]

But here too, input of name* is tokenized as two tokens of name and * instead of single.

Santhosh Kumar
@brsanthu
Soled: we had another token Identifier which was before WildcardIdentifier which didn't have longer_alt. We specified it as follows and it is working fine.
Shahar Soel
@bd82

Any idea how to go about adding comments support?

@brsanthu Are comments allowed anywhere in your language?

Soled: we had another token Identifier which was before WildcardIdentifier which didn't have longer_alt. We specified it as follows and it is working fine.

@brsanthu sounds like LONGER_ALT is the right approach here. keep in mind you can also use custom token patterns: https://sap.github.io/chevrotain/docs/guide/custom_token_patterns.html if you want to add more logic to you matchers.

This message was deleted
Shahar Soel
@bd82

Ah, it just needed a this context. Fixed it!

@matthew-dean all those context issues seems weird, I wonder if using regular "this" instead of "$" would have solved it. afterAll a lambda function should pass the correct "this" context.

Santhosh Kumar
@brsanthu
@bd82 yes, comment can come anywhere. Think of sql and you can comment between statements, between fields, between clauses etc.,
Xtext has a concept of hidden tokens, and you can assign any token as hidden tokens. Parser will automatically collect the tokens without having to define them in each place. Is there such construct in here?
Shahar Soel
@bd82
@brsanthu there are not hidden tokens, but there is token grouping I think it should fit your use case: https://sap.github.io/chevrotain/docs/features/token_grouping.html
Santhosh Kumar
@brsanthu
@bd82 tried with grouping, which does collect all tokes automatically but you lose the details of comment positions and we cannot use it to serialize the model back into text. Is there a way to interleave tokens and groups into correct order?
Shahar Soel
@bd82
@brsanthu There is not a built in solution for this in Chevrotain. This is the problem that the Java-Parser solves as it is used in a pettier-plugin which needs to re-construct the whole text.
You could do post processing on the token vector and add comments information on-top of existing tokens and then adjust the chevrotain CONSUME to handle those, you could add it to your CST via a visitor by comparing the comments positions and tokens inside your CST structure. there are many ways to approach this problem, but I do admit it is not a trivial one.
John Doe Antler
@JohnDoeAntler_gitlab
// lexer.ts
export const Identifier = createToken({
    name: 'Identifier',
    pattern: /[a-zA-Z\_]\w*/,
});

export const Const = createToken({
    name: "Const",
    pattern: /const/,
    longer_alt: Identifier,
});

export const tokens = [
    ...
    Const,
    ...
    Identifier,
];

// input
consttest

// lex result
{
  tokens: [
    {
      image: 'const',
      ...
    },
    {
      image: 'test', 
      ...
    }
  ],
  groups: {},
  errors: []
}
Is longer_alt just broken or deprecated? :/
Shahar Soel
@bd82
@JohnDoeAntler_gitlab its not broken of deprecated afaik. I am unable to reproduce your issue:
const { Lexer, createToken } = require("chevrotain")

// ----------------- lexer -----------------
const Identifier = createToken({
  name: 'Identifier',
  pattern: /[a-zA-Z\_]\w*/,
});

const Const = createToken({
  name: "Const",
  pattern: /const/,
  longer_alt: Identifier,
});

const allTokens = [
  Const,
  Identifier
]

const lexer = new Lexer(allTokens, {positionTracking: "onlyOffset"})
const result = lexer.tokenize("consttest")
// only a single token in the result
John Doe Antler
@JohnDoeAntler_gitlab

@JohnDoeAntler_gitlab its not broken of deprecated afaik. I am unable to reproduce your issue:

I figured it out, my export and import parts are just messed up therefore the lexer cannot able to load the longer_alt properly... Sorry for disturbing you ;c

John Doe Antler
@JohnDoeAntler_gitlab
another question, how to invoke the rule in parser's constructor with typescript? :/
export class ASParser extends CstParser {
    constructor() {
        super(tokens);
        this.RULE("global", () => {
            this.CONSUME(Public);
        });
        this.performSelfAnalysis();
    }
}

export const parserInstance = new ASParser();

export const parse = (str: string) => {
    const tokens = tokenize(str);

    // input
    parserInstance.input = tokens;

    // js
    // parserInstance.global();

    return parserInstance.errors;
}
Shahar Soel
@bd82
With TypeScript you can define the parsing rules as class properties (not inside the constructor), in the future such syntax will also be supported in ECMAScript.
John Doe Antler
@JohnDoeAntler_gitlab
ok thanks <3
Shahar Soel
@bd82
you are welcome :smile:
Lukas Köbis
@lukas1994
Would any chevrotain experts be available for a few hours of consulting? I've set up a grammar (~300 lines of code) but I'm having some issues with it (spent a fair amount of time on it already). You can email me at lukas@causal.app Thanks!