by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jul 08 11:08
    MarinPostma commented #454
  • Jul 07 14:39
    jakubadamw opened #465
  • Jun 26 14:31
    KOLANICH closed #464
  • Jun 26 14:31
    KOLANICH commented #464
  • Jun 26 13:44
    sirxyzzy commented #464
  • Jun 26 12:55
    KOLANICH opened #464
  • Jun 22 12:45
    Spu7Nix opened #463
  • Jun 17 23:05
    CAD97 closed #462
  • Jun 17 22:42
    erayerdin commented #462
  • Jun 17 15:53
    CAD97 commented #462
  • Jun 17 10:50
    erayerdin opened #462
  • May 25 19:05
    CAD97 commented #461
  • May 25 17:09
    mdg commented #461
  • May 25 16:13
    mdg edited #461
  • May 25 16:00
    mdg opened #461
  • May 17 19:03
    Nadrieril edited #460
  • May 17 19:03
    Nadrieril opened #460
  • May 17 18:58
    Nadrieril commented #333
  • May 17 18:56
    Nadrieril commented #333
  • May 15 04:29
    mdg commented #457
Mathias
@mathiversen
I'm getting it to work with many more elements when they don't have so many attributes, or when they're not too deeply nested. I need to further investigate why real webpages gets so slow though.. hmm
Mathias
@mathiversen
found one error, apparently websites aren't too picky when it comes to ending void elements
Matthew Graham
@mdg
does anyone have any examples of a grammar that uses PrecClimber? I found the calculator example, but it doesn't have a pest grammar as part of the example
Matthew Graham
@mdg
I was able to work out my issue. I had to use a combination of a regular rule to recognize when to use the climber and a second silent rule that wouldn't cause nesting on subsequent expressions. This doesn't sound ideal, but it's working for now.
Owen Nelson
@onelson
Working on a grammar which (similar to @expectocode ) defines COMMENT = _{ "#" ~ (!NEWLINE ~ ANY)* ~ NEWLINE? } but I'm having trouble coming up with a way to get the comment rule to skip # within quotes. Seems like this has got to be a common problem for comments. Is there some trick to it?
Owen Nelson
@onelson
Oh oh oh, I think I see. I can disarm the COMMENT rule with an extra level of indirection. If I put the quoted string into an atomic rule, and wrap that with a ! rule, then I still get the match but the comment doesn't break anything.
Mathias
@mathiversen
My html-parser is now sort of working. Turns out internet is full of non-compliant code (who could have guessed). But ive managed to make this https://gist.github.com/mathiversen/69d744da9d80bf774ed549511e12d884 work on most websites. Happy to receive any comments! :)
not sure i've fully grasped when it's preferred to use atomics for example
Jeffrey Goldberg
@jpgoldberg

I understand that I can (and should) use SOI and EOI around in my start rule, but I would also like to parse strings using non-start rules, but make sure that the parse matches the entire string.

Currently, I have a nasty approach in which I'm going through the pairs and finding the lowest start position and the highest end position. But I feel that there must be some nicer, and more general way to do this.

Jeffrey Goldberg
@jpgoldberg
@Phrohdoh I haven't looked in detail at what you are asking about, but I am skeptical of using what is a kind of context-free grammar for phonology. Syllable structure may well have very tree-like properties, but phonology is inherently context sensitive. This will force you to have to make unnatural use of look-aheads.
Jeffrey Goldberg
@jpgoldberg
This message was deleted
Jeffrey Goldberg
@jpgoldberg

Is there an idiomatic way to have variants of the same grammar. Suppose (asking for a friend) one were trying to develop parser that followed the specification for email addresses from RFC5322.

I (I mean my friend) would like to have a way to

  1. Optionally disallow expansion of any of the obsolete non-terminals
  2. Optionally disallow of comment-folding-white-space.

On a related note, is there a way to expand to null. I found that I get an error with a rule like

cfws = { }

Gonçalo Soares
@GoncaloFDS
Hey, first of all, I wanna say this project is awesome! And now I need some help, I need my grammar to be able to parse regex strings like: "^[^\.]*(\.{1}png)$" I believe that the way to do this is using the stack but that is still a WIP so can you help me with this?
Luiz Berti
@luizberti_gitlab

Hi there, I am trying to ignore some characters when matching an atomic rule. I want to match hex numbers as such 0xFF_aa_b and so on, and my current rule works fine:

hexnum = @{
   "0x"
    ~ ASCII_HEX_DIGIT
    ~ (ASCII_HEX_DIGIT | ("_" ~ &ASCII_HEX_DIGIT))*
}

Except it always matches hexnum: "0xFF_aa_b", and I would like it to match as hexnum: "FFaab" to alleviate the cleanup toil afterwards.

This answer on Reddit has a broken link, but it seems to be roughly what I want. I already tried rearranging several times, but no success.

Seems like DROP might be useful in this case too, but theres not much documentation on it, and what I tried so far hasn't worked.

Can anyone help? Thanks in advance

Andy Philpotts
@sirxyzzy
Hi there! I just wanted to give a shout out to Pest, I have been working on a project to parse MIB files, which are written in ASN.1, and without Pest I don't think I would have been able to make this happen. In case anyone wants to critique my code or the grammar I created, the project is on github
Jeffrey Goldberg
@jpgoldberg

@luizberti_gitlab, I am very new to pest, so I might be completely wrong, but I believe that you shouldn't try to achieve what you are after in the grammar itself. Instead you do that in how you process the resulting pairs.

I'm facing something similar, in that I have a pest grammar for RFC5322 email addresses in which

(first comment) jsmith@ (second comment)\nexample.com (third comment)

is a valid address. But I want to prune what the specs call "comments and folding whitespace (CFWS)", but I assume that I do that not in the pest specification itself, but in my handling of non-terminal nodes to eliminate.

That is, I plan to walk through the pairs created by a parse and drop any pair corresponding to the the rule for cwfs. I just haven't gotten around to trying that yet.

So I think you will need to create a rule specifically for 0x and another for _ so that you can walk through your pairs and filter those. This will mean that you need to change your hexnum rule to be a compound atomic.

Again, I am just figuring this stuff out for myself now. It is possible that my thinking about all of this is entirely wrong.

Andy Philpotts
@sirxyzzy

I am just starting to use pest_consume and finding it super helpful in many cases, but I am wondering if there is a more concise way to handle optionals. I have quite a few rules with multiple optional clauses, this is a simpler one..

module_body = { export_list? ~ import_list? ~ assignment_list }

So when dealing with this using pest_consume, it seems I have to cover each possible permutation, something vaguely like:

    fn module_body(node: Node) -> Result<...> {
        Ok(match_nodes!(node.into_children();
            [assignment_list(a)] => ...,
            [export_list(e), assignment_list(a)] => ...,
            [import_list(i), assignment_list(a)] => ...,
            [export_list(e), import_list(i), assignment_list(a)] => ...,
        ))
    }

I'm not looking forward to handling cases like the one below, which if I'm right ,will need 32 match cases , I'll probably just treat is as a list of nodes to avoid that

snmp_object_type_macro_type = { "OBJECT-TYPE"
                          ~ snmp_syntax_part
                          ~ snmp_units_part?
                          ~ snmp_access_part
                          ~ snmp_status_part
                          ~ snmp_descr_part?
                          ~ snmp_refer_part?
                          ~ snmp_index_part?
                          ~ snmp_def_val_part? }
Kmeakin
@Kmeakin
is it possible to make pest continue parsing after it encounters an error
eg when parsing something like (] {) i'd like it to report both the missing ) and the missing (
Daniel Hines
@d4hines

Hi! I'm trying out Pest today. I modified the CSV grammar example from the book to include my grammar. However, when I get a parse error, it looks really ugly:

thread 'main' panicked at 'unsuccessful parse: Error { variant: ParsingError { positives: [var_id], negatives: [] }, location: Pos(12), line_col: Pos((2, 12)), path: None, line: "    occurs(", continued_line: None }', src\main.rs:15:16
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: process didn't exit successfully: `target\debug\csv-tool.exe` (exit code: 101)

How can I get it to look pretty like on pest.rs main page?

Aha, I need to use unwrap_or_else(|e| panic!("{}", e))

Now it looks much better:

thread 'main' panicked at ' --> 2:12
  |
2 |     occurs(
  |            ^---
  |
  = expected var_id', src\main.rs:16:25

But what if my users don't know what a var_id is? Is there a pattern for including more useful info?

Michael Jones
@michaeljones
Hi, I'm new to the project. All the links to the online fiddle that I've tried don't seem to work? Currently using Firefox
Michael Jones
@michaeljones

I would also welcome advice about whether or not I should attempt to parser a language like Elm using Pest. Here is a complete example of the syntax: https://github.com/pdamoc/elm-syntax-sscce/blob/master/src/Main.elm

It is whitespace dependent but I'm not sure it is as regular (not in the technical sense) as say simple markdown lists. From what I can tell from reading different issues, it might be possible to make it work with Pest but perhaps I would be better off with nom or combine or something? I would certainly welcome your input

Andy Philpotts
@sirxyzzy
@michaeljones I think Pest is a better fit than nom (I haven't tried combine) The main reason is whitespace and comments, the built in support in Pest is so much more useful than trying to do the same thing manually in nom. I feel nom is ideal for decoding protocols, which have very constrained syntaxes, and can create a very fast and lean parser, but Pest is better suited to parsing a human readable language.
@michaeljones Oh, and the fiddle ( https://pest.rs/#editor ) is working for me, in Chrome, Edge and Firefox!
segeljakt
@segeljakt
there is also lalrpop, LALR and PEG are pretty different
it's hard to compare pest with lalrpop, the pest grammar is probably briefer, but also requires more "glue-code" to make it work with rust, though this might have been solved since I last used it
segeljakt
@segeljakt
if you are implementing programming language parsers, pest outputs token-trees while lalrpop outputs the AST directly. In this sense, pest's output is more flexbile but also lower-level which might lead to more work
Michael Jones
@michaeljones
Thank you to both of you.

I might return to pest to re-asses it but I couldn't initially see how to handle complex indentation dependent code. It is interesting that it produces tokens rather than ASTs though. I hadn't realised that.

And when I pointed out the issue with the fiddle, I meant that various links to the fiddle, from various Github issues, would load the fiddle but without any actual code in it. The links were like "look at this example" and you'd click on it and see nothing.

jemenake
@jemenake

Is there a best-practice way to prevent identifiers from being keywords in a Pest grammar? Right now, I'm using the following design:

kw_bool = { "bool" }
kw_int = { "int" }
kw_void = { "void" }
keywords = { kw_bool | kw_int | kw_void }
id = @{ !(keywords ~ (WHITESPACE|EOI)) ~ ASCII_ALPHA ~ ASCII_ALPHANUMERIC* }

but this feels a little heavy-handed, to me.

segeljakt
@segeljakt
You could use uppercase
I don't know why pest doesn't have it as convention
asthasr
@asthasr
Hi, guys. Quick question: is there an easy way to "label" a group of rules so that they get a nice name in the generated parser? e.g. if I want to match Mo | Tu | We | Th | Fr as weekday and Sa | Suas weekend -- and use those rules in other rules -- but want to end up with a token called day that I can reference, what's the best way to do it? Or am I barking up the wrong tree?
asthasr
@asthasr
Actually, I figured that out.
weekday = _{ "Mo" | "Tu" | "We" | "Th" | "Fr" }
weekend = _{ "Sa" | "Su" }
day = { 'A'..'Z' ~ 'a'..'z' }
number = { "1" | "2" }

strict_rule = { weekday ~ "1" | weekend ~ "2" }
friendly_rule = _{ PUSH(&strict_rule) ~ POP_ALL ~ day ~ number }
It feels like there should be a nicer way to do this, to say "this rule also produces this token," but this does work.
jemenake
@jemenake

I don't see why you can't just use

weekday = _{ "Mo" | "Tu" | "We" | "Th" | "Fr" }
weekend = _{ "Sa" | "Su" }
day = { weekday | weekend }

You can always access the actual text of the "day" rule by calling the my_day_pair.as_str(), and, if needed, you can make decisions based upon the type of day with

match my_day_pair.into_inner().next().unwrap().as_rule() {
    Rule::weekday => { ... },
    Rule::weekend => { ... }
}

No?

asthasr
@asthasr
@jemenake Note that strict_rule adds additional requirements to the weekday/weekend rules. If my "top level" rule calls something ambiguous like day, it's not going to be able to differentiate between the two types of day. Of course, I could change my day rule to match your day instead of using char ranges and that's fine.
Basically, I want the parser to be able to make decisions that differentiate the weekday/weekend rules, but also group them into a final output token.
I could just make the decision in code, of course, as with your match example, but I think that's unnecessarily "dumb" of the parser... if you consider tokens that are output as being the "API" of a parser, I feel that I should be able to scope it down to what matters.
jemenake
@jemenake
So, it sounds like what you're after is some kind of alias feature, where you could alter the rule Pest is using for matching the input, but could also just have your Rust code just match on an aliased rule (which you call "token", but I don't think Pest thinks in terms of tokens in the traditional lexer/parser sense) called "day". This way, you could change some of your Pest grammar rules between matching weekday, weekend, alldays, etc... without needing to change your Rust code because it would always just match on Rule::day. Does that kinda describe what you're after?
asthasr
@asthasr
Yes, exactly, @jemenake.
asthasr
@asthasr

Something "Serde-like" would also be cool, similar to this:

#[pest(parser = MyParser, entrypoint = some_rule)]
struct MyStruct {
    #[pest(rule = day)]
    day String,

    #[pest(rule = day_of_month)]
    day_of_month u8,

    #[pest(rule = month)]
    month MyMonth,
}

With the macros implying something like a MyStruct::parse(&str) -> Result<MyStruct>

But this is probably the purview of a different crate than pest itself.
jemenake
@jemenake
I've never seen anything like that in Pest (but I'm pretty new to it). I can totally see the benefit of giving any rule an alias, since it would add so much to code reusability. I could see some potential problems if two rules (with the same alias) have different sub-rules, since it could require more match clauses deeper into the rule unpacking as you figure out what you're dealing with.... since, in essence, I think we're talking about sub-type polymorphism, here, so you kinda need a way to discover the particular subclass you're dealing with or enforce some kind of standard characteristics among the subclasses.
asthasr
@asthasr
What if you didn't, though? I mean, in the end you're basically producing (name_of_rule, string). And if you alias it the way I did earlier, using PUSH/POP, if it fails a "strict" rule then that strict rule's failure is still "tracked" by the parser. It seems like it's "okay" if there's a failure between the text parsing and the Rust value instantiation... it's kind of like you have two layers. One layer, the pest-generated text parser, can fail at parsing the text. the next layer, the deserialization macros, can fail at parsing the pest pairs.
another possibility that might make sense would be to do it outside the normal flow of "rules." so at the bottom of the example grammar I said above, what if you had something like tag day = { weekend | weekday } and if either of those rules is produced, that "tag" gets populated as well?