by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • May 25 19:05
    CAD97 commented #461
  • May 25 17:09
    mdg commented #461
  • May 25 16:13
    mdg edited #461
  • May 25 16:00
    mdg opened #461
  • May 17 19:03
    Nadrieril edited #460
  • May 17 19:03
    Nadrieril opened #460
  • May 17 18:58
    Nadrieril commented #333
  • May 17 18:56
    Nadrieril commented #333
  • May 15 04:29
    mdg commented #457
  • May 15 03:35
    CAD97 commented #439
  • May 15 03:32
    CAD97 commented #457
  • May 15 03:31
    CAD97 commented #457
  • May 15 03:22
    mdg commented #439
  • May 15 03:12
    mdg commented #457
  • May 14 01:50
    zmitchell closed #459
  • May 14 01:50
    zmitchell commented #459
  • May 14 01:40
    CAD97 commented #459
  • May 14 01:32
    zmitchell opened #459
  • May 11 14:07
    applegrew commented #458
  • May 11 14:06
    applegrew closed #458
Taryn
@Phrohdoh
hm that makes sense
I'll have to see how the reference impl works as the reference peg is written this way
thank you
Taryn
@Phrohdoh
I misunderstood the reference peg's precedence, fixing that'll probably solve my issue. :-)
Tanuj
@expectocode
good luck :)
Kate Goldenring
@kate-goldenring

I have created a grammar for parsing udev rules, but when i pass a bad udev rule (one where the field such as "TYPO" does not fit the grammar), the parser doesn't throw an error unless it is the first condition in the udev_rule. How do i parse it in a way that i know the udev_rule_string was incorrectly formatted/ doesn't fit the grammar?

The Pest Book says Eagerness causes "If an expression fails to match, the failure propagates upwards, eventually leading to a failed parse, unless the failure is "caught" somewhere in the grammar." But I don't see this happening.

Here is the grammar:

WHITESPACE = _{ " " }
udev_rule = { condition ~ ("," ~ condition)* }
condition = ${ field ~ operation ~ quoted_value }
operation = { equality | inequality | assignment }
equality = { "==" }
inequality = { "!=" }
assignment = { "=" }
field = { devpath | kernel | tag | driver | subsystem }
quoted_value = {"\"" ~ value ~ "\""}
value = { (ASCII_ALPHANUMERIC | SPACE_SEPARATOR | "$" | "." | "_" | "*" | "?" | "[" | "]" | "-" | "|" | "\\" | "/" | "%")* } 

// Supported fields
devpath = { "DEVPATH" }
kernel = { "KERNEL" }
tag = { "TAG" }
driver = { "DRIVER" }
subsystem = { "SUBSYSTEM" }

Here is the function for parsing

pub fn parse_udev_rule(udev_rule_string: &str) {
    let udev_rule = UdevRuleParser::parse(Rule::udev_rule, udev_rule_string)?.next().unwrap();
}

Panics on unwrap:
parse_udev_rule("TYPO==\"blah\", KERNEL==\"video[0-9]*\"");
Doesn't panic on unwrap: WHY?
parse_udev_rule("KERNEL==\"video[0-9]*\", TYPO==\"blah\" ");

Tanuj
@expectocode
@kate-goldenring might sound silly, but could it be because you arent matching SOI / EOI in your grammar?
Kate Goldenring
@kate-goldenring
@expectocode That is probably part of it! However, when i change the grammar to be udev_rule = { SOI ~ condition ~ ("," ~ condition)* ~ EOI }, now, the unwrap() following next() always panics
Kate Goldenring
@kate-goldenring

I next tried udev_rule = { SOI ~ (condition ~ ("," ~ condition)*) ~EOI } and it still didn't work. Moving the inner rule into it's own rule fixes the issue:

udev_rule = { SOI ~ inner_rule ~ EOI }
inner_rule = { condition ~ ("," ~ condition)* }

Why does it need to be moved into a new rule?

Tanuj
@expectocode
Not sure - the one you said causes it to always panic works for me in the pest.rs editor
image.png
super-continent
@super-continent
having trouble getting pest to recognize comments in my script, my project has this file defining the grammar, but any comments put just trigger an error
anyone have ideas of what im doing wrong?
Kate Goldenring
@kate-goldenring
@expectocode I realized it was because when i iterated across the rules within udev_rule, I was assuming they were all component rules; however, EOI is also an element
super-continent
@super-continent
still havent figured out the comment rules, it couldnt be because of the implementation since comment doesnt create any tokens, so im not sure why it wont recognize and ignore comments
Tanuj
@expectocode
@kate-goldenring glad you figured it out! If I may ask, what's the parser for?
@super-continent assuming you're talking about the line you removed in the last commit, I think it's because your ANY+ would accept the NEWLINE, and then the NEWLINE wouldnt match
Tanuj
@expectocode
Here's the comment rule I used for my language: COMMENT = _{ "#" ~ (!NEWLINE ~ ANY)* ~ &(NEWLINE | EOI)
Mathias
@mathiversen
Hi! I started a little toy project the other day, where im building an html parser. Ive managed to write something that passes my 20-ish tests for what html is, and it runs pretty smooth when the input string is relatively short (20-ish elements), but it just runs into an infinite loop when i use it on a 50-ish elements document. Here's the pest-code: https://gist.github.com/mathiversen/69d744da9d80bf774ed549511e12d884
Any ideas of on how to make it work on large documents? :)
Mathias
@mathiversen
I'm getting it to work with many more elements when they don't have so many attributes, or when they're not too deeply nested. I need to further investigate why real webpages gets so slow though.. hmm
Mathias
@mathiversen
found one error, apparently websites aren't too picky when it comes to ending void elements
Matthew Graham
@mdg
does anyone have any examples of a grammar that uses PrecClimber? I found the calculator example, but it doesn't have a pest grammar as part of the example
Matthew Graham
@mdg
I was able to work out my issue. I had to use a combination of a regular rule to recognize when to use the climber and a second silent rule that wouldn't cause nesting on subsequent expressions. This doesn't sound ideal, but it's working for now.
Owen Nelson
@onelson
Working on a grammar which (similar to @expectocode ) defines COMMENT = _{ "#" ~ (!NEWLINE ~ ANY)* ~ NEWLINE? } but I'm having trouble coming up with a way to get the comment rule to skip # within quotes. Seems like this has got to be a common problem for comments. Is there some trick to it?
Owen Nelson
@onelson
Oh oh oh, I think I see. I can disarm the COMMENT rule with an extra level of indirection. If I put the quoted string into an atomic rule, and wrap that with a ! rule, then I still get the match but the comment doesn't break anything.
Mathias
@mathiversen
My html-parser is now sort of working. Turns out internet is full of non-compliant code (who could have guessed). But ive managed to make this https://gist.github.com/mathiversen/69d744da9d80bf774ed549511e12d884 work on most websites. Happy to receive any comments! :)
not sure i've fully grasped when it's preferred to use atomics for example
Jeffrey Goldberg
@jpgoldberg

I understand that I can (and should) use SOI and EOI around in my start rule, but I would also like to parse strings using non-start rules, but make sure that the parse matches the entire string.

Currently, I have a nasty approach in which I'm going through the pairs and finding the lowest start position and the highest end position. But I feel that there must be some nicer, and more general way to do this.

Jeffrey Goldberg
@jpgoldberg
@Phrohdoh I haven't looked in detail at what you are asking about, but I am skeptical of using what is a kind of context-free grammar for phonology. Syllable structure may well have very tree-like properties, but phonology is inherently context sensitive. This will force you to have to make unnatural use of look-aheads.
Jeffrey Goldberg
@jpgoldberg
This message was deleted
Jeffrey Goldberg
@jpgoldberg

Is there an idiomatic way to have variants of the same grammar. Suppose (asking for a friend) one were trying to develop parser that followed the specification for email addresses from RFC5322.

I (I mean my friend) would like to have a way to

  1. Optionally disallow expansion of any of the obsolete non-terminals
  2. Optionally disallow of comment-folding-white-space.

On a related note, is there a way to expand to null. I found that I get an error with a rule like

cfws = { }

Gonçalo Soares
@GoncaloFDS
Hey, first of all, I wanna say this project is awesome! And now I need some help, I need my grammar to be able to parse regex strings like: "^[^\.]*(\.{1}png)$" I believe that the way to do this is using the stack but that is still a WIP so can you help me with this?
Luiz Berti
@luizberti_gitlab

Hi there, I am trying to ignore some characters when matching an atomic rule. I want to match hex numbers as such 0xFF_aa_b and so on, and my current rule works fine:

hexnum = @{
   "0x"
    ~ ASCII_HEX_DIGIT
    ~ (ASCII_HEX_DIGIT | ("_" ~ &ASCII_HEX_DIGIT))*
}

Except it always matches hexnum: "0xFF_aa_b", and I would like it to match as hexnum: "FFaab" to alleviate the cleanup toil afterwards.

This answer on Reddit has a broken link, but it seems to be roughly what I want. I already tried rearranging several times, but no success.

Seems like DROP might be useful in this case too, but theres not much documentation on it, and what I tried so far hasn't worked.

Can anyone help? Thanks in advance

Andy Philpotts
@sirxyzzy
Hi there! I just wanted to give a shout out to Pest, I have been working on a project to parse MIB files, which are written in ASN.1, and without Pest I don't think I would have been able to make this happen. In case anyone wants to critique my code or the grammar I created, the project is on github
Jeffrey Goldberg
@jpgoldberg

@luizberti_gitlab, I am very new to pest, so I might be completely wrong, but I believe that you shouldn't try to achieve what you are after in the grammar itself. Instead you do that in how you process the resulting pairs.

I'm facing something similar, in that I have a pest grammar for RFC5322 email addresses in which

(first comment) jsmith@ (second comment)\nexample.com (third comment)

is a valid address. But I want to prune what the specs call "comments and folding whitespace (CFWS)", but I assume that I do that not in the pest specification itself, but in my handling of non-terminal nodes to eliminate.

That is, I plan to walk through the pairs created by a parse and drop any pair corresponding to the the rule for cwfs. I just haven't gotten around to trying that yet.

So I think you will need to create a rule specifically for 0x and another for _ so that you can walk through your pairs and filter those. This will mean that you need to change your hexnum rule to be a compound atomic.

Again, I am just figuring this stuff out for myself now. It is possible that my thinking about all of this is entirely wrong.

Andy Philpotts
@sirxyzzy

I am just starting to use pest_consume and finding it super helpful in many cases, but I am wondering if there is a more concise way to handle optionals. I have quite a few rules with multiple optional clauses, this is a simpler one..

module_body = { export_list? ~ import_list? ~ assignment_list }

So when dealing with this using pest_consume, it seems I have to cover each possible permutation, something vaguely like:

    fn module_body(node: Node) -> Result<...> {
        Ok(match_nodes!(node.into_children();
            [assignment_list(a)] => ...,
            [export_list(e), assignment_list(a)] => ...,
            [import_list(i), assignment_list(a)] => ...,
            [export_list(e), import_list(i), assignment_list(a)] => ...,
        ))
    }

I'm not looking forward to handling cases like the one below, which if I'm right ,will need 32 match cases , I'll probably just treat is as a list of nodes to avoid that

snmp_object_type_macro_type = { "OBJECT-TYPE"
                          ~ snmp_syntax_part
                          ~ snmp_units_part?
                          ~ snmp_access_part
                          ~ snmp_status_part
                          ~ snmp_descr_part?
                          ~ snmp_refer_part?
                          ~ snmp_index_part?
                          ~ snmp_def_val_part? }
Kmeakin
@Kmeakin
is it possible to make pest continue parsing after it encounters an error
eg when parsing something like (] {) i'd like it to report both the missing ) and the missing (
Daniel Hines
@d4hines

Hi! I'm trying out Pest today. I modified the CSV grammar example from the book to include my grammar. However, when I get a parse error, it looks really ugly:

thread 'main' panicked at 'unsuccessful parse: Error { variant: ParsingError { positives: [var_id], negatives: [] }, location: Pos(12), line_col: Pos((2, 12)), path: None, line: "    occurs(", continued_line: None }', src\main.rs:15:16
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: process didn't exit successfully: `target\debug\csv-tool.exe` (exit code: 101)

How can I get it to look pretty like on pest.rs main page?

Aha, I need to use unwrap_or_else(|e| panic!("{}", e))

Now it looks much better:

thread 'main' panicked at ' --> 2:12
  |
2 |     occurs(
  |            ^---
  |
  = expected var_id', src\main.rs:16:25

But what if my users don't know what a var_id is? Is there a pattern for including more useful info?

Michael Jones
@michaeljones
Hi, I'm new to the project. All the links to the online fiddle that I've tried don't seem to work? Currently using Firefox
Michael Jones
@michaeljones

I would also welcome advice about whether or not I should attempt to parser a language like Elm using Pest. Here is a complete example of the syntax: https://github.com/pdamoc/elm-syntax-sscce/blob/master/src/Main.elm

It is whitespace dependent but I'm not sure it is as regular (not in the technical sense) as say simple markdown lists. From what I can tell from reading different issues, it might be possible to make it work with Pest but perhaps I would be better off with nom or combine or something? I would certainly welcome your input

Andy Philpotts
@sirxyzzy
@michaeljones I think Pest is a better fit than nom (I haven't tried combine) The main reason is whitespace and comments, the built in support in Pest is so much more useful than trying to do the same thing manually in nom. I feel nom is ideal for decoding protocols, which have very constrained syntaxes, and can create a very fast and lean parser, but Pest is better suited to parsing a human readable language.
@michaeljones Oh, and the fiddle ( https://pest.rs/#editor ) is working for me, in Chrome, Edge and Firefox!
segeljakt
@segeljakt
there is also lalrpop, LALR and PEG are pretty different
it's hard to compare pest with lalrpop, the pest grammar is probably briefer, but also requires more "glue-code" to make it work with rust, though this might have been solved since I last used it
segeljakt
@segeljakt
if you are implementing programming language parsers, pest outputs token-trees while lalrpop outputs the AST directly. In this sense, pest's output is more flexbile but also lower-level which might lead to more work