Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Johannes Maas
@Y0hy0h
And how do I continue searching if end fails? attempt will go back to before end started consuming, right? So how can I keep looking until end is successful or I reach the end of file?
Johannes Maas
@Y0hy0h
Oh, I'm realizing in general this is a bad idea, because a legitimate end could begin in the middle what we read before endfailed. E.g., if end matches aab and we parse aaab, then it would consume aa, fail on the next a and then continue at that a and next encounter the b and still fail, whereas there is an aabin the input...
I think in my case this might not actually happen, but I'll try it with attempt or something.
Out of curiosity, is there a way to continue searching even after a parser has failed?
Johannes Maas
@Y0hy0h
I'm sorry, it looks like what I want is actually the last example given in the skip_untildocumentation. :see_no_evil:
Thanks a lot for your swift and concise response! :)
Markus Westerlind
@Marwes

Oh, I'm realizing in general this is a bad idea, because a legitimate end could begin in the middle what we read before endfailed. E.g., if end matches aab and we parse aaab, then it would consume aa, fail on the next a and then continue at that a and next encounter the b and still fail, whereas there is an aabin the input...

Yeah, I was going to mention this problem though from the angle of skip_until not doing anything clever around at and therefore it can be quite slow if end can match a long string

It was late though so I didn't have the time
JackFly26
@JackFly26
why does combine::parser::char::digit return a Digit instead of an opaque type?
oh no the last message was july 11
dhgelling
@dhgelling
Hey, I have a simple parse defined, but am wondering what the simplest way to read from a file is. I'm only using parsers from combine::parser::repeat and combine::parser::char
Lloyd
@lloydmeta

I've been using Combine for AoC for a while, and something that still stumps me is how to parse something like a <Vec<Vec<A>> where the individual A in the text to be parsed are separated by newlines, and each Vec<A> are separated by two newlines..

e.g.

abc

a
b
c

ab
ac

a
a
a
a

b

Should be parsed into

vec![
  vec![ "abc"],
  vec![ "a", "b", "c"],
  vec![ "ab", "ac"],
  vec![ "a", "a", "a", "a"],
  vec!["b"]
]

For AoC, since it doesn't really matter, I end up splitting the input by \n\n first and parsing, silently throwing out invalid groups, but this feels ugly and is probably not how Combine is meant to be used, so wondering if someone can help direct me to The Right Way :tm: :)

https://github.com/lloydmeta/aoc2020-rs/blob/227f84d1a412715b6dd67bd84bcb5024bf6d83e1/src/day_06.rs#L75-L89

Markus Westerlind
@Marwes
@lloydmeta Since this involves two tokens '\n\n' you will want to first look at https://docs.rs/combine/4.4.0/combine/parser/combinator/fn.attempt.html then since you basically want to stop the inner parser you would look at https://docs.rs/combine/4.4.0/combine/fn.not_followed_by.html (or a plain satisfy() would work here as well)
let person_answers_parser = many::<String, _, _>(letter()).map(PersonAnswers);
                let group_people_answers_parser =
                    sep_by1(person_answers_parser, (newline(), not_followed_by(newline()))).map(GroupAnswers)
let parser = sep_by1(group_people_answers_parser,  (newline(), newline()));
Lloyd
@lloydmeta
aah ok thanks @Marwes will give that a go
dhgelling
@dhgelling
I want to use combine for an advent of code solution, but need mutually recursive parsers. Is there a way to define parsers using functions without everything being cluttered up by where clauses?
I just want to parse from a string anyway
dhgelling
@dhgelling
oh also, I tend use sep_by1 to parse input separated by newline, but often there is a final newline at the end of the document. What is the best way to handle that?
ah never mind, I thought sep_end_by would force the separator to be at the end, but turns out it doesn't
Markus Westerlind
@Marwes

I want to use combine for an advent of code solution, but need mutually recursive parsers. Is there a way to define parsers using functions without everything being cluttered up by where clauses?

I suppose you could do it like combine-language and declare a struct parameterized by the input and then declare the required where bound on the impl https://github.com/Marwes/combine-language/blob/873c7f1aa977731a87e29fd8ced8ce48b589dcb1/src/lib.rs#L336-L340

The where clause has to go somewhere though
dhgelling
@dhgelling

Thanks =) I'm trying to use the not_followed_by construct you suggested above, but it's not accepting my input. the parser looks like this:

let rule = some_parser_not_accepting_newline;
let rules = sep_by1(rule, (newline(), not_followed_by(newline())));
let file = (rules.skip((newline(), newline())), string("somestring"));

but it fails on the empty line with the message

Error while parsing input: Parse error at line: 7, column: 1
Unexpected `
`

Some debug prints show that it's failing in the separator of the sep_by1, but I don't know how to fix it

Markus Westerlind
@Marwes
Might need an attempt in there perhaps

let rules = sep_by1(rule, attempt((newline(), not_followed_by(newline()))));
Since the separate ends up committing the first newline
dhgelling
@dhgelling
yeah it works wrapping the separator in attempt(), but I'm not clear on why that's needed. If my separator is string("next") and the next input is new instead, would it consume the first two characters anyway?
hmm yes it seems so, guess in my mind the separator was automatically wrapped in attempt() anyway, since it might fail at the end of the sequence
eaglgenes101
@eaglgenes101
Is it okay to reuse combine parsers?
And why do they take &mut self for their methods anyways?
Markus Westerlind
@Marwes

Is it okay to reuse combine parsers?

Yes

And why do they take &mut self for their methods anyways?

It allows them to take FnMut functions so that they can mutate things (say push to a Vec for the many parser)

eaglgenes101
@eaglgenes101
Okay good experience, but trying to make spans work was a pain
Im just interested in the location of the token where a parser began parsing, and of the token it either committed a success or ran into an error
Seems severely burdensome to have to replace half the types in my function signatures with spanned analogs just to get those
eaglgenes101
@eaglgenes101
(Also, I tried to implement the combinators myself. What's with all the parse_mode stuff in the library combinators, and do I need to concern myself with those?)
marwes
@marwes:matrix.org
[m]

Seems severely burdensome to have to replace half the types in my function signatures with spanned analogs just to get those

If you need the default errors to know about spans then I am afraid you need a custom stream that knows about spans (the stream would also tokenize so I'd expect a custom would be needed regardless). If you only need the span of a particular parser you can do something like (position(), my_parser, position()) to get the start end end of it

(Also, I tried to implement the combinators myself. What's with all the parse_mode stuff in the library combinators, and do I need to concern myself with those?)

The parse_mode stuff is quite a mess coming in from the outside but it necessary to support parsing incomplete/partial input. If you need a custom parser I'd recommend copying a parser as a base from the library itself or using the parsers in https://docs.rs/combine/4.6.0/combine/parser/function/index.html as they let you invoke all the parse machinery manually while still only needing to write one function

eaglgenes101
@eaglgenes101
And what's the idea of add_committed_expected_error? The implementations I could find in the library mostly just forward it or implement the method much like add_error
eaglgenes101
@eaglgenes101
Right now I'm trying to understand the choice combinator to see if I can implement one that has different error-merging behavior, returning the error corresponding to the longest matching alternative instead of (from what I can tell) the error the last alternative returns
eaglgenes101
@eaglgenes101
...and nevermind, that's already what the choice combinator does for errors. Argh.
eaglgenes101
@eaglgenes101
Okay, I finally figured out the issue I was having-- many of the individual parsers set the expected set in the errors, and the combinators kind of let this happen. Ideally, the many combinator and friends would make it so that setting the expected set of the wrapped combinator merely causes it to be added to the final combinator rather than completely overwriting it.
I assume this is a bit of an oversight regarding how different little parts of combine interact when put together
eaglgenes101
@eaglgenes101
As a workaround, I created my own combinator that, when an error occurs, holds onto a tracked error's value and just merges the inner combinator's error contribution on top
eaglgenes101
@eaglgenes101
Should I file an issue about this?
marwes
@marwes:matrix.org
[m]
Sure, though it may be difficult to change without breaking other assumptions
Ed Page
@epage

Wondering if someone can help me with a change I'm making to toml_edit. I'm updating the parser to handle un-escaped quotes right before/after triple-quotes. I'm assuming I need to wrap stuff with attempt but didn't get anywhere when sprinkling it around

I've updated toml_edits parser
https://github.com/ordian/toml_edit/pull/125/commits/37c3d8c82d38333dcbfdfe8dea8f18f653b4197c
(core is ml_basic_body and ml_literal_body, with ml_literal_body being the easier-to-read one)

to reflect the ABNF grammar
https://github.com/toml-lang/toml/blob/master/toml.abnf

To handle test cases like:

lit_one = ''''one quote''''
lit_two = '''''two quotes'''''
lit_one_space = ''' 'one quote' '''
lit_two_space = ''' ''two quotes'' '''

https://github.com/BurntSushi/toml-test/blob/master/tests/valid/string/multiline-quotes.toml

And get errors like

 thread 'parser::tests::values' panicked at 'Unexpected error for "'''I [dw]on't need \\d{2} apples'''": Parse error at line: 1, column: 35
Unexpected `end of input`
Expected `'''`
While parsing a Multiline Literal String

https://github.com/ordian/toml_edit/pull/125/checks?check_run_id=3414324671

Ed Page
@epage

Also, anyone have experience optimizing combine parsers? toml_edit takes twice as long to parse as toml-rs and all of that is wtihin the parsing and the deep stacks are making it harder to differentiate what is costing us.

This is part of the effort to switch cargo from toml-rs to toml_edit so we can mainline cargo-add.

marwes
@marwes:matrix.org
[m]
I'd look into parsing &[u8] instead of &str as the char decoding takes a fair bit of time
https://docs.rs/combine/4.6.1/combine/macro.dispatch.html may be useful in some really hot locations where there are a lot of alternative parsers