Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Ghost
@ghost~56d3703de610378809c41a40

Quick question regarding nom: I have a binary file that declares its version at the beginning of the file. I use a tuple parser to parse the various sectors, so each subparser takes the bytes it'll need. The thing is, one of my last parsers needs the version information from the parser that runs at first.

How would I design around this? The bytes that reach my last parser do not contain the version information anymore, and since I map on the tuple results, I do only get the version after all parsers have run.

I'm not sure if putting all my nom functions into a struct and just set the version inside the struct as a field is a clean nom-esque way to do it. It certainly works and I settled for something similar to this, but I'd like to know if there is a better way.

Here is the code I use for parsing:

map(
        tuple((
            context("Magic number", parse_magic_number),             // 0xcafebabe
            // Peek needed to make version available in CP parsing.
            context("Version", peek(parse_version)),     // Peeking keeps the version bytes intact, meaning I simply just call `parse_version` inside of `parse_cp` again, giving me access to the version information.
            context("Constant pool", parse_cp),     // here I need the version information,
        )),
        |(_, version)| ClassFile { version },
    )(i)

The above only works because the version is needed in the next parser. But how would I approach it if I need the version 5 parsers down the line?

Denis Lisov
@tanriol
How about something like
let (input, version) = parse_version(input)?;
// ....
let (input, constant_pool) = parse_cp(input, version)?;
IMHO, one big feature of nom 5 is that one can write it in the imperative way easily and reuse all the normal ways of writing Rust code.
Ghost
@ghost~56d3703de610378809c41a40
lol. ye I think my mindset just wants to have those big parser blocks :D
Andrew
@apullin
Is there an easy, off-the-shelf parser that will split up a string by spaces, except treat quoted strings as whole literals without splitting?
Sven Thiele
@sthiele
I have to parse a language that can contain comments that start with '%' and extend to the end of line everywhere. Is there a good pattern how to deal with something like this?
Nick Overdijk
@NickNick
@sthiele I made my own parser that consumes input to the end of the line
YMMV. Notice the trim, etc. :)
Sven Thiele
@sthiele
@NickNick thx . I was wondering about how one can intersperse another parser with such a comment parser similar to ws
Nick Overdijk
@NickNick
@sthiele I make a function like 'nom_comment' that matches a comment or not? Then use that with alt or opt or something..
Nick Overdijk
@NickNick
If your nom_comment and nom_other_thing return different things, you can make nom_either by mapping the parser output to an enum containing both.
Sven Thiele
@sthiele
@NickNick thank you! I got it working.
Jason Liquorish
@Bassetts

Hi, I am trying to parse a format that has 48 bytes represening a name that is null terminated. I have done take_str!(48) which works great but includes all the null characters, how can I parse this into a &str and drop all the null characters?

For example I am currently getting

B. SMITH\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}

using just take_str

Denis Lisov
@tanriol
Not sure whether there's any special support for that, especially in the macros.
Jason Liquorish
@Bassetts
I thought terminated with take_until and take_while was what I wanted but that crashed, haven't had a chance to see why it crashed
Jason Liquorish
@Bassetts
Ah, I got it, was so simple I was over thinking it
>> name: take_until!("\0")
>> take!(48 - name.len())
John Barker
@j16r
trying to upgrade this project to nom 0.4, this test is failing: https://github.com/j16r/rust-mailbox/blob/master/src/util/address.rs#L260 and it's not obvious to me why, is there a way to debug the parsers that nom is taking?
John Barker
@j16r
I think it has something to do with the change in eof handling
Denis Lisov
@tanriol
Yes, likely to be related to complete/incomplete handling. By the way, why not to nom 5?
John Barker
@j16r
I figure it might be easier to go through 0.4 first
Denis Lisov
@tanriol
(it's 4.0, not 0.4)
John Barker
@j16r
ahh, yes, 4 first, then 5, still .. figuring out the eof changes, some were easy
Denis Lisov
@tanriol
Note that the complete/incomplete handling changed significantly between nom 4 and nom 5.
John Barker
@j16r
hmm, wonder if it's more intuitive
Denis Lisov
@tanriol
In nom 5 (with functions, not with macros) you use the correct base parsers (ones from the complete or streaming module depending on which handling you need)
The macros are always streaming in nom 5, IIRC.
Sorry, have to go now :-(
John Barker
@j16r
nw, not gotten much further, but I'll try 5 and see if it helps
John Barker
@j16r
ahh, the dbg! in nom doesn't print if the parser succeeds :/
John Barker
@j16r
how might one choose to use the std library dbg! in place of the nom dbg!?
Jason Liquorish
@Bassetts
Is there a combinator that will let me take the result of a parser and the remaining input and pass it to another parser? Something that would work on
let (input, event_string_code) = map_res(take(4usize), std::str::from_utf8)(input)?;
let (input, event_details) = EventDataDetails::parse(input, event_string_code)?;

Ok((
  input,
  PacketEventData {
    event_string_code,
    event_details,
  },
))
EventDataDetails::parse needs both the remaining input and event_string_code but I can only find combinators that either pass the remaining input or the result of the first parser
Jason Liquorish
@Bassetts
No idea if it's the correct thing to do but I changed EventDataDetails::parse to return Fn(&[u8]) and have done
flat_map(
  map_res(take(4usize), std::str::from_utf8),
  |event_string_code| {
    map(
      EventDataDetails::parse(event_string_code),
      |event_details| PacketEventData {
        event_string_code,
        event_details,
      },
    )
  },
)(input)?;
Russ Cam
@forloop_twitter

is there a way to specify take_till! until one of a set of alternatives? For example, something like this:

take_till!(
                alt!(
                    tag!("note: Run with `RUST_BACKTRACE=1` for a backtrace.") |
                    tag!("note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace") |
                    tag!("\n\n") |
                    tag!("\r\n\r\n")
                )
            )

(I'm looking at parsing cargo test output to junit, and am looking at updating https://crates.io/crates/cargo-results to handle additional unhandled cases. Perhaps there's a better way of converting output to junit?)

Denis Lisov
@tanriol
I'd probably just do that manually. Try searching for every string and take the minimal index.
Ghost
@ghost~56d3703de610378809c41a40

Can someone help me out a little bit? I am trying to create a parser that can read multi-line strings.

somekey = some value with spaces but \
     all white-space after the backslash is ignored until the first non-whitespace character

The parser should return some value with spaces but all white-space after the backslash is ignored until the first non-whitespace character. The key-value thing is not the problem and it works fine with one-line values but I struggle to somehow make the backslash parser.

The current parser I use is this one:

fn ws<I, O, E: ParseError<I>, F>(parser: F) -> impl FnMut(I) -> IResult<I, O, E>
where
    I: InputTakeAtPosition,
    <I as InputTakeAtPosition>::Item: AsChar + Clone,
    F: FnMut(I) -> IResult<I, O, E>,
{
    delimited(multispace0, parser, multispace0)
}

fn parse_entry(i: &str) -> IResult<&str, (&str, &str)> {
    separated_pair(
        preceded(multispace0, alphanumeric0),
        ws(tag("=")),
        preceded(multispace0, not_line_ending),
    )(i)
}

Can anyone help me out with this? The output would have to have the single backslash removed, so I guess zero-copy (&str) isn't generally possible as a return type with this kind of problem, is it?

Denis Lisov
@tanriol
Correct, zero-copy won't work in the general case.
I'd probably go for separated_list alternating a slice of non-newline non-backslash and backslash followed by whitespace.
And then concatenate them (possibly into Cow)
Ghost
@ghost~56d3703de610378809c41a40
Thanks @tanriol! After reading the doc of separated_list, would that work in the case where I can have both types of values in any specific order, e.g. values can either be one-line or multi-line. It seems that the separator_list parser alternates, meaning that multiple multi-line values wouldn't work in this case, would they?
key1 = line value
key2 = multi-line \
    line \
   line
key3 = line value again
key4 = again, line value
Denis Lisov
@tanriol
Why does it matter if you use separated_list(...) in place of not_line_ending in parse_entry?
Ghost
@ghost~56d3703de610378809c41a40
ohhh I would have used it somewhere else, that is totally true!
thanks
John Barker
@j16r
is there something like this https://github.com/Geal/nom/blob/master/examples/string.rs more generally available so I don't have to copy and paste a bunch of code?
Tom Alexander
@tomalexander
Hey I've updated my long-dormant PR and I was hoping I could get some eyes on it. No rush, but considering how long it was dormant I wouldn't be surprised if my updates are completely unnoticed: Geal/nom#469
zserik
@zserik
is there any documentation available regarding how streaming parsers differentiate between Incomplete (try to fetch more data) and EOF (end of file condition; reentry after Incomplete, but no more data will ever become available for this particular byte stream , which is important in case the EOF would finish some element which would otherwise continued)? and is there documentation available how to integrate streaming parsers with AsyncRead byte streams`?
Denis Lisov
@tanriol
@zserik AFAIK, they do not differentiate, you build the async integration yourself.
zserik
@zserik
ok, but how are cases handled when both partial input (Incomplete) and full input (EOF at reading) would be possible alternatives, and might yield different results, how should that information be propagated through the parser?