by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Michael Dale Long
@nikarul
Hello, I'm trying to write a parser that needs to parse Python-style variable names ([A-Za-z0-9_] characters, except the first character cannot be a number). I thought I would start by writing a custom version of nom::character::complete::alphanumeric1 that also accepts underscores, but when I tried copying it to my code and modifying it, I found that nom::traits is private. Any suggestions on the idiomatic way to do this with Nom?
Denis Lisov
@tanriol
IIRC, the traits are reexported at top level, aren't they?
Michael Dale Long
@nikarul
They are indeed. Thank you!
Plecra
@Plecra
Is there a variant of preceded that includes both parts in the result?
/ tuple that returns a single slice
oo, recognize looks like it'll help
Plecra
@Plecra
Is there an anybyte equivalent?
I want to simplify this
fn ident(s: &[u8]) -> IResult<&[u8], &[u8]> {
    recognize(preceded(
        |i| if let Some(&l) = s.get(0) {
            if letter(l) {
                Ok((&s[1..], ()))
            } else {
                Err(Err::Error(ParseError::from_error_kind(s, ErrorKind::OneOf)))
            }
        } else {
            Err(Err::Incomplete(Needed::Size(1)))
        },
        take_while(|b| matches!(b, b'A'..=b'Z' | b'a'..=b'z' | b'_' | b'0'..=b'9'))
    ))(s)
}
Plecra
@Plecra
fn anybyte(s: &[u8]) -> IResult<&[u8], u8> {
    s.split_first()
        .map(|(&b, rest)| (rest, b))
        .ok_or(Err::Incomplete(Needed::Size(1)))
}
June Wunder
@junewunder
Hello! I'm having a problem with creating a custom errorkind and parser completness, in the docs for alt it says "with a custom error type, it is possible to have alt return the error of the parser that went the farthest in the input data" but there seems to not be any documentation on how to do this
When all of my alt cases fail, then I'm getting an Incomplete, not an error. How would I get my desired behaivior?
Here is the exact code I care about:
let (input, defs) = complete(many1(annotated_terminal(alt((
        p_fn_named,
        p_prim,
        p_struct,
        p_enum,
    )))))(input)?;
I'm trying to parse many top-level definitions, and if the whole file is not parsed, then I want the error from why it wasn't able to parse
Thank you for any help!
Denis Lisov
@tanriol
@junewunder I think you need something like terminated(many1(complete(cut(alt((...))))), eof)
June Wunder
@junewunder
oh interesting!! okay lemme look into that real quick
and btw! I've been using nom to make a mixfix parser library, is there any interest in building tokenstream functionality into nom?
my idea is that we'd have nom::tokens just like nom::bytes with some combinators to do with taking tokens from a tokenstream
I also could just build it out into its own library which might be a better idea
Denis Lisov
@tanriol
May be interesting, but I'm not a nom developer :-)
June Wunder
@junewunder
ah rad! thank you!
Sven Thiele
@sthiele
Hi, I need to parse an expression which can be either an int literal or a float literal alt((float_literal, int_literal)).
So far I've been using number::complete::double to parse the float literal, but this caused trouble because double also accepts integers double("1") == 1.0. Is there an easy way to fix this?
Restioson
@Restioson
can you swap the order so its alt((int_literal, float_literal)) ?
does that make a diff
Sven Thiele
@sthiele
Then I get errors for something like "1.1)" expected ')' found '.'
Denis Lisov
@tanriol
I'd say int_literal should go first and have some lookahead assertion like "not followed immediately with ., e or E"
Sven Thiele
@sthiele
Then I get problems with ranges "1..2"
Restioson
@Restioson
Also, is there any way to return an ErrorKind for a custom parser thingie
fn parse_date(input: &str) -> IResult<&str, DateTime<Utc>> {
    DateTime::parse_from_str(input, "%F")
        .map_err(|_| nom::Err::Error((input, /* here */)))
}
want to adapt this chrono parser so i dont have to DIY it. the %f may become more complex
Sven
@SirWindfield

I have some problems and I do not really understand what is wrong with my code:

use nom::IResult;
use crate::types::JavaVersion;
use nom::bits::complete::take;
use crate::constants::MAGIC_NUMBER;
use nom::error::context;
use crate::constants::sizes::{MINOR_SIZE, MAJOR_SIZE}; // both are assigned a value of 2 respectively
use nom::sequence::tuple;
use nom::combinator::map;

pub fn parse_version_info(i: &[u8]) -> IResult<&[u8], JavaVersion> {
    map(
        tuple((
            take(MINOR_SIZE),
            take(MAJOR_SIZE)
            )),
        |(minor, major)| {
            JavaVersion {
                major,
                minor,
            }
        }
    )(i)
}

#[cfg(test)]
mod tests {
    use crate::parser::{parse_class_file, parse_version_info};
    use byteorder::{WriteBytesExt, BigEndian};

    fn int2bytes(i: i32) -> Vec<u8> {
        let mut bs = [0u8; std::mem::size_of::<i32>()];
        bs.as_mut()
            .write_i32::<BigEndian>(i)
            .expect("Failed to convert to byte array");

        bs.into_vec()
    }

    #[test]
    fn test_version_info() {
        let i = 0x3200;
        let i = int2bytes(i).as_slice();

        let version = parse_version_info(i);
    }
}
|

I get the following error:

error[E0308]: mismatched types
  --> java-class-file\src\parser.rs:22:7
   |
22 |     )(i)
   |       ^ expected tuple, found `&[u8]`
   |
   = note:  expected tuple `(_, usize)`
           found reference `&[u8]`

error[E0308]: mismatched types
  --> java-class-file\src\parser.rs:11:5
   |
11 | /     map(
12 | |         tuple((
13 | |             take(MINOR_SIZE),
14 | |             take(MAJOR_SIZE)
...  |
21 | |         }
22 | |     )(i)
   | |________^ expected `&[u8]`, found tuple
   |
   = note: expected enum `std::result::Result<(&[u8], types::JavaVersion), nom::internal::Err<(&[u8], nom::error::ErrorKind)>>`
              found enum `std::result::Result<((_, usize), types::JavaVersion), nom::internal::Err<_>>`

error: aborting due to 2 previous errors
I have no clue what I am doing wrong. Why exactly is it expected that the return type is a tuple here?
I want to return a IResult holding the rest of the u8 slice plus a JavaVersion that I parsed earlier
Denis Lisov
@tanriol
@SirWindfield Bit-level parsers require as input (and return as remaining input) (&[u8], usize) (usually), not just &[u8], because they have to keep track of the current bit position too.
Ignacio Monzalvo
@imonzalvo

Hi there! A newbie from Uruguay here.
I've been working on a modsecurity parser for some weeks and I'm really glad I chose nom for this project. But I'm quite stucked right now.

I want this parser to be able to parse comments. This should be a very simple task, just reading from '#' until EOL.
What is the recommended way to achieve this?

Denis Lisov
@tanriol
I'd say tuple((tag("#"), is_not("\r\n"))) or something like that?..
Sven
@SirWindfield
Hey! I have a custom binary file that has tag bytes that denote the type of the next value. In total there are 13 different tag bytes and depending on the tag byte, I need to read in a different number of bytes afterwards. How would I do this in nom with just combining parsers? My current approach is to read in the byte (call take), and match over that one single byte, returning different parsers depending on the value it has.
The goal is to ditch that first byte read and somehow combine it with the match statement to create one single parser only.
Denis Lisov
@tanriol
@SirWindfield You can do that with alt and tags in every branch if you wish.
Sven
@SirWindfield
@tanriol I kind of did that actually. Is there a tag specific to bytes? Or do I have to use tag("\x01") or tag([0x01]) for the time being?
On a side note, is there a way to tell nom to only match the next N bytes on a parser? The binary file contains some kind of variable table that has its size declared using two bytes at the beginning. So I only want to apply my tag byte logic until those N bytes are passed. Afterwards other sub-parsers should take over.
Denis Lisov
@tanriol
You can take N bytes and use map_parser on them
Ryan Faulhaber
@rfaulhaber

hey folks, I'm trying to write an s-expression parser and I have a function that looks like this:

    delimited(
        char('('),
        list_content,
        preceded(multispace0, char(')')),
    )(input)

I want it to ignore whitespace and apparently that's causing issues. this function fails when called with the input ( 123 456 ) but doesn't when the input is ( 123 456 ) (note the extra space at the end!). any obvious reason why this might be? the error I get is: Err(Error(("", Char)))

Sven
@SirWindfield

Quick question regarding nom: I have a binary file that declares its version at the beginning of the file. I use a tuple parser to parse the various sectors, so each subparser takes the bytes it'll need. The thing is, one of my last parsers needs the version information from the parser that runs at first.

How would I design around this? The bytes that reach my last parser do not contain the version information anymore, and since I map on the tuple results, I do only get the version after all parsers have run.

I'm not sure if putting all my nom functions into a struct and just set the version inside the struct as a field is a clean nom-esque way to do it. It certainly works and I settled for something similar to this, but I'd like to know if there is a better way.

Here is the code I use for parsing:

map(
        tuple((
            context("Magic number", parse_magic_number),             // 0xcafebabe
            // Peek needed to make version available in CP parsing.
            context("Version", peek(parse_version)),     // Peeking keeps the version bytes intact, meaning I simply just call `parse_version` inside of `parse_cp` again, giving me access to the version information.
            context("Constant pool", parse_cp),     // here I need the version information,
        )),
        |(_, version)| ClassFile { version },
    )(i)

The above only works because the version is needed in the next parser. But how would I approach it if I need the version 5 parsers down the line?

Denis Lisov
@tanriol
How about something like
let (input, version) = parse_version(input)?;
// ....
let (input, constant_pool) = parse_cp(input, version)?;
IMHO, one big feature of nom 5 is that one can write it in the imperative way easily and reuse all the normal ways of writing Rust code.
Sven
@SirWindfield
lol. ye I think my mindset just wants to have those big parser blocks :D
Andrew
@apullin
Is there an easy, off-the-shelf parser that will split up a string by spaces, except treat quoted strings as whole literals without splitting?
Sven Thiele
@sthiele
I have to parse a language that can contain comments that start with '%' and extend to the end of line everywhere. Is there a good pattern how to deal with something like this?
Nick Overdijk
@NickNick
@sthiele I made my own parser that consumes input to the end of the line
YMMV. Notice the trim, etc. :)
Sven Thiele
@sthiele
@NickNick thx . I was wondering about how one can intersperse another parser with such a comment parser similar to ws