by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Virgile Andreani
@Armavica
I see
However I am in fact scanning for all such structures in the input, so even if the attacker can insert bad cases, I think I should be able to still find the good ones, which is what matters in my case.
Denis Lisov
@tanriol
I'd probably not bet on that - reading the same data in different ways (or, for example, different instances of this structure) is one of the frequent sources of security vulnerabilities.
Returning to the original question, I'd take the specification, the expected data structure and find some sequence of 4+ bytes that's known ahead of time so that I could seek to it with take_until
Restioson
@Restioson
hi
is there any way to have alt((a, b)) not return (&str, &str)
this is what i have
45 | pub fn alt<I: Clone, O, E: ParseError<I>, List: Alt<I, O, E>>(l: List) -> impl Fn(I) -> IResult<I, O, E> {
   |                                                 ------------ required by this bound in `nom::branch::alt`
   |
   = note: expected enum `std::result::Result<(&str, &str), nom::internal::Err<(&str, nom::error::ErrorKind)>>`
              found enum `std::result::Result<(&str, screen::settings::administration::parse_search::Criteria<'_>), nom::internal::Err<(&str, nom::error::ErrorKind)>>`
i think the issue is the two combinators return diff types, do they each need to return Either or something
Denis Lisov
@tanriol
All the alt branches need to return the same type. For me it's usually an enum.
Restioson
@Restioson
ok thx
Michael Dale Long
@nikarul
Hello, I'm trying to write a parser that needs to parse Python-style variable names ([A-Za-z0-9_] characters, except the first character cannot be a number). I thought I would start by writing a custom version of nom::character::complete::alphanumeric1 that also accepts underscores, but when I tried copying it to my code and modifying it, I found that nom::traits is private. Any suggestions on the idiomatic way to do this with Nom?
Denis Lisov
@tanriol
IIRC, the traits are reexported at top level, aren't they?
Michael Dale Long
@nikarul
They are indeed. Thank you!
Plecra
@Plecra
Is there a variant of preceded that includes both parts in the result?
/ tuple that returns a single slice
oo, recognize looks like it'll help
Plecra
@Plecra
Is there an anybyte equivalent?
I want to simplify this
fn ident(s: &[u8]) -> IResult<&[u8], &[u8]> {
    recognize(preceded(
        |i| if let Some(&l) = s.get(0) {
            if letter(l) {
                Ok((&s[1..], ()))
            } else {
                Err(Err::Error(ParseError::from_error_kind(s, ErrorKind::OneOf)))
            }
        } else {
            Err(Err::Incomplete(Needed::Size(1)))
        },
        take_while(|b| matches!(b, b'A'..=b'Z' | b'a'..=b'z' | b'_' | b'0'..=b'9'))
    ))(s)
}
Plecra
@Plecra
fn anybyte(s: &[u8]) -> IResult<&[u8], u8> {
    s.split_first()
        .map(|(&b, rest)| (rest, b))
        .ok_or(Err::Incomplete(Needed::Size(1)))
}
June Wunder
@junewunder
Hello! I'm having a problem with creating a custom errorkind and parser completness, in the docs for alt it says "with a custom error type, it is possible to have alt return the error of the parser that went the farthest in the input data" but there seems to not be any documentation on how to do this
When all of my alt cases fail, then I'm getting an Incomplete, not an error. How would I get my desired behaivior?
Here is the exact code I care about:
let (input, defs) = complete(many1(annotated_terminal(alt((
        p_fn_named,
        p_prim,
        p_struct,
        p_enum,
    )))))(input)?;
I'm trying to parse many top-level definitions, and if the whole file is not parsed, then I want the error from why it wasn't able to parse
Thank you for any help!
Denis Lisov
@tanriol
@junewunder I think you need something like terminated(many1(complete(cut(alt((...))))), eof)
June Wunder
@junewunder
oh interesting!! okay lemme look into that real quick
and btw! I've been using nom to make a mixfix parser library, is there any interest in building tokenstream functionality into nom?
my idea is that we'd have nom::tokens just like nom::bytes with some combinators to do with taking tokens from a tokenstream
I also could just build it out into its own library which might be a better idea
Denis Lisov
@tanriol
May be interesting, but I'm not a nom developer :-)
June Wunder
@junewunder
ah rad! thank you!
Sven Thiele
@sthiele
Hi, I need to parse an expression which can be either an int literal or a float literal alt((float_literal, int_literal)).
So far I've been using number::complete::double to parse the float literal, but this caused trouble because double also accepts integers double("1") == 1.0. Is there an easy way to fix this?
Restioson
@Restioson
can you swap the order so its alt((int_literal, float_literal)) ?
does that make a diff
Sven Thiele
@sthiele
Then I get errors for something like "1.1)" expected ')' found '.'
Denis Lisov
@tanriol
I'd say int_literal should go first and have some lookahead assertion like "not followed immediately with ., e or E"
Sven Thiele
@sthiele
Then I get problems with ranges "1..2"
Restioson
@Restioson
Also, is there any way to return an ErrorKind for a custom parser thingie
fn parse_date(input: &str) -> IResult<&str, DateTime<Utc>> {
    DateTime::parse_from_str(input, "%F")
        .map_err(|_| nom::Err::Error((input, /* here */)))
}
want to adapt this chrono parser so i dont have to DIY it. the %f may become more complex
Sven
@SirWindfield

I have some problems and I do not really understand what is wrong with my code:

use nom::IResult;
use crate::types::JavaVersion;
use nom::bits::complete::take;
use crate::constants::MAGIC_NUMBER;
use nom::error::context;
use crate::constants::sizes::{MINOR_SIZE, MAJOR_SIZE}; // both are assigned a value of 2 respectively
use nom::sequence::tuple;
use nom::combinator::map;

pub fn parse_version_info(i: &[u8]) -> IResult<&[u8], JavaVersion> {
    map(
        tuple((
            take(MINOR_SIZE),
            take(MAJOR_SIZE)
            )),
        |(minor, major)| {
            JavaVersion {
                major,
                minor,
            }
        }
    )(i)
}

#[cfg(test)]
mod tests {
    use crate::parser::{parse_class_file, parse_version_info};
    use byteorder::{WriteBytesExt, BigEndian};

    fn int2bytes(i: i32) -> Vec<u8> {
        let mut bs = [0u8; std::mem::size_of::<i32>()];
        bs.as_mut()
            .write_i32::<BigEndian>(i)
            .expect("Failed to convert to byte array");

        bs.into_vec()
    }

    #[test]
    fn test_version_info() {
        let i = 0x3200;
        let i = int2bytes(i).as_slice();

        let version = parse_version_info(i);
    }
}
|

I get the following error:

error[E0308]: mismatched types
  --> java-class-file\src\parser.rs:22:7
   |
22 |     )(i)
   |       ^ expected tuple, found `&[u8]`
   |
   = note:  expected tuple `(_, usize)`
           found reference `&[u8]`

error[E0308]: mismatched types
  --> java-class-file\src\parser.rs:11:5
   |
11 | /     map(
12 | |         tuple((
13 | |             take(MINOR_SIZE),
14 | |             take(MAJOR_SIZE)
...  |
21 | |         }
22 | |     )(i)
   | |________^ expected `&[u8]`, found tuple
   |
   = note: expected enum `std::result::Result<(&[u8], types::JavaVersion), nom::internal::Err<(&[u8], nom::error::ErrorKind)>>`
              found enum `std::result::Result<((_, usize), types::JavaVersion), nom::internal::Err<_>>`

error: aborting due to 2 previous errors
I have no clue what I am doing wrong. Why exactly is it expected that the return type is a tuple here?
I want to return a IResult holding the rest of the u8 slice plus a JavaVersion that I parsed earlier
Denis Lisov
@tanriol
@SirWindfield Bit-level parsers require as input (and return as remaining input) (&[u8], usize) (usually), not just &[u8], because they have to keep track of the current bit position too.
Ignacio Monzalvo
@imonzalvo

Hi there! A newbie from Uruguay here.
I've been working on a modsecurity parser for some weeks and I'm really glad I chose nom for this project. But I'm quite stucked right now.

I want this parser to be able to parse comments. This should be a very simple task, just reading from '#' until EOL.
What is the recommended way to achieve this?

Denis Lisov
@tanriol
I'd say tuple((tag("#"), is_not("\r\n"))) or something like that?..
Sven
@SirWindfield
Hey! I have a custom binary file that has tag bytes that denote the type of the next value. In total there are 13 different tag bytes and depending on the tag byte, I need to read in a different number of bytes afterwards. How would I do this in nom with just combining parsers? My current approach is to read in the byte (call take), and match over that one single byte, returning different parsers depending on the value it has.
The goal is to ditch that first byte read and somehow combine it with the match statement to create one single parser only.
Denis Lisov
@tanriol
@SirWindfield You can do that with alt and tags in every branch if you wish.
Sven
@SirWindfield
@tanriol I kind of did that actually. Is there a tag specific to bytes? Or do I have to use tag("\x01") or tag([0x01]) for the time being?
On a side note, is there a way to tell nom to only match the next N bytes on a parser? The binary file contains some kind of variable table that has its size declared using two bytes at the beginning. So I only want to apply my tag byte logic until those N bytes are passed. Afterwards other sub-parsers should take over.
Denis Lisov
@tanriol
You can take N bytes and use map_parser on them
Ryan Faulhaber
@rfaulhaber

hey folks, I'm trying to write an s-expression parser and I have a function that looks like this:

    delimited(
        char('('),
        list_content,
        preceded(multispace0, char(')')),
    )(input)

I want it to ignore whitespace and apparently that's causing issues. this function fails when called with the input ( 123 456 ) but doesn't when the input is ( 123 456 ) (note the extra space at the end!). any obvious reason why this might be? the error I get is: Err(Error(("", Char)))