by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Denis Lisov
@tanriol
I'd not be so sure. Extra memory copy on the one hand, yes... more complex parsers, more branches and more code bloat on the other.
Virgile Andreani
@Armavica
Hello! I am searching how I can discard the beginning of the input until a given parser applies, is there a way to do that?
Denis Lisov
@tanriol
You can, but there's no easy way... and the performance may be pretty bad. What do you know about the parser that has to work?
Virgile Andreani
@Armavica
Ok, thank you. I am looking for a DER object with der-parser. For now I am trying the parser on each increasing offset of the input, it is indeed pretty inefficent. But maybe I could first search for a pattern in my data and then parse only from here.
Denis Lisov
@tanriol
Sounds a bit dangerous... I don't know this area good enough, so I'm not sure whether you're likely to have false positives.
Virgile Andreani
@Armavica
Why dangerous? I think false positives are fine, because they are going to be invalidated by the parser.
Denis Lisov
@tanriol
That probably depends on whether you know the expected structure ahead of time so that you could compare the one found to the one expected.
If you know it, you're probably safe from the "noise" prefix accidentally containing a matching structure, but, on the other hand, this sounds like a security vulnerability if the attacker could somehow control the "noise" part and inject the bad version of the structure.
Virgile Andreani
@Armavica
I see
However I am in fact scanning for all such structures in the input, so even if the attacker can insert bad cases, I think I should be able to still find the good ones, which is what matters in my case.
Denis Lisov
@tanriol
I'd probably not bet on that - reading the same data in different ways (or, for example, different instances of this structure) is one of the frequent sources of security vulnerabilities.
Returning to the original question, I'd take the specification, the expected data structure and find some sequence of 4+ bytes that's known ahead of time so that I could seek to it with take_until
Restioson
@Restioson
hi
is there any way to have alt((a, b)) not return (&str, &str)
this is what i have
45 | pub fn alt<I: Clone, O, E: ParseError<I>, List: Alt<I, O, E>>(l: List) -> impl Fn(I) -> IResult<I, O, E> {
   |                                                 ------------ required by this bound in `nom::branch::alt`
   |
   = note: expected enum `std::result::Result<(&str, &str), nom::internal::Err<(&str, nom::error::ErrorKind)>>`
              found enum `std::result::Result<(&str, screen::settings::administration::parse_search::Criteria<'_>), nom::internal::Err<(&str, nom::error::ErrorKind)>>`
i think the issue is the two combinators return diff types, do they each need to return Either or something
Denis Lisov
@tanriol
All the alt branches need to return the same type. For me it's usually an enum.
Restioson
@Restioson
ok thx
Michael Dale Long
@nikarul
Hello, I'm trying to write a parser that needs to parse Python-style variable names ([A-Za-z0-9_] characters, except the first character cannot be a number). I thought I would start by writing a custom version of nom::character::complete::alphanumeric1 that also accepts underscores, but when I tried copying it to my code and modifying it, I found that nom::traits is private. Any suggestions on the idiomatic way to do this with Nom?
Denis Lisov
@tanriol
IIRC, the traits are reexported at top level, aren't they?
Michael Dale Long
@nikarul
They are indeed. Thank you!
Plecra
@Plecra
Is there a variant of preceded that includes both parts in the result?
/ tuple that returns a single slice
oo, recognize looks like it'll help
Plecra
@Plecra
Is there an anybyte equivalent?
I want to simplify this
fn ident(s: &[u8]) -> IResult<&[u8], &[u8]> {
    recognize(preceded(
        |i| if let Some(&l) = s.get(0) {
            if letter(l) {
                Ok((&s[1..], ()))
            } else {
                Err(Err::Error(ParseError::from_error_kind(s, ErrorKind::OneOf)))
            }
        } else {
            Err(Err::Incomplete(Needed::Size(1)))
        },
        take_while(|b| matches!(b, b'A'..=b'Z' | b'a'..=b'z' | b'_' | b'0'..=b'9'))
    ))(s)
}
Plecra
@Plecra
fn anybyte(s: &[u8]) -> IResult<&[u8], u8> {
    s.split_first()
        .map(|(&b, rest)| (rest, b))
        .ok_or(Err::Incomplete(Needed::Size(1)))
}
June Wunder
@junewunder
Hello! I'm having a problem with creating a custom errorkind and parser completness, in the docs for alt it says "with a custom error type, it is possible to have alt return the error of the parser that went the farthest in the input data" but there seems to not be any documentation on how to do this
When all of my alt cases fail, then I'm getting an Incomplete, not an error. How would I get my desired behaivior?
Here is the exact code I care about:
let (input, defs) = complete(many1(annotated_terminal(alt((
        p_fn_named,
        p_prim,
        p_struct,
        p_enum,
    )))))(input)?;
I'm trying to parse many top-level definitions, and if the whole file is not parsed, then I want the error from why it wasn't able to parse
Thank you for any help!
Denis Lisov
@tanriol
@junewunder I think you need something like terminated(many1(complete(cut(alt((...))))), eof)
June Wunder
@junewunder
oh interesting!! okay lemme look into that real quick
and btw! I've been using nom to make a mixfix parser library, is there any interest in building tokenstream functionality into nom?
my idea is that we'd have nom::tokens just like nom::bytes with some combinators to do with taking tokens from a tokenstream
I also could just build it out into its own library which might be a better idea
Denis Lisov
@tanriol
May be interesting, but I'm not a nom developer :-)
June Wunder
@junewunder
ah rad! thank you!
Sven Thiele
@sthiele
Hi, I need to parse an expression which can be either an int literal or a float literal alt((float_literal, int_literal)).
So far I've been using number::complete::double to parse the float literal, but this caused trouble because double also accepts integers double("1") == 1.0. Is there an easy way to fix this?
Restioson
@Restioson
can you swap the order so its alt((int_literal, float_literal)) ?
does that make a diff
Sven Thiele
@sthiele
Then I get errors for something like "1.1)" expected ')' found '.'
Denis Lisov
@tanriol
I'd say int_literal should go first and have some lookahead assertion like "not followed immediately with ., e or E"
Sven Thiele
@sthiele
Then I get problems with ranges "1..2"
Restioson
@Restioson
Also, is there any way to return an ErrorKind for a custom parser thingie
fn parse_date(input: &str) -> IResult<&str, DateTime<Utc>> {
    DateTime::parse_from_str(input, "%F")
        .map_err(|_| nom::Err::Error((input, /* here */)))
}
want to adapt this chrono parser so i dont have to DIY it. the %f may become more complex
Sven
@SirWindfield

I have some problems and I do not really understand what is wrong with my code:

use nom::IResult;
use crate::types::JavaVersion;
use nom::bits::complete::take;
use crate::constants::MAGIC_NUMBER;
use nom::error::context;
use crate::constants::sizes::{MINOR_SIZE, MAJOR_SIZE}; // both are assigned a value of 2 respectively
use nom::sequence::tuple;
use nom::combinator::map;

pub fn parse_version_info(i: &[u8]) -> IResult<&[u8], JavaVersion> {
    map(
        tuple((
            take(MINOR_SIZE),
            take(MAJOR_SIZE)
            )),
        |(minor, major)| {
            JavaVersion {
                major,
                minor,
            }
        }
    )(i)
}

#[cfg(test)]
mod tests {
    use crate::parser::{parse_class_file, parse_version_info};
    use byteorder::{WriteBytesExt, BigEndian};

    fn int2bytes(i: i32) -> Vec<u8> {
        let mut bs = [0u8; std::mem::size_of::<i32>()];
        bs.as_mut()
            .write_i32::<BigEndian>(i)
            .expect("Failed to convert to byte array");

        bs.into_vec()
    }

    #[test]
    fn test_version_info() {
        let i = 0x3200;
        let i = int2bytes(i).as_slice();

        let version = parse_version_info(i);
    }
}
|

I get the following error:

error[E0308]: mismatched types
  --> java-class-file\src\parser.rs:22:7
   |
22 |     )(i)
   |       ^ expected tuple, found `&[u8]`
   |
   = note:  expected tuple `(_, usize)`
           found reference `&[u8]`

error[E0308]: mismatched types
  --> java-class-file\src\parser.rs:11:5
   |
11 | /     map(
12 | |         tuple((
13 | |             take(MINOR_SIZE),
14 | |             take(MAJOR_SIZE)
...  |
21 | |         }
22 | |     )(i)
   | |________^ expected `&[u8]`, found tuple
   |
   = note: expected enum `std::result::Result<(&[u8], types::JavaVersion), nom::internal::Err<(&[u8], nom::error::ErrorKind)>>`
              found enum `std::result::Result<((_, usize), types::JavaVersion), nom::internal::Err<_>>`

error: aborting due to 2 previous errors
I have no clue what I am doing wrong. Why exactly is it expected that the return type is a tuple here?
I want to return a IResult holding the rest of the u8 slice plus a JavaVersion that I parsed earlier