Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
idl0r
@idl0r:matrix.org
[m]
i struggle a bit since i'm new to rust as well as nom and PEG like parsing
and the documentation is somewhat tricky for beginners :P
idl0r
@idl0r:matrix.org
[m]
so if i have a nom parser, let's say a JSON parser, can i access some particular values easily afterwards. like i need one specific string from a 1000 line JSON file. or would i have to iterate over a vector or hash and try finding that one particular?
2 replies
xiretza
@xiretza:xiretza.xyz
[m]
have you considered using a JSON parser to parse JSON? nom is really only useful for custom protocols/formats
idl0r
@idl0r:matrix.org
[m]
it's not JSON. it was just meant as an example
xiretza
@xiretza:xiretza.xyz
[m]
you'd parse the data into a structure that makes such lookups easy
idl0r
@idl0r:matrix.org
[m]
https://www.juniper.net/documentation/us/en/software/junos/junos-xml-protocol/topics/concept/junos-xml-protocol-configuration-mapping-to-json.html those "CLI Configuration Statements" is what the format / config looks like. and i need to access e.g. routing-options->static->route 1.2.3.4/5->next-hop
3 replies
it's neither JSON nor XML
like i said, it's those "CLI Configuration Statements" ones
a basic example would look like: http://dpaste.com/6J4MA4SNA
cheako
@cheako:matrix.org
[m]
That reminds me of bind config files, but it's been a long time. Perhaps a RecDescent parser would work better?
xiretza
@xiretza:xiretza.xyz
[m]
isn't nom recursive descent?
idl0r
@idl0r:matrix.org
[m]
i usually wrote stuff like that by hand when i did perl/python/ruby/c or something like that but i have my issues with being new to rust and esp. the borrow stuff and was thinking using something like nom or pest might be easier but nom/pest is like learning another language :D
xiretza
@xiretza:xiretza.xyz
[m]
god these are some weird data structures
can't you just export it as JSON?
idl0r
@idl0r:matrix.org
[m]
yes and no. there are commands to do so but the config isn't read from their shell so that's not an option
i wouldn't even mind if it treats all those as strings for now
xiretza
@xiretza:xiretza.xyz
[m]
well that's easy then, this should do it:
enum Statement {
    Leaf {
        values: Vec<String>,
    },
    Container {
        children: HashMap<String, Statement>,
    },
}
idl0r
@idl0r:matrix.org
[m]
i wanted to do something like that initially when i wanted to parse it on my own and also having prev/next pointer for navigating back/forth into those containers but that was more frustrating because i always stumbled into issues with borrowing and having only one mutable :/
xiretza
@xiretza:xiretza.xyz
[m]
you really don't want doubly linked lists in rust, no
idl0r
@idl0r:matrix.org
[m]
yeah, just need to figure out the rust way to do that :D
idl0r
@idl0r:matrix.org
[m]
xiretza
@xiretza:xiretza.xyz
[m]
I've heard good things about it
idl0r
@idl0r:matrix.org
[m]
holy.. i'm rather shocked. doing it the safe way seems very complicated
i hope i'll get used to rust one day
tanriol
@tanriol:matrix.org
[m]
Building new nontrivial data containers is something that's inherently difficult, and Rust makes you see and acknowledge the difficulty. Do you really need that? A vector usually works as well, or possibly better due to better cache locality.
idl0r
@idl0r:matrix.org
[m]
well, i'm not sure if i really need that with rust. just trying to figure out how to parse and access something like http://dpaste.com/6J4MA4SNA afterwads, like conf["protocols"]["bgp"]["group"]["someisp"]["neighbor"]
and the recursion was the most difficult and frustrating part at the moment
tanriol
@tanriol:matrix.org
[m]
I'm not sure whether one can parse that without understanding the specific keywords at all...
I'd try parsing that into something along the lines of serde_json::Value
idl0r
@idl0r:matrix.org
[m]
yeah, i was playing with it but failed again with recursion, borrow and mutability
like when i parse that line by line and come to e.g. that system {} block, i have to keep in mind that i am in the system block right now and append stuff like that "host-name" to it but then comes the user block which i need to append to it as well but that block has even more so i need to know i am in system.user and at the and i have to go back to system. and then i somehow have to get specific values/locations after i have parsed anything, like the neighbour thingy i mentioned above
i thought i can (more) easily do that with nom, pest or that serde stuff but i failed again and again, at least for now :P
cheako
@cheako:matrix.org
[m]
I think for nom, the current state is held in the instruction pointer... If you need to keep track of lines, do so by including how many lines an atom crosses.
viktor-grunwaldt
@viktor-grunwaldt

Hi, I've been writing a parser for aoc 2021 day 16 and need nom::multi::length_value , butit seems that InputTake is not implemented for bit array representation (as (&'a [u8], usize)). Should I just use take and fiddle around with inputs or is there an elegant solution?
here's the relevant part of code if it would help:

type BitInput<'a> = (&'a [u8], usize);

(...)

fn p_operator_by_bitlen(input: BitInput) -> IResult<BitInput, Vec<PacketContainer>> {
    preceded(
        tag(0b1, 1u8),
        length_value(
            take::<_,u16,usize,_>(15), 
            many0(p_packet_container)
    ))(input)
}
1 reply
jeremy
@jeremy:cyborgman.co
[m]

Is there a way to use escaped_transform() in a &[u8]? I am trying to parse a sequence of bytes and remove some escaped characters. I can get it to work when using String and &str, but not with a slice of u8

fn remove_escape(input: &[u8])->IResult<&[u8], Vec<u8>>{
    escaped_transform(
        is_not([0x7D]),
        0x7D as char,
        value(0x69, is_a([0x7D]))
    )(input)
}

In this example 0x7D denotes an escaped character, if the subsequent character is also a 0x7D I want to transform it to a 0x69

xiretza
@xiretza:xiretza.xyz
[m]
in what way does it not work?
jeremy
@jeremy:cyborgman.co
[m]
Sorry, thats key information :), it does not compile:
xiretza
@xiretza:xiretza.xyz
[m]
value(&[0x69], is_a([0x7D])), maybe?
jeremy
@jeremy:cyborgman.co
[m]
nope, ive tried playing with moving between a u8, [u8], and &[u8] and it doesnt compile
xiretza
@xiretza:xiretza.xyz
[m]
ah, that's a &[u8; 1], value([0x69].as_slice(), is_a([0x7D])) then
jeremy
@jeremy:cyborgman.co
[m]
thank you!!!!!! that worked
xiretza
@xiretza:xiretza.xyz
[m]
*[0x69] or maybe &*[0x69] should work too
Zacchary Dempsey-Plante
@zedseven

Hi, I'm new to nom and I'm having some issues with implementing something that the project README lists as an example use case - TLV parsing.

The specific type of TLV I'm working with is EMV tag-length-value, where the tag is variable-length. The first byte of the tag is special - it indicates there are more bytes to follow if the last 5 bits are 1. All remaining bytes of the tag indicate this by having their first bit as 1.
I'm not sure how to take 1 byte, inspect its value, and take_while afterwards depending on the value of the first take. I'm inclined to write this part in pure Rust, but that feels like I'm missing the point when nom functions like take_while exist.

Even if I did write it in pure Rust, I'm not sure how to return a custom error if, for example, the first byte indicated there were more to come and it was at the EOF.

I'm sure these are trivial issues and I'm overthinking things, but I'd appreciate some guidance if possible.

Side-note: the README lists TLV as one of the common patterns nom is great for, but I can't find any examples for this anywhere.

tanriol
@tanriol:matrix.org
[m]
IMHO, it's perfectly okay to write something (in your case, tag) in pure Rust if this specific parser is easier done in it. If you wanted to enforce nom-style, I'd suggest something like alt((single_byte_tag, multi_byte_tag)), where the single-byte and multi-byte parsers check that the bits in question are set or not set.
Zacchary Dempsey-Plante
@zedseven
Ah, thank you. I think that's what I was stuck on. Is there a 'proper' way to return an error in those functions, that remains useful further up the chain? I've seen nom::Errand the different enumerations of it, but is the 'standard' way to just create a VerboseErrormanually? Apologies if I'm missing something here - I'm not at my computer at the moment, so I'm working from memory.
tanriol
@tanriol:matrix.org
[m]
Depends on your idea of useful errors. May be just choosing the "right" ErrorKind, or adding context in the key places, or creating your own error and implementing the nom traits for it.
Not sure about VerboseError, haven't used them yet.