Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
twitchyliquid64
@twitchyliquid64:matrix.org
[m]
If you can represent that state in terms of what function is being called in a stack of functions, then yes!
What i am describing is referred to as 'recusive descent'
Barnaby Keene
@Southclaws
hmm, this may require some thought
I'm writing a shell, that contains different types of "pipe" represented by different syntax
so for example, one pipe type is a jq expression parser, so you can pipe some command that produces JSON into it: curl some.api |< .jq.expression[].etc >| echo so what I'm trying to figure out is essentially some rules around once you hit a |< the only valid two things are 1. capture any text and 2. capture >| as the "closing" tag
I think I can do it with terminated though
twitchyliquid64
@twitchyliquid64:matrix.org
[m]
outside of <| are you trying to accept anything?
or are you parsing everything into tiny units, but you want to parse within the |< differently
Barnaby Keene
@Southclaws
well, the stuff outside is basically any text, I have no other parsing rules for the "commands" or jq strings or other things
at first I did a naive delimiter but that would allow x |< y |< which breaks my grammar rules, I want to enforce that once you hit a |< the only possible way of closing that expression is with a >|
what the "expression" is is just free text, because I don't plan to write the actual parsers for the special pipes with nom - they are effectively just .+ (in regex lingo)
twitchyliquid64
@twitchyliquid64:matrix.org
[m]
yeah so you can totally do this in nom, but its not idomatic to capture free text instead of breaking it down lol
lemme find you an example
Barnaby Keene
@Southclaws
I can't really break it down though, because one of the pipes is a JavaScript expression, so I can't really write an entire JS parser in nom because Deno is doing that bit
but it sounds like I'm on the right track, thanks for helping out!
twitchyliquid64
@twitchyliquid64:matrix.org
[m]
so the basic idea is you need to write two parsers and combine them: One parser that captures any token EXCEPT |<, and one for the inside of your |<>| brace boi
Barnaby Keene
@Southclaws
ah okay, that makes sense!
take_until1 right?
twitchyliquid64
@twitchyliquid64:matrix.org
[m]
I think so, except im not sure if take_until1 only works on individual bytes or you can pass it not(tag("|<"))
either way, you need to make a parser that wont 'accept' (consume) the |< from the input stream
Barnaby Keene
@Southclaws
makes sense, yeah
so "anything" was wrong, it should have been "anything except this closing tag"
thanks for the help, it makes a lot more sense now!
twitchyliquid64
@twitchyliquid64:matrix.org
[m]
exactly
and then your second parser is one that only works if the content is wrapped in your brace boiz, like this
fn parse_tuple(i: &str) -> IResult<&str, AST> {
    let (i, _) = multispace0(i)?;
    let (i, _) = tag("|<")(i)?;

    let (i, elements) = fold_many1(
        tuple((multispace0, parse_bit_of_syntax, multispace0)),
        Vec::new(),
        |mut acc, (_, thing, _)| {
            acc.push(Box::new(thing));
            acc
        },
    )(i)?;
    let (i, _) = tuple((multispace0, tag(">|")))(i)?;

    Ok((i, AST::Tuple { inners: elements }))
}
(You could also use delimited)
And then you combine the two parsers (the first one, and the one for your braces) with alt
Barnaby Keene
@Southclaws
Thanks, that's really useful!
Atmaram Naik
@atmnk

I was wondering if we can have derive macro for AST elements like ```#[derive(Parser,Debug,PartialEq)]

[nom(tuple((ty,tag("=>"),ny,tag("=>"),gy)))]

pub struct Struct {
ty:ty::Type,
ny:ty::Type,
gy:ty::Type
}```

i have made one poc for deriving on simple struct and unit based enums.
i have openend feature request for same.
wanted to check if it would worth efforts?
望将
@CSUwangj
hi, I recently started to try using nom parsing markdown and want to get some idea from pandoc, but found there are functions like notAfterString I found it hard to implement with nom. Was curious if someone might be familiar with pandoc/Haskell or be able to point me in a general direction about the appropriate way to handle emphasis and strong inline?
take an example, a * foo bar* should be parse as <p>a * foo bar*</p> but not <p>a <em>foo bar</em></p>
Sean Carroll
@seancarroll

I'm new to nom and have also been attempting to build a markdown parser via nom as well. I'm sure the following is not entirely correct or commonmark compliant as I'm still attempting to learn but my first initial crack at it was

fn italics(i: &str) -> IResult<&str, String> {
    let (i, _) = tag("*")(i)?;
    let (i, _) = not(tag(" "))(i)?;
    map(
        terminated(many0(is_not("*")), tag("*")),
        |vec| vec.join(""),
    )(i)
}

Based on a few tests this appears to correctly parses *foo bar* while fails on * foo bar*
Would still need a paragraph parser to parse the entire a * foo bar*
I hope this helps in some way. I'm sure someone more experienced can provide a more robust or correct parser. Would benefit me as well :)

4 replies
望将
@CSUwangj
ah I see, that's what I did before, but there are many test cases failed on this simple implementation
I've implemented a parser for fenced code block and passed most test case, FYI it's https://github.com/CSUwangj/pure-markdown/blob/master/src/block/fenced_code.rs#L55
1 reply
Sean Carroll
@seancarroll
I would imagine. its definitely an incomplete implementation.
望将
@CSUwangj
I write a parser combinator take_except which would take bytes until given parser success, but this solution failed on emphasize and strong
Sean Carroll
@seancarroll
as a novice in writing parsers emphasis seems particularly tricky to me especially when attempting to support strong emphasis
twitchyliquid64
@twitchyliquid64:matrix.org
[m]
You might be able to use take_until for line 5 instead of many0 too
twitchyliquid64
@twitchyliquid64:matrix.org
[m]
One last general comment, capturing a ton of bytes inside parsers (ie: terminated(many0(is_not("*")), tag("*")),) is usually frowned upon and rarely what you want for languages which can 'nest' features/syntax in a tree. In this case, you want to recurse back to some parser instead of consuming the bytes.
Sean Carroll
@seancarroll
good advise. i have an inline text parser where I believe i'm doing that, if I understand the recommendation, but didnt think to apply that to emphasis
4 replies
twitchyliquid64
@twitchyliquid64:matrix.org
[m]
By way of example, I have a wierd parser I built for a language which can represent a sequence of <geometry>, but an array of geometry (ie: [4]<geometry>) is also valid. To parse this, I have a nom parser which matches and parses the [4] thing, but then recurses to get the nested parse fragment.
Sean Carroll
@seancarroll
@CSUwangj , while doing some research I stumbled upon this. https://github.com/chipsenkbeil/vimwiki-rs. I thought you might find it helpful
Atmaram Naik
@atmnk
use nom_parser_derive::*;
use nom_parser::Parsable;
use nom_parser::ParseResult;
use nom::combinator::{opt};
use nom::sequence::{tuple};
use nom::bytes::complete::{tag};
use nom::character::complete::{ digit1};
#[derive(Parser,PartialEq,Clone,Debug)]
pub enum BinaryExpression{
    #[nom(tuple(field(0),tag(" + "),field(1)))]
    Add(Number,Number),
    #[nom(tuple(field(0),tag(" - "),field(1)))]
    Subtract(Number,Number),
    #[nom(tuple(field(0),tag(" * "),field(1)))]
    Multiply(Number,Number),
    #[nom(tuple(field(0),tag(" / "),field(1)))]
    Divide(Number,Number)
}
impl BinaryExpression{
    pub fn apply(&self)->Number{
        match self {
            Self::Add(one,two)=>{
                return Number(one.0+two.0)
            },
            Self::Subtract(one,two)=>{
                return Number(one.0-two.0)
            },
            Self::Multiply(one,two)=>{
                return Number(one.0*two.0)
            },
            Self::Divide(one,two)=>{
                return Number(one.0/two.0)
            }
        }
    }
}
#[derive(Parser,PartialEq,Clone,Debug)]
pub struct Number(#[nom(func(parse_isize))]isize);
fn parse_isize<'a>(input:&'a str) ->ParseResult<'a,isize>{
    let mut num = tuple((opt(tag("-")),digit1));
    let (i,(sign,nums)) = num(input)?;
    let str_num=format!("{}",nums);
    let f_num=str_num.parse::<isize>().unwrap();
    if let Some(_)=sign{
        Ok((i,(f_num * -1)))
    } else {
        Ok((i,f_num))
    }
}
fn main() {
    let (_,expr) = BinaryExpression::parse("1 + 2").unwrap();
    assert_eq!(expr,BinaryExpression::Add(Number(1),Number(2)));
    assert_eq!(expr.apply(),Number(3));

    let (_,expr) = BinaryExpression::parse("1 - 2").unwrap();
    assert_eq!(expr,BinaryExpression::Subtract(Number(1),Number(2)));
    assert_eq!(expr.apply(),Number(-1));

    let (_,expr) = BinaryExpression::parse("4 * 2").unwrap();
    assert_eq!(expr,BinaryExpression::Multiply(Number(4),Number(2)));
    assert_eq!(expr.apply(),Number(8));

    let (_,expr) = BinaryExpression::parse("4 / 3").unwrap();
    assert_eq!(expr,BinaryExpression::Divide(Number(4),Number(3)));
    assert_eq!(expr.apply(),Number(1));

}
I was able to come up with derive macro to parse AST, above is simple example.
Atmaram Naik
@atmnk
use nom_parser_derive::*;
use nom_parser::Parsable;
use nom_parser::ParseResult;
use nom::sequence::{tuple};
use nom::branch::alt;
use nom::character::complete::{alpha1, alphanumeric1};
use nom::multi::many0;
use nom::bytes::complete::tag;

#[derive(Parser,PartialEq,Debug,Clone)]
#[nom(tuple(tag("fn"),field(identifier),tag("("),field(args),tag(")"),tag("{"),field(statements),tag("}")))]
pub struct Function{
    identifier:Identifier,
    #[nom(separated_list0(tag(","),field))]
    args:Vec<Argument>,
    #[nom(many0(terminated(field,tag(";"))))]
    statements:Vec<Statement>
}
#[derive(Parser,PartialEq,Debug,Clone)]
#[nom(tuple(field(name),field(field_type)))]
pub struct Argument{
    name:Identifier,
    #[nom(opt(tuple(tag(":"),field)))]
    field_type:Option<Type>
}
#[derive(Parser,PartialEq,Debug,Clone)]
pub enum Statement{
    #[nom(tag("Hello"))]
    Hello,
    #[nom(tag("World"))]
    World
}
#[derive(Parser,PartialEq,Debug,Clone)]
pub struct Identifier(#[nom(func(ident))]String);
pub fn ident<'a>(input: &'a str) -> ParseResult<'a,String> {
    nom::combinator::map(tuple((
        alt((alpha1,tag("_"))),
        many0(alphanumeric1))),|(a,b)|{format!("{}{}",a,b.join(""))})
        (input)
}
#[derive(Parser,PartialEq,Debug,Clone)]
pub enum Type{
    #[nom(tag("Int"))]
    Int,
    #[nom(tag("String"))]
    String
}
fn main() {
    let (_,val) = Function::parse("fn hello( a : String , b ){ Hello; World; } ").unwrap();
    assert_eq!(val,Function{
        identifier:Identifier("hello".to_string()),
        args:vec![Argument{
            name:Identifier("a".to_string()),
            field_type:Some(Type::String)
        },
                  Argument{
                      name:Identifier("b".to_string()),
                      field_type:None
                  }
        ],
        statements:vec![Statement::Hello,Statement::World]
    });
}
Here is an example of How this reduces so much code. Simple Function Parser
望将
@CSUwangj

@CSUwangj , while doing some research I stumbled upon this. https://github.com/chipsenkbeil/vimwiki-rs. I thought you might find it helpful

thanks! I'll check it