Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    implisci
    @implisci
    How does one create a function that creates and returns a Parser? The function would take (say) a String argument that would be applied in the body of the returned parser . The argument would appear within the P ( ) as interpolated string variable, for example: P(s"$foo") where foo is the String variable passed to the create Parser function. Thus if a parser exists for detecting the text within some html tag such as <p> some text </p> , essentially the same function would return parsers that would work for other tags such as <li> some text </li>. In this case the function could take 2 string params: one for the begin tag and another for the end tag.
    implisci
    @implisci

    Adding parameters to parser method as implicit does the trick (compile errors without the implicit). Would be curious to know how this works.

    def foo[_: P](implicit begin: String, end: String) = P(s"$begin" ~ CharIn("a-z").rep(0).! ~ s"$end" ~ End)
    fastparse.parse("<h1>blah</h1>",foo(_,"<h1>","</h1>")) 
    // Parsed[String] = Success("blah", 13)

    Interestingly, a further simplification also works. I thought it would lead to some eager evaluation or compile error.

    def h1parser(s:String) = parse(s,foo(_,"<h1>","</h1>"))
    h1parser("<h1>blah</h1>") 
    // Parsed[String] = Success("blah", 13)
    implisci
    @implisci

    One does not need implicit for begin and end if the call format is modified slightly as below. String interpolation of parameters not needed.

    def foo[_: P](begin: String, end: String) = P(begin ~ CharIn("a-z").rep(0).! ~  end ~ End)
    def h1parser(s:String) = fastparse.parse(s, foo("<h1>", "</h1>")(_))

    I assume h1parser is compiled and will not need to reinitialized. The foo parser, I think, needs to be reinitialized for each set of begin, end parameters (when it is called). Hence, create all the parsers one needs beforehand (or memoize) to avoid initialization cost. Correct?

    Li Haoyi
    @lihaoyi
    parsers are just methods; there is nothing to initialize and no initialization cost
    (this didnt used to be the case in fastparse 1.x, but it is now in fastparse 2.x)
    implisci
    @implisci
    Is there a way to feed StringIn with a list of strings? It supposedly(?) takes in String* but one gets "Function can only accept constant singleton type" . The docs say that the list of strings fed to StringIn is backed up by a Trie. Wouldn't the advantage of StringIn versus using | repeatedly be larger for many strings in the input to StringIn?
    urbanchr
    @urbanchr

    Is there an easy way to transform a Seq into a List when using .tupled? I really love about fast-parsers that one can easily write a .map using .tupled, for example as

    def Exp[_ : P]: P[Exp] = P(  P("if" ~ BExp ~ "then" ~ Exp ~ "else" ~ Exp).map{If.tupled}  .... )

    Unfortunately, my ASTs take often lists as arguments, for example

    case class Call(name: String, args: List[Exp]) extends Exp

    Now the problem is that I cannot write the parser for this simply as

    P (Ident ~ "(" ~ Exp.rep(sep=",")  ~ ")").map{Call.tupled} ...

    This will complain that .rep returns a Seq while Call expects a list. Of course I can write

    P (Ident ~ "(" ~ Exp.rep(sep=",")  ~ ")").map{ case (l, r) => Call(l, r.toList)}

    but that seems overly verbose. Is there any (simple) way to make the transition from Seq to List automatic behind the scenes?

    urbanchr
    @urbanchr
    Update: I can see that Haoyi Li in Hands-on-Scala uses Seq throughout to get the convenient syntax, but I would prefer to stick with Lists if possible. He defines Call as
    case class Call(expr: Expr, args: Seq[Expr]) extends Expr
    Mike Limansky
    @limansky

    Hi! I'm trying to fail parsing basing on some runtime conditions. As I understand it might be done using flatMap with Pass/Fail. But I compiler exception on this piece of code:

    import fastparse.NoWhitespace._
    import fastparse.Parsed.{Failure, Success}
    import fastparse._
    
    sealed trait Expr
    case class Foo(s: String) extends Expr
    case class Bar(s: String) extends Expr
    
    object Parser {
      val foos = Set("aaa", "bbb", "ccc")
      val bars = Set("ddd", "eee", "123")
    
      def p[_: P]: P[Expr] = P("x=\"" ~ CharPred(_ != '\"').rep.! ~ "\"")
        .flatMap(s =>
          if (foos.contains(s)) Pass(Foo(s))
          else if (bars.contains(s)) Pass(Bar(s))
          else Fail(s"unknown $s")
        )
    
      def parseExpr(s: String): Either[String, Expr] = {
        parse(s, p(_)) match {
          case Success(value, _) => Right(value)
          case Failure(str, i, extra) => Left(s"unable to parse, $str")
        }
      }
    
    }

    Am I doing something wrong?

    Mike Limansky
    @limansky
    Oh, looks like I faced with known problem: lihaoyi/fastparse#217 . The workaround from the issue comment works fine.
    Daniel Joanes
    @djoanes
    'field' > 10 and ('field' = 5 or 'field' = 8) into AND(Cond("field", ">", 10), OR(Cond('field', '=', 5), Cond('field', '=', 8))
    the brackets are trippying me out
    Any examples I can look at
    Diego Colombo
    @diegocolombo
    Hi. I'm having some trouble using MultiLineWhitespace. It doesn't consume the \n when there no space between it and an expression. e.g. "STARTS(customers.note,\"b\") \nAND \nENDS(customers.code, \"a\")" works fine, but "STARTS(customers.note,\"b\")\nAND \nENDS(customers.code, \"a\")" don't. Is it the right behavior?
    Mike Limansky
    @limansky

    Hi. Assume I have function def combine[_: P, T](a: T, b: T): P[T] which can return Pass or Fail depending on args. Is it possible to convert it to def combineM[_: P, T]: P[(T, T) => T] having the same logic but inside parser? I'm trying to refactor from:

    def plus[_: P]: P[Unit]
    def p[_: P] = P(x ~ plus ~x).flatMap { case(a, b) => combine(a,b) }

    to

    def plus[_:P, Expr]: P[(Expr, Expr) => Expr] = ???
    def p[_: P] = P(x ~ plus ~ x).flatMap { case (a, p, b) => p(a,b) }
    Agam Brahma
    @agam
    This might be very noob-ish, but: I'm stringing together a simple sexp-parser for something, and I'm trying to understand why, in the last one below, I'm getting an error about it having a Seq[String] and expecting a List[SchemeVal], when I've very explicitly annotated exprParser as being P[SchemeVal]:
      def stringParser[_: P] = P( "\"" ~ CharsWhile(_ != '\"').! ~ "\"" ).map( SchemeString(_) )
    
      def atomParser[_: P] = P( CharIn("a-zA-Z!#$%&|*+-/:<=>?@^_~") ~ CharIn("0-9a-zA-Z!#$%&|*+-/:<=>?@^_~").rep ).!.map( SchemeAtom(_) )
    
      def numberParser[_: P] = P( CharIn("0-9").rep.! ).map( (x) => SchemeNumber(x.toInt) )
    
      def exprParser[_: P]: P[SchemeVal] = P( atomParser | stringParser | numberParser | "(" ~ listParser ~ ")" )
    
      def listParser[_: P]: P[SchemeList] = P( exprParser.!.rep(sep=" ") ).map( SchemeList(_) )
    1 reply
    David
    @davoclavo

    Im building a streaming Byte parser, and am trying to use fastparse 2.2.2 with geny.Readable, but can't figure out how to build Byte matchers, only with Strings, so..

    Do I have to treat them all as String? Or should I use a version prior to 2.0.4 to use the Byte matchers?

    Thanks a lot for the tips!

    Li Haoyi
    @lihaoyi
    fastparse 2.x only works with strings; 1.x had support for bytes, but there was relatively low usage so I dropped support in 2.x
    avitaleOPEN
    @avitaleOPEN
    Hi. I'm new using parser combinators (not so using parser generators like ANTLR) and I was wondering if it is possible to create parsers based on island grammars using fastparser, f.e, "match every import sentence in a java file, but ignore the rest". Does fastparse have any feature to work directly with such a thing? Thx!
    avitaleOPEN
    @avitaleOPEN
    I've come to a solution playing around with negative lookahead. Maybe that's the way
    Andrea
    @Andrea
    Hi there: I am sure I am missing something obvious, but I expected the whitespace to work in this case. I got around it by using the NoSpace option and parsing the space myself
    @ import $ivy.`com.lihaoyi::fastparse:2.2.2`  
    @ import fastparse._ 
    @ import SingleLineWhitespace._ 
    
    @ def number[_: P]: P[Int]       = P(CharIn("0-9").rep(1).!.map(_.toInt)) 
    @ parse(" 22", number(_)) 
    res5: Parsed[Int] = Failure(
      "",
      0,
      Extra(
        IndexedParserInput(" 22"),
        0,
        0,
        ammonite.$sess.cmd5$$$Lambda$2051/468567811@2f20f7ad,
        List()
      )
    )
    @ def numbers[_: P]: P[Seq[Int]] = P(number.rep(2)) 
    @ parse("  22  44", number(_)) 
    res6: Parsed[Int] = Failure("", 0, Extra(IndexedParserInput("  22  44"), 0, 0, ammonite.$sess.cmd6$$$Lambda$2091/1165646637@48cb2d73, List()))
    Li Haoyi
    @lihaoyi
    @Andrea whitespace only applies between things, but not at the beginning or end of the parse
    Timofey
    @GusevTimofey

    Hi there! What is a way to make the leftBracket and rightBracket optional at the same time:

    def leftBracket[_: P]: P[Unit] = P("(")
    def rightBracket[_: P]: P[Unit] = P(")")
    def parser[_: P]: P[ParsedRule] = P(leftBracket.? ~ parseRule ~ rightBracket.?)

    I want to have: (text) or text. And now I have: (text or text) or text or (text).

    xuanbachle
    @xuanbachle
    How can I get position of the AST node after parsing? In parser combinators library, I can use the Positional to get the line, col information, but it's unclear how to do it in FastParse.
    Li Haoyi
    @lihaoyi
    You can use the Index parser and then store the Int somewhere in your AST node
    xuanbachle
    @xuanbachle
    Could you help give an example of the Index parser? Thank you.
    Li Haoyi
    @lihaoyi
    look at the docs
    xuanbachle
    @xuanbachle
    Ah, sounds good. Thank you.
    Glen Marchesani
    @fizzy33
    @olafurpg any pointers on getting a sjs 1 ready com.geirsson:fastparse:1.0.0 ?
    Ólafur Páll Geirsson
    @olafurpg
    @fizzy33 i think its already published under the org.scalameta domain
    have you checked mvnrepository.com?
    Glen Marchesani
    @fizzy33
    it is only for pre 1.0 scala js
    Screenshot 2020-10-01 at 15.29.37.png
    @fizzy33
    Glen Marchesani
    @fizzy33
    oh snap, not sure which maven repo search I used but obviously not that one @olafurpg thanks
    that solves my issue
    xraybat
    @xraybat

    hello, trying to (fast)parse "coord 1,2" into 3 parts (with SingleLineWhitespace._). when i try :

    def parser[_: P] = 
      P("coord".!
        ~ CharIn("0-9").rep(exactly=2, sep=",").!)

    i get:

    found 'coord', value = (coord,1,2), index = 9
    value_1 = coord
    value_2 = 1,2

    with both coords in the one value. when i try:

    def parser[_: P] = 
      P("coord".!
        ~ CharIn("0-9").rep(exactly=1, sep=",").!
        ~ CharIn("0-9").rep(exactly=1, sep=",").!)

    i get :

    Expected parser:1:1 / [0-9]:1:8, found ",2"

    what am i doing wrong here?

    Gábor Bakos
    @aborg0

    You were not considering the , separator (with exactly=1 none of them expects separator text). Something like this should work:

        def parser[_: P] = 
          P("coord".!
           ~ CharIn("0-9").rep(exactly=1, sep=",").! ~ ","
           ~ CharIn("0-9").rep(exactly=1, sep=",").!)

    or just

        def parser[_: P] = 
          P("coord".!  ~ CharIn("0-9").! ~ "," ~ CharIn("0-9").!)
    xraybat
    @xraybat
    thanks @aborg0, your solution is much simpler. i thought i was making it more complex than need be. and i should add .rep(1) to handle more than single digits (and an End). so:
    P("coord".! ~ CharIn("0-9").rep(1).! ~ "," ~ CharIn("0-9").rep(1).! ~ End)
    Performant Data
    @performantdata

    I feel like I'm missing something. Maybe I missed it in the docs.

    How do you continue parsing on an InputStream? I have to think that the ParsingRun needs to be saved between calls to parse(), because it must contain at least one character that it has read off the InputStream, which it can't be sure that it can "put back" (via reset()). parse() must need that character in order to know when it has exhausted the parser's matching.

    1 reply
    Performant Data
    @performantdata
    And is the InputStream being run through the default charset codec? It seems like Char-based parsing is all that's available.
    1 reply
    Li Haoyi
    @lihaoyi
    parsing has to happen all in one method call, so if your InputStream doesnt have all the data ready, the parsing needs to block and wait for it
    not sure about reusing the imput stream between parses, migjt be able to get the un-parsed data from the previous parsing run and prepend it the next time someone tries to parse something
    1 reply
    prepend = wrap the inputstream with a new one which returns the prepended data before readinng from the underlying stream
    Li Haoyi
    @lihaoyi
    you don't need ParsingRun, just ParserInput which you can construct yourself and pass in
    Carlos Silva
    @alchimystic
    hello. I'm using your pythonparse example to parse some python files.
    I'm using Statements.suite as the entry point, as it seems to be the only way to handle properly trailing tabs. I am trying to get a Seq[Ast.stmt].
    I am able to parse most of the files, but i'm stuck in one which contains only a def (python function) and nothing else.
    Tried to play around with the suite method, adding an alternative " | funcdef" or even a " | decorated" but it fails to compile. Decorated returns one single statement, while the original returns a Seq of it. How can i wrap a Seq around it? Maybe adding a .map after P[...]?
    4 replies
    Matt Jadczak
    @mjadczak

    I'm trying to use fastparse to parse a legacy config format which I cannot change. It's a pretty vanilla "key-value" format, a little like HOCON (just with more weirdness), but one of its quirks is that whitespace is allowed inside key names, but is stripped off the start and end. So a line like

             this.key.is weird = This value is allowed to have spaces !

    would be parsed as the key having parts ["this", "key", "is weird"] and the value would be "This value is allowed to have spaces !"
    I am currently defining my key parsing logic something like P(whitespace ~ keyPart ~ ("." ~/ keyPart).rep) and then mapping via a function which takes the last element parsed and manually removes the whitespace. Is this the best I can do, or is there some way to express "repeat values, but the last one cannot end in whitespace but can contain it"?

    1 reply
    Matt Jadczak
    @mjadczak
    (where keyPart is P(CharsWhile(c => c != '=' && c != '.' && c != '\n').!) )