Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Gargaj
    @Gargaj
    basically i'm hoping to get that working with both abc { ... } and { ... }
    Gargaj
    @Gargaj
    bit of elaboration: my problem seems to be that i have two parsers and both can start with the same token
    Nicholas Blumhardt
    @nblumhardt
    @Gargaj seems like what you have should work - any chance of a more complete sample? might be a good one for the longer-form Stack Overflow format, if you end up posting a q there please drop a link here, will take a look :+1:
    Gargaj
    @Gargaj
    yeah I figured it out
    i needed some more use of .Try()
    cos i didnt realize i need .Try() to allow backtracking in the token list
    i even found a hax for parsing a list that has a dangling delimiter (so the javascript style [1,2,3,] empty element) by just creating an end token that can be either ] or ,]
    i.sinister
    @i-sinister
    Good day, everybody. I need to pass a "context object" to the TokenListParser so that I can do symbol lookups (variables actually). Parser delegate signature does not allow it but there are several workarounds and I like none of them. First of all, I can have parsers as a readonly instance properties of some Parser class, but then they have to be recreated for every parse operation which is not good for performance, especially considering linq usage when using combinators. Another approach is to put reference to the "context object" to every token so that it is always available at the parser - this is also a "performance killer" because it would require convert tokens from enum to the struct (at least) with the pointer to the context. The third option is to build AST and perform lookups/validation at a later stage - this one seems like doing unnecessary jobs and (maybe) producing invalid tree (also I'm loosing token location information). Last options is to do variable lookup at the tokenization stage (similar to the "lexer hack"), but this approach does not solve by problem 100% because in some cases I need to know the "context around the token" (aka future AST node), so its really better be done at the parsing stage. And there is also an "option" (which is not an option of "using superpower") to write "parallel implementation" for TokenListParser (with combinators etc) that accepts context as an argument - I'd like to avoid it of course as means writing lot code and fixings lots of bugs in the code written. So what are the recommended/best practices to handle "accessing context during parsing" problems?
    i.sinister
    @i-sinister
    @nblumhardt, is this chat alive?
    Nicholas Blumhardt
    @nblumhardt
    Hi @i-sinister :-) ... yup! I don't have anything to add to your analysis above, though - using instance-based parsers for context-sensitive grammars works but isn't very efficient; sometimes the context-sensitivity can be kept to just a few rules, though, with the majority of syntactic forms still context-free, sounds like that's the best option.
    Truly context-sensitive-grammars are a bit of a special case, though - most of the time, AST post-processing and a forgiving grammar is the way to go
    not easy answers to all these questions, though
    Erik Schierboom
    @ErikSchierboom
    Just wanted to let you know I love superpower. Brilliant job!
    Nicholas Blumhardt
    @nblumhardt
    Thanks @ErikSchierboom :+1: :-)
    Jeivardan
    @jeivardan
    Hi @nblumhardt my problem statement is I'll receive a response when I execute a command and the start of response string contains any of the following token ":", "?", "Finished", "Error", "Info", "Warning" and depending upon the tokens specified above the response body that is rest of the string present after this token may vary for example if the token is ":" it means a valid prompt and has no response body and if the token is "Finished" the response body contains two things first is a command name that I executed and the rest is the actual responsedata and I need to convert this whole response into an object with members like { responseType, cmdName, responsedata } Is it possible with superpower.
    Jeivardan
    @jeivardan
    Possible types of responses
    1) " : " ( : Means a valid prompt and no response body).
    2) " ? some response message" (? Means Invalid prompt after executing an invalid command)
    3) " Finished : CommandName : responsedata"
    this is the first level of parsing the response and futher the responsedata can be parsed
    Jeivardan
    @jeivardan
    Can Superpower solve my problem or should I look into ANTLR
    Jeivardan
    @jeivardan
    @nblumhardt any suggestion please I am confused

    For a command possible responses are

    ":"

    "?"

    "Finished : commandname : msgbody"

    "Error errorcode commandname(if)"

    "Info : msgbody"

    "Warning errorcode msgbody"

    Andrew Savinykh
    @AndrewSav
    @jeivardan you've got to try and see if it works out for you. there are a lot of unknowns here, what response can contain, what body can contain, are you going to tokenize them or not, do you have a grammar for them, what other "tags" mean, etc, etc, etc
    Kenneth Ellested
    @ellested_gitlab

    Is Nicholas to be found here?

    I just wanted to send my deep thanks for his great work on this incredible library. I was actually "forced" in this direction, as I couldn't find a HTML parser for .net, that could load the nodes and retain the exact stream positions, and at the same time allow for all the quirks to be found in HTML documents. I've tried both HtmlAgilityPack, AngleSharp and other libraries - but this problem is apparently quite general. Anyway, I've started some days ago, and I was lost several times trying to figure my way around the code and concepts. But the more I used it, the more sense it suddenly made. The HTML parser is maybe just 300 lines of code, and it reads every quirk correctly so far - and it's the Superpower library that makes this possible. Some of the usual problems are matching tags, non closed tags (occasionally), multiline attributes, markup in script and style tags and so on. I know that large documents are not suited for this kind of parser design, but the Speed is around 1.5 seconds for a 1.2MB document (around 10 times longer than HAP and AngleSharp). That's pretty good without optimizations and probably some mistakes made.

    To make a long story short, every developer should learn Superpower - the investment in time will come back 10 fold. A lot of new opportunities will also open, and you can make a lot of cool stuff with your new Superpowers (great name too 😀)

    Thanks Nicholas

    Nicholas Blumhardt
    @nblumhardt
    Woot! @ellested_gitlab that's awesome to hear, thanks for dropping by - much appreciated :sunglasses:
    Kenneth Ellested
    @ellested_gitlab
    Hi Nicholas - I've made a custom TextParser<TextSpan> based on the built-in Comment parsers. The idea is to match any character until a certain string/span occurs. I'm almost sure it can be done with the build in parsers, but I haven't found out yet. Example: "I need this text, until I encounter a <STOP>". So I need any character until the word <STOP>. My problem seems to be that Character.AnyChar is greedy, and I can't figure a way to limit it. I'm sure this is ultra simple when you know how :-)
    Kenneth Ellested
    @ellested_gitlab

    This is actually what I'm looking for:
    public static TextParser<TextSpan> MatchUntil(string stopword) =>
    from value in Span.Regex($@".+?(?={Regex.Escape(stopword)})")
    select value;

    But I can't figure out to make it with the fast parser methods in the library. I've tried combinations with Or, IgnoreThen and Try, but I fail every time. I'm sure I'm missing the point somewhere, so it would be great to see how this is done without the Regex.

    Nicholas Blumhardt
    @nblumhardt
    @ellested_gitlab I think we're missing something like Span.Until("<STOP>") - I have a feeling I've seen an implementation of it in the past, but can't put my finger on where it was, sorry :)
    Kenneth Ellested
    @ellested_gitlab
    OK, I feel better now :) - I was pretty sure I was just overlooking something fundamental. Anyway, seems like a nice challenge, so I will give it a try.
    Kenneth Ellested
    @ellested_gitlab
        public static class SpanEx
        {
            public static TextParser<TextSpan> Until(string stopword)
            {
                bool isWithinLength(TextSpan ts) => ts.Length >= stopword.Length;
                bool isStopwordMatching(TextSpan ts) => ts.First(stopword.Length).EqualsValue(stopword);
                bool isMatch(TextSpan ts) => isWithinLength(ts) && isStopwordMatching(ts);
    
                return (TextSpan input) =>
                {
                    TextSpan x = input;
    
                    while (!x.IsAtEnd && !isMatch(x))
                        x = x.ConsumeChar().Remainder;
    
                    return isMatch(x)
                      ? Result.Value(input.Until(x), x, x)
                      : Result.Empty<TextSpan>(input, $"Until expected {stopword}");
                };
            }
        }
    Came up with this, which is at least 15% faster than the Regex on my integrations tests. I'm not so confident about how the error messaging works yet, so not sure if this is fully compatible.
    If it's not totally off, I can submit a PR with my tests...
    Nicholas Blumhardt
    @nblumhardt
    Looks about right to me @ellested_gitlab - haven't thought through it in detail but a PR would be welcome, we can dig in further there! :+1:
    Khiem Pham
    @vi3tkhi3m
    Hi, anyone know how I can extract a string between two brackets that has nested brackets? Ex. (I want to (extract (this)) text). Output should be : I want to (extract (this)) text. I've tried to use .Contained(OpenBracket, ClosingBracket), but this will close as soon it sees the first closing bracket ... Thanks in advance!
    Kristian Hellang
    @khellang

    Hello 👋🏻 Does anyone have any pointers on how to tokenize/parse a template like this:

     Hello {upper(firstName)}!

    Basically, I want to tokenize everything outside the curlies as just text, including whitespace, but parse everything inside the curlies with full fidelity as identifiers etc., ignoring whitespace

    Kristian Hellang
    @khellang
    It feels like I want to nest tokenizers, where the outer would separate template from text, while the inner would dig into the template itself
    Kristian Hellang
    @khellang
    I guess I could always write the tokenizer by hand, but using TokenizerBuilder is just too lovely
    Nicholas Blumhardt
    @nblumhardt
    Heya @khellang !
    Yes, in fact there's an example of a parser exactly like this one at:
    Nicholas Blumhardt
    @nblumhardt
    Actually, that one might be more complicated than you need, since the expressions in that language include top-level { and }, so the end of a "hole" depends on the expression grammar; e.g. Hello {greeting({name: 'ted'})}!
    Nicholas Blumhardt
    @nblumhardt
    (Or Hello {greeting({name: '}')}! :-) )
    Think you'll need to either adopt something like that, though, or write the tokenizer by hand; TokenizerBuilder is a bit too simplistic for this
    Nicholas Blumhardt
    @nblumhardt
    Digging back into it some more, the Serilog.Expressions one was even nastier because of the need to support , and : as delimiters between the expression and the alignment/width/format specifiers, while also using them in various roles within the expression syntax. Hopefully if you tackle writing the tokenizer by hand, it won't be quite that nasty :-)
    If you need another set of eyes on anything, ping me here or by mail :)
    Nicholas Blumhardt
    @nblumhardt
    @vi3tkhi3m is it just text you're dealing with, or are there other more complex aspects to the grammar? If it's just text, iterating through character by character and tracking a depth variable for parenthesis nesting will be a lot more straightforward than doing this with a parser, I think.
    Kristian Hellang
    @khellang
    Thanks @nblumhardt! I ended up writing the tokenizer by hand. Was pretty straight forward for what I needed :)
    Thanks for an awesome library 😍
    Nicholas Blumhardt
    @nblumhardt
    @khellang :bow:
    José Manuel Nieto
    @SuperJMN
    Hi! Trouble Man here, kicking again!
    I hope somebody can help me with this. I'm a big ashamed that I cannot handle it by myself: https://stackoverflow.com/questions/66959755/parse-string-between-a-pair-of-delimiters-that-are-strings