Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Nicholas Blumhardt
    @nblumhardt
    Woot! @ellested_gitlab that's awesome to hear, thanks for dropping by - much appreciated :sunglasses:
    Kenneth Ellested
    @ellested_gitlab
    Hi Nicholas - I've made a custom TextParser<TextSpan> based on the built-in Comment parsers. The idea is to match any character until a certain string/span occurs. I'm almost sure it can be done with the build in parsers, but I haven't found out yet. Example: "I need this text, until I encounter a <STOP>". So I need any character until the word <STOP>. My problem seems to be that Character.AnyChar is greedy, and I can't figure a way to limit it. I'm sure this is ultra simple when you know how :-)
    Kenneth Ellested
    @ellested_gitlab

    This is actually what I'm looking for:
    public static TextParser<TextSpan> MatchUntil(string stopword) =>
    from value in Span.Regex($@".+?(?={Regex.Escape(stopword)})")
    select value;

    But I can't figure out to make it with the fast parser methods in the library. I've tried combinations with Or, IgnoreThen and Try, but I fail every time. I'm sure I'm missing the point somewhere, so it would be great to see how this is done without the Regex.

    Nicholas Blumhardt
    @nblumhardt
    @ellested_gitlab I think we're missing something like Span.Until("<STOP>") - I have a feeling I've seen an implementation of it in the past, but can't put my finger on where it was, sorry :)
    Kenneth Ellested
    @ellested_gitlab
    OK, I feel better now :) - I was pretty sure I was just overlooking something fundamental. Anyway, seems like a nice challenge, so I will give it a try.
    Kenneth Ellested
    @ellested_gitlab
        public static class SpanEx
        {
            public static TextParser<TextSpan> Until(string stopword)
            {
                bool isWithinLength(TextSpan ts) => ts.Length >= stopword.Length;
                bool isStopwordMatching(TextSpan ts) => ts.First(stopword.Length).EqualsValue(stopword);
                bool isMatch(TextSpan ts) => isWithinLength(ts) && isStopwordMatching(ts);
    
                return (TextSpan input) =>
                {
                    TextSpan x = input;
    
                    while (!x.IsAtEnd && !isMatch(x))
                        x = x.ConsumeChar().Remainder;
    
                    return isMatch(x)
                      ? Result.Value(input.Until(x), x, x)
                      : Result.Empty<TextSpan>(input, $"Until expected {stopword}");
                };
            }
        }
    Came up with this, which is at least 15% faster than the Regex on my integrations tests. I'm not so confident about how the error messaging works yet, so not sure if this is fully compatible.
    If it's not totally off, I can submit a PR with my tests...
    Nicholas Blumhardt
    @nblumhardt
    Looks about right to me @ellested_gitlab - haven't thought through it in detail but a PR would be welcome, we can dig in further there! :+1:
    Khiem Pham
    @vi3tkhi3m
    Hi, anyone know how I can extract a string between two brackets that has nested brackets? Ex. (I want to (extract (this)) text). Output should be : I want to (extract (this)) text. I've tried to use .Contained(OpenBracket, ClosingBracket), but this will close as soon it sees the first closing bracket ... Thanks in advance!
    Kristian Hellang
    @khellang

    Hello 👋🏻 Does anyone have any pointers on how to tokenize/parse a template like this:

     Hello {upper(firstName)}!

    Basically, I want to tokenize everything outside the curlies as just text, including whitespace, but parse everything inside the curlies with full fidelity as identifiers etc., ignoring whitespace

    Kristian Hellang
    @khellang
    It feels like I want to nest tokenizers, where the outer would separate template from text, while the inner would dig into the template itself
    Kristian Hellang
    @khellang
    I guess I could always write the tokenizer by hand, but using TokenizerBuilder is just too lovely
    Nicholas Blumhardt
    @nblumhardt
    Heya @khellang !
    Yes, in fact there's an example of a parser exactly like this one at:
    Nicholas Blumhardt
    @nblumhardt
    Actually, that one might be more complicated than you need, since the expressions in that language include top-level { and }, so the end of a "hole" depends on the expression grammar; e.g. Hello {greeting({name: 'ted'})}!
    Nicholas Blumhardt
    @nblumhardt
    (Or Hello {greeting({name: '}')}! :-) )
    Think you'll need to either adopt something like that, though, or write the tokenizer by hand; TokenizerBuilder is a bit too simplistic for this
    Nicholas Blumhardt
    @nblumhardt
    Digging back into it some more, the Serilog.Expressions one was even nastier because of the need to support , and : as delimiters between the expression and the alignment/width/format specifiers, while also using them in various roles within the expression syntax. Hopefully if you tackle writing the tokenizer by hand, it won't be quite that nasty :-)
    If you need another set of eyes on anything, ping me here or by mail :)
    Nicholas Blumhardt
    @nblumhardt
    @vi3tkhi3m is it just text you're dealing with, or are there other more complex aspects to the grammar? If it's just text, iterating through character by character and tracking a depth variable for parenthesis nesting will be a lot more straightforward than doing this with a parser, I think.
    Kristian Hellang
    @khellang
    Thanks @nblumhardt! I ended up writing the tokenizer by hand. Was pretty straight forward for what I needed :)
    Thanks for an awesome library 😍
    Nicholas Blumhardt
    @nblumhardt
    @khellang :bow:
    José Manuel Nieto
    @SuperJMN
    Hi! Trouble Man here, kicking again!
    I hope somebody can help me with this. I'm a big ashamed that I cannot handle it by myself: https://stackoverflow.com/questions/66959755/parse-string-between-a-pair-of-delimiters-that-are-strings