These are chat archives for picoe/Eto.Parse

7th
Sep 2015
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 04:42
I still have problems with "\n" after changing = to :=
malformed_line := { malformed_line_unit } ; 
malformed_line_unit := ? Terminals.AnyChar ? - '\n';

> ast.Matches ["malformed_line", true].Matches[33]
{
}
    Name: "malformed_line_unit"
    Parser: {Eto.Parse.UnaryParser}
    Scanner: {Eto.Parse.Scanners.StringScanner}
    StringValue: "\n"
    Success: true
    Tag: (null)
    Text: "\n"
    Value: "\n"
> 
I don't understood how second rule matches '\n' character when i want it not to...
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 07:00
I tried to write
cr := #x0D ;
lf := #x0A ;
in grammar - it doesn't parse the grammar then.
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 07:13
I also tried to take code directly from the text of standart -
crlf := = { ? ISO 6429 character Carriage Return ? }, ? ISO 6429 character Line Feed ?, { ? ISO 6429 character Carriage Return ? };
Curtis Wensley
@cwensley
Sep 07 2015 07:20
I think it won't work because you are doing '\n', which is a C# escape sequence, which is not part of ebnf afaik.. using ? Terminals.Eol ? might work better.
or if you have EbnfStyle.CharacterSets style set (not part of Iso14977), you can do: [#x13]
CharacterSets are mutually exclusive with EbnfStyle.SquareBracketAsOptional, which is part of Iso14977.
if character escapes are actually part of the standard, that'd be an easy fix by adding AllowEscapeCharacters = true to the terminal_string in the EbnfGrammar constructor.
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 07:28
I have no constructor (because I don't derive class, I use the existing one). And I don't see a way to add Parser into mygrammar object after it's construction
So, I agree with you, that ISO is a standart which is not defining all details. But don't know what to do next.
Curtis Wensley
@cwensley
Sep 07 2015 07:31
no, this is in the Eto.Parse code, so it'd require a change there
but the ? Terminals.Eol ? is what you could do
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 07:31
this is not the only character i want
Curtis Wensley
@cwensley
Sep 07 2015 07:32
or as I mentioned earlier, use the EbnfGrammar.SpecialParsers to define anything you want
so.. ebnfGrammar.SpecialParsers["Woot"] = Terminals.Set("\r\n\tblah");
then in your ebnf: ? Woot ?
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 07:33
yes, willl try this now
Curtis Wensley
@cwensley
Sep 07 2015 07:34
Terminals.Eol does match \r\n or \n btw
Hm, I should add an EbnfStyle.EscapeStrings to allow for escapes in literals in ebnf though, it'd probably be very useful.
It'd also allow for hex such as '\x013'
Curtis Wensley
@cwensley
Sep 07 2015 07:47
there we go
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 07:53
You are right, that Terminals.Eol is exactly what I need, but I can't use it because of NIH syndrome :)
Curtis Wensley
@cwensley
Sep 07 2015 07:55
haha
that's a hard syndrome to overcome sometimes
this is the error: Index=3, Context="
nn>>>-nn
---n-n-
"
Expected:
eol: EOL
Curtis Wensley
@cwensley
Sep 07 2015 08:15
hm that doesn't make sesnse
*sense
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 08:15
what do you mean?
or, what steps I should do to provide more details or to diagnose the problem ?
I replaced nbody from 'n' to 'A', and replaced last '=' to ':=', nothing changes
Eol does match
(I mean the first eol in the C# string)
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 08:25
@cwensley And I can explain my NIH syndrome's source. I am not sure in '-' operator. I don't have successful experience with the construct := { ? Terminal.AnyChar ? } - { ? Terminal.Eol ? }
Curtis Wensley
@cwensley
Sep 07 2015 08:26
yeah that won't work
cuz it'll find all AnyChar's first
you need to do { ? Terminal.AnyChar ? - ? Terminal.Eol? }
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 08:27
one AnyChar will not match the 2-simbol sequence
Curtis Wensley
@cwensley
Sep 07 2015 08:27
all expressions in Eto are 'greedy'.. e.g. the { ? Terminal.AnyChar ? } will basically match the entire string
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 08:28
So, what to do about my nonworking example above
?
(about syntax3.ebnf)
Curtis Wensley
@cwensley
Sep 07 2015 08:31
hm, not sure.. I'd have to debug it..
I can't spot any problems at initial glance
Curtis Wensley
@cwensley
Sep 07 2015 08:44
oh cool, I'll take a look
Curtis Wensley
@cwensley
Sep 07 2015 08:52
ok, so it's a little odd behaviour, but the nbody is using { } which means 'zero or more'.. so, it matches and never gets to the myws rule in the alternative
if you move nbody to the last alternative for file_unit, it'll work
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 08:54
is there an IDE for debugging grammars ?
i want a highlighting for nondefined symbols, possible empty rules and etc
Curtis Wensley
@cwensley
Sep 07 2015 08:55
no but that'd be awesome to have
wouldn't be too difficult to put together either
eto.parse is just purely a hobby of mine so I don't always get a lot of time to work on it
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 08:57
I see how "not difficult" is just to use this library. I think IDE will take several months
meebey have an IRC client, it took 10 years to write
Curtis Wensley
@cwensley
Sep 07 2015 08:58
well depends on what you mean by "ide".. a simple editor/debugger wouldn't take much to put together initially.. but maybe it wouldn't meet expectations
the main difficulty of Eto.Parse is because it's a recursive descent parser instead of a state parser.. order of rules counts
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 09:01
all this should be written in teaching course (like wikibooks). I learned this only on the third day of solving "simple" task
Curtis Wensley
@cwensley
Sep 07 2015 09:03
indeed that would be nice. Using ebnf or bnf certainly complicates things quite a bit
mainly due to their limitations and inconsistent formats
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 09:06
You could recommend your own format (propose an RFC if there is none yet
Curtis Wensley
@cwensley
Sep 07 2015 09:08
I suppose, but defining your grammar through code is what I'd recommend instead
but if you have existing ebnf or are more comfortable with it, it is not a bad way to go but you have to mix it with code using the SpecialParsers
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 09:10
What was my logic? 1) I to parse file. 2) I was teached that there such thing as parsers 3) BNF is syntax 4) there is more modern EBNF 5) EBNF in wikipedia gives ISO standart of 1996
Curtis Wensley
@cwensley
Sep 07 2015 09:11
indeed that is a logical path
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 09:12
EBNF comes earlier in wikipedia page than ABNF (that is why I did't trie d yet with RFC)
Curtis Wensley
@cwensley
Sep 07 2015 09:13
oh yay another format for Eto.Parse to support!
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 09:13
I don't want to use code, because BNF is something what I have a bit of knowledge, and your code model is completely unknown for me
Curtis Wensley
@cwensley
Sep 07 2015 09:14
yep, no problem.. hence is why I've added the ebnf, bnf, and gold parsers.. there's just ambiguities that need to be addressed, and recursive descent parsing to take into consideration
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 09:23
oh, now my mistake
space instead of '-'
Curtis Wensley
@cwensley
Sep 07 2015 09:26
yeah you got it.
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 09:26
Ok. Now will do the third attempt to rewrite my grammar
Curtis Wensley
@cwensley
Sep 07 2015 09:26
ah, you don't have to change nbody if you changed it that way to 'A', { 'A' }
that is the 'typical' bnf way of doing things
er, don't have to move nbody to the end, that is
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 09:27
also, I didn't get what to do constructs similar to { AA - BBB }
Curtis Wensley
@cwensley
Sep 07 2015 09:28
what do you mean?
the AnyChar stuff?
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 09:28
for example - comment line cosists from comment symbols except some sequences
Curtis Wensley
@cwensley
Sep 07 2015 09:29
you just can't do { ? Terminals.AnyChar ? } pretty much anywhere in your grammar
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 09:30
yes, but what i should do ?
Curtis Wensley
@cwensley
Sep 07 2015 09:30
it will match the rest of the input since well, it'll match one or more of any character
you need to limit AnyChar to the exception, so { ? Terminals.AnyChar ? - ? Terminals.Eol ? } should match everything up till the newline
anyway.. off to bed.. it's really late here.
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 09:35
you already told that 3 times. Everything you say (on this thread) is correct. But you don't listen my words about another types of terminators
like in bash there is possibility to use fragments which ends by concrete word
Curtis Wensley
@cwensley
Sep 07 2015 09:36
sorry about that. I didn't see your questions about that specifically. What are you trying to do exactly?
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 09:36
like cat <<-'EOF' | sed -e 's/^ //' -e 's/ $//' | ed -s server.xml
H
/BEGIN realm/i
.
/BEGIN realm/+1,/END realm/-1d
.-1r realm.xml
wq
EOF
I am trying to learn one more feature of parser which i need to understand to write my grammar
exactly I want to define strings which ends by specific sequence of characters
(i want to be able to define)
Curtis Wensley
@cwensley
Sep 07 2015 09:39
myEbnfGrammar.SpecialParsers["UntilEOF"] = (+Terminals.AnyChar).Until("EOF"); (;
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 09:39
thanks
Curtis Wensley
@cwensley
Sep 07 2015 09:40
no way to do that using ebnf I'm afraid.. at least with Eto.Parse.
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 10:50
@cwensley What do you think about other parser generators? ruffin--/SqlDbSharp#3
GPPG, C#, BSD, https://gppg.codeplex.com
GPLEX, C#, BSD, https://gplex.codeplex.com
C#, GLL, Apache 2.0, YaccConstructor, http://yaccconstructor.github.io/YaccConstructor/gll.html
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 11:22
GLL and GLR should be easiest to define the grammar (because they should resolve ambiguilities automagically)
But I have not tried them
on wikipedia there is no page for GLL, but the page for GLR do exist - https://en.wikipedia.org/wiki/GLR_parser
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 19:40

You asked about practical application of '-' operation. Here it is:
 vmid := letter_english | digit | minus | ".";
 vend := letter_english | digit;
 variable_name := letter_english , [ [ { vmid } ] , vend ] ;
it has exactly the same problem - greedy "vmid" eats all the word, and then the end doesn't match all the rule
Curtis Wensley
@cwensley
Sep 07 2015 20:19
@ArsenShnurkov, thanks. Hm, I wonder if it would make sense to interpret the '-' applied to a repeat as an until instead of using an except parser..
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 20:21
I debugged my grammar deeper, but still not to the end
doesn't work for me, for unknown reason
ArsenShnurkov
@ArsenShnurkov
Sep 07 2015 20:36
oh, found one more bug, no "line_separator" rule defined
Curtis Wensley
@cwensley
Sep 07 2015 23:50
did you define it? I can't see it in the syntax