Parsing expression grammar: Difference between revisions

Content deleted Content added
Indirect left recursion: Practical significance
Compared to regular expressions: Simplified example. Elaborated point about nondeterminism.
Line 177:
 
The <code>*</code>, <code>+</code>, and <code>?</code> of parsing expressions are similar to those in regular expressions, but a difference is that these operate strictly in a greedy mode. This is ultimately due to <code>/</code> being an ordered choice. A consequence is that something can match as a regular expression which does not match as parsing expression:
: <code>[oa]+[ab]?[oa]+[b][obbc]+[ccd]</code>
is both a valid regular expression and a valid parsing expression. As regular expression, it matches <code>bc</code>, but as parsing expression it does not match, because the <code>[ab]?</code> will match the <code>b</code>, then <code>[bc]</code> will match the <code>c</code>, leaving nothing for the <code>[cd]</code>, so at that point matching the sequence fails. "Trying again" with having <code>[ab]?</code> match the empty string is explicitly against the semantics of parsing expressions; this is not an edge case of a particular matching algorithm, instead it is the sought behaviour.
is both a valid regular expression and a valid parsing expression. As regular expression, it matches
 
: <code>ooaooboooc</code>
Even regular expressions that depend on nondeterminism ''can'' be compiled into a parsing expression grammar, by having a separate nonterminal for every state of the corresponding [[deterministic finite automaton|DFA]] and encoding its transition function into the definitions of these nonterminals —
with <code>[ab]</code> against the <code>a</code> and <code>[b]</code> against the <code>b</code>, but as parsing expression it does not match, because <code>[oa]+</code> matches <code>ooaoo</code>, <code>[ab]</code> matches the <code>b</code>, second <code>[oa]+</code> matches <code>ooo</code>, causing <code>[b]</code> to fail against the <code>c</code>.
<syntaxhighlight lang="peg">
A ← 'x' B / 'y' C
</syntaxhighlight>
is effectively saying "from state A transition to state B if the next character is x, but to state C if the next character is y" — but this works because nondeterminism can be eliminated within the realm of regular languages. It would not make use of the parsing expression variants of the repetition operations.
 
=== Compared to context-free grammars ===