Parsing expression grammar: Difference between revisions

Content deleted Content added
Advantages: Actually write about advantages (as opposed to underhandedly warn about memory requirements, which is covered in another section)
Indirect left recursion: Sum and Product should have + instead of * for repetition. With that change, it became obvious that the ordered choice should start with the lowest priority, not the highest.
Line 190:
A PEG is called ''well-formed''<ref name="For04"/> if it contains no ''[[left recursion|left-recursive]]'' rules, i.e., rules that allow a nonterminal to expand to an expression in which the same nonterminal occurs as the leftmost symbol. For a left-to-right top-down parser, such rules cause infinite regress: parsing will continually expand the same nonterminal without moving forward in the string.
 
Therefore, to allow packrat parsing, left recursion must be eliminated. For example, in the arithmetic grammar above, it could seem tempting to express operator precedence as a matter of ordered choice — <code>ValueSum / Product / SumValue</code> would mean first try viewing as <code>ValueSum</code> (since we parse top–down), second try viewing as <code>Product</code>, and only third try viewing as <code>SumValue</code> — rather than via nesting of definitions. This (non-well-formed) grammar seeks to keep precedence order only in one line:
 
<syntaxhighlight lang="peg">
Value ← [0-9.]+ / '(' Expr ')'
Product ← Expr (('*' / '/') Expr)*+
Sum ← Expr (('+' / '-') Expr)*+
Expr ← ValueSum / Product / SumValue
</syntaxhighlight>
 
Unfortunately matching an <code>Expr</code> (once it has been found <code>Value</code> did not match) requires testing if a <code>ProductSum</code> matches, while matching a <code>ProductSum</code> requires testing if an <code>Expr</code> matches. Because the term appears in the leftmost position, these rules make up a [[circular definition]] that cannot be resolved. (Circular definitions that can be resolved exist—such as in the original formulation from the first example—but such definitions are required not to exhibit pathological recursion.) However, left-recursive rules can always be rewritten to eliminate left-recursion.<ref name="pegs" /><ref name="AhoSethiUllman 1986">{{cite book |last1=Aho |first1=A.V. |last2=Sethi |first2=R. |last3=Ullman |first3=J.D. |date=1986 |title=Compilers: Principles, Techniques, and Tools |publisher=[[Addison-Wesley Longman]] |place=Boston, MA, USA |url=https://archive.org/details/compilersprincip0000ahoa/ |url-access=registration |isbn=0-201-10088-6}}</ref> For example, the following left-recursive CFG rule:
<syntaxhighlight lang="peg">
string-of-a ← string-of-a 'a' | 'a'