Parsing expression grammar: Difference between revisions

Content deleted Content added
Indirect left recursion: Clarify intent of example
Advantages: Actually write about advantages (as opposed to underhandedly warn about memory requirements, which is covered in another section)
Line 153:
 
== Advantages ==
Compared to pure [[regular expressions]] (i.e., withoutdescribing back-referencesa language recognisable using a [[finite automaton]]), PEGs are strictlyvastly more powerful,. butIn requireparticular significantlythey morecan memory.handle Forunbounded examplerecursion, aand regularso expressionmatch inherentlyparentheses cannotdown findto an arbitrary numbernesting ofdepth; matchedregular pairsexpressions can at best keep track of parentheses,nesting becausedown itto issome notfixed recursivedepth, butbecause a PEGfinite automaton (having a finite set of internal states) can only distinguish finitely many different nesting depths. In Howevermore theoretical terms, <math> \{a^n PEGb^n\}_{n will\geqslant require0} an</math> amount(the language of memoryall proportionalstrings toof thezero lengthor ofmore the input<math>a</math>'s, whilefollowed aby regularan expression''equal matchernumber'' willof require<math>b</math>s) onlyis a constantparsing amountexpression oflanguage, memory.matched by the grammar
 
<syntaxhighlight lang="peg">
Any PEG can be parsed in linear time by using a packrat parser, as described above.
start ← Matched !.
Matched ← ('a' Matched 'b')?
</syntaxhighlight>
 
Here <code>Matched !.</code> is the starting expression. The <code>!.</code> part enforces that the input ends after the <code>Matched</code>, by saying “there is no next character”; unlike regular expressions, which have magic constraints <code>$</code> or <code>\Z</code> for this, parsing expressions can express the end of input using only its basic primitives.
Many CFGs contain ambiguities, even when they're intended to describe unambiguous languages. The "[[dangling else]]" problem in C, C++, and Java is one example. These problems are often resolved by applying a rule outside of the grammar. In a PEG, these ambiguities never arise, because of prioritization.
 
The <code>*</code>, <code>+</code>, and <code>?</code> of parsing expressions are similar to those in regular expressions, but a difference is that these operate strictly in a greedy mode. This is ultimately due to <code>/</code> being an ordered choice.
 
In the strict formal sense, PEGs are likely incomparable to [[context-free grammar]]s (CFGs), but practically there are many things that PEGs can do which pure CFGs cannot, whereas it is difficult to come up with examples of the contrary. In particular PEGs can be crafted to natively resolve ambiguities, such as the "[[dangling else]]" problem in C, C++, and Java, whereas CFG-based parsing often needs a rule outside of the grammar to resolve them. Moreover any PEG can be parsed in linear time by using a packrat parser, as described above, whereas parsing according to a general CFG is asymptotically equivalent<ref>{{cite journal |last1=Lee |first1=Lillian |title=Fast Context-free Grammar Parsing Requires Fast Boolean Matrix Multiplication |journal=J. ACM |date=January 2002 |volume=49 |issue=1 |page=1–15 |doi=10.1145/505241.505242}}</ref> to [[logical matrix|boolean matrix]] multiplication (thus likely between quadratic and cubic).
 
== Disadvantages ==