Content deleted Content added
→Indirect left recursion: Sum and Product should have + instead of * for repetition. With that change, it became obvious that the ordered choice should start with the lowest priority, not the highest. |
→Advantages: Added example of non-context-free parsing expression language. |
||
Line 153:
== Advantages ==
Compared to pure [[regular expressions]] (i.e., describing a language recognisable using a [[finite automaton]]), PEGs are vastly more powerful. In particular they can handle unbounded recursion, and so match parentheses down to an arbitrary nesting depth; regular expressions can at best keep track of nesting down to some fixed depth, because a finite automaton (having a finite set of internal states) can only distinguish finitely many different nesting depths. In more theoretical terms, <math> \{a^n b^n\}_{n \geqslant 0} </math> (the language of all strings of zero or more <math>a</math>'s, followed by an ''equal number'' of <math>b</math>s) is not a regular language, but it is easily seen to be a parsing expression language, matched by the grammar
<syntaxhighlight lang="peg">
Line 160:
</syntaxhighlight>
Here <code>Matched !.</code> is the starting expression. The <code>!.</code> part enforces that the input ends after the <code>Matched</code>, by saying “there is no next character”; unlike regular expressions, which have magic constraints <code>$</code> or <code>\Z</code> for this, parsing expressions can express the end of input using only
The <code>*</code>, <code>+</code>, and <code>?</code> of parsing expressions are similar to those in regular expressions, but a difference is that these operate strictly in a greedy mode. This is ultimately due to <code>/</code> being an ordered choice.
In the strict formal sense, PEGs are likely incomparable to [[context-free grammar]]s (CFGs), but practically there are many things that PEGs can do which pure CFGs cannot, whereas it is difficult to come up with examples of the contrary. In particular PEGs can be crafted to natively resolve ambiguities, such as the "[[dangling else]]" problem in C, C++, and Java, whereas CFG-based parsing often needs a rule outside of the grammar to resolve them. Moreover any PEG can be parsed in linear time by using a packrat parser, as described above, whereas parsing according to a general CFG is asymptotically equivalent<ref>{{cite journal |last1=Lee |first1=Lillian |title=Fast Context-free Grammar Parsing Requires Fast Boolean Matrix Multiplication |journal=J. ACM |date=January 2002 |volume=49 |issue=1 |page=1–15 |doi=10.1145/505241.505242}}</ref> to [[logical matrix|boolean matrix]] multiplication (thus likely between quadratic and cubic).
One classical example of a formal language which is provably not context-free is the language <math> \{a^n b^n c^n\}_{n \geqslant 0} </math>: an arbitrary number of <math>a</math>s are followed by an equal number of <math>b</math>s, which in turn are followed by an equal number of <math>c</math>s. This, too, is a parsing expression language, matched by the grammar
<syntaxhighlight lang="peg">
start ← AB 'c'*
AB ← 'a' AB 'b' / &(BC !.)
BC ← ('b' BC 'c')?
</syntaxhighlight>
For <code>AB</code> to match, the first stretch of <math>a</math>s must be followed by an equal number of <math>b</math>s, and in addition <code>BC</code> has to match where the <math>a</math>s switch to <math>b</math>s, which means those <math>b</math>s are followed by an equal number of <math>c</math>s.
== Disadvantages ==
|