Parsing expression grammar: Difference between revisions

Content deleted Content added
See also: Moving link from "Semantics" section
Memory consumption: Rewritten to not give undue weight to nesting depth parameter.
Line 175:
|archive-date= 28 July 2011
|date=10 March 2010
}}</ref> to eliminate redundant parsing steps. Packrat parsing requires internal storage proportional to the total input size, rather than to the depth of the parse tree as with LR parsers. Whether this is a significant difference depends on circumstances; if parsing is a service provided as a [[Function (computer programming)|function]] then the parser will have stored the full parse tree up until returning it, and already that parse tree will typically be of size proportional to the total input size. If parsing is instead provided as a [[Generator (computer programming)|generator]] then one might get away with only keeping parts of the parse tree in memory, but the feasibility of this depends on the grammar. A parsing expression grammar can be designed so that only after consuming the full input will the parser discover that it needs to backtrack to the beginning,<ref>For example, there could at the very end of input be a directive to the effect that “in this file, comma is a [[decimal separator]], so all those function calls f(3,14*r) you thought had two arguments? They don't. Now go back to the start of input and parse it all again.” Arguably that would be a poor design of the input language, but the point is that parsing expression grammars are powerful enough to handle this, purely as a matter of syntax.</ref> which again could require storage proportional to total input size.
}}</ref> to eliminate redundant parsing steps. Packrat parsing requires storage proportional to the total input size, rather than the depth of the parse tree as with LR parsers. This is a significant difference in many domains: for example, hand-written source code has an effectively constant expression nesting depth independent of the length of the program—expressions nested beyond a certain depth tend to get refactored.
 
For recursive grammars and some inputs, the depth of the parse tree can be proportional to the input size,<ref>for example, the LISP expression (x (x (x (x ....))))</ref> so both an LR parser and a packrat parser will appear to have the same worst-case asymptotic performance. AHowever in many domains, for example hand-written source code, the expression nesting depth has an effectively constant bound quite independent of the length of the program, because expressions nested beyond a certain depth tend to get [[refactor]]ed. When it is not necessary to keep the full parse tree, a more accurate analysis would take the depth of the parse tree into account separately from the input size. <ref>This is similar to a situation which arises in [[graph algorithms]]: the [[Bellman–Ford algorithm]] and [[Floyd–Warshall algorithm]] appear to have the same running time (<math>O(|V|^3)</math>) if only the number of vertices is considered. However, a more precise analysis which accounts for the number of edges as a separate parameter assigns the [[Bellman–Ford algorithm]] a time of <math>O(|V|*|E|)</math>, which is quadratic for sparse graphs with <math>|E| \in O(|V|)</math>.</ref>
For some grammars and some inputs, the depth of the parse tree can be proportional to the input size,<ref>for example, the LISP expression (x (x (x (x ....))))</ref>
so both an LR parser and a packrat parser will appear to have the same worst-case asymptotic performance. A more accurate analysis would take the depth of the parse tree into account separately from the input size. This is similar to a situation which arises in [[graph algorithms]]: the [[Bellman–Ford algorithm]] and [[Floyd–Warshall algorithm]] appear to have the same running time (<math>O(|V|^3)</math>) if only the number of vertices is considered. However, a more precise analysis which accounts for the number of edges as a separate parameter assigns the [[Bellman–Ford algorithm]] a time of <math>O(|V|*|E|)</math>, which is quadratic for sparse graphs with <math>|E| \in O(|V|)</math>.
 
=== Indirect left recursion ===