Operator-precedence parser: Difference between revisions

Content deleted Content added
GCC uses operator-precedence parsers too
Add example of putting an operator-precedence parser in the middle of a recursive-descent parser
Line 10:
 
[[Perl 6]] "sandwiches" an operator-precedence parser in between two [[Recursive descent parser]]s in order to achieve a balance of speed and dynamism. [[GNU Compiler Collection|GCC]]'s C and C++ parsers, which are hand-coded [[recursive descent parser]]s, are both sped up by an operator-precedence parser that can quickly examine arithmetic expressions.
 
== Example algorithm to parse infix notation ==
 
An [EBNF] grammar that parses infix notation will usually look like this:
 
<pre>
expression ::= equality-expression
equality-expression ::= additive-expression ( ( '==' | '!=' ) additive-expression ) *
additive-expression ::= multiplicative-expression ( ( '+' | '-' ) multiplicative-expression ) *
multiplicative-expression ::= primary ( ( '*' | '/' ) primary ) *
primary ::= '(' expression ')' | NUMBER | VARIABLE | '-' primary
</pre>
 
With many levels of precedence, implementing this grammar with a [[Recursive descent parser|predictive recursive-descent parser]] can become inefficient. Parsing a number, for example, can require five function calls (one for each non-terminal in the grammar, until we reach <em>primary</em>).
 
An operator-precedence parser can do the same more efficiently. The idea is that we can left associate the arithmetic operations as long as we find operators with the same precedence, but we have to save a temporary result to evaluate higher precedence operators. The algorithm that is presented here does not need an explicit stack: instead, it uses recursive calls to implement the stack.
 
The algorithm is not a pure operator-precedence parser like the Dijkstra [[Reverse Polish Notation#Converting from infix notation|shunting yard]] algorithm. It assumes that the <em>primary</em> nonterminal is parsed in a separate subroutine, like in a [[recursive descent parser]].
 
Precedence levels are greater or equal to 0. The pseudo-code is as follows:
 
parse_expression ()
:* return parse_expression_1 (parse_primary (), 0)
parse_expression_1 (lhs, min_precedence)
:* while the next token is a binary operator whose precedence is >= min_precedence
::* op := next token
::* rhs := parse_primary ()
::* while the next token is a binary operator whose precedence is greater than <em>op</em>'s, or a right-associative operator whose precedence is equal to <em>op</em>'s
:::* lookahead := next token
:::* rhs := parse_expression_1 (rhs, lookahead's precedence)
::* lhs := the result of applying <em>op</em> with operands <em>lhs</em> and <em>rhs</em>
:* return lhs
 
An example execution on the expression 2 + 3 * 4 + 5 == 19 is as follows. We give precedence 0 to equality expressions, 1 to additive expressions, 2 to multiplicative expressions.
 
parse_expression_1 (lhs = 2, min_precedence = 0)
* the next token is +, with precedence 1. the while loop is entered.
* op is + (precedence 1)
* rhs is 3
* the next token is *, with precedence 2. recursive invocation.<br>parse_expression_1 (lhs = 3, min_precedence = 2)
:* the next token is *, with precedence 2. the while loop is entered.
::* op is * (precedence 2)
::* rhs is 4
::* the next token is +, with precedence 1. no recursive invocation.
::* lhs is assigned 3*4 = 12
::* the next token is +, with precedence 1. the while loop is left.
:* 12 is returned.
* the next token is +, with precedence 1. no recursive invocation.
* lhs is assigned 2+12 = 14
* the next token is +, with precedence 1. the while loop is not left.
* op is + (precedence 1)
* rhs is 5
* the next token is ==, with precedence 0. no recursive invocation.
* lhs is assigned 14+5 = 19
* the next token is ==, with precedence 0. the while loop is not left.
* op is == (precedence 0)
* rhs is 19
* the next token is <em>end-of-line</em>, which is not an operator. no recursive invocation.
* lhs is assigned the result of evaluating 19 == 19, for example 1 (as in the C standard).
* the next token is <em>end-of-line</em>, which is not an operator. the while loop is left.
1 is returned.
</pre>
 
[[Category:Parsing algorithms]]