Operator-precedence parser: Difference between revisions

Content deleted Content added
Kr5t (talk | contribs)
Create section on Pratt parsing, which is a generalization of precedence climbing, as noted by the person who coined the term "precedence climbing", Thomas Novell
Rescuing 1 sources and tagging 0 as dead.) #IABot (v2.0.9.5
 
(33 intermediate revisions by 25 users not shown)
Line 1:
{{Short description|Bottom-up parser that interprets an operator-precedence grammar}}
In [[computer science]], an '''operator -precedence parser''' is a [[Bottom-up parsing|bottom-up parser]] that interprets an [[operator-precedence grammar]]. For example, most [[calculator]]s use operator -precedence parsers to convert from the human-readable [[infix notation]] relying on [[order of operations]] to a format that is optimized for evaluation such as [[Reverse Polish notation]] (RPN).
 
[[Edsger Dijkstra]]'s [[shunting yard algorithm]] is commonly used to implement operator -precedence parsers.
 
== Relationship to other parsers ==
Line 7 ⟶ 8:
An operator-precedence parser is a simple [[shift-reduce parser]] that is capable of parsing a subset of [[LR parser|LR(1)]] grammars. More precisely, the operator-precedence parser can parse all LR(1) grammars where two consecutive [[nonterminal]]s and [[empty string|epsilon]] never appear in the right-hand side of any rule.
 
Operator-precedence parsers are not used often in practice; however they do have some properties that make them useful within a larger design. First, they are simple enough to write by hand, which is not generally the case with more sophisticated right shift-reduce parsers. Second, they can be written to consult an operator table at [[Run time (program lifecycle phase)|run time]], which makes them suitable for languages that can add to or change their operators while parsing. (An example is [[Haskell (programming language)|Haskell]], which allows user-defined infix operators with custom associativity and precedence; consequentiallyconsequently, an operator-precedence parser must be run on the program ''after'' parsing of all referenced modules.)
 
[[Raku (programming language)|Raku]] sandwiches an operator-precedence parser between two [[recursive descent parser]]s in order to achieve a balance of speed and dynamism. This is expressed in the virtual machine for Raku, [[Parrot virtual machine|Parrot]], as the [[Parser Grammar Engine]] (PGE). [[GNU Compiler Collection|GCC]]'s C and C++ parsers, which are hand-coded recursive descent parsers, are both sped up by an operator-precedence parser that can quickly examine arithmetic expressions. Operator -precedence parsers are also embedded within [[compiler -compiler]]-generated parsers to noticeably speed up the recursive descent approach to expression parsing.<ref name="Harwell2008">{{cite web| last=Harwell | first=Sam | date=2008-08-29 | title=Operator precedence parser | url=https://theantlrguy.atlassian.net/wiki/spaces/ANTLR3/pages/2687077/Operator+precedence+parser | publisher=ANTLR3 Wiki | accessdateaccess-date=2017-10-25}}</ref>
 
== Precedence climbing method ==
Line 16 ⟶ 17:
An infix-notation expression grammar in [[EBNF]] format will usually look like this:
 
<syntaxhighlight lang="ebnfabnf">
expression ::= equality-expression
equality-expression ::= additive-expression ( ( '==' | '!=' ) additive-expression ) *
Line 34 ⟶ 35:
The pseudocode for the algorithm is as follows. The parser starts at function ''parse_expression''. Precedence levels are greater than or equal to 0.
 
<u>parse_expression()</u>
'''return''' parse_expression_1(parse_primary(), 0)
 
<u>parse_expression_1(lhs, min_precedence)</u>
''lookahead'' := peek next token
'''while''' ''lookahead'' is a binary operator whose precedence is >= ''min_precedence''
Line 47 ⟶ 48:
than ''op''<nowiki>'</nowiki>s, or a right-associative operator
whose precedence is equal to ''op'''s
''rhs'' := ''parse_expression_1'' (''rhs'', precedence of ''op'' + (1 if ''lookahead's' precedence is greater, else 0))
''lookahead'' := peek next token
''lhs'' := the result of applying ''op'' with operands ''lhs'' and ''rhs''
Line 54 ⟶ 55:
Note that in the case of a production rule like this (where the operator can only appear once):
 
<syntaxhighlight lang="bnfabnf">
equality-expression ::= additive-expression ( '==' | '!=' ) additive-expression
</syntaxhighlight>
Line 92 ⟶ 93:
 
== Pratt parsing ==
{{external links|section|date=August 2023}}
 
Another precedence parser known as Pratt parsing was first described by Vaughn[[Vaughan Pratt]] in the 1973 paper "Top downDown operatorOperator precedencePrecedence",<ref>Pratt, Vaughan. "[https://web.archive.org/web/20151223215421/http://hall.org.ua/halls/wizzard/pdf/Vaughan.Pratt.TDOP.pdf Top downDown operatorOperator precedencePrecedence]." ''Proceedings of the 1st Annual ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages'' (1973).</ref>, based on [[Recursive descent parser|recursive descent]]. Though it predates precedence climbing, it can be viewed as a generalization of precedence climbing .<ref>{{cite web |last1=Norvell |first1=Theodore |title=Parsing Expressions by Recursive Descent |url=http://www.engr.mun.ca/~theo/Misc/pratt_parsing.htm |website=www.engr.mun.ca |quote=The purpose of this post is to [... start] with precedence climbing and refactoring it to use the command pattern until we arrive at a Pratt parser. [This is the author who coined the term "precedence climbing".]}}</ref>
 
Pratt designed the parser originally to implement the [[CGOL]] programming language, and it was treated in much more depth in a Masters Thesis under his supervision.<ref>Van De Vanter, Michael L. "[http://publications.csail.mit.edu/lcs/specpub.php?id=715 A Formalization and Correctness Proof of the CGOL Language System]." (Master's Thesis). MIT Laboratory for Computer Science Technical Report MIT-LCS-TR-147 (Cambridge, Massachusetts). 1975.</ref>
Line 100 ⟶ 101:
* [[Douglas Crockford]] based the JavaScript parser in [[JSLint]] on Pratt parsing.<ref>{{cite web|author=Crockford, D|title=Top Down Operator Precedence|date=2007-02-21|url=http://crockford.com/javascript/tdop/tdop.html}}</ref>
* Comparison between Python implementations of precedence climbing and Pratt parsing: [http://www.oilshell.org/blog/2016/11/01.html "Pratt Parsing and Precedence Climbing Are the Same Algorithm" (2016) by Andy Chu]
* Tutorial using [[PythonRust (programming language)|PythonRust]]: [httphttps://effbotmatklad.orggithub.io/2020/04/zone13/simple-topbut-downpowerful-pratt-parsing.htmhtml "Simple Top-Downbut ParsingPowerful inPratt PythonParsing" (20082020) by FredrikAleksey LundhKladov]
* Tutorial using [[Rust (programming language)|Rust]]: [https://williamr.dev/posts/pratt-parsing/ "The Pratt Parsing Technique" (2024) by William Rågstad]
* Tutorial using [[Python (programming language)|Python]]: [http://effbot.org/zone/simple-top-down-parsing.htm "Simple Top-Down Parsing in Python" (2008) by Fredrik Lundh] {{Webarchive|url=https://web.archive.org/web/20150228044653/http://effbot.org/zone/simple-top-down-parsing.htm |date=2015-02-28 }}
* Tutorial using [[Java (programming language)|Java]]: [http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/ "Pratt Parsers: Expression Parsing Made Easy" (2011) by Bob Nystrom, author of Crafting Interpreters]
* Implementation in [[C Sharp (programming language)|C#]]: [https://github.com/atifaziz/Gratt "Gratt: A Generic Vaughn Pratt's top-down operator precedence parser for .NET Standard"] (a [[Generic programming|generic]] version inspired by the Java implementation presented by Bob Nystrom in [http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/ "Pratt Parsers: Expression Parsing Made Easy"])
 
== Alternative methods ==
Line 109 ⟶ 113:
Such trees do not necessarily need to be implemented using data structures conventionally used for trees. Instead, tokens can be stored in flat structures, such as tables, by simultaneously building a priority list which states what elements to process in which order.
 
=== Full parenthesization ===
Another approach is to first fully parenthesize the expression, inserting a number of parentheses around each operator, such that they lead to the correct precedence even when parsed with a linear, left-to-right parser. This algorithm was used in the early [[Fortran#FORTRAN|FORTRAN I]] compiler:<ref name="Padua2000">{{cite journal |last1=Padua |first1=David |title=The Fortran I Compiler |year=2000 |journal=Computing in Science & Engineering |volume=2 |issue=1 |pages=70–75 |url=http://polaris.cs.uiuc.edu/publications/c1070.pdf |doi=10.1109/5992.814661 |bibcode=2000CSE.....2a..70P |archive-date=2020-06-17 |access-date=2016-03-29 |archive-url=https://web.archive.org/web/20200617113640/http://polaris.cs.uiuc.edu/publications/c1070.pdf |url-status=dead }}</ref>
 
<blockquote>
Line 119 ⟶ 124:
Although not obvious, the algorithm was correct, and, in the words of [[Donald Knuth|Knuth]], “The resulting formula is properly parenthesized, believe it or not.”<ref name="Knuth1962">{{cite journal |last1=Knuth |first1=Donald E. |title=A HISTORY OF WRITING COMPILERS |year=1962 |publisher=Edmund C. Berkeley |journal=Computers and Automation |volume=11 |issue=12 |pages=8–14 |url=https://archive.org/details/bitsavers_computersA_13990695}}</ref>
</blockquote>
{{missing information|section|why parenthesization works|date=May 2023}}
 
Example code of a simple C application that handles parenthesisation of basic math operators (<code>+</code>, <code>-</code>, <code>*</code>, <code>/</code>, <code>^</code>, <code>(</code> and <code>)</code><!-- parentheses -->):
Line 125 ⟶ 131:
#include <string.h>
 
// The command-line argument boundary is our lexer.
int main(int argc, char *argv[]) {
int i;
printf("((((");
for (i=1; i!=argc; i++) {
// strlen(argv[i]) == 2
if (argv[i] && !argv[i][1]) {
switch (*argv[i]) {
case '(': printf("(((("); continue;
case ')': printf("))))"); continue;
case '^': printf(")^("); continue;
case '*': printf("))*(("); continue;
case '/': printf("))/(("); continue;
case '+':
// unary check: either first or had an operator expecting secondary argument
if (i == 1 || strchr("(^*/+-", *argv[i-1]))
printf("+");
Line 157 ⟶ 166:
</syntaxhighlight>
 
First, you need to compile your program. Assuming your program is written in C and the source code is in a file named program.c, you would use the following command:
For example, when compiled and invoked from the command line with parameters
<pre>a * b +gcc program.c ^ d /-o eprogram</pre>
The above command tells gcc to compile program.c and create an executable named program.
 
Command to run the program with parameters, For example; a * b + c ^ d / e
<pre>./program a '*' b + c '^' d / e</pre>
 
it produces
<pre>((((a))*((b)))+(((c)^(d))/((e))))</pre>
Line 170 ⟶ 184:
 
==References==
{{Reflist}}
<references/>
 
==External links==
* {{cite web| last=Clarke | first=Keith | date=1992-05-26 | title=Re: compact recursive-descent parsing of expressions | url=http://groups.google.com/group/comp.compilers/browse_thread/thread/aee24b9504d527ca/1210ef8ae529accd?#1210ef8ae529accd | accessdateaccess-date=2012-01-24}}
* [http://groups.google.com/group/comp.compilers/browse_thread/thread/442ca111b2dfb38d/5390237263b3625b?#5390237263b3625b Example C++ code by Keith Clarke for parsing infix expressions using the precedence climbing method]
* {{cite journal| last=Samelson | first=Klaus | authorlinkauthor-link=Klaus Samelson |author2=[[Friedrich L. Bauer]] |author2-link=Friedrich L. Bauer |date=February 1960 | title=Sequential formula translation | journal=Communications of the ACM | volume=3 | issue=2 | pages=76–83 | doi=10.1145/366959.366968| doi-access=free }}
* [https://web.archive.org/web/20160817143616/http://radovan-augustin.info/techpapers/parser/Parser.pdf Parser for expression with infix notation using precedence lists]
 
{{Parsers}}