Operator-precedence parser: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 01:54, 7 November 2020 edit Kr5t (talk \| contribs) 406 edits Create section on Pratt parsing, which is a generalization of precedence climbing, as noted by the person who coined the term "precedence climbing", Thomas Novell ← Previous edit		Latest revision as of 20:16, 15 August 2025 edit undo InternetArchiveBot (talk \| contribs) Bots, Pending changes reviewers 5,674,603 edits Rescuing 1 sources and tagging 0 as dead.) #IABot (v2.0.9.5
(33 intermediate revisions by 25 users not shown)
Line 1: {{Short description\|Bottom-up parser that interprets an operator-precedence grammar}} In [[computer science]], an '''operator -precedence parser''' is a [[Bottom-up parsing\|bottom-up parser]] that interprets an [[operator-precedence grammar]]. For example, most [[calculator]]s use operator -precedence parsers to convert from the human-readable [[infix notation]] relying on [[order of operations]] to a format that is optimized for evaluation such as [[Reverse Polish notation]] (RPN). [[Edsger Dijkstra]]'s [[shunting yard algorithm]] is commonly used to implement operator -precedence parsers. == Relationship to other parsers == Line 7 ⟶ 8: An operator-precedence parser is a simple [[shift-reduce parser]] that is capable of parsing a subset of [[LR parser\|LR(1)]] grammars. More precisely, the operator-precedence parser can parse all LR(1) grammars where two consecutive [[nonterminal]]s and [[empty string\|epsilon]] never appear in the right-hand side of any rule. Operator-precedence parsers are not used often in practice; however they do have some properties that make them useful within a larger design. First, they are simple enough to write by hand, which is not generally the case with more sophisticated right shift-reduce parsers. Second, they can be written to consult an operator table at [[Run time (program lifecycle phase)\|run time]], which makes them suitable for languages that can add to or change their operators while parsing. (An example is [[Haskell (programming language)\|Haskell]], which allows user-defined infix operators with custom associativity and precedence; ~~consequentially~~consequently, an operator-precedence parser must be run on the program ''after'' parsing of all referenced modules.) [[Raku (programming language)\|Raku]] sandwiches an operator-precedence parser between two [[recursive descent parser]]s in order to achieve a balance of speed and dynamism~~. This is expressed in the virtual machine for Raku, [[Parrot virtual machine\|Parrot]], as the [[Parser Grammar Engine]] (PGE)~~. [[GNU Compiler Collection\|GCC]]'s C and C++ parsers, which are hand-coded recursive descent parsers, are both sped up by an operator-precedence parser that can quickly examine arithmetic expressions. Operator -precedence parsers are also embedded within [[compiler -compiler]]-generated parsers to noticeably speed up the recursive descent approach to expression parsing.<ref name="Harwell2008">{{cite web\| last=Harwell \| first=Sam \| date=2008-08-29 \| title=Operator precedence parser \| url=https://theantlrguy.atlassian.net/wiki/spaces/ANTLR3/pages/2687077/Operator+precedence+parser \| publisher=ANTLR3 Wiki \| ~~accessdate~~access-date=2017-10-25}}</ref> == Precedence climbing method == Line 16 ⟶ 17: An infix-notation expression grammar in [[EBNF]] format will usually look like this: <syntaxhighlight lang="~~ebnf~~abnf"> expression ::= equality-expression equality-expression ::= additive-expression ( ( '==' \| '!=' ) additive-expression ) * Line 34 ⟶ 35: The pseudocode for the algorithm is as follows. The parser starts at function ''parse_expression''. Precedence levels are greater than or equal to 0. <u>parse_expression()</u> '''return''' parse_expression_1(parse_primary(), 0) <u>parse_expression_1(lhs, min_precedence)</u> ''lookahead'' := peek next token '''while''' ''lookahead'' is a binary operator whose precedence is >= ''min_precedence'' Line 47 ⟶ 48: than ''op''<nowiki>'</nowiki>s, or a right-associative operator whose precedence is equal to ''op'''s ''rhs'' := ''parse_expression_1'' (''rhs'', precedence of ''op'' + (1 if ''lookahead's' precedence is greater, else 0)) ''lookahead'' := peek next token ''lhs'' := the result of applying ''op'' with operands ''lhs'' and ''rhs'' Line 54 ⟶ 55: Note that in the case of a production rule like this (where the operator can only appear once): <syntaxhighlight lang="~~bnf~~abnf"> equality-expression ::= additive-expression ( '==' \| '!=' ) additive-expression </syntaxhighlight> Line 92 ⟶ 93: == Pratt parsing == {{external links\|section\|date=August 2023}} Another precedence parser known as Pratt parsing was first described by ~~Vaughn~~[[Vaughan Pratt]] in the 1973 paper "Top ~~down~~Down ~~operator~~Operator ~~precedence~~Precedence",<ref>Pratt, Vaughan. "[https://web.archive.org/web/20151223215421/http://hall.org.ua/halls/wizzard/pdf/Vaughan.Pratt.TDOP.pdf Top ~~down~~Down ~~operator~~Operator ~~precedence~~Precedence]." ''Proceedings of the 1st Annual ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages'' (1973).</ref>, based on [[Recursive descent parser\|recursive descent]]. Though it predates precedence climbing, it can be viewed as a generalization of precedence climbing .<ref>{{cite web \|last1=Norvell \|first1=Theodore \|title=Parsing Expressions by Recursive Descent \|url=http://www.engr.mun.ca/~theo/Misc/pratt_parsing.htm \|website=www.engr.mun.ca \|quote=The purpose of this post is to [... start] with precedence climbing and refactoring it to use the command pattern until we arrive at a Pratt parser. [This is the author who coined the term "precedence climbing".]}}</ref> Pratt designed the parser originally to implement the [[CGOL]] programming language, and it was treated in much more depth in a Masters Thesis under his supervision.<ref>Van De Vanter, Michael L. "[http://publications.csail.mit.edu/lcs/specpub.php?id=715 A Formalization and Correctness Proof of the CGOL Language System]." (Master's Thesis). MIT Laboratory for Computer Science Technical Report MIT-LCS-TR-147 (Cambridge, Massachusetts). 1975.</ref> Line 100 ⟶ 101: * [[Douglas Crockford]] based the JavaScript parser in [[JSLint]] on Pratt parsing.<ref>{{cite web\|author=Crockford, D\|title=Top Down Operator Precedence\|date=2007-02-21\|url=http://crockford.com/javascript/tdop/tdop.html}}</ref> * Comparison between Python implementations of precedence climbing and Pratt parsing: [http://www.oilshell.org/blog/2016/11/01.html "Pratt Parsing and Precedence Climbing Are the Same Algorithm" (2016) by Andy Chu] * Tutorial using [[~~Python~~Rust (programming language)\|~~Python~~Rust]]: [~~http~~https://~~effbot~~matklad.~~org~~github.io/2020/04/~~zone~~13/simple-~~top~~but-~~down~~powerful-pratt-parsing.~~htm~~html "Simple ~~Top-Down~~but ~~Parsing~~Powerful inPratt ~~Python~~Parsing" (~~2008~~2020) by ~~Fredrik~~Aleksey ~~Lundh~~Kladov] * Tutorial using [[Rust (programming language)\|Rust]]: [https://williamr.dev/posts/pratt-parsing/ "The Pratt Parsing Technique" (2024) by William Rågstad] * Tutorial using [[Python (programming language)\|Python]]: [http://effbot.org/zone/simple-top-down-parsing.htm "Simple Top-Down Parsing in Python" (2008) by Fredrik Lundh] {{Webarchive\|url=https://web.archive.org/web/20150228044653/http://effbot.org/zone/simple-top-down-parsing.htm \|date=2015-02-28 }} * Tutorial using [[Java (programming language)\|Java]]: [http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/ "Pratt Parsers: Expression Parsing Made Easy" (2011) by Bob Nystrom, author of Crafting Interpreters] * Implementation in [[C Sharp (programming language)\|C#]]: [https://github.com/atifaziz/Gratt "Gratt: A Generic Vaughn Pratt's top-down operator precedence parser for .NET Standard"] (a [[Generic programming\|generic]] version inspired by the Java implementation presented by Bob Nystrom in [http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/ "Pratt Parsers: Expression Parsing Made Easy"]) == Alternative methods == Line 109 ⟶ 113: Such trees do not necessarily need to be implemented using data structures conventionally used for trees. Instead, tokens can be stored in flat structures, such as tables, by simultaneously building a priority list which states what elements to process in which order. === Full parenthesization === Another approach is to first fully parenthesize the expression, inserting a number of parentheses around each operator, such that they lead to the correct precedence even when parsed with a linear, left-to-right parser. This algorithm was used in the early [[Fortran#FORTRAN\|FORTRAN I]] compiler:<ref name="Padua2000">{{cite journal \|last1=Padua \|first1=David \|title=The Fortran I Compiler \|year=2000 \|journal=Computing in Science & Engineering \|volume=2 \|issue=1 \|pages=70–75 \|url=http://polaris.cs.uiuc.edu/publications/c1070.pdf \|doi=10.1109/5992.814661 \|bibcode=2000CSE.....2a..70P \|archive-date=2020-06-17 \|access-date=2016-03-29 \|archive-url=https://web.archive.org/web/20200617113640/http://polaris.cs.uiuc.edu/publications/c1070.pdf \|url-status=dead }}</ref> <blockquote> Line 119 ⟶ 124: Although not obvious, the algorithm was correct, and, in the words of [[Donald Knuth\|Knuth]], “The resulting formula is properly parenthesized, believe it or not.”<ref name="Knuth1962">{{cite journal \|last1=Knuth \|first1=Donald E. \|title=A HISTORY OF WRITING COMPILERS \|year=1962 \|publisher=Edmund C. Berkeley \|journal=Computers and Automation \|volume=11 \|issue=12 \|pages=8–14 \|url=https://archive.org/details/bitsavers_computersA_13990695}}</ref> </blockquote> {{missing information\|section\|why parenthesization works\|date=May 2023}} Example code of a simple C application that handles parenthesisation of basic math operators (<code>+</code>, <code>-</code>, <code></code>, <code>/</code>, <code>^</code>, <code>(</code> and <code>)</code><!-- parentheses -->): Line 125 ⟶ 131: #include <string.h> // The command-line argument boundary is our lexer. int main(int argc, char argv[]) { int i; printf("(((("); for (i=1; i!=argc; i++) { // strlen(argv[i]) == 2 if (argv[i] && !argv[i][1]) { switch (argv[i]) { case '(': printf("(((("); continue; case ')': printf("))))"); continue; case '^': printf(")^("); continue; case '': printf("))(("); continue; case '/': printf("))/(("); continue; case '+': // unary check: either first or had an operator expecting secondary argument if (i == 1 \|\| strchr("(^/+-", argv[i-1])) printf("+"); Line 157 ⟶ 166: </syntaxhighlight> First, you need to compile your program. Assuming your program is written in C and the source code is in a file named program.c, you would use the following command: ~~For example, when compiled and invoked from the command line with parameters~~ <pre>~~a b +~~gcc program.c ~~^ d /~~-o eprogram</pre> The above command tells gcc to compile program.c and create an executable named program. Command to run the program with parameters, For example; a * b + c ^ d / e <pre>./program a '' b + c '^' d / e</pre> it produces <pre>((((a))((b)))+(((c)^(d))/((e))))</pre> Line 170 ⟶ 184: ==References== {{Reflist}} ~~<references/>~~ ==External links== * {{cite web\| last=Clarke \| first=Keith \| date=1992-05-26 \| title=Re: compact recursive-descent parsing of expressions \| url=http://groups.google.com/group/comp.compilers/browse_thread/thread/aee24b9504d527ca/1210ef8ae529accd?#1210ef8ae529accd \| ~~accessdate~~access-date=2012-01-24}} * [http://groups.google.com/group/comp.compilers/browse_thread/thread/442ca111b2dfb38d/5390237263b3625b?#5390237263b3625b Example C++ code by Keith Clarke for parsing infix expressions using the precedence climbing method] * {{cite journal\| last=Samelson \| first=Klaus \| ~~authorlink~~author-link=Klaus Samelson \|author2=[[Friedrich L. Bauer]] \|author2-link=Friedrich L. Bauer \|date=February 1960 \| title=Sequential formula translation \| journal=Communications of the ACM \| volume=3 \| issue=2 \| pages=76–83 \| doi=10.1145/366959.366968\| doi-access=free }} * [https://web.archive.org/web/20160817143616/http://radovan-augustin.info/techpapers/parser/Parser.pdf Parser for expression with infix notation using precedence lists] {{Parsers}}