Recursive ascent parser: Difference between revisions

Content deleted Content added
Djspiewak (talk | contribs)
No edit summary
avoid unnec redirect, link scope
 
(44 intermediate revisions by 32 users not shown)
Line 1:
RecursiveIn [[computer science]], '''recursive ascent parsing''' is a technique for implementing an [[LALR]]LR parser]] which uses mutually-recursive functions rather than tables. Thus, the parser is ''directly encoded'' in the host language similar to [[recursive descent]]. Direct encoding usually yields a parser which is faster than its table-driven equivalent<ref>{{cite webjournal|title=Very fast LR parsing|author=Thomas J Penello|urljournal=http://portalACM SIGPLAN Notices |year=1986|volume=21 |issue=7 |pages=145–151 |doi=10.acm.org1145/citation.cfm?id=13310.13326 |doi-access=free }}</ref> for the same reason that compilation is faster than interpretation. It is also (nominally) possible to hand edit a recursive ascent parser, whereas a table-tabular implementation is nigh unreadable to the average human.
 
Recursive ascent was first described by Thomas PenelloPennello in his article {{cite webbook|titlechapter=Very fast LR parsing| doi=10.1145/12276.13326 |chapter-url=http://portal.acm.org/citation.cfm?id=13310.13326| title=Proceedings of the 1986 SIGPLAN symposium on Compiler construction - SIGPLAN '86 | year=1986 | last1=Pennello | first1=Thomas J. | pages=145–151 | isbn=0897911970 | s2cid=17214407 }} in 1986. He was not intending to create a hand-editable implementation of an LR parser, but rather a maintainable and efficient parser implemented in [[assembly language]]. The technique was later expounded upon by G.H. Roberts<ref>{{cite webjournal|title=Recursive ascent: an LR analog to recursive descent|year=1988|author=G.H. Roberts|urljournal=http://portalACM SIGPLAN Notices |volume=23 |issue=8 |pages=23–29 |doi=10.acm.org1145/citation.cfm?id=47907.47909 |s2cid=12740771 |doi-access=free }}</ref> in 1988 as well as in an article by Leermakers, Augusteijn, Kruseman Aretz<ref>{{cite webjournal|title=A functional LR parser|author=Leermakers, Augusteijn, Kruseman Aretz|urljournal=http://portalTheoretical Computer Science |year=1992|volume=104 |issue=2 |pages=313–323 |doi=10.acm.org1016/citation.cfm?id0304-3975(92)90128-3 |doi-access=146986.146994free }}</ref> in 1992 as part ofin the textbook,journal ''Theoretical Computer Science'' also by Leermakers and Augusteijn. An extremely readable description of the technique was written by Morell and Middleton<ref>{{cite webnews|title=Recursive-ascent parsing|authorauthor1=Larry Morell and |author2=David Middleton |name-list-style=amp |year=2003|journal=Journal of Computing Sciences in Colleges|volume=18|issue=6|pages=186–201|url=http://portal.acm.org/citation.cfm?id=770849}}</ref> in 2003. A good exposition can also be found in a TOPLAS article by Sperber and Thiemann.<ref>{{cite journal|title=Generation of LR parsers by partial evaluation|author=Sperber and Thiemann|journal=ACM Transactions on Programming Languages and Systems |year=2000|volume=22 |issue=2 |pages=224–264 |doi=10.1145/349214.349219 |s2cid=14955687 |doi-access=free }}</ref>
 
Recursive ascent has also been merged with recursive descent, yielding a technique known as [[recursive ascent/descent]]. This implementation technique is arguably easier to hand-edit due to the reduction in states and fact that some of these states are more intuitively top-down rather than bottom up. It can also yield some minimal performance improvements over conventional recursive ascent.<ref>{{cite web|title=ScalaBison Recursive Ascent-Descent Parser Generator|author1=John Boyland |author2=Daniel Spiewak |name-list-style=amp |year=2009|url=http://www.cs.uwm.edu/~boyland/papers/scala-bison.pdf|archive-url=https://web.archive.org/web/20090718161230/http://www.cs.uwm.edu/~boyland/papers/scala-bison.pdf|archive-date=2009-07-18|url-status=dead}}</ref>
 
== Summary ==
 
Intuitively, recursive ascent is a literal implementation of the [[LR parser|LR parsing]] concept. Each function in the parser represents a single LR [[Finite -state machine|automaton]] state]]. Within each function, a multi-branch statement is used to select the appropriate action based on the current token poppedread offfrom the input stackstream. Once the token has been identified, action is taken based on the state being encoded. There are two different fundamental actions which may be taken based on the token in question:
 
* '''Shift''' - Encoded as a function call, effectively jumping to a new automaton state.
* '''Reduce''' - Encoded differently according to the [[semantic action routine]] for the relevant [[Formal grammar#The syntax of grammars|production]]. The result of this routine is wrapped in an [[Algebraic data type|ADT]] which is returned to the caller. The reduce action must also record the number of tokens which were shifted ''prior'' to the reduce, passing this value back to the caller along with the reduce value. This shift counter determines at which point up the call stack the reduce should be handled.
 
There is also a third LR automaton action which may be taken in a given state, but only after a reduce where the shift counter has decremented to zero (indicating that the current state should handle the result). This is the '''goto''' action, which is essentially a special case of '''shift''' designed to handle non-terminals in a production. This action must be handled ''after'' the multi-branch statement, since this is where any reduction results will "resurface" from farther down the call stack.
Line 14 ⟶ 16:
== Example ==
 
Consider the following grammar in [[GNU Bison | bison]] syntax:
 
<pre>expr : expr '+' term { $$ = $1 + $3; }
Line 29 ⟶ 31:
;</pre>
 
This grammar is [[LR parsing#A note about LR(0) versus SLR and LALR parsing parser| LR(0)]] in that it is left-recursive (in the '''expr''' non-terminal) but does not require any lookahead. Recursive ascent is also capable of handling grammars which are LALR(1) in much the same way that table-driven parsers handle such cases (by pre-computing conflict resolutions based on possible lookahead).
 
The following is a [[Scala (programming language) | Scala]] implementation of a recursive ascent parser based on the above grammar:
 
<syntaxhighlight lang="scala">
<pre>object ExprParser {
object ExprParser {
private type Result = (NonTerminal, Int)
Line 48 ⟶ 51:
def this(c: Char) = this(c.toString)
}
object #:: {
def unapply[A](str: Stream[A]) = str match {
case Stream.cons(hd, tail) => Some((hd, tail))
case Stream() => None
}
}
Line 305 ⟶ 301:
(res, goto - 1)
}
}
}</pre>
</syntaxhighlight>
 
The following is a [[Prolog]] implementation of a recursive ascent parser based on the above grammar:
== See Also ==
<syntaxhighlight lang="prolog">
state(S), [S] --> [S].
state(S0, S), [S] --> [S0].
 
* [[Recursive descent parser]]
* [[Recursive ascent/descent parser]]
 
/*
=== External ===
0. S --> E$
1. E --> E + T
2. E --> E - T
3. E --> T
4. T --> (E)
5. T --> N
6. N --> 0
7. N --> 1
*/
accept --> state(s([], [e(_)])).
r(1) --> state(s(Ts, [t(A1), '+', e(A0)|Ss]), s(Ts, [e(A0+A1)|Ss])).
r(2) --> state(s(Ts, [t(A1), '-', e(A0)|Ss]), s(Ts, [e(A0-A1)|Ss])).
r(3) --> state(s(Ts, [t(A)|Ss]), s(Ts, [e(A)|Ss])).
r(4) --> state(s(Ts, [')', e(A), '('|Ss]), s(Ts, [t(A)|Ss])).
r(5) --> state(s(Ts, [n(A)|Ss]), s(Ts, [t(A)|Ss])).
r(6) --> state(s(Ts, ['0'|Ss]), s(Ts, [n(0)|Ss])).
r(7) --> state(s(Ts, ['1'|Ss]), s(Ts, [n(1)|Ss])).
t(T) --> state(s([T|Ts], Ss), s(Ts, [T|Ss])).
 
* [http://compilers.iecc.com/comparch/article/93-05-016 Parsing techniques]
* [http://www.aclweb.org/anthology-new/E/E91/E91-1012.pdf Non-Deterministic Recursive Ascent Parsing], Renee Leermakers. 1991 conference on European chapter of the Association for Computational Linguistics.
 
/*
=== References ===
S --> .E$
E --> .E + T
E --> .E - T
E --> .T
T --> .(E)
T --> .N
N --> .0
N --> .1
*/
s0 --> t('('), s3, s2, s1.
s0 --> t('0'), s11, s10, s2, s1.
s0 --> t('1'), s12, s10, s2, s1.
 
/*
S --> E.$
E --> E. + T
E --> E. - T
*/
s1 --> accept.
s1 --> t('+'), s7, s1.
s1 --> t('-'), s8, s1.
 
/*
E --> T.
*/
s2 --> r(3).
 
/*
T --> (.E)
E --> .E + T
E --> .E - T
E --> .T
T --> .(E)
T --> .N
N --> .0
N --> .1
*/
s3 --> t('('), s3, s2, s4.
s3 --> t('0'), s11, s10, s2, s4.
s3 --> t('1'), s12, s10, s2, s4.
 
/*
T --> (E.)
E --> E .+ T
E --> E .- T
*/
s4 --> t(')'), s9.
s4 --> t('+'), s7, s4.
s4 --> t('-'), s8, s4.
 
/*
E --> E + T.
*/
s5 --> r(1).
 
/*
E --> E - T.
*/
s6 --> r(2).
 
/*
E --> E + .T
T --> .(E)
T --> .N
N --> .0
N --> .1
*/
s7 --> t('('), s3, s5.
s7 --> t('0'), s11, s10, s5.
s7 --> t('1'), s12, s10, s5.
 
/*
E --> E - .T
T --> .(E)
T --> .N
N --> .0
N --> .1
*/
s8 --> t('('), s3, s6.
s8 --> t('0'), s11, s10, s6.
s8 --> t('1'), s12, s10, s6.
 
/*
T --> (E).
*/
s9 --> r(4).
 
/*
T --> N.
*/
s10 --> r(5).
 
/*
N --> '0'.
*/
s11 --> r(6).
 
/*
N --> '1'.
*/
s12 --> r(7).
 
parser(Cs, T) :-
length(Cs, _),
phrase(s0, [s(Cs, [])], [s([], [e(T)])]).
 
% state(S0, S), [S] --> [S0, S].
%?- S0 = [s("(1+1)", [])|_], phrase(s0, S0, [S]), maplist(portray_clause, S0).
</syntaxhighlight>
 
==See also==
 
* [[Recursive descent parser]]
 
== References ==
{{reflist}}
 
{{Parsers}}
 
[[Category:Parsing algorithms]]