Threaded code: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 06:49, 14 March 2020 edit GünniX (talk \| contribs) Extended confirmed users 337,299 edits m ref name fixed WP:REFNAME Tag: AWB ← Previous edit		Latest revision as of 09:03, 12 August 2025 edit undo 51.179.118.227 (talk) →Indirect threading: Since `ip = &thread`, `ip = &i_pushA` and `ip + 1 = &i_pushB`. `ip = &push` and `ip + 1 = &A`
(48 intermediate revisions by 32 users not shown)
Line 1: {{Short description\|Program whose source code consists entirely of calls to functions}} ~~{{distinguish\|text=[[thread (computing)\|multi-threaded programming]]}}~~ {{confuse\|Multi-threaded programming\|Jump threading}} In [[computer science]], '''threaded code''' is a programming technique where the [[Source code\|code]] has a form that essentially consists entirely of calls to [[subroutine]]s. It is often used in [[compiler]]s, which may generate code in that form or be implemented in that form themselves. The code may be processed by an [[interpreter (computing)\|interpreter]] or it may simply be a sequence of [[machine code]] ~~[[subroutine~~ call~~\|call]]~~ instructions. Threaded code has better [[code density\|density]] than code generated by alternative generation techniques and by alternative [[calling convention]]s. In [[Cache (computing)\|cached]] architectures, it may [[Execution (computing)\|execute]] slightly slower.{{citation needed\|date=February 2012}} However, a program that is small enough to fit in a [[computer processor]]'s [[CPU cache\|cache]] may run faster than a larger program that suffers many [[cache miss]]es.<ref name="tuwien1">{{cite web\|url=http://www.complang.tuwien.ac.at/forth/threading/ \|title=Speed of various interpreter dispatch techniques V2}}</ref> Small programs may also be faster at [[thread ~~switching~~switch]]ing, when other programs have filled the cache. Threaded code is best known for its use in many compilers of [[programming language]]s, such as [[Forth (programming language)\|Forth]], many implementations of [[BASIC]], some implementations of [[COBOL]], early versions of [[B (programming language)\|B]],<ref>Dennis M. Ritchie, [https://www.bell-labs.com/usr/dmr/www/chist.html "The Development of the C Language"], 1993. Quote: "The B compiler on the PDP-7 did not generate machine instructions, but instead 'threaded code' ..."</ref> and other languages for small [[minicomputer]]s and for [[amateur radio satellite]]s.{{citation needed\|date=February 2016}} Line 9 ⟶ 10: ==History== {{Original research section\|date=February 2020}} The common way to make computer programs is to use a [[compiler]] to translate [[~~Source~~source code]] (written in some [[Symbolic language (programming)\|symbolic language]]) to [[machine code]]. The resulting [[executable]] is typically fast but, because it is specific to a [[computer hardware\|hardware]] platform, it isn't portable. A different approach is to generate [[instruction set\|instructions]] for a [[virtual machine]] and to use an [[interpreter (computing)\|interpreter]] on each hardware platform. The interpreter instantiates the virtual machine environment and executes the instructions. Thus the interpreter, compiled to machine code, provides an abstraction layer for "interpreted languages" that only ~~the~~need ~~interpreter~~little ~~must~~compilation to conform to that layer (compilation may be ~~compiled~~confined to generating an [[Abstract Syntax Tree]]) or even need no compilation at all (if the layer is designed to consume raw source code.) Early computers had relatively little memory. For example, most [[Data General Nova]], [[IBM 1130]], and many of the first [[microcomputer]]s had only 4 kB of RAM installed. Consequently, a lot of time was spent trying to find ways to reduce a program's size, to fit in the available memory. One solution is to use an interpreter which reads the symbolic language a bit at a time, and calls functions to perform the actions. As the source code is typically much [[code density\|denser]] than the resulting machine code, this can reduce overall memory use. This was the reason [[Microsoft BASIC]] is an interpreter:{{efn\|[[Dartmouth BASIC]], upon which MS[[Microsoft BASIC]] is ultimately based, was a compiler that ran on mainframe machines.}} its own code had to share the 4 kB memory of machines like the [[Altair 8800]] with the user's source code. A compiler translates from a source language to machine code, so the compiler, source, and output must all be in memory at the same time. In an interpreter, there is no output. ~~Code is created a line at a time, executed, and then discarded.~~ Threaded code is a formatting style for compiled code that minimizes memory use. Instead of writing out every step of an operation at its every occurrence in the program, as was common in [[macro assembler]]s for instance, the compiler writes each common bit of code into a subroutine. Thus, each bit exists in only one place in memory (see "[[Don't repeat yourself]]"). The top-level application in these programs may consist of nothing but subroutine calls. Many of these subroutines, in turn, also consist of nothing but lower-level subroutine calls~~. This technique {{mdashb}}[[code refactoring]]{{mdashb}} remains widely used today, although for different reasons~~. Mainframes and some early microprocessors such as the [[RCA 1802]] required several instructions to call a subroutine. In the top-level application and in many subroutines, that sequence is constantly repeated, with only the subroutine address changing from one call to the next. This means that a program consisting of many function calls may have considerable amounts of repeated code as well. Line 47 ⟶ 48: </ref> Over the years, programmers have created many variations on that "interpreter" or "small selector". The particular address in the list of addresses may be extracted using an index, [[general -purpose register]] or [[pointer (computer programming)\|pointer]]. The addresses may be direct or indirect, contiguous or non-contiguous (linked by pointers), relative or absolute, resolved at compile time or dynamically built. No single variation is "best" for all situations. ==Development== To save space, programmers squeezed the lists of subroutine calls into simple lists of subroutine addresses, and used a small loop to call each subroutine in turn. For example, the following pseudocode uses this technique to add two numbers A and B. In the example, the list is labeled '''thread''' and a variable '''ip''' (Instruction Pointer) tracks our place within the list. Another variable '''sp''' (Stack Pointer) contains an address elsewhere in memory that is available to hold a value temporarily. <~~source~~syntaxhighlight lang="c"> start: ip = &thread // points to the address '&pushA', not the textual label 'thread' Line 69 ⟶ 70: jump top add: ~~addend~~addend1 = --sp // ~~point~~Pop spthe ~~to last~~top value ~~saved~~off onthe stack~~, follow it to copy that value out~~ sp++addend2 = --sp ~~+ addend~~ // ~~copy~~Pop ~~another~~second value ~~out of stack, add, copy sum~~off ~~into~~the stack sp++ = addend1 + addend2 // Add the two values together and store the result on the top of the stack jump top </syntaxhighlight> ~~</source>~~ <!--In this case, decoding the [[bytecode]]s is performed once, during program compilation or program load, so it is not repeated each time an instruction is executed. This can save much time and space when decode and dispatch overhead is large compared to the execution cost. Line 82 ⟶ 84: The calling loop at <code>top</code> is so simple that it can be repeated inline at the end of each subroutine. Control now jumps once, from the end of a subroutine to the start of another, instead of jumping twice via <code>top</code>. For example: <~~source~~syntaxhighlight lang="c"> start: ip = &thread // ip points to &pushA (which points to the first instruction of pushA) Line 98 ⟶ 100: jump ip++ add: ~~addend~~addend1 = --sp // ~~point~~Pop spthe ~~to last~~top value ~~saved~~off onthe stack~~, follow it to copy that value out~~ sp++addend2 = --sp ~~+ addend~~ // ~~copy~~Pop ~~another~~second value ~~out of stack, add, copy sum~~off ~~into~~the stack sp++ = addend1 + addend2 // Add the two values together and store the result on top of the stack jump ip++ </syntaxhighlight> ~~</source>~~ This is called '''direct threaded code''' (DTC). Although the technique is older, the first widely circulated use of the term "threaded code" is probably James R. Bell's 1973 article "Threaded Code".<ref>{{cite journal\|last=Bell\|first=James R.\|title=Threaded code\|journal=Communications of the ACM\|year=1973\|volume=16\|issue=6\|pages=370–372\|doi=10.1145/362248.362270\|s2cid=19042952 \|doi-access=free}}</ref> In 1970, [[Charles H. Moore]] invented a more compact arrangement, '''indirect threaded code''' (ITC), for his Forth virtual machine. Moore arrived at this arrangement because [[Data General Nova\|Nova]] minicomputers had an [[indirection bit]] in every address, which made ITC easy and fast. Later, he said that he found it so convenient that he propagated it into all later Forth designs.<ref>Moore, Charles H., published remarks in Byte Magazine's Forth Issue</ref> Line 115 ⟶ 118: Addresses in the thread are the addresses of machine language. This form is simple, but may have overheads because the thread consists only of machine addresses, so all further parameters must be loaded indirectly from memory. Some Forth systems produce direct-threaded code. On many machines direct-threading is faster than subroutine threading (see reference below). An example of a [[stack machine]] might execute the sequence "push A, push B, add". That might be translated to the following thread and routines, where <code>ip</code> is initialized to the address labeled <code>thread</code> (i.e., the address where <code>&pushA</code> is stored). <~~source~~syntaxhighlight lang="c"> #define PUSH(x) (sp++ = (x)) #define POP() (--sp) start: ip = &thread // ip points to &pushA (which points to the first instruction of pushA) Line 127 ⟶ 132: ... pushA: sp++ = PUSH(A) jump ip++ // send control where ip says to (i.e. to pushB) and advance ip pushB: sp++ = PUSH(B) jump ip++ add: ~~addend~~result = --spPOP() + POP() PUSH(result) sp++ = --sp + addend▼ jump ip++ </syntaxhighlight> ~~</source>~~ Alternatively, operands may be included in the thread. This can remove some indirection needed above, but makes the thread larger: <~~source~~syntaxhighlight lang="c"> #define PUSH(x) (sp++ = (x)) #define POP() (--sp) start: ip = &thread Line 152 ⟶ 159: ... push: sp++variable_address = ip++ // must move ip past operand address, since it is not a subroutine address PUSH(variable_address) // Read value from variable and push on stack jump ip++ add: ~~addend~~result = --spPOP() + POP() PUSH(result) sp++ = --sp + addend▼ jump ip++ </syntaxhighlight> ~~</source>~~ ===Indirect threading=== Line 165 ⟶ 173: For example, if the goal is to execute "push A, push B, add", the following might be used. Here, <code>ip</code> is initialized to address <code>&thread</code>, each code fragment (<code>push</code>, <code>add</code>) is found by double-indirecting through <code>ip</code> and an indirect block; and any operands to the fragment are found in the indirect block following the fragment's address. This requires keeping the ''current'' subroutine in <code>ip</code>, unlike all previous examples where it contained the ''next'' subroutine to be called. <~~source~~syntaxhighlight lang="c"> start: ip = &thread // points to '&i_pushA' Line 183 ⟶ 191: &add push: sp++ = (*ip + 1) // look 1 past start of indirect block for operand address jump (++ip) // advance ip in thread, jump through next indirect block to next subroutine add: ~~addend~~addend1 = --sp sp++addend2 = --sp ~~+ addend~~ ▲ sp++ = --spaddend1 + ~~addend~~addend2 jump (++ip) </syntaxhighlight> ~~</source>~~ ===Subroutine threading=== Line 198 ⟶ 207: As an example of call threading for "push A, push B, add": <~~source~~syntaxhighlight lang="c"> thread: call pushA Line 211 ⟶ 220: ret add: ~~addend~~addend1 = --sp sp++addend2 = --sp ~~+ addend~~ ▲ sp++ = --spaddend1 + ~~addend~~addend2 ret </syntaxhighlight> ~~</source>~~ ===Token threading=== Token-threaded code ~~uses~~implements ~~lists~~the thread as a list of indices into a table of operations; the index width is naturally chosen to be as small as possible for density and efficiency. 1 byte / 8-bits is the natural choice for ease of programming, but smaller sizes like 4-bits, or larger like 12~~-bit~~ ~~indexes~~or to16 abits, ~~table~~can be used depending on the number of ~~pointers~~operations supported. ItAs long as the index width is ~~notably~~chosen ~~compact~~to be narrower than a machine pointer, it will naturally be more compact than the other threading types without much special effort by athe programmer. It is usually half to three-fourths the size of other threadings, which are themselves a quarter to an eighth the size of non-threaded code. The table's pointers can either be indirect or direct. Some Forth compilers produce token-threaded code. Some programmers consider the "[[p-code machine\|p-code]]" generated by some [[Pascal (programming language)\|Pascal]] compilers, as well as the [[bytecode]]s used by [[.NET Framework\|.NET]], [[Java (programming language)\|Java]], BASIC and some [[C (programming language)\|C]] compilers, to be token-threading. A common approach, historically, is [[bytecode]], which typically uses 8-bit opcodes ~~and, often,~~with a stack-based virtual machine. AThe ~~typical~~archetypal bytecode [[Interpreter (computing)\|interpreter]] is known as a "[[decode and dispatch interpreter]]", and follows the form: <~~source~~syntaxhighlight lang="c"> start: vpc = &thread dispatch: ~~top:~~ iaddr = decode(&vpc++) // ~~may~~Convert bethe ~~implemented~~next ~~simply~~bytecode ~~as:~~operation to ~~return~~a vpcpointer /to machine code that implements it // Any inter-instruction operations are performed here (e.g. updating global state, event processing, etc) addr = table[i]▼ jump addr CODE_PTR decode(BYTE_CODE p) { // In a more complex encoding, there may be multiple tables to choose between or control/mode flags ▲ ~~addr =~~return table[i(p)++]; } thread: / Contains bytecode, not machine addresses. Hence it is more compact. / 1 /pushA/ Line 238 ⟶ 252: pushA: sp++ = A jump ~~top~~dispatch pushB: sp++ = B jump ~~top~~dispatch add: ~~addend~~addend1 = --sp sp++addend2 = --sp ~~+ addend~~ sp++ = addend1 + addend2 jump ~~top~~dispatch ~~</source>~~ </syntaxhighlight> If the virtual machine uses only byte-size instructions, <code>decode()</code> is simply a fetch from <code>thread</code>, but often there are commonly used 1-byte instructions plus some less-common multibyte instructions (see [[complex instruction set computer]]), in which case <code>decode()</code> is more complex. The decoding of single byte opcodes can be very simply and efficiently handled by a branch table using the opcode directly as an index. Line 252 ⟶ 267: For instructions where the individual operations are simple, such as "push" and "add", the [[Computational overhead\|overhead]] involved in deciding what to execute is larger than the cost of actually executing it, so such interpreters are often much slower than machine code. However, for more complex ("compound") instructions, the overhead percentage is proportionally less significant. There are times when token-threaded code can sometimes run faster than the equivalent machine code when that machine code ends up being too large to fit in the physical CPU's L1 instruction cache. The higher [[code density]] of threaded code, especially token-threaded code, can allow it to fit entirely in the L1 cache when it otherwise would not have, thereby avoiding cache thrashing. However, threaded code consumes both instruction cache (for the implementation of each operation) as well as data cache (for the bytecode and tables) unlike machine code which only consumes instruction cache; this means threaded code will eat into the budget for the amount of data that can be held for processing by the CPU at any given time. In any case, if the problem being computed involves applying a large number of operations to a small amount of data then using threaded code may be an ideal optimization. ~~Counter-intuitively, token-threaded code can sometimes run faster than the equivalent machine code --~~ ~~when the machine code is too large to fit in cache, but the higher [[code density]] of threaded code, especially token-threaded code, allows it to fit entirely in high-speed cache.~~ <ref name="heller"/> Line 260 ⟶ 274: ==={{anchor\|String threading}}Lesser-used threading=== An example is string threading, in which operations are identified by strings, usually looked- up by a hash table. This was used in Charles H. Moore's earliest Forth implementations and in the [[University of Illinois at Urbana–Champaign\|University of Illinois]]'s experimental hardware-interpreted computer language. It is also used in [[Bashforth]]. ===RPL=== [[Hewlett-Packard\|HP]]'s [[RPL (programming language)\|RPL]], first introduced in the [[HP-18C]] calculator in 1986, is a type of proprietary hybrid (direct-threaded and indirect-threaded) ''threaded~~-interpreted~~ interpretive language'' (TIL)<ref name="Loelinger_1981"/> that, unlike ~~others~~other TILs, allows embedding of RPL "objects" into the "runstream", iei.e. ~~The~~the stream of addresses through which the interpreter pointer advances. An RPL "object" can be thought of as a special data type whose in-memory structure contains an address to an "object prolog" at the start of the object, and then data or executable code follows. The object prolog determines how the object's body should be executed or processed. Using the "RPL inner loop",<ref name="~~RPL1~~Busby_2018"~~>Busby, Jonathan. [https:~~/~~/www.hpmuseum.org/forum/thread-11358.html "The RPL inner loop explained"], [http://www.hpmuseum.org/ "The Museum of HP Calculators"], 7 September 2018, Retrieved on 27 December 2019</ref~~>, which was invented ~~and published (~~ and patented <ref~~>{{cite web \| last~~ name= Wickes \| first = William C. \| title = Data processing system and method for the direct and indirect execution of uniformly structured object types \| website = uspto.gov \| date = May 30, 1986 \| url = http:"Wickes_1986"//patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=8&p=1&f=G&l=50&d=PTXT&S1=((%22Hewlett+Packard%22.ASNM.)+AND+Wickes.INNM.)&OS=AN/%22Hewlett+Packard%22+and+IN/Wickes&RS=(AN/%22Hewlett+Packard%22+AND+IN/Wickes) \| access-date = December 27, 2019 }}</ref> ) by William C. Wickes in 1986 and published in ~~"Programming Environments", Institute for Applied Forth Research, Inc.,~~ 1988, execution follows like so:<ref :name="Wickes_1988"/>▼ # Dereference the IP (instruction pointer) and store it into O (current object pointer) ▲[[Hewlett-Packard\|HP]]'s [[RPL (programming language)\|RPL]], first introduced in the [[HP-18C]] calculator in 1986, is a type of proprietary hybrid direct-threaded and indirect-threaded threaded-interpreted language that, unlike others TILs, allows embedding of RPL "objects" into the "runstream" ie. The stream of addresses through which the interpreter pointer advances. An RPL "object" can be thought of as a special data type whose in-memory structure contains an address to an "object prolog" at the start of the object, and then data or executable code follows. The object prolog determines how the object's body should be executed or processed. Using the "RPL inner loop"<ref name="RPL1">Busby, Jonathan. [https://www.hpmuseum.org/forum/thread-11358.html "The RPL inner loop explained"], [http://www.hpmuseum.org/ "The Museum of HP Calculators"], 7 September 2018, Retrieved on 27 December 2019</ref>, which was invented and published ( and patented <ref>{{cite web \| last = Wickes \| first = William C. \| title = Data processing system and method for the direct and indirect execution of uniformly structured object types \| website = uspto.gov \| date = May 30, 1986 \| url = http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=8&p=1&f=G&l=50&d=PTXT&S1=((%22Hewlett+Packard%22.ASNM.)+AND+Wickes.INNM.)&OS=AN/%22Hewlett+Packard%22+and+IN/Wickes&RS=(AN/%22Hewlett+Packard%22+AND+IN/Wickes) \| access-date = December 27, 2019 }}</ref> ) by William C. Wickes in 1986 and published in "Programming Environments", Institute for Applied Forth Research, Inc., 1988, execution follows like so : # Increment the IP by the length of one address pointer # Dereference ~~the~~O IPand (store ~~instruction~~its ~~pointer~~address )in ~~and~~O_1 ~~store~~(this itis ~~into O (~~the ~~current~~second ~~object~~level ~~pointer~~of indirection) # ~~Increment~~Transfer IPcontrol to next pointer or embedded object by setting the ~~length~~PC (program counter) to O_1 ofplus one address pointer ~~# Dereference O and store its address in O_1 ( This is the second level of indirection )~~ ~~# Transfer control to next pointer or embedded object by setting the PC ( program counter) to O_1 plus one address pointer~~ # Go back to step 1 This can be represented more precisely by : <pre> Line 282 ⟶ 295: Where above, O is the current object pointer, I is the interpreter pointer, Δ is the length of one address word and the "[]" operator stands for "dereference". When control is transferred to an object pointer or an embedded object, execution continues as follows : <pre> PROLOG -> PROLOG ( The prolog address at the start of the prolog code points to itself ) IF O + Δ =/= PC THEN GOTO INDIRECT ( Test for direct execution ) O = I - Δ ( Correct O to point to start of embedded object ) I = I + α ( Correct I to point after embedded object where α is the length of the object ) INDIRECT ( ~~rest~~Rest of prolog ) </pre> On HP's [[HP Saturn\|Saturn]] microprocessors that use RPL, there is a third level of indirection made possible by an architectural / programming trick which allows faster execution.<ref name="~~RPL1~~Busby_2018"/> ==Branches== In all interpreters, a branch simply changes the thread pointer (<code>ip</code> ~~above~~) to a different address in the thread. A conditional jump-if-zero branch, tothat ~~jump~~jumps only if the top-of-stack value is zero, ~~might~~could be ~~encoded~~implemented as ~~follows~~shown below. ~~Note~~This ~~that~~example uses the embedded parameter version of direct threading so the <code>&thread[123]</code> line is the ~~___location~~destination toof ~~which~~where to jump, ~~not~~if the ~~address~~condition ofis atrue, ~~handler. So,~~so it must be skipped (<code>ip++</code>) ~~regardless~~over ~~of whether~~if the branch is not taken. <~~source~~syntaxhighlight lang="c"> thread: ... Line 305 ⟶ 318: ... brz: when_true_ip = ip++ // Get destination address for branch ~~tmp = ip++~~ if (--sp == 0) // Pop/Consume top of stack and check if it's zero ~~if (sp++ == 0)~~ ip = ~~tmp~~when_true_ip jump ip++ </syntaxhighlight> ~~</source>~~ ==Common amenities== Line 326 ⟶ 339: In an indirect-threaded virtual machine, the one given here, the operations are: <~~source~~syntaxhighlight lang="c"> next: ip++ -> w Line 337 ⟶ 350: --rp -> ip next </syntaxhighlight> ~~</source>~~ ~~This is perhaps{{citation needed\|date=July 2016}} the simplest and fastest interpreter or virtual machine.~~ ==See also== Line 344 ⟶ 356: [[Continuation-passing style]], which replaces the global variable <code>ip</code> with a function parameter * [[Just-in-time compilation]] * [[Return-oriented programming]]~~—is~~: the rediscovery of threaded code in order to exploit remote vulnerable systems. * [[Tail ~~recursion~~call]] * [[History of general -purpose CPUs]]▼ ==Notes== Line 351 ⟶ 364: ==References== {{reflist}}\|refs= <ref name="Loelinger_1981">{{cite book \|title=Threaded Interpretive Languages: Their Design and Implementation \|author-first=R. G. \|author-last=Loelinger \|___location=Dayton, Ohio, USA \|edition=2nd printing, 1st \|date=1981 \|orig-date=August 1979 \|publisher=[[BYTE Books]], [[BYTE Publications Inc.]] \|publication-place=Peterborough, New Hampshire, UK \|isbn=0-07038360-X \|lccn=80-19392 \|id={{ISBN\|978-0-07038360-9}} \|url=https://archive.org/details/R.G.LoeligerThreadedInterpretiveLanguagesTheirDesignAndImplementationByteBooks1981 \|access-date=2023-08-03}} (xiv+2+251 pages)</ref> <ref name="Busby_2018">{{cite web \|title=The RPL inner loop explained \|author-last=Busby \|author-first=Jonathan \|work=The Museum of HP Calculators \|date=2018-09-07 \|url=https://www.hpmuseum.org/forum/thread-11358.html \|access-date=2019-12-27 \|url-status=live \|archive-url=https://web.archive.org/web/20230803201320/https://www.hpmuseum.org/forum/thread-11358.html \|archive-date=2023-08-03}}</ref> <ref name="Wickes_1986">{{cite web \|title=Data processing system and method for the direct and indirect execution of uniformly structured object types \|author-last=Wickes \|author-first=William C. \|website=uspto.gov \|date=1986-05-30 \|url=http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=8&p=1&f=G&l=50&d=PTXT&S1=((%22Hewlett+Packard%22.ASNM.)+AND+Wickes.INNM.)&OS=AN/%22Hewlett+Packard%22+and+IN/Wickes&RS=(AN/%22Hewlett+Packard%22+AND+IN/Wickes) \|access-date=2019-12-27}}</ref> <ref name="Wickes_1988">{{cite conference \|title=RPL: A Mathematical<!-- also seen as: "Mathematics". Check actual publication's cover. --> Control Language \|author-last=Wickes \|author-first=William C. \|editor-first=Lawrence P. \|editor-last=Forsely \|date=1988-10-01 \|orig-date=14–18 June 1988 \|conference=Proceedings of the 1988 Rochester Forth Conference: Programming Environments \|volume=8 \|publisher=Institute for Applied Forth Research, Inc., [[University of Rochester]] \|___location=Rochester, New York, USA \|isbn=978-0-91459308-9 \|oclc=839704944 \|url=https://dl.acm.org/doi/abs/10.5555/534949 }} (NB. This title is often cited as "RPL: A Mathematics Control Language". An excerpt is available at: [https://web.archive.org/web/20230328115142/https://www.hpcalc.org/details/1743 RPLMan from Goodies Disk 4][https://web.archive.org/web/20220419184811/https://www.hpcalc.org/hp48/docs/programming/rplman.zip Zip File])</ref> }} == Further reading == * [http://cm.bell-labs.com/cm/cs/who/dmr/chist.html The Development of the C Language] {{Webarchive\|url=https://web.archive.org/web/20150328220551/http://cm.bell-labs.com/cm/cs/who/dmr/chist.html \|date=2015-03-28}} by [[Dennis Ritchie\|Dennis M. Ritchie]] describes B (a precursor of C) as implemented using "threaded code".▼ * {{cite web \|title=What is RPL? \|author-first=Joseph K. \|author-last=Horn \|url=http://www.hpcalc.org/hp48/docs/programming/rpl3.txt \|access-date=2017-09-17 \|url-status=live \|archive-url=https://web.archive.org/web/20170917221524/http://www.hpcalc.org/hp48/docs/programming/rpl3.txt \|archive-date=2017-09-17}} (NB. Brief overview on the threaded languages, System and User RPL, used on the HP calculators like the [[HP 48]].)▼ == External links == * Anton Ertl's explanatory page [http://www.complang.tuwien.ac.at/forth/threaded-code.html What is Threaded Code?] describes different threading techniques and provides further references. * [https://thinking-forth.sourceforge.net/ Thinking Forth Project] includes the seminal (but out of print) book Thinking Forth by [http://home.earthlink.net/~lbrodie/ Leo Brodie] {{Webarchive\|url=https://web.archive.org/web/20051113041339/http://home.earthlink.net/~lbrodie/ \|date=2005-11-13 }} published in 1984. ▲[http://cm.bell-labs.com/cm/cs/who/dmr/chist.html The Development of the C Language] by [[Dennis Ritchie\|Dennis M. Ritchie]] describes B (a precursor of C) as implemented using "threaded code". [http://~~thinking-~~www.forth.~~sourceforge.net~~com/starting-forth/ ~~Thinking~~Starting ~~Forth Project~~FORTH] ~~includes~~online ~~the seminal (but out~~version of ~~print)~~the book ~~Thinking~~Starting ~~Forth~~FORTH by [http://home.earthlink.net/~lbrodie/ Leo Brodie] {{Webarchive\|url=https://web.archive.org/web/20051113041339/http://home.earthlink.net/~lbrodie/ \|date=2005-11-13 }} published in ~~1984~~1981. * Brad Rodriguez's [http://www.bradrodriguez.com/papers/moving1.htm Moving FORTH: Part 1: Design Decisions in the Forth Kernel] covers threading techniques in depth.▼ [http://www.forth.com/starting-forth/ Starting FORTH] online version of the book Starting FORTH by [http://home.earthlink.net/~lbrodie/ Leo Brodie] published in 1981. ▲Brad Rodriguez's [http://www.bradrodriguez.com/papers/moving1.htm Moving FORTH: Part 1: Design Decisions in the Forth Kernel] covers threading techniques in depth. ▲* [[History of general purpose CPUs]] * [https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html GCC extensions. Labels as Values] ▲* {{cite web \|title=What is RPL? \|author-first=Joseph K. \|author-last=Horn \|url=http://www.hpcalc.org/hp48/docs/programming/rpl3.txt \|access-date=2017-09-17 \|url-status=live \|archive-url=https://web.archive.org/web/20170917221524/http://www.hpcalc.org/hp48/docs/programming/rpl3.txt \|archive-date=2017-09-17}} (NB. Brief overview on the threaded languages, System and User RPL, used on the HP calculators like the [[HP 48]].) [[Category:Compilers]]