Talk:Bytecode: Difference between revisions

Content deleted Content added
m Added wikiprojects
 
(10 intermediate revisions by 5 users not shown)
Line 1:
{{Talk header}}
{{VitalWikiProject articlebanner shell|topiccollapsed=Technology|level=5yes|class=Start}}|vital=yes|1=
{{WikiProject Computing |class=Start |importance=High |software=y |software-importance=High}}
{{WikiProjectBannerShell|collapsed=yes|1=
{{WikiProject Computing |class=Start |importance=High |software=y |software-importance=High}}
{{WikiProject Technology}}
{{WikiProject Computer science}}
Line 64 ⟶ 63:
 
The focus should probably be on 'byte-oriented', in the sense of simplifying instruction decoding. The op-code is only one of several fields -- it is not a great benefit if the op-code is easy to extract, while other fields are complex. I've always thought the instruction encoding used for the EM-1 'machine' was a good example: opcode is one byte, escape sequence is one byte, and address fields is one or two bytes. There are a few exceptions where the instructions and arguments were encoded into one byte, but this was to speed execution of very common instructions. (See Informatica Report IR-81 (from 1983) by Andrew S Tanenbaum et al.: Description of a machine architecture for use with block structured languages.) Although the term 'bytecode' is not used by the authors, it has been used in descriptions of the Amsterdam Compiler Kit, of which EM-1 was a central concept.[[User:Athulin|Athulin]] ([[User talk:Athulin|talk]]) 09:46, 10 January 2011 (UTC)
:
:Yes. [[VAX]] was well designed for byte-by-byte interpreters, specifically a microcoded processor. But that was its main failure. It does not lead to parallel or OoO execution. Each operand has a one byte operand descriptor, followed by the appropriate number of operand bytes. To find the next instruction, you have to process all the operand descriptor bytes, one by one. Even though DEC hoped for a long life for VAX, after not so many years, they went to the RISC architecture [[Alpha AXP|Alpha]]. VAX instructions can be 1 to 56 bytes long, I suspect a wider range than JVM. [[User:Gah4|Gah4]] ([[User talk:Gah4|talk]]) 20:29, 14 August 2025 (UTC)
 
==Layman's terms==
Line 121 ⟶ 122:
 
It seems to me like the list of examples will grow indefinitely, as most interpreters use bytecode as an intermediate representation for interpretation, including many of those which can also optionally produce native code or compile to another language (i.e. to C or LLVM). I can myself immediately think of various examples which are missing from the list, but am not sure it would be wise to add them... [[Special:Contributions/76.10.128.192|76.10.128.192]] ([[User talk:76.10.128.192|talk]]) 15:13, 5 July 2013 (UTC) It wouldn't.
 
:IMO WP lists are not very useful. They are basically OR and provide little value. I think examples are OK, but list a couple and leave it at that. Don't try to be exhaustive. [[User:Stevebroshar|Stevebroshar]] ([[User talk:Stevebroshar|talk]]) 13:21, 11 August 2025 (UTC)
 
== Bytecode versus P-code ==
Line 138 ⟶ 141:
:I only have a suggestion: look for material coming out of the compilers / interpreters designed for portability starting in the 1960s, and usually based on the ideas of solving the M*N compiler problem (languages*architectures), which some researchers proposed to solve by defining a universal intermediate language between the M languages and the N computer architectures, making it into a M+N problem. When I hear the term I think of Algol 60 (Randall & Russell: Algol 60 implementation, (1964)), Smalltalk-80 and Pascal, and certainly the Amsterdam Compiler Kit (Andrew Tanenbaum). Tanenbaum's paper on the design of the EM-1 byte code ("DESCRIPTION OF A MACHINE ARCHITECTURE FOR USE WITH BLOCK STRUCTURED LANGUAGE", Informatica Report IR-81) is worth reading. It doesn't use the term 'byte code', but the instruction format is heavily byte-oriented for processing efficiency, and anyone knowing EM-1 would almost certainly interpret the term as a synonym for intermediate code based on the same design principles. At a stretch, Griswold's book on portable SNOBOL4 implementation might be relevant, perhaps also Lisp Machine stuff, and I vaguely remember another author who wrote a lot about portable code (Winter?). Barron's and Pemberton's books on Pascal implementation may also provides clues. [[User:Athulin|Athulin]] ([[User talk:Athulin|talk]]) 09:27, 20 October 2022 (UTC)
::... and interestingly enough I come on the https://en.wikipedia.org/wiki/Virtual_machine#History where the term 'O-Code' is used for one such intermediate language. It is possible, that the term P code is a reference back to O code: they seem to have been used in much the same way. [[User:Athulin|Athulin]] ([[User talk:Athulin|talk]]) 18:02, 29 November 2022 (UTC)
 
== First sentence is wrong ==
 
WRT "Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter"
 
Most of that is wrong. I don't think bytecode is the same as portable code. I think that they have been conflated but they mean different things. Bytecode need not be portable. Portable code need not be byte-sized.
 
Bytecode is hardly an instruction set. An instruction set is what processors have. Bytecode is generally not what a processor processes. It's an intermediate executable format. Intermediate between source code and native executable. I've heard that there is a CPU for Java bytecode so in that context the bytecode is an instruction set. But, in the more common uses, it's not what I'd call an instruction set.
 
Efficient? Who says?
 
Bytecode is often _not_ interpreted! Some (many/most?) runtimes convert bytecode to native code at runtime ... via JIT compilation. [[User:Stevebroshar|Stevebroshar]] ([[User talk:Stevebroshar|talk]]) 13:27, 11 August 2025 (UTC)
 
:In general, I believe that it is reasonable to describe Pascal P-code and Java byte code as machine languages for abstract machines. Do you not consider [[MIX (abstract machine)|MIX]] to be an instruction set?
:What is the encoding of the P-code for the [[Pascal (programming language)#The Pascal-P system|Pascal-P system]]?
::
::Yes it is strange. Bytecode should apply to any byte oriented instruction encoding, back to (as far as I know) IBM System/360. An important idea behind S/360 was the low-end microcoded machines. That is, ones that interpret the instructions in software. Before S/360, the IBM scientific machines used 36 bit words. Not so many years later, we have VAX, again byte oriented and designed for microcoded processors that interpret the byte codes. VAX followed DEC 36 bit machines, such as the PDP-10. (Seems to be a pattern here.) Once byte addressable machines became popular, byte oriented intermediate code became popular for many different cases. Even more, early in the Java years, Sun had designed and built hardware for running JVM! [[User:Gah4|Gah4]] ([[User talk:Gah4|talk]]) 18:42, 11 August 2025 (UTC)
::
::But okay, MIX. MIX is designed to be either binary or decimal, such that properly written programs run in either case. Assuming ''byte'' means eight-bit unit, I think that disqualifies MIX. But the idea isn't so far off. Back to the years close to the beginning of MIX, decimal machines were not so rare for commercial processors. I suspect, though, that Knuth was trying to get people to think more generally. [[User:Gah4|Gah4]] ([[User talk:Gah4|talk]]) 18:47, 11 August 2025 (UTC)
:::The [[IBM 7030]] used 64-bit words. Other vendors had 48-bit and 60-bit scientific machines. The wierdest was the [[UNIVAC LARC]], which had a 12-digit word but allowed \ (ignore), ^ (space), - (minus), . (period) and + (plus) as digits. -- [[User:Chatul|Shmuel (Seymour J.) Metz Username:Chatul]] ([[User talk:Chatul|talk]]) 10:52, 12 August 2025 (UTC)
:::{{tq| Bytecode should apply to any byte oriented instruction encoding, back to (as far as I know) IBM System/360.}} S/360 is a byte-addressable processor, but its ''instruction set'' isn't byte-aligned, it's halfword-aligned.
:::{{tq|An important idea behind S/360 was the low-end microcoded machines. That is, ones that interpret the instructions in software.}} Microcoding goes anywhere from "it's kinda like [[SIMH]]" vertical microcode, where it looks like a software interpreter, to "it's a state machine defined by words in a control memory with bitfields that either go directly to CPU circuit or control which next control memory word is fetched" horizontal microcode, with various types in between. Often instruction fetch and decode is helped by specialized hardware controlled by the microcode, or is done by a separate hardwired or microcoded engine, with the results fed to a separate execution unit. [[User:Guy Harris|Guy Harris]] ([[User talk:Guy Harris|talk]]) 23:30, 14 August 2025 (UTC)
:{{tq|An instruction set is what processors have.}} OK, is [https://bitsavers.org/pdf/ibm/system38/GA21-9331-1_System_38_Functional_Reference_Manual_Feb81.pdf the never-directly-executed instructions generated by all compilers for System/38 ''not'' used for internal development at IBM, and one of the compilers used for internal development at IBM], or is [https://bitsavers.org/pdf/ibm/system38/SC21-9037-3_IBM_System_38_Internal_Microprogramming_Instructions_Formats_and_Functions_Reference_4th_ed_198508.pdf the executed-by-the-microcoded-processor instructions generated by another compiler used for internal development at IBM, possibly an assembler used for internal development at IBM, and the low-level system code that translates the first instruction set into this instruction set in order to run that code], the instruction set for [[IBM System/38]]? The [[AS/400]] originally continued with those instruction sets, but switched to an extended form of [[PowerPC]]/[[Power ISA]] for the second instruction set ''while continuing to run the same software without recompilation'' (unless you screwed up and "removed observability", meaning "discarding the first-instruction-set code and leaving behind only the second-instruction-set code generated by the binary-to-binary translator").
:{{tq|Bytecode need not be portable.}} If, for a given bytecode, you can write bytecode interpreters that run on more than one type of CPU (whether by having different interpreters for different machines, or by writing a portable interpreter), it's at least in principle portable, and if you actually ''do'' that, it ''is'' portable. (The OS portability issue is probably a bigger issue than the CPU portability issue.)
:{{tq|I've heard that there is a CPU for Java bytecode so in that context the bytecode is an instruction set.}} [[picoJava]] did that.
:Some other machines that could perhaps be considered bytecode machines are the [[Pascal MicroEngine]] (using the same [[MCP-1600]] that the [[LSI-11]] used), possibly the [[Lilith (computer)|Lilith]] running "M-code" for [[Modula-2]], and the Xerox D-machines, microcoded to interpret, among other things, the stack-machine code generated by the [[Mesa (programming language)|Mesa]] compiler. [[User:Guy Harris|Guy Harris]] ([[User talk:Guy Harris|talk]]) 23:54, 14 August 2025 (UTC)