Content deleted Content added
WP:FIX + general fixes, typo(s) fixed: a a → a using AWB |
|||
Line 24:
==Architecture==
The architecture comprises a central execution unit that operates on a set of 25 registers, and surrounded by a number of ''resources'' that perform operations that interact with the environment. Each thread has its own set of hardware registers, enabling threads to execute concurrently.
The instruction set comprises both a (more or less standard) sequential programming model, and instructions that implement multi-threading, multi-core and I/O operations.
Most instructions can access only the 12 general-purpose registers r0–r11.
* r12 = cp = Constant pool pointer
* r13 = dp = Data pointer
Line 33:
* r15 = lr = Link register
Registers 16 through 24 are only accessible to specialized instructions.
The status register contains various mode bits, but the processor does ''not'' have the standard ALU result flags like [[Carry flag|carry]], [[Zero flag|zero]], [[Negative flag|negative]] or [[Overflow flag|overflow]].
===Instruction encoding===
Line 42:
{|class="wikitable" style="text-align:center"
|+XMOS instruction formats
! 1<br />5 || 1<br />4 || 1<br />3 || 1<br />2 || 1<br />1 || 1<br />0 ||
|-
|colspan=6| opcode ||colspan=10| immediate ||align=left| 10/20-bit immediate
Line 50:
|colspan=6| opcode ||colspan=2| 1 1 ||colspan=2| opc ||colspan=6| immediate ||align=left| 6/16-bit immediate
|-
|colspan=5| opcode ||colspan=5| 9×''a''+3×''b''+''c'' ||colspan=2|
|-
|colspan=5| opcode ||colspan=5| 27+3×''b''+''c'' || * || o ||colspan=2| b b ||colspan=2| c c ||align=left| 2-operand register
Line 58:
|colspan=5| opcode ||colspan=6| 1 1 1 1 1 1 || o ||colspan=2| 1 1 ||colspan=2| opc ||align=left| 0-operand
|}
The last four forms share the same opcode range, because the number of operands is determined by bits 5 through 10.
In the second form, some instructions (loads and stores) use all 4 bits to encode the register number, allowing access to r12–r15.
Because constants are always unsigned, many instructions come in add/subtract pairs, e.g. jump forward and backward.
Line 78:
** '''11111''': 3 additional operands, in addition to the operands of the following register instruction
The encoding of the 3-operand register opcodes is quite unusual, since 12 registers is not a power of 2.
In all cases, the low 2 bits of the register number are placed in a 2-bit field, reducing the problem to encoding the high bits, which are in the range of 0 to 2.
The 3-operand form places the low register numbers in the low 6 instruction bits.
The 2-operand form uses the unused 5 combinations (27–31) in the 5-bit field.
1-operand instructions use the tenth combination value, with all 6 bits set, and place the register number in the 4 available bits.
Finally, the 1-operand encoding, with a register number 12 or more (the ''b'' field contains binary 11), is also used to encode 0-operand instructions.
(A few instructions use the register ''c'' field value 0–11 as a small immediate constant, or use it to select one of 12 convenient bit-shift
Line 97:
One 10-bit immediate opcode (PFIX, opcode 111100) is used to add an additional 10 bits to the 6- or 10-bit immediate in the following instruction.
One 3-operand opcode (EOPR, opcode 11111) is reserved for an "additional operands" prefix.
==Sequential programming model==
Line 109:
Almost all instructions execute in a single cycle. If an instruction does not need data from memory (for example, arithmetic operations), the instruction will prefetch a word of instructions. Because most instructions are encoded in 16-bits, and because most instructions are not loads or stores (a typical number is 20% loads&stores, 80% other instructions<ref>
{{cite book
}}</ref>), the [[Instruction prefetch|prefetch]] mechanism can stay ahead of the instructions stream. This acts like a very small [[instruction cache]], but its behaviour can be predicted at [[compile time]], making timing behaviour as predictable as functional behaviour.
Line 141:
==Parallel programming model==
The XS1 instruction set is designed to support both multi threading and multi-core computations. To this extent it supports channel communication (to support distributed memory computations) and barriers and locks (to support shared memory computations).
A thread initiates execution on one or more newly
allocated threads by setting their initial register values.
Communication between threads is performed using channels that provide full-duplex data transfer between channel-ends. This enables, amongst others, the implementation of [[Communicating sequential processes|CSP]] based languages, languages based on the [[Pi calculus]]. The instruction set is agnostic as to where a channel is connected to - whether that is inside a core or outside the core.
Channel ends have a buffer able to hold sufficient tokens to allow at least one word to be buffered. If an output instruction is executed when the channel is too full to take the data then the thread which executed the instruction is paused. It is restarted when there is enough room in the channel for the instruction to successfully complete. Likewise, when an input instruction is executed and there is not enough data available then the thread is paused and will be restarted when enough data becomes available.
|