XCore Architecture: Difference between revisions

Content deleted Content added
WP:FIX + general fixes, typo(s) fixed: a a → a using AWB
Line 24:
 
==Architecture==
The architecture comprises a central execution unit that operates on a set of 25 registers, and surrounded by a number of ''resources'' that perform operations that interact with the environment. Each thread has its own set of hardware registers, enabling threads to execute concurrently.
The instruction set comprises both a (more or less standard) sequential programming model, and instructions that implement multi-threading, multi-core and I/O operations.
 
Most instructions can access only the 12 general-purpose registers r0–r11. In general, they are completely interchangeable, except that some instructions use r11 implicitly. There are also 4 base registers usable by some instructions:
* r12 = cp = Constant pool pointer
* r13 = dp = Data pointer
Line 33:
* r15 = lr = Link register
 
Registers 16 through 24 are only accessible to specialized instructions. Except for the first two (r16 = pc = program counter, r17 = sr = status register), they are dedicated to exception and interrupt handling.
 
The status register contains various mode bits, but the processor does ''not'' have the standard ALU result flags like [[Carry flag|carry]], [[Zero flag|zero]], [[Negative flag|negative]] or [[Overflow flag|overflow]]. Add and subtract with carry instructions exist, but specify 5 operand registers: 2 inputs and input carry, and one output and output carry.
 
===Instruction encoding===
Line 42:
{|class="wikitable" style="text-align:center"
|+XMOS instruction formats
! 1<br />5 || 1<br />4 || 1<br />3 || 1<br />2 || 1<br />1 || 1<br />0 || <br />9 || <br />8 || <br />7 || <br />6 || <br />5 || <br />4 || <br />3 || <br />2 || <br />1 || <br />0 || Description
|-
|colspan=6| opcode ||colspan=10| immediate ||align=left| 10/20-bit immediate
Line 50:
|colspan=6| opcode ||colspan=2| 1 1 ||colspan=2| opc ||colspan=6| immediate ||align=left| 6/16-bit immediate
|-
|colspan=5| opcode ||colspan=5| 9×''a''+3×''b''+''c'' ||colspan=2| a a ||colspan=2| b b ||colspan=2| c c ||align=left| 3-operand register
|-
|colspan=5| opcode ||colspan=5| 27+3×''b''+''c'' || * || o ||colspan=2| b b ||colspan=2| c c ||align=left| 2-operand register
Line 58:
|colspan=5| opcode ||colspan=6| 1 1 1 1 1 1 || o ||colspan=2| 1 1 ||colspan=2| opc ||align=left| 0-operand
|}
The last four forms share the same opcode range, because the number of operands is determined by bits 5 through 10. The last 3 forms use bit 4 as an additional opcode bit. (And the last form uses bits 1 and 0 as well.)
 
In the second form, some instructions (loads and stores) use all 4 bits to encode the register number, allowing access to r12–r15. Other instructions (conditional branches) do not allow register numbers above 11, instead allowing the third form to share the opcode range.
 
Because constants are always unsigned, many instructions come in add/subtract pairs, e.g. jump forward and backward.
Line 78:
** '''11111''': 3 additional operands, in addition to the operands of the following register instruction
 
The encoding of the 3-operand register opcodes is quite unusual, since 12 registers is not a power of 2. The encoding used fits 0 to 3 operands, and the number of operands, into 11 bits. Thus, each 5-bit opcode can be assigned four times, once to a 3-operand instruction, once to a 2-operand, etc.
 
In all cases, the low 2 bits of the register number are placed in a 2-bit field, reducing the problem to encoding the high bits, which are in the range of 0 to 2.
 
The 3-operand form places the low register numbers in the low 6 instruction bits. The high 2 bits of each register number are combined in base-3 into a number between 0 and 26 (using 9×''a''+3×''b''+''c'') and stored in the remaining 5 bits.
 
The 2-operand form uses the unused 5 combinations (27–31) in the 5-bit field. Operand ''a'' is not used, and the 2-bit field for its low bits is reassigned; one bit is used for an additional opcode bit, and the other is used as an additional combination register specifier, doubling the number of available combinations to 10, and allowing all 9 combinations of 3×''b''+''c'' to be represented. This is done in a manner similar to [[bi-quinary coded decimal]]: the combination, modulo 5, is stored in the 5-bit field (as (3×''b''+''c'')&nbsp;mod&nbsp;5 + 27), and the 1-bit quotient (⌊(3×''b''+''c'')/5⌋) is stored in instruction bit 5 (marked with an asterisk in the table above).<ref>The architecture manual documents bit 5 as the "most significant bit", but fails to mention the non-binary base; some [http://git.infradead.org/users/segher/dis-xs1.git/blob/HEAD:/dis-xs1.fs XS-1 disassembler source code] makes it clear. In the definition of <code>parse-inssn-r2</code>, the <code>1 #split 1b - swap 5 * +</code> portion splits the 6-bit register field into a 5-bit and a 1-bit part, subtracts 27 (hex 1b) from the high part, multiplies the low part by 5, and adds them.</ref>
 
1-operand instructions use the tenth combination value, with all 6 bits set, and place the register number in the 4 available bits. Only operand ''c'' is specified, and the high bits are stored in the ''b'' field.
 
Finally, the 1-operand encoding, with a register number 12 or more (the ''b'' field contains binary 11), is also used to encode 0-operand instructions. The two low-order bits of the ''c'' field are available for additional opcode bits (bringing the total to 8).
 
(A few instructions use the register ''c'' field value 0–11 as a small immediate constant, or use it to select one of 12 convenient bit-shift
Line 97:
One 10-bit immediate opcode (PFIX, opcode 111100) is used to add an additional 10 bits to the 6- or 10-bit immediate in the following instruction.
 
One 3-operand opcode (EOPR, opcode 11111) is reserved for an "additional operands" prefix. Its 3 operands are used along with those of the following instruction word to produce additional 32-bit instructions with up to 6 operands. This is also used for rarely used 3- and 2-operand instructions; in such cases the EOPR specifies all 3 or 2 operands, and the following instruction word is a 0-operand instruction. (In the 2-operand case, the extra opcode bit in the leading EOPR ''is'' used.)
 
==Sequential programming model==
Line 109:
Almost all instructions execute in a single cycle. If an instruction does not need data from memory (for example, arithmetic operations), the instruction will prefetch a word of instructions. Because most instructions are encoded in 16-bits, and because most instructions are not loads or stores (a typical number is 20% loads&amp;stores, 80% other instructions<ref>
{{cite book
|author1=Jurij Šilc |author2=Borut Robič |author3=Theo Ungerer
| year = 1999
| title = Processor Architecture
| publisher = Springer
| isbn = 3-540-64798-8
}}</ref>), the [[Instruction prefetch|prefetch]] mechanism can stay ahead of the instructions stream. This acts like a very small [[instruction cache]], but its behaviour can be predicted at [[compile time]], making timing behaviour as predictable as functional behaviour.
 
Line 141:
 
==Parallel programming model==
The XS1 instruction set is designed to support both multi threading and multi-core computations. To this extent it supports channel communication (to support distributed memory computations) and barriers and locks (to support shared memory computations).
A thread initiates execution on one or more newly
allocated threads by setting their initial register values.
 
Communication between threads is performed using channels that provide full-duplex data transfer between channel-ends. This enables, amongst others, the implementation of [[Communicating sequential processes|CSP]] based languages, languages based on the [[Pi calculus]]. The instruction set is agnostic as to where a channel is connected to - whether that is inside a core or outside the core. Channels carry messages constructed from data and control tokens between the two channel ends. The control tokens can be used to encode communication protocols.
 
Channel ends have a buffer able to hold sufficient tokens to allow at least one word to be buffered. If an output instruction is executed when the channel is too full to take the data then the thread which executed the instruction is paused. It is restarted when there is enough room in the channel for the instruction to successfully complete. Likewise, when an input instruction is executed and there is not enough data available then the thread is paused and will be restarted when enough data becomes available.