XCore Architecture: Difference between revisions

Content deleted Content added
Other infoboxes have "-bit" after the bit width. (Yes, one could argue that it's redundant. If you care, change all of them.)
Six operands how? Double length instructions? Not all single cycle. "including a divide and remainder (which are the only instructions that are not single cycle)" - https://www.xcore.com/forum/viewtopic.php?f=7&t=873
Line 19:
|publisher=[[XMOS]]}}</ref>
 
The architecture encodes instructions compactly, using 16 bits for frequently used instructions (with up to three operands) and 32 bits for less frequently used instructions (with up to 6 operands). Almost all instructions execute in a single cycle, and the architecture is event-driven in order to decouple the timings that a program needs to make from the execution speed of the program. A program will normally perform its computations and then wait for an [[Event (computing)|event]] (e.g. a [[Message passing|message]], time, or external I/O event) before continuing.
Almost all instructions execute in a single cycle, and the architecture is event-driven in order to decouple the timings that a program needs to make from the execution speed of the program. A program will normally perform its computations and then wait for an [[Event (computing)|event]] (e.g. a [[Message passing|message]], time, or external I/O event) before continuing.
 
Processors with this architecture include the [[XCore XS1-G4]] and [[XCore XS1-L1]].
 
==Architecture==
 
The architecture comprises a central execution unit that operates on a set of 25 registers, a surrounded by a number of ''resources'' that perform operations that interact with the environment. Each thread has its own set of hardware registers, enabling threads to execute concurrently.
The instruction set comprises both a (more or less standard) sequential programming model, and instructions that implement multi-threading, multi-core and I/O operations.
Line 40 ⟶ 38:
 
===Instruction encoding===
Most instructions are 16-bit while a few have 32-bit encoding. Instructions can use between zero and six operands. Most common arithmetic operations (such as ADD, SUB, MULT) are [[Instruction_setInstruction set#Number_of_operandsNumber of operands|three-operand instructions]] based on a set of 12 general purpose registers.
 
Instructions can use between zero and six operands. Most common arithmetic operations (such as ADD, SUB, MULT) are [[Instruction_set#Number_of_operands|three-operand instructions]] based on a set of 12 general purpose registers.
 
{|class="wikitable" style="text-align:center"
Line 96 ⟶ 93:
constants 0–8, 16, 24, or 32.)
 
Less frequently used instructions are encoded in 32 bits. 32-bit instructions allow 16- or 20-bit immediate operands (such as far branches), up to 6 register operands (for example long multiply which has 4 source and two destination operands) and additional opcode space for rarely used instructions.
 
One 10-bit immediate opcode (PFIX, opcode 111100) is used to add an additional 10 bits to the 6- or 10-bit immediate in the following instruction.
Line 103 ⟶ 100:
 
==Sequential programming model==
 
Each thread has access to 12 general purpose registers R0...R11. In addition there are 4 special purpose registers the SP, LR (Link register - stores the return address), CP (constant pool, points to a part of memory that stores constants) and DP (data pool - points to global variables). In addition to those 16 there are another 9 registers that store the PC, kernel PC, Exception type, Exception data, and saved copies of all those in case of an exception or interrupt.<ref>{{cite web
|title=XMOS XS1 Architecture |format=PDF
Line 111 ⟶ 107:
The instruction set is a [[Load-store architecture|load-store]] instruction set.
 
AllAlmost all instructions execute in a single cycle. If an instruction does not need data from memory (for example, arithmetic operations), the instruction will prefetch a word of instructions. Because most instructions are encoded in 16-bits, and because most instructions are not loads or stores (a typical number is 20% loads&amp;stores, 80% other instructions<ref>
{{cite book
|author1=Jurij Šilc |author2=Borut Robič |author3=Theo Ungerer
Line 144 ⟶ 140:
It is up to the callee to save the link-register if it is not a leaf-function, a single instruction extends the stack and saves the link register.
 
==Parallel Programmingprogramming Modelmodel==
The XS1 instruction set is designed to support both multi threading and multi-core computations. To this extent it supports channel communication (to support distributed memory computations) and barriers and locks (to support shared memory computations).
A thread initiates execution on one or more newly
allocated threads by setting their initial register values.
 
Communication between threads is performed using channels that provide full-duplex data transfer between channel-ends. This enables, amongst others, the implementation of [[Communicating sequential processes|CSP]] based languages, languages based on the [[Pi calculus]]. The instruction set is agnostic as to where a channel is connected to - whether that is inside a core or outside the core. Channels carry messages constructed from data and control tokens between the two channel ends. The control tokens can be used to encode communication protocols.
tokens between the two channel ends. The control tokens can be
used to encode communication protocols.
 
Channel ends have a buffer able to hold sufficient tokens to allow at least one word to be buffered. If an output instruction is executed when the channel is too full to take the data then the thread which executed the instruction is paused. It is restarted when there is enough room in the channel for the instruction to successfully complete. Likewise, when an input instruction is executed and there is not enough data available then the thread is paused and will be restarted when enough data becomes available.
Channel ends have a buffer able to hold sufficient tokens to
allow at least one word to be buffered. If an output instruction
is executed when the channel is too full to take the data then
the thread which executed the instruction is paused. It is
restarted when there is enough room in the channel for the
instruction to successfully complete. Likewise, when
an input instruction is executed and there is not enough data
available then the thread is paused and will be restarted
when enough data becomes available.
 
A thread can, with a single instruction, synchronise with a group of threads using a barrier synchronisation. Alternatively a thread can synchronise using a lock, providing mutual exclusion. In order to communicate data when using barriers and locks, threads can either write data into the registers of another thread, or they can access memory of another thread (provided both threads execute on the same core). If shared memory is used, then the compiler or the programmer must ensure that there are no race conditions.
 
==I/O and timing instructions==
 
The XS1 architecture is event-driven. It has an instruction that can dispatch an external [[Event (computing)|events]] in addition to traditional [[interrupts]]. If the program chooses to use events, then the underlying processor has to expect an event and wait in a specific place so that it can be handled synchronously. If desired, I/O can be handled asynchronously using interrupts. Events and interrupts can be used on any resource that the implementation supports.
 
Line 172 ⟶ 157:
 
==Devices==
 
The XS1 instruction set is implemented by the [[XCore XS1-G4]], [[XCore XS1-L1]], [[XCore XS1-SU]], and [[XCore XS1-AnA]]. The former is a four-core processing node, the latter three are single and dual core processing nodes.