Systolic array

This is an old revision of this page, as edited by RainierHa (talk | contribs) at 00:18, 21 May 2007. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computer architecture, a systolic array is a pipe network arrangement of data processing units (DPUs (see figure, for instance, with 32 bit wide DPUs). DPUs are similar to central processing units (CPU)s, but do not have a program counter, since operation is transport-triggered, i.e., by the arrival of a data object (also used in transport triggered architectures), in an array (often rectangular) where data flows across the array between neighbours, usually with different data flowing in different directions. The data streams entering and leaving the ports of the array are generated by auto-sequencing memory units (ASMs). Each ASM includes a data counter. In Embedded Systems a data stream may also be input from and/or output to an external source.

The systolic array paradigm, data-stream-driven by data counters, is the counterpart of the von Neumann paradigm, instruction-stream-driven by a program counter (see von Neumann or von Neumann architecture). Because a systolic array includes multiple data counters, it supports data parallelism. The name derives from analogy with the regular pumping of blood by the heart.

H. T. Kung and Charles E. Leiserson published the first paper describing systolic arrays in 1978; however, the first machine known to have used the technique was the Colossus Mark II in 1944.

Each processor at each step takes in data from one or more neighbours (e.g. North and West), processes it and, in the next step, outputs results in the opposite direction (South and East).

An example of a systolic algorithm might be matrix multiplication. One matrix is fed in a row at a time from the top of the array and is passed down the array, the other matrix is fed in a column at a time from the left hand side of the array and passes from left to right. Dummy values are then passed in until each processor has seen one whole row and one whole column. At this point, the result of the multiplication is stored in the array and can now be output a row or a column at a time, flowing down or across the array.

Systolic arrays are arrays of processors which are connected to a small number of nearest neighbours in a mesh-like topology. Processors perform a sequence of operations on data that flows between them. Generally the operations will be the same in each processor, with each processor performing an operation (or small number of operations) on a data item and them passing it on to its neighbour. Like SIMD machines, systolic arrays compute in "lock-step" with each processor undertaking alternate compute | communicate phases. One well-known systolic array is CMU's iWarp processor, which has been manufactured by Intel. An iWarp system has a linear array processors connected by data buses going in both directions.


An Example - Poynomial Evaluation Horner's rule for evaluating a polynomial is: y = ((((anx + an-1)*x + an-2)*x + an-3)*x .... a1)*x + a0

A linear systolic array in which the processors are arranged in pairs: one multiplies its input by x and passes the result to the right, the next adds aj and passes the result to the right:

See also

This article is based on material taken from the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing" terms of the GFDL, version 1.3 or later.