Systolic array: Difference between revisions

Content deleted Content added
Add how/why systolic arrays had to be rediscovered
m spelling
Line 1:
{{Use dmy dates|date=October 2019}}
In [[parallel computing|parallel]] [[computer architectures]], a '''systolic array''' is a homogeneous [[Graph (discrete mathematics)|network]] of tightly coupled [[data processing unit]]s (DPUs) called cells or [[Node (computer science)|node]]s. Each node or DPU independently computes a partial result as a function of the data received from its upstream neighborsneighbours, stores the result within itself and passes it downstream. Systolic arrays were first used in [[Colossus computer|Colossus]], which was an early computer used to break German [[Lorenz cipher|Lorenz]] ciphers during [[World War II]].<ref>{{YouTube |id=g2tMcMQqSbA |title=Colossus - The Greatest Secret in the History of Computing |time=41m45s}}</ref> Due to the classified nature of Colossus, they were independently invented or rediscovered by [[H. T. Kung]] and [[Charles Leiserson]] who described arrays for many dense linear algebra computations (matrix product, solving systems of [[linear equation]]s, [[LU decomposition]], etc.) for banded matrices. Early applications include computing [[greatest common divisor]]s of integers and polynomials.<ref>http://www.eecs.harvard.edu/~htk/publication/1984-ieeetoc-brent-kung.pdf</ref> They are sometimes classified as [[MISD|multiple-instruction single-data]] (MISD) architectures under [[Flynn's taxonomy]], but this classification is questionable because a strong argument can be made to distinguish systolic arrays from any of Flynn's four categories: [[SISD]], [[SIMD]], [[MISD]], [[MIMD]], as discussed later in this article.
 
The parallel input [[data]] flows through a network of hard-wired [[Microprocessor|processor]] nodes, which combine, process, [[merge algorithm|merge]] or [[sorting algorithm|sort]] the input data into a derived result. Because the [[wave]]-like propagation of data through a systolic array resembles the [[pulse]] of the human circulatory system, the name ''systolic'' was coined from medical terminology. The name is derived from [[systole]] as an analogy to the regular pumping of blood by the heart.
Line 8:
 
==Architecture==
A systolic array typically consists of a large [[monolithic system|monolithic]] [[Graph (discrete mathematics)|network]] of primitive computing [[Node (computer science)|node]]s which can be hardwired or software configured for a specific application. The nodes are usually fixed and identical, while the interconnect is programmable. The more general '''wavefrontwave front''' processors, by contrast, employ sophisticated and individually programmable nodes which may or may not be monolithic, depending on the array size and design parameters. The other distinction is that systolic arrays rely on [[synchronous]] data transfers, while [[wavefront]] tend to work [[wikt:asynchronous|asynchronous]]ly.
 
Unlike the more common [[Von Neumann architecture]], where program execution follows a script of instructions stored in common memory, [[address space|addressed]] and sequenced under the control of the [[CPU]]'s [[program counter]] (PC), the individual nodes within a systolic array are triggered by the arrival of new data and always process the data in exactly the same way. The actual processing within each node may be hard wired or block [[microcode|micro code]]d, in which case the common node personality can be block programmable.
 
The systolic array paradigm with data-streams driven by data [[Counter (digital)|counter]]s, is the counterpart of the Von Neumann architecture with instruction-stream driven by a program counter. Because a systolic array usually sends and receives multiple data streams, and multiple data counters are needed to generate these data streams, it supports [[data parallelism]].
Line 35:
An example of a systolic [[algorithm]] might be designed for [[matrix multiplication]]. One [[matrix (math)|matrix]] is fed in a row at a time from the top of the array and is passed down the array, the other matrix is fed in a column at a time from the left hand side of the array and passes from left to right. Dummy values are then passed in until each processor has seen one whole row and one whole column. At this point, the result of the multiplication is stored in the array and can now be output a row or a column at a time, flowing down or across the array.<ref>[http://web.cecs.pdx.edu/~mperkows/temp/May22/0020.Matrix-multiplication-systolic.pdf Systolic Array Matrix Multiplication]</ref>
 
Systolic arrays are arrays of [[data processing unit|DPU]]s which are connected to a small number of nearest neighbour DPUs in a mesh-like topology. DPUs perform a sequence of operations on data that flows between them. Because the traditional systolic array synthesis methods have been practiced by algebraic algorithms, only uniform arrays with only linear pipes can be obtained, so that the architectures are the same in all DPUs. The consequence is, that only applications with regular data dependencies can be implemented on classical systolic arrays. Like [[SIMD]] machines, clocked systolic arrays compute in "lock-step" with each processor undertaking alternate compute | communicate phases. But systolic arrays with asynchronous handshake between DPUs are called ''wavefront arrays''.
phases. But systolic arrays with asynchronous handshake between DPUs are called ''wavefront arrays''.
One well-known systolic array is Carnegie Mellon University's [[iWarp]] processor, which has been manufactured by Intel. An iWarp system has a linear array processor connected by data buses going in both directions.