Analysis of parallel algorithms

This article discusses the analysis of parallel algorithms running on parallel random-access machines (PRAMs): models of parallel computation where an unbounded number of processors have shared access to a random-access memory (RAM).

Overview

Suppose computations are executed on a PRAM machine using $p$ processors. Let $T p$ denote the time that expires between the start of the computation and its end. Analysis of the computation's running time focuses on the following notions:

The work of a computation executed by $p$ processors is the total number of primitive operations that the processors perform.^[1] Ignoring communication overhead from synchronizing the processors, this is equal to the time used to run the computation on a single processor, denoted $T 1$ .
The span or critical path length is the time $T \infty$ spent computing using an idealized machine with an infinite number of processors.^[2]
The cost of the computation is the quantity $pT p$ . This expresses the total time spent, by all processors, in both computing and waiting.^[1]

Several useful results follow from the definitions of work, span and cost:

Work law. The cost is always at least the work: $pT p \geq T 1$ . This follows from the fact $p$ processors can perform at most $p$ operations in parallel.^[2]^[1]
Span law. A finite number $p$ of processors cannot outperform an infinite number, so that $T p \geq T \infty$ .^[2]

Using these definitions and laws, the following measures of performance can be given:

Speedup is the gain in speed made by parallel execution compared to sequential execution: $S p = T 1 ∕ T p$ . When the speedup is $Ω(n)$ for input size $n$ (using big O notation), the speedup is linear, which is optimal in the PRAM model because the work law implies that $T 1 ∕ T p \leq p$ (super-linear speedup can occur in practice due to memory hierarchy effects). The situation $T p ∕ T 1 = p$ is called perfect linear speedup.^[2] An algorithm that exhibits linear speedup is said to be scalable.^[1]
Efficiency is the speedup per processor, $S p ∕ p$ .^[1]
Parallelism is the ratio $T 1 ∕ T \infty$ . It represents the maximum possible speedup on any number of processors. By the span law, the parallelism bounds the speedup: if $p > T 1 ∕ T \infty$ , then $T 1 ∕ T p \leq T 1 ∕ T \infty < p$ .^[2]
The slackness is $T 1 ∕ (pT \infty)$ . A slackness less than one implies (by the span law) that perfect linear speedup is impossible on $p$ processors.^[2]

Execution on a limited number of processors

Any computation that can run in parallel on $P$ processors can be executed on $p < P$ processors by letting each processor execute multiple units of work; a result called Brent's law states that^[3]

T_{p}\leq T_{P}+{\frac {T_{1}-T_{P}}{p}}

References

^ ^a ^b ^c ^d ^e Casanova, Henri; Legrand, Arnaud; Robert, Yves (2008). Parallel Algorithms. CRC Press. p. 10.
^ ^a ^b ^c ^d ^e ^f Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009) [1990]. Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. ISBN 0-262-03384-4.
^ Gustafson, John L. (2011,). "Brent's Theorem". Encyclopedia of Parallel Computing. pp. 182–185. {{cite encyclopedia}}: Check date values in: |year= (help)CS1 maint: extra punctuation (link) CS1 maint: year (link)

[casanova-1] Casanova, Henri; Legrand, Arnaud; Robert, Yves (2008). Parallel Algorithms. CRC Press. p. 10.

[clrs-2] ^ ^a ^b ^c ^d ^e ^f Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009) [1990]. Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. ISBN 0-262-03384-4.

[3] Gustafson, John L. (2011,). "Brent's Theorem". Encyclopedia of Parallel Computing. pp. 182–185. {{cite encyclopedia}}: Check date values in: |year= (help)CS1 maint: extra punctuation (link) CS1 maint: year (link)

[1]

[2]

[3]