This article discusses the analysis of parallel algorithms running on parallel random-access machines (PRAMs): models of parallel computation where an unbounded number of processors have shared access to a random-access memory (RAM).
Overview
Suppose computations are executed on a PRAM machine using p processors. Let Tp denote the time that expires between the start of the computation and its end. Analysis of the computation's running time focuses on the following notions:
- The work of a computation executed by p processors is the total number of primitive operations that the processors perform.[1] Ignoring communication overhead from synchronizing the processors, this is equal to the time used to run the computation on a single processor, denoted T1.
- The span or critical path length is the time T∞ spent computing using an idealized machine with an infinite number of processors.[2]
- The cost of the computation is the quantity pTp. This expresses the total time spent, by all processors, in both computing and waiting.[1]
Several useful results follow from the definitions of work, span and cost:
- Work law. The cost is always at least the work: pTp ≥ T1. This follows from the fact p processors can perform at most p operations in parallel.[2][1]
- Span law. A finite number p of processors cannot outperform an infinite number, so that Tp ≥ T∞.[2]
Using these definitions and laws, the following measures of performance can be given:
- Speedup is the gain in speed made by parallel execution compared to sequential execution: Sp = T1 ∕ Tp. When the speedup is Ω(n) for input size n (using big O notation), the speedup is linear, which is optimal in the PRAM model because the work law implies that T1 ∕ Tp ≤ p (super-linear speedup can occur in practice due to memory hierarchy effects). The situation Tp ∕ T1 = p is called perfect linear speedup.[2] An algorithm that exhibits linear speedup is said to be scalable.[1]
- Efficiency is the speedup per processor, Sp ∕ p.[1]
- Parallelism is the ratio T1 ∕ T∞. It represents the maximum possible speedup on any number of processors. By the span law, the parallelism bounds the speedup: if p > T1 ∕ T∞, then T1 ∕ Tp ≤ T1 ∕ T∞ < p.[2]
- The slackness is T1 ∕ (pT∞). A slackness less than one implies (by the span law) that perfect linear speedup is impossible on p processors.[2]
Execution on a limited number of processors
Any computation that can run in parallel on P processors can be executed on p < P processors by letting each processor execute multiple units of work; a result called Brent's law states that[3]
References
- ^ a b c d e Casanova, Henri; Legrand, Arnaud; Robert, Yves (2008). Parallel Algorithms. CRC Press. p. 10.
- ^ a b c d e f Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009) [1990]. Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. ISBN 0-262-03384-4.
- ^ Gustafson, John L. (2011,). "Brent's Theorem". Encyclopedia of Parallel Computing. pp. 182–185.
{{cite encyclopedia}}
: Check date values in:|year=
(help)CS1 maint: extra punctuation (link) CS1 maint: year (link)