Analysis of parallel algorithms: Difference between revisions

Content deleted Content added
m lc common adjective
Tags: Visual edit Mobile edit Mobile web edit Advanced mobile edit
m top: Typo fixing, fixed "the the"
 
(4 intermediate revisions by 4 users not shown)
Line 1:
{{short description|Subfield of computer science}}
 
In [[computer science]], the '''analysis of parallel [[algorithms]]''' is the process of finding the [[computational complexity]] of algorithms[[algorithm]]s executed in [[Parallel computing|parallel]] – the amount of time, storage, or other [[System resource|resources]] needed to execute them. In many respects, '''analysis of [[parallel algorithms'''algorithm]]s is similar to the [[analysis of algorithms|the analysis]] of [[sequential algorithmsalgorithm]]s, but is generally more involved because one must reason about the behavior of multiple cooperating [[Thread (computing)|threads]] of execution. One of the primary goals of parallel analysis is to understand how a parallel algorithm's use of resources (speed, space, etc.) changes as the number of processors is changed.
 
==Background==
Line 53 ⟶ 54:
 
==Definitions==
{{anchor|Overview}}
Suppose computations are executed on a machine that has {{mvar|p}} processors. Let {{mvar|T<sub>p</sub>}} denote the time that expires between the start of the computation and its end. Analysis of the computation's [[Time complexity|running time]] focuses on the following notions:
 
* The ''work'' of a computation executed by {{mvar|p}} processors is the total number of primitive operations that the processors perform.<ref name="casanova">{{cite book |title=Parallel Algorithms |first1=Henri |last1=Casanova |first2=Arnaud |last2=Legrand |first3=Yves |last3=Robert |publisher=CRC Press |year=2008 |pages=10|citeseerx=10.1.1.466.8142 }}</ref> Ignoring communication overhead from synchronizing the processors, this is equal to the time used to run the computation on a single processor, denoted {{math|''T''<sub>1</sub>}}.
* The ''depth'' or ''span'' is the length of the longest series of operations that have to be performed sequentially due to [[data dependency|data dependencies]] (the ''{{visible anchor|Critical path|text=critical path}}''). The depth may also be called the ''critical path length'' of the computation.<ref name="cacm">{{cite journal |first=Guy |last=Blelloch |title=Programming Parallel Algorithms |journal=[[Communications of the ACM]] |volume=39 |issue=3 |year=1996 |doi=10.1145/227234.227246 |url=https://www.cs.cmu.edu/afs/cs/Web/People/blelloch/papers/B85.pdf |pages=85–97|citeseerx=10.1.1.141.5884 |s2cid=12118850 }}</ref> Minimizing the depth/span is important in designing parallel algorithms, because the depth/span determines the shortest possible execution time.<ref name="spp">{{cite book |author1=Michael McCool |author2=James Reinders |author3=Arch Robison |title=Structured Parallel Programming: Patterns for Efficient Computation |publisher=Elsevier |year=2013 |pages=4–5}}</ref> Alternatively, the span can be defined as the time {{math|''T''<sub>∞</sub>}} spent computing using an idealized machine with an infinite number of processors.<ref name="clrs">{{Introduction to Algorithms|3|pages=779–784}}</ref>
* The ''cost'' of the computation is the quantity {{mvar|pT<sub>p</sub>}}. This expresses the total time spent, by all processors, in both computing and waiting.<ref name="casanova"/>
 
Line 66 ⟶ 68:
Using these definitions and laws, the following measures of performance can be given:
 
* ''[[Speedup]]'' is the gain in speed made by parallel execution compared to sequential execution: {{math|1=''S<sub>p</sub>'' = ''T''<sub>1</sub> / ''T<sub>p</sub>''}}. When the speedup is {{math|Ω(''p'')}} for {{mvar|p}} processors (using [[big O notation]]), the speedup is linear, which is optimal in simple models of computation because the work law implies that {{math|''T''<sub>1</sub> / ''T<sub>p</sub>'' ≤ ''p''}} ([[Speedup#Super-linear speedup|super-linear speedup]] can occur in practice due to [[memory hierarchy]] effects). The situation {{math|1=''T''<sub>1</sub> / ''T<sub>p</sub>'' = ''p''}} is called perfect linear speedup.<ref name="clrs"/> An algorithm that exhibits linear speedup is said to be [[scalability|scalable]].<ref name="casanova"/> Analytical expressions for the speedup of many important parallel algorithms are presented in this book.<ref>{{Cite book |last=Kurgalin |first=Sergei |title=The discrete math workbook: a companion manual using Python |last2=Borzunov |first2=Sergei |date=2020 |publisher=Springer Naturel |isbn=978-3-030-42220-2 |edition=2nd |series=Texts in Computer Science |___location=Cham, Switzerland}}</ref>.
* ''Efficiency'' is the speedup per processor, {{math|''S<sub>p</sub>'' / ''p''}}.<ref name="casanova"/>
* ''Parallelism'' is the ratio {{math|''T''<sub>1</sub> / ''T''<sub>∞</sub>}}. It represents the maximum possible speedup on any number of processors. By the span law, the parallelism bounds the speedup: if {{math|''p'' > ''T''<sub>1</sub> / ''T''<sub>∞</sub>}}, then:<ref name="clrs" /> <math display="block">\frac{T_1}{T_p} \leq \frac{T_1}{T_\infty} < p.</math>