Analysis of parallel algorithms: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 01:05, 16 August 2024 edit Nyq (talk \| contribs) Extended confirmed users 5,837 edits m lc common adjective Tags: Visual edit Mobile edit Mobile web edit Advanced mobile edit ← Previous edit		Latest revision as of 11:51, 27 January 2025 edit undo John of Reading (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers 787,571 edits m →top: Typo fixing, fixed "the the" Tag: AWB
(4 intermediate revisions by 4 users not shown)
Line 1: {{short description\|Subfield of computer science}} In [[computer science]], ~~the~~ '''analysis of parallel [[algorithms]]''' is the process of finding the [[computational complexity]] of ~~algorithms~~[[algorithm]]s executed in [[Parallel computing\|parallel]] – the amount of time, storage, or other [[System resource\|resources]] needed to execute them. In many respects, ~~'''~~analysis of [[parallel ~~algorithms'''~~algorithm]]s is similar to ~~the~~ [[analysis of algorithms\|the analysis]] of [[sequential ~~algorithms~~algorithm]]s, but is generally more involved because one must reason about the behavior of multiple cooperating [[Thread (computing)\|threads]] of execution. One of the primary goals of parallel analysis is to understand how a parallel algorithm's use of resources (speed, space, etc.) changes as the number of processors is changed. ==Background== Line 53 ⟶ 54: ==Definitions== {{anchor\|Overview}} Suppose computations are executed on a machine that has {{mvar\|p}} processors. Let {{mvar\|T<sub>p</sub>}} denote the time that expires between the start of the computation and its end. Analysis of the computation's [[Time complexity\|running time]] focuses on the following notions: * The ''work'' of a computation executed by {{mvar\|p}} processors is the total number of primitive operations that the processors perform.<ref name="casanova">{{cite book \|title=Parallel Algorithms \|first1=Henri \|last1=Casanova \|first2=Arnaud \|last2=Legrand \|first3=Yves \|last3=Robert \|publisher=CRC Press \|year=2008 \|pages=10\|citeseerx=10.1.1.466.8142 }}</ref> Ignoring communication overhead from synchronizing the processors, this is equal to the time used to run the computation on a single processor, denoted {{math\|''T''<sub>1</sub>}}. * The ''depth'' or ''span'' is the length of the longest series of operations that have to be performed sequentially due to [[data dependency\|data dependencies]] (the ''{{visible anchor\|Critical path\|text=critical path}}''). The depth may also be called the ''critical path length'' of the computation.<ref name="cacm">{{cite journal \|first=Guy \|last=Blelloch \|title=Programming Parallel Algorithms \|journal=[[Communications of the ACM]] \|volume=39 \|issue=3 \|year=1996 \|doi=10.1145/227234.227246 \|url=https://www.cs.cmu.edu/afs/cs/Web/People/blelloch/papers/B85.pdf \|pages=85–97\|citeseerx=10.1.1.141.5884 \|s2cid=12118850 }}</ref> Minimizing the depth/span is important in designing parallel algorithms, because the depth/span determines the shortest possible execution time.<ref name="spp">{{cite book \|author1=Michael McCool \|author2=James Reinders \|author3=Arch Robison \|title=Structured Parallel Programming: Patterns for Efficient Computation \|publisher=Elsevier \|year=2013 \|pages=4–5}}</ref> Alternatively, the span can be defined as the time {{math\|''T''<sub>∞</sub>}} spent computing using an idealized machine with an infinite number of processors.<ref name="clrs">{{Introduction to Algorithms\|3\|pages=779–784}}</ref> * The ''cost'' of the computation is the quantity {{mvar\|pT<sub>p</sub>}}. This expresses the total time spent, by all processors, in both computing and waiting.<ref name="casanova"/> Line 66 ⟶ 68: Using these definitions and laws, the following measures of performance can be given: * ''[[Speedup]]'' is the gain in speed made by parallel execution compared to sequential execution: {{math\|1=''S<sub>p</sub>'' = ''T''<sub>1</sub> / ''T<sub>p</sub>''}}. When the speedup is {{math\|Ω(''p'')}} for {{mvar\|p}} processors (using [[big O notation]]), the speedup is linear, which is optimal in simple models of computation because the work law implies that {{math\|''T''<sub>1</sub> / ''T<sub>p</sub>'' ≤ ''p''}} ([[Speedup#Super-linear speedup\|super-linear speedup]] can occur in practice due to [[memory hierarchy]] effects). The situation {{math\|1=''T''<sub>1</sub> / ''T<sub>p</sub>'' = ''p''}} is called perfect linear speedup.<ref name="clrs"/> An algorithm that exhibits linear speedup is said to be [[scalability\|scalable]].<ref name="casanova"/> Analytical expressions for the speedup of many important parallel algorithms are presented in this book.<ref>{{Cite book \|last=Kurgalin \|first=Sergei \|title=The discrete math workbook: a companion manual using Python \|last2=Borzunov \|first2=Sergei \|date=2020 \|publisher=Springer Naturel \|isbn=978-3-030-42220-2 \|edition=2nd \|series=Texts in Computer Science \|___location=Cham, Switzerland}}</ref>. * ''Efficiency'' is the speedup per processor, {{math\|''S<sub>p</sub>'' / ''p''}}.<ref name="casanova"/> * ''Parallelism'' is the ratio {{math\|''T''<sub>1</sub> / ''T''<sub>∞</sub>}}. It represents the maximum possible speedup on any number of processors. By the span law, the parallelism bounds the speedup: if {{math\|''p'' > ''T''<sub>1</sub> / ''T''<sub>∞</sub>}}, then:<ref name="clrs" /> <math display="block">\frac{T_1}{T_p} \leq \frac{T_1}{T_\infty} < p.</math>