Content deleted Content added
Typo/general fixes, replaced: an reduce → a reduce, an reduce-operation → a reduce-operation, added orphan tag, typo(s) fixed: Therefore → Therefore, (5) |
|||
Line 1:
{{Multiple issues|
{{Orphan|date=February 2019}}
{{no footnotes|date=March 2018}}
}}
A central problem in [[graph theory]] is the [[shortest path problem]]. Hereby, the problem of finding the shortest path between every pair of nodes is known as '''all-pair-shortest-paths (APSP)''' problem. As [[sequential algorithm]]s for this problem often yield long runtimes, parallelization has shown to be beneficial in this field. In this article two efficient algorithms solving this problem are introduced.
Line 40 ⟶ 43:
A trivial parallelization can be obtained by parallelizing the loop of <code>DijkstraAPSP</code> in line''8''.
However, when using the sequential <code>DijkstraSSSP</code> this limits the number of processors to be used by the number of iterations executed in the loop.
Therefore, for this trivial parallelization <math>|V|</math> is an upper bound for the number of processors.
For example, let the number of processors <math>p</math> be equal to the number of nodes <math>|V|</math>. This results in each processor executing <code>DijkstraSSSP</code> exactly once in parallel.
Line 46 ⟶ 49:
In total this yields a runtime of <math>O(|V|^2 \cdot \frac{|V|}{p})</math>, when <math>|V|</math> is a multiple of <math>p</math>.
Consequently, the efficiency of this parallelization is perfect: Employing <math>p</math> processors reduces the runtime by the factor <math>p</math>.
Another benefit of this parallelization is that no communication between the processors is required. However, it is required that every processor has enough local memory to store the entire adjacency matrix of the graph.
Line 58 ⟶ 61:
With this definition each partition has a size of <math>k=\frac{p}{|V|}</math> processors. The partitions can perform their computations in parallel as the results of each are independent of each other. Therefore, the parallelization presented in the previous section corresponds to a partition size of 1 with <math>p=|V|</math> processors.
The main difficulty is the parallelization of multiple processors executing <code>DijkstraSSSP</code> for a single root node. The idea for this parallelization is to distribute the management of the distancelist <math>d_v</math> in DijkstraSSSP within the partition. Each processor in the partition therefore is exclusively responsible for <math>\frac{|V|}{k}</math> elements of <math>d_v</math>. For example, consider <math>|V|=4</math> and <math>p=8</math>: this yields a partition size of <math>k=2</math>. In this case, the first processor of each partition is responsible for <math>d_{v,1}</math>, <math>d_{v,2}</math> and the second processor is responsible for <math>d_{v,3}</math> and <math>d_{v,4}</math>. Hereby, the total distance lists is <math>d_v = [d_{v,1},d_{v,2},d_{v,3},d_{v,4}]</math>.
The <code>DijkstraSSSP</code> algorithm mainly consists of the repetition of two steps: First, the nearest node <math>x</math> in the distancelist <math>d_v</math> has to be found. For this node the shortest path already has been found.
Line 66 ⟶ 69:
# Find the node <math>x</math> with the shortest distance in <math>d_v</math>.
#* Each processor owns a part of <math>d_v</math>: Each processor scans for the local minimum <math>\tilde{x}</math> in his part, for example using linear search.
#* Compute the global minimum <math>x</math> in <math>d_v</math> by performing
#* [[Broadcast]] the global minimum <math>x</math> to all nodes in the partition.
# Adjust the distance of all neighbors of <math>x</math> in <math>d_v</math>
Line 97 ⟶ 100:
For the second iteration the subtasks executed are shown explicitly in the image:
# Computation of the local minimum node in <math>d_A</math>
# Computation of the globalminimum node in <math>d_A</math> through
# Broadcast of the global minimum node in <math>d_A</math>
# Marking of the global nearest node as "finished" and adjusting the distance of its neighbors
Line 103 ⟶ 106:
== Floyd algorithm ==
The Floyd algorithm solves the All-Pair-Shortest-Paths problem for directed graphs. With the [[adjacency matrix]] of a graph as input, it calculates shorter paths iterative. After |''V''| iterations the distance-matrix contains all the shortest paths. The following describes a sequentiel version of the algorithm in pseudo code:
1 '''func''' Floyd_All_Pairs_SP(''A'') {
Line 115 ⟶ 118:
[[File:2d block-mapping.png|thumb|300x300px|partition of a matrix with 2-D block mapping]]
Where ''A'' is the [[adjacency matrix]], ''n'' = |''V''| the number of nodes and ''D'' the distance matrix. For a more detailed description of the sequential algorithm look up [[
===Parallelization===
Line 151 ⟶ 154:
As there is no additional computation in the algorithm and the computation is split equally among the ''p'' processes, we have a runtime of <math>O(n^3 / p)</math> for the computational part.
In each iteration of the algorithm there is a one-to-all broadcast operation performed along the row and column of the processes. There are <math> n / \sqrt p </math> elements broadcast. Afterwards there is a synchronisation step performed. How much time these operations take is highly dependent on the architecture of the parallel system used. Therefore, the time needed for communication and data transfer in the algorithm is <math>T_\text{comm} = n (T_\text{synch} + T_\text{broadcast})</math>.
For the whole algorithm we have the following runtime:
Line 160 ⟶ 163:
For the runtime of the data transfer between the processes in the pipelined version of the algorithm we assume that a process can transfer ''k'' elements to a neighbouring process in <math>O(k)</math> time. In every step there are <math>n / \sqrt p </math> elements of a row or a column send to a neighbouring process. Such a step takes <math>O(n / \sqrt p )</math> time. After <math>\sqrt p </math> steps the relevant data of the first row and column arrive at process <math> p_{\sqrt p ,\sqrt p}</math> (in <math>O(n)</math> time).
The values of successive rows and columns follow after time <math>O(n^2 / p)</math> in a pipelined mode. Process <math>p_{\sqrt p ,\sqrt p}</math> finishes its last computation after O(<math>n^3 / p</math>) + O(<math>n</math>) time. Therefore, the additional time needed for communication in the pipelined version is <math>O(n)</math>.
The overall runtime for the pipelined version of the algorithm is:
|