Parallel all-pairs shortest path algorithm: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 00:27, 13 June 2019 edit Jarble (talk \| contribs) Autopatrolled, Extended confirmed users 150,084 edits adding links to references using Google Scholar ← Previous edit		Latest revision as of 02:09, 28 July 2025 edit undo SchlurcherBot (talk \| contribs) Bots 59,052 edits m Bot: http → https
(23 intermediate revisions by 18 users not shown)
Line 1: {{Multiple issues\| ~~{{Orphan\|date=February 2019}}~~ {{no footnotes\|date=March 2018}} {{~~manual~~how-to\|date=May 2019}} }} A central problem in algorithmic [[graph theory]] is the [[shortest path problem]]. Hereby, the problem of finding the shortest path between every pair of nodes is known as '''all-pair-shortest-paths (APSP)''' problem. As [[sequential algorithm]]s for this problem often yield long runtimes, parallelization has shown to be beneficial in this field. In this article two efficient algorithms solving this problem are introduced. Another variation of the problem is the single-source-shortest-paths (SSSP) problem, which also has parallel approaches: [[Parallel single-source shortest path algorithm]]. == Problem definition== Let <math>G=(V,E,w)</math> be a directed Graph with the set of nodes <math>V</math> and the set of edges <math>E\subseteq V\times V</math>. Each edge <math>e \in E</math> has a weight <math>w(e)</math> assigned. The goal of the all-pair-shortest-paths problem is to find the shortest path between '''all''' pairs of nodes of the graph. For this path to be unique it is required that the graph does not contain cycles with a negative weight. In the remainder of the article it is assumed that the graph is represented using an [[adjacency matrix]]. We expect the output of the algorithm to be a distancematrix <math>D</math>. In <math>D</math>, every entry <math>d-d_{i,j}</math> is the weight of the shortest path in <math>G</math> from node <math>i</math> to node <math>j</math>. The [[Floyd algorithm]] presented later can handle negative edge weights, whereas the [[Dijkstra algorithm]] requires all edges to have a positive weight. Line 31 ⟶ 32: 12 } In this example we assume that <code>~~DisjktraSSSP~~DijkstraSSSP</code> takes the graph <math>G</math> and the root node <math>v</math> as input. The result of the execution in turn is the distancelist <math>d_v</math>. In <math>d_v</math>, the <math>i</math>-th element stores the distance from the root node <math>v</math> to the node <math>i</math>. Therefore the list <math>d_v</math> corresponds exactly to the <math>v</math>-th row of the APSP distancematrix <math>D</math>. For this reason, <code>DijkstraAPSP</code> iterates over all nodes of the graph <math>G</math> and executes <code>~~DisjktraSSSP~~DijkstraSSSP</code> with each as root node while storing the results in <math>D</math>. The runtime of <code>DijkstraSSSP</code> is <math>O(\|E\|+\|V\|\log(\|V\|^2))</math> as we expect the graph to be represented using an [[adjacency matrix]]. Therefore <code>DijkstraAPSP</code> has a total sequential runtime of <math>O(\|E\|\|V\|+\|V\|^32\log(\|V\|))</math>. ===Parallelization for up to \|''V''\| processors=== Line 70 ⟶ 71: #* Each processor owns a part of <math>d_v</math>: Each processor scans for the local minimum <math>\tilde{x}</math> in his part, for example using linear search. #* Compute the global minimum <math>x</math> in <math>d_v</math> by performing a [[reduce-operation]] across all <math>\tilde{x}</math>. #* [[Broadcast]] the global minimum <math>x</math> to all nodes in the partition. # Adjust the distance of all neighbors of <math>x</math> in <math>d_v</math> #* Every processors now knows the global nearest node <math>x</math> and its distance. Based on this information, adjust the neighbors of <math>x</math> in <math>d_v</math> which are managed by the corresponding processor. Line 96 ⟶ 97: The top row in the image corresponds to <math>d_A</math> after the initialization, the bottom one to <math>d_A</math> after the termination of the algorithm. The nodes are distributed in a way that '''p1''' is responsible for the nodes '''A''' and '''B''', while '''p2''' is responsible for '''C''' and '''D'''. The distancelist <math>d_A</math> is distributed according to this. For the second iteration the subtasks executed are shown explicitly in the image: Line 104 ⟶ 105: # Marking of the global nearest node as "finished" and adjusting the distance of its neighbors == ~~Floyd~~Floyd–Warshall algorithm == The ~~Floyd~~[[Floyd–Warshall algorithm]] solves the All-Pair-Shortest-Paths problem for directed graphs. With the [[adjacency matrix]] of a graph as input, it calculates shorter paths iterative. After \|''V'' \| iterations the distance-matrix contains all the shortest paths. The following describes a ~~sequentiel~~sequential version of the algorithm in pseudo code: 1 '''func''' Floyd_All_Pairs_SP(''A'') { Line 118 ⟶ 119: [[File:2d block-mapping.png\|thumb\|300x300px\|partition of a matrix with 2-D block mapping]] Where ''A'' is the [[adjacency matrix]], ''n'' = \|''V'' \| the number of nodes and ''D'' the distance matrix~~. For a more detailed description of the sequential algorithm look up [[Floyd–Warshall algorithm]]~~. ===Parallelization=== The basic idea to parallelize the algorithm is to partition the matrix and split the computation between the processes. Each process is assigned to a specific part of the matrix. A common way to achieve this is '''2-D Block Mapping'''. Here the matrix is partitioned into squares of the same size and each square gets assigned to a process. For aan <math>n \times n</math>-matrix and ''p'' processes each process calculates a <math>n/ \sqrt p \times n/ \sqrt p</math> sized part of the distance matrix. For <math> p = n^2 </math> processes each would get assigned to exactly one element of the matrix. Because of that the parallelization only scales to a maximum of <math>n^2</math> processes. In the following we refer with <math>p_{i,j}</math> to the process that is assigned to the square in the i-th row and the j-th column. As the calculation of the parts of the distance matrix is dependent on results from other parts the processes have to communicate between each other and exchange data. In the following we refer with <math>d^{(k)}_{i,j}</math> to the element of the i-th row and j-th column of the distance matrix after the k-th iteration. To calculate <math>d^{(k)}_{i,j}</math> we need the elements <math>d^{(k-1)}_{i,j}</math>, <math>d^{(k-1)}_{i,k}</math> and <math>d^{(k-1)}_{k,j}</math> as specified in line 6 of the algorithm. <math>d^{(k-1)}_{i,j}</math> is available to each ~~processe~~process as it was calculated by itself in the previous iteration. Additionally each process needs a part of the k-th row and the k-th column of the <math>D^{k-1}</math> matrix. The <math>d^{(k-1)}_{i,k}</math> element holds a process in the same row and the <math>d^{(k-1)}_{k,j}</math> element holds a process in the same column as the process that wants to compute <math>d^{(k)}_{i,j}</math>. Each process that calculated a part of the k-th row in the <math>D^{k-1}</math> matrix has to send this part to all processes in its column. Each process that calculated a part of the k-th column in the <math>D^{k-1}</math> matrix has to send this part to all processes in its row. All this processes have to do a one-to-all-broadcast operation along the row or the column. The data dependencies are illustrated in the image below. Line 131 ⟶ 132: 1 '''func''' Floyd_All_Pairs_Parallel(<math>D^{(0)}</math>) { 2 '''for''' ''k'' := 1 '''to''' ''n'' '''do''' { 3 Each process <math>p_{i,j}</math> that has a segment of the k-th row of <math>D^{(k-1)}</math>, broadcasts it to the <math>p_{,j}</math> processes; 4 Each process <math>p_{i,j}</math> that has a segment of the k-th column of <math>D^{(k-1)}</math>, broadcasts it to the <math>p_{i,}</math> processes; 5 Each process waits to receive the needed segments; 6 Each process computes its part of the <math>D^{(k)}</math> matrix; 7 } 8 } [[File:Data-denpendencies-floyd.png\|thumb\|400x400px\|data dependencies in Floyd algorithm]] Line 147 ⟶ 148: ===Runtime=== The runtime of the sequential algorithm is determined by the triple nested for loop. The computation in line 6 can be done in constant time (<math>O(1)</math>). ~~Therefor~~Therefore, the runtime of the sequential algorithm is <math>O(n^3)</math>. ====2-D block mapping==== Line 173 ⟶ 174: ==Bibliography== * Grama, A.: ''Introduction to [[parallel computing]].'' Pearson Education, 2003. * {{citation * Kumar, V.: ''[http://www.academia.edu/download/46545236/Scalability_of_parallel_algorithms_for_t20160616-15656-rc5uyt.pdf Scalability of Parallel Algorithms for the All-Pairs Shortest-Path Problem].'' Journal of Parallel and Distributed Programming 13, 1991. \| last1 = Kumar \| first1 = Vipin \| last2 = Singh \| first2 = Vineet \| date = October 1991 \| doi = 10.1016/0743-7315(91)90083-l \| issue = 2 \| journal = Journal of Parallel and Distributed Computing \| pages = 124–138 \| title = Scalability of parallel algorithms for the all-pairs shortest-path problem \| url = https://www-users.cse.umn.edu/~kumar001/papers/shortest-path.ps \| volume = 13}} * Foster, I.: ''Designing and Building Parallel Programs'' (Online). * Bindell, Fall: ''[~~http~~https://www.cs.cornell.edu/~bindel/class/cs5220-f11/code/path.pdf Parallel All-Pairs Shortest Paths]'' Applications of Parallel Computers, 2011. [[Category:Graph algorithms]]