Distributed minimum spanning tree: Difference between revisions

Content deleted Content added
No edit summary
No edit summary
Line 1:
The '''distributed minimum spanning tree (MST)''' problem involves the construction of a [[minimum spanning tree]] by a [[distributed algorithm]], in a network where nodes communicate by message passing. It is radically different from the classical sequential problem, although the most basic approach resembles [[Borůvka's algorithm]]. Distributed minimumOne spanningimportant treeapplication of this problem is mainlyto find a tree that can be used for [[Broadcasting_(computing)|broadcastbroadcasting]]. In particular, if the cost for a message to pass through an edge in a graph is significant, a MST can minimize the total cost for a source process to communicate with all the other processes in the network.
 
The problem was first suggested and solved in <math>O(V \log V)</math> time in 1983 by Gallager et. al.,<ref name="GHS" /> where <math>V</math> is the number of vertices in the [[graph theory|graph]]. Later, the solution was improved to <math>O(V)</math><ref>[[Baruch Awerbuch]]. “Optimal Distributed Algorithms for Minimum Weight Spanning Tree, Counting, Leader Election, and Related Problems,” ''Proceedings of the 19th ACM [[Symposium on Theory of Computing]] (STOC)'', New York City, New York, May 1987.
</ref> and finally<ref>Juan Garay, Shay Kutten and [[David_Peleg_(scientist)| David Peleg]], “A Sub-Linear Time Distributed Algorithm for Minimum-Weight Spanning Trees (Extended Abstract),” ''Proceedings of the IEEE [[Symposium on Foundations of Computer Science]]'' (FOCS), 1993.</ref> <ref>Shay Kutten and [[David_Peleg_(scientist)| David Peleg]], “Fast Distributed Construction of Smallk-Dominating Sets and Applications,” ''Journal of Algorithms'', Volume 28, Issue 1, July 1998, Pages 40-66.</ref>
<math>O(\sqrt V \log^* V + D)</math> where ''D'' is the network, or graph diameter. Lower A lower bound on the time complexity of the solution has been eventually shown to be<ref>[[David_Peleg_(scientist)| David Peleg]] and Vitaly Rubinovich “A near tight lower bound on the time complexity of Distributed Minimum Spanning Tree Construction,“ ''[[SIAM Journal on Computing]]'', 2000, and ''IEEE Symposium on Foundations of Computer Science (FOCS)'', 1999.</ref>
<math>\Omega\left({\frac{\sqrt V}{\log V}}\right).</math>
 
Line 15:
 
== MST in message-passing model ==
[[Message_passing|Message-passing]] model is one of the common used models in [[distributed computing]]. In this model, each process is modeled as a node of a graph. The communication channel between two processes is an edge of the graph.
 
The [[Message_passing|Messagemessage-passing]] model is one of the commonmost commonly used models in [[distributed computing]]. In this model, each process is modeled as a node of a graph. The communication channel between two processes is an edge of the graph.
Two common used minimum spanning tree algorithms are [[Prim's_algorithm|Prim’s algorithm]] and [[Kruskal's_algorithm|Kruskal’s algorithm]]. However, it is difficult to apply these two algorithms in the message-passing model. The mainly challenges are:
* Both [[Prim's_algorithm|Prim’s algorithm]] and [[Kruskal's_algorithm|Kruskal’s algorithm]] executes sequentially. Therefore, when a process A is executing, other processes have to wait, which obviously degrades the performance.
* Both [[Prim's_algorithm|Prim’s algorithm]] and [[Kruskal's_algorithm|Kruskal’s algorithm]] require processes to know the state of the whole graph, which is very difficult to discover in message-passing model.
 
Two commoncommonly used algorithms for the classical minimum spanning tree algorithmsproblem are [[Prim's_algorithm|Prim’s algorithm]] and [[Kruskal's_algorithm|Kruskal’s algorithm]]. However, it is difficult to apply these two algorithms in the distributed message-passing model. The mainlymain challenges are:
Due to those difficulties, traditional MST algorithms are not suitable in message-passing model. Therefore, new category of MST algorithms was introduced for constructing a MST in message-passing model.
* Both [[Prim's_algorithm|Prim’s algorithm]] and [[Kruskal's_algorithm|Kruskal’s algorithm]] require processing one node or vertex at a time, making it difficult to make them run in parallel. (For example, Kruskal's algorithm processes edges in turn, deciding whether to include the edge in the MST based on whether it would form a cycle with all previously chosen edges.)
* Both [[Prim's_algorithm|Prim’s algorithm]] and [[Kruskal's_algorithm|Kruskal’s algorithm]] executesrequire sequentially.processes Therefore,to whenknow athe processstate Aof isthe executingwhole graph, otherwhich processesis havevery difficult to wait,discover whichin obviouslythe degrades themessage-passing performancemodel.
 
Due to these difficulties, new techniques were needed for distributed MST algorithms in the message-passing model. Some bear similarities to [[Borůvka's algorithm]] for the classical MST problem.
 
== GHS algorithm ==
GHS algorithm is one of the best-known algorithms in distributed computing theory. This algorithm can construct the MST in [[Message_passing|Message-passing]] model.
 
The GHS algorithm of [[Robert G. Gallager|Gallager]], Humblet and Spira is one of the best-known algorithms in distributed computing theory. This algorithm can construct the MST in [[Message_passing|Message-passing]] model.
=== Precondition ===
 
* The algorithm should run on connected undirected graph.<ref name="GHS" />
*=== The graph should have distinct finite weight assigned to each edge.Preconditions<ref name="GHS" /> ===
* The algorithm should run on a connected undirected graph.<ref name="GHS" />
* Each node initially knows the weight for each edge incident to that node.<ref name="GHS" />
* The graph should have distinct finite weights assigned to each edge. (This assumption can be removed by breaking ties between edge weights in a consistent way.)
* Initially, each node is in a quiescent state and it either spontaneously awakens or awakened by receipt of any message from another node.<ref name="GHS" />
* Each node initially knows the weight for each edge incident to that node.<ref name="GHS" />
* Messages can be transmitted independently in both directions on an edge and arrive after an unpredictable but finite delay, without error.<ref name="GHS" />
* Initially, each node is in a quiescent state and it either spontaneously awakens or is awakened by receipt of any message from another node.<ref name="GHS" />
* Each edge satisfies [[FIFO]] property.<ref name="GHS" />
* Messages can be transmitted independently in both directions on an edge and arrive after an unpredictable but finite delay, without error.<ref name="GHS" />
* Each edge delivers messages in [[FIFO]] order.
 
=== Properties of MST ===
Let's define fragment of a MST G as a sub-tree of G, that is, a connected set of nodes and edges of G. There are two properties of MST:
# Given a fragment of a minimum spanning tree G, let e be a minimum-weight outgoing edge of the fragment. Then joining e and its adjacent non-fragment node to the fragment yields another fragment of G.<ref name="GHS" />
# If all the edges of a connected graph have different weights, then the MST is unique.<ref name="GHS" />
 
Let's defineDefine fragment of a MST GT asto be a sub-tree of GT, that is, a connected set of nodes and edges of GT. There are two properties of MSTMSTs:
# Given a fragment of a minimum spanning treeMST GT, let e be a minimum-weight outgoing edge of the fragment. Then joining e and its adjacent non-fragment node to the fragment yields another fragment of GT.<ref name="GHS" />
# If all the edges of a connected graph have different weights, then the MST of the graph is unique.<ref name="GHS" />
 
These two properties form the basicbasis for proving correctness of the GHS algorithm. In general, the GHS algorithm is a buttonbottom-up algorithm in the sense that it starts by letting each individual node be a fragment and joining fragments in a certain way to form new fragments. This fragments-process of joining processfragments repeats until there is only one fragment left and property 1 and 2 imply the resultresulting fragment is ana MST.
 
=== Description of the algorithm ===
The GHS algorithm assigns a ''level'' ofto each fragment, which is a non-decreasing integer with initial value 0. Each non-zero level fragment has an ''ID', which is the ID of the core edge in the fragment., The core edge of each fragmentwhich is selected when the fragment is constructed. During the execution of the algorithm, each node can classify each of its incident edgeedges into three categories:<ref name="GHS" /><ref name="Lynch" />
* '''Branch''': Edgesedges are those that have already been determined to be part of the minimum spanning treeMST.
* '''Rejected''': Edgesedges are those that have already been determined not to be part of the minimum spanning treeMST.
* '''Basic''': Edgesedges are neither of branch edgeedges andnor rejected edgeedges.
 
For level -0 fragments, each awakenawakened node will do the followingsfollowing:
 
# ChoosesChoose its minimum-weight incident edge and marks that edge as a branch edge.
For level 0 fragments, each awaken node will do the followings:
# Send a message via the branch edge to notify the node on the other side.
# Chooses its minimum-weight incident edge and marks that edge as branch
# SendsWait for a message viafrom the branchother edgeend to notifyof the node on the other sideedge.
# Waiting for message from the other end of the fragment.
The edge chosen by both nodes it connects becomes the core with level 1.
 
For a non-zero level fragment, an execution of the algorithm can be separated into three stages in each level:
 
For non-zero level fragment, an execution of the algorithm can be separated into three stages in each level:
==== Broadcast ====
The two nodes adjacent to the core broadcast messages to the rest of the nodes in the fragment. The messages are sent via the branch edge but not via the core. Each broadcast message contains the ID and level of the fragment. At the end of this stage, each node has received the new fragment ID and level.
==== ConvertcastConvergecast ====
In this stage, eachall nodenodes triesin tothe fragment cooperate to find the minimum weight outgoing edge of the fragment. Outgoing edges are edges connecting to other fragments. The messages sent in this stage are in the opposite direction of the broadcast stage. Initialized by all the leaves (the nodes that have only one branch edge), a message is sent through the branch edge. The message contains the minimum weight (infinity if not exist) of itsthe incident outgoing edge it found (theor infinity if no such edge was found). The way to find the minimum outgoing edge will be discussed later). For each non-leaf node, (let the number of its branch edges isbe n) after receiving n-1 convertcastconvergecast messages, it will pick up the minimum weight from the messages and compare it to the weights of its incident outgoing edges. The smallest weight will be sent toward the branch it received the broadcast from.
==== Change core ====
After the completecompletion of the previous stage, the two nodes connected by the core can acknowledgeinform each other of the best edges they received. Then they can identify the minimum outgoing edge from the entire fragment. A message will be sent from the core to the minimum outgoing edge via a setpath of branch edge(s)edges. Finally, a message will be sent out via the chosen outgoing edge to request to combine thosethe two fragments that the edge connects. Depending on the levels of those two fragments, one of 2two combined operations couldare performperformed to form a new fragment (details discussed below).
 
==== How to find minimum weight incident outgoing edge? ====
As discussed above, every node needs to find its minimum weight outgoing incident edge after the receipt of a broadcast message from the core. If node n receives a broadcast, it will pick its minimum weight basic edge and send a message to the node n’ on the other side with its fragment’s ID and level. Let node n’ be the node on the other sideThen, then node n’ will decide whether the edge is an outgoing edge and send back a message to notify node n of the result. The decision is made according to followingsthe following:<br/>
'''Case 1''': Fragment_ID(n) = Fragment_ID(n’).<br/>
Then, node n and n’ belongs to same fragment (so the edge is not outgoing).<br/>
'''Case 2''': Fragment_ID(n) != Fragment_ID(n’) and Level(n) <= Level(n’).<br/>
Then, node n and n’ belongs to the different fragments (so the edge is outgoing).<br/>
'''Case 3''': Fragment_ID(n) != Fragment_ID(n’) and Level(n) > Level(n’).<br/>
We can’tcannot make any conclusion. The reason is thosethe two nodes may belongsbelong to the same fragment already but node n’ haven’thas not discovered this fact yet due to the delay of thea broadcast message. In this case, the algorithm letlets node n’ postpone the response until its level becomes higher than or equal to the level it received from node n.
 
==== How to combine two fragments? ====
Let F and F’ be the two fragments that need to be combined. There isare two ways to do this:<ref name="GHS" /><ref name="Lynch" />
* '''Merge''': This operation occurs if both F and F’ share thea common minimum weight outgoing edge, and Level(F) = Level(F’). The level of the combined fragment will have level asbe Level(F) + 1.
* '''Absorb''': This operation occurs if Level(F) < Level(F’). The combined fragment will have the same level as F’.
 
Furthermore, when this operation occurs, F must be in the stage of changing the core while F’ can be in broadcast or convertcastconvergecast stage. Therefore, some work is needed to make sure an absorb operation won’twill not interfere with operations in F’. Let e be the edge that F and F’ want to combine with and node n and n’ be the two nodes connected by e in F and F’, respectively. There are two cases to consider:<br/>
 
'''Case 1''': Node n’ hasn’t sent convertcastconvergecast message back to the core.<br/>
Furthermore, when this operation occurs, F must be in the stage of changing the core while F’ can be in broadcast or convertcast stage. Therefore, some work is needed to make sure absorb operation won’t interfere with operations in F’. Let e be the edge that F and F’ want to combine with and node n and n’ be the two nodes connected by e in F and F’ respectively. There are two cases to consider:<br/>
In this case, node n’ can simply initialinitiate a broadcast to F to update the fragment ID and collect minimum weight outgoing edge in F.<br/>
'''Case 1''': Node n’ hasn’t sent convertcast message back to the core.<br/>
'''Case 2''': Node n’ has already finished the convertcastconvergecast.<br/>
In this case, node n’ can simply initial a broadcast to F to update the fragment ID and collect minimum weight outgoing edge in F.<br/>
That means n’ has packedpicked a minimum weight outgoing edge e’, which is different from e(otherwise, node n’ will postpone the response according to the algorithm), into the convertcastconvergecast message that sends to the core and weight(e’) < weight(e). Therefore, e’ is still the minimum outgoing edge to choose after F is absorbed.
'''Case 2''': Node n’ has already finished the convertcast.<br/>
That means n’ has packed a minimum weight outgoing edge e’, which is different from e(otherwise, node n’ will postpone the response according to the algorithm), into the convertcast message that sends to the core and weight(e’) < weight(e). Therefore, e’ is still the minimum outgoing edge to choose after F is absorbed.
 
=== Progress property ===
This algorithm has a nice property that the lowest level fragments won’twill not be blocked, although some operations in non-lowest level fragments may be blocked. This property implies the algorithm will eventually terminate with a minimum spanning tree.
 
== Approximation algorithms ==