Distributed minimum spanning tree: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 05:42, 17 December 2011 edit Hongbinlu (talk \| contribs) 9 edits No edit summary ← Previous edit		Latest revision as of 19:50, 30 December 2024 edit undo Nyxion303 (talk \| contribs) Extended confirmed users 8,457 edits Reverted 1 edit by 189.156.187.30 (talk): Rv; removed with no reason Tags: Twinkle Undo
(39 intermediate revisions by 26 users not shown)
Line 1: [[Image:Minimum spanning tree.svg\|thumb\|400px\|right\|''Example of a MST:'' The minimum spanning tree of a [[planar graph]]. Each edge is labeled with its weight, which here is roughly proportional to its length.]] The '''distributed minimum spanning tree (MST)''' problem involves the construction of a [[minimum spanning tree]] by a [[distributed algorithm]], in a network where nodes communicate by message passing. It is radically different from the classical sequential problem, although the most basic approach resembles [[Borůvka's algorithm]]. ~~Distributed~~One ~~minimum~~important ~~spanning~~application ~~tree~~of this problem is ~~mainly~~to find a tree that can be used for [[~~Broadcasting_~~Broadcasting (computing)\|~~broadcast~~broadcasting]]. In particular, if the cost for a message to pass through an edge in a graph is significant, aan MST can minimize the total cost for a source process to communicate with all the other processes in the network. The problem was first suggested and solved in <math>O(V \log V)</math> time in 1983 by Gallager ''et. al.'',<ref name="GHS" /> where <math>V</math> is the number of vertices in the [[graph theory\|graph]]. Later, the solution was improved to <math>O(V)</math><ref>[[Baruch Awerbuch]]. “Optimal Distributed Algorithms for Minimum Weight Spanning Tree, Counting, Leader Election, and Related Problems,” ''Proceedings of the 19th ACM [[Symposium on Theory of Computing]] (STOC)'', New York City, New York, May 1987. </ref> and finally<ref>Juan Garay, Shay Kutten and [[~~David_Peleg_~~David Peleg (scientist)\| David Peleg]], “A Sub-Linear Time Distributed Algorithm for Minimum-Weight Spanning Trees (Extended Abstract),” ''Proceedings of the IEEE [[Symposium on Foundations of Computer Science]]'' (FOCS), 1993.</ref> <ref>Shay Kutten and [[~~David_Peleg_~~David Peleg (scientist)\| David Peleg]], “Fast Distributed Construction of Smallk-Dominating Sets and Applications,” ''Journal of Algorithms'', Volume 28, Issue 1, July 1998, Pages 40-66.</ref> <math>O(\sqrt V \log^* V + D)</math> where ''D'' is the network, or graph diameter. ~~Lower~~A lower bound on the time complexity of the solution has been eventually shown to be<ref>[[~~David_Peleg_~~David Peleg (scientist)\| David Peleg]] and Vitaly Rubinovich “A near tight lower bound on the time complexity of Distributed Minimum Spanning Tree Construction,“ ''[[SIAM Journal on Computing]]'', 2000, and ''IEEE Symposium on Foundations of Computer Science (FOCS)'', 1999.</ref> <math>\Omega\left({\frac{\sqrt V}{\log V}+D}\right).</math> == Overview == The input graph <math>G(V,E)</math> is considered to be a network, where vertices <math>V</math> are independent computing nodes and edges <math>E</math> are communication links. Links are weighted as in the classical problem. At the beginning of the algorithm, nodes know only the weights of the links which are connected to them. (It is possible to consider models in which they know more, for example their neighbors' links.) As the output of the algorithm, every node knows which of its links belong to the ~~Minimum~~minimum ~~Spanning~~spanning ~~Tree~~tree and which do not. == MST in message-passing model == The [[~~Message_passing\|~~Message passing\|message-passing]] model is one of the ~~common~~most commonly used models in [[distributed computing]]. In this model, each process is modeled as a node of a graph. ~~The~~Each communication channel between two processes is an edge of the graph. Two ~~common~~commonly used algorithms for the classical minimum spanning tree ~~algorithms~~problem are [[Prim'~~s_algorithm\|Prim’s~~s algorithm]] and [[Kruskal'~~s_algorithm\|Kruskal’s~~s algorithm]]. However, it is difficult to apply these two algorithms in the distributed message-passing model. The ~~mainly~~main challenges are: * Both [[Prim'~~s_algorithm\|Prim’s~~s algorithm]] and [[Kruskal'~~s_algorithm\|Kruskal’s~~s algorithm]] ~~executes~~require ~~sequentially.~~processing ~~Therefore,~~one ~~when~~node or vertex at a ~~process~~time, Amaking isit ~~executing~~difficult to make them run in parallel. For example, ~~other~~Kruskal's algorithm processes ~~have~~edges toin ~~wait~~turn, ~~which~~deciding ~~obviously~~whether ~~degrades~~to include the ~~performance~~edge in the MST based on whether it would form a cycle with all previously chosen edges. * Both [[Prim'~~s_algorithm\|Prim’s~~s algorithm]] and [[Kruskal'~~s_algorithm\|Kruskal’s~~s algorithm]] require processes to know the state of the whole graph, which is very difficult to discover in the message-passing model. Due to ~~those~~these difficulties, ~~traditional~~new ~~MST~~techniques ~~algorithms~~were ~~are~~needed ~~not~~for ~~suitable~~distributed MST algorithms in the message-passing model. ~~Therefore,~~Some ~~new~~bear ~~category~~similarities ofto ~~MST algorithms was~~[[Borůvka's ~~introduced~~algorithm]] for ~~constructing~~the aclassical MST ~~in message-passing model~~problem. == GHS algorithm == The GHS algorithm<ref name="GHS" /> of [[Robert G. Gallager\|Gallager]], Humblet and Spira is one of the best-known algorithms in distributed computing theory. This algorithm ~~can~~constructs ~~construct the~~an MST in ~~[[Message_passing\|Message~~the asynchronous message-passing]] model. === ~~Precondition~~Assumptions === The GHS algorithm requires several assumptions. * The algorithm should run on connected undirected graph.<ref name="GHS" /> * The input graph is connected and undirected. * The graph should have distinct finite weight assigned to each edge.<ref name="GHS" /> * Each edge in the input graph has distinct finite weights. This assumption is not needed if there is a consistent method to break ties between edge weights. * Each node initially knows the weight ~~for~~of each edge incident to that node.~~<ref name="GHS" />~~ * Initially, each node is in aan ~~quiescent~~inactive state. ~~and~~Each itnode either spontaneously awakens or is awakened by receipt of any message from another node.~~<ref name="GHS" />~~ * Messages can be transmitted independently in both directions on an edge and arrive after an unpredictable but finite delay, without error.~~<ref name="GHS" />~~ * Each edge satisfies [[FIFO]] property.<ref name="GHS" /> * Each edge delivers messages in [[FIFO (computing and electronics)\|FIFO]] order. === Properties of ~~MST~~MSTs === ~~Let's~~Define ~~define~~a fragment of aan MST G<math>T</math> asto be a sub-tree of G,<math>T</math>. ~~that~~That is, a fragment is a connected set of nodes and edges of G<math>T</math>. ~~There~~MSTs ~~are~~have two important properties ofin ~~MST~~relation to fragments:<ref name="GHS" /> # Given a fragment of aan ~~minimum~~MST ~~spanning tree G~~<math>T</math>, let <math>e</math> be a minimum-weight outgoing edge of the fragment. Then joining <math>e</math> and its adjacent non-fragment node to the fragment yields another fragment of Gan MST.~~<ref name="GHS" />~~ # If all the edges of a connected graph have different weights, then the MST of the graph is unique.~~<ref name="GHS" />~~ These two properties form the ~~basic~~basis for proving correctness of the GHS algorithm. In general, the GHS algorithm is a ~~button~~bottom-up algorithm in the sense that it starts by letting each individual node be a fragment, and then joining fragments inuntil a ~~certain~~single ~~way to form new fragments. This fragments-joining process repeats until there~~fragment is ~~only one fragment~~ left. ~~and~~The ~~property~~above ~~1 and 2~~properties imply that the ~~result~~remaining fragment ismust be an MST.▼ ▲These two properties form the basic correctness of GHS algorithm. In general, GHS algorithm is a button-up algorithm in the sense that it starts by letting each individual node be a fragment and joining fragments in a certain way to form new fragments. This fragments-joining process repeats until there is only one fragment left and property 1 and 2 imply the result fragment is an MST. === Description of the algorithm === The GHS algorithm assigns a ''level'' ofto each fragment, which is a non-decreasing integer with initial value 0. ~~Each~~Furthermore, each fragment with a non-zero level ~~fragment~~ has an ''ID'', which is the ID of the core edge in the fragment., ~~The core edge of each fragment~~which is selected when the fragment is constructed. During the execution of the algorithm, each node can classify each of its incident ~~edge~~edges into three categories:<ref name="GHS" /><ref name="Lynch" /> * '''Branch''': ~~Edges~~edges are those that have ~~already~~ been determined to be part of the ~~minimum spanning tree~~MST. * '''Rejected''': ~~Edges~~edges are those that have ~~already~~ been determined not to be part of the ~~minimum spanning tree~~MST. * '''Basic''': ~~Edges~~edges are ~~neither~~all ofedges that are neither branch ~~edge~~edges ~~and~~nor rejected ~~edge~~edges. ~~For~~In level -0 fragments, each ~~awaken~~awakened node will do the ~~followings~~following:▼ # ~~Chooses~~Choose its minimum-weight incident edge and ~~marks~~mark that edge as a branch edge.▼ # ~~Sends~~Send a message via the branch edge to notify the node on the other side.▼ # ~~Waiting~~Wait for a message from the other end of the ~~fragment~~edge.▼ The edge that is chosen by ~~both~~the two nodes it connects becomes the core ~~with~~edge, and is assigned level 1.▼ In non-zero-level fragments, a separate algorithm is executed in each level. This algorithm can be separated into three stages: broadcast, convergecast, and change core. ▲For level 0 fragments, each awaken node will do the followings: ▲# Chooses its minimum-weight incident edge and marks that edge as branch ▲# Sends a message via the branch edge to notify the node on the other side. ▲# Waiting for message from the other end of the fragment. ▲The edge chosen by both nodes it connects becomes the core with level 1. ==== Broadcast ====▼ The two nodes adjacent to the core broadcast messages to the rest of the nodes in the fragment. The messages are sent via the branch edge but not via the core. Each broadcast message contains the ID and level of the fragment. At the end of this stage, each node has received the new fragment ID and level.▼ ==== ~~Convertcast~~Convergecast ====▼ In this stage, ~~each~~all ~~node~~nodes ~~tries~~in tothe fragment cooperate to find the minimum weight outgoing edge of the fragment. Outgoing edges are edges connecting to other fragments. The messages sent in this stage are in the opposite direction of the broadcast stage. Initialized by all the leaves (the nodes that have only one branch edge), a message is sent through the branch edge. The message contains the minimum weight ~~(infinity if not exist)~~ of ~~its~~the incident outgoing edge it found (~~the~~or infinity if no such edge was found). The way to find the minimum outgoing edge will be discussed later). For each non-leaf node, ~~(let~~given the number of its branch edges isas <math>n)</math>, after receiving <math>n - 1</math> ~~convertcast~~convergecast messages, it will pick up the minimum weight from the messages and compare it to the weights of its incident outgoing edges. The smallest weight will be sent toward the branch it received the broadcast from.▼ ~~For non-zero level fragment, an execution of the algorithm can be separated into three stages in each level:~~ ▲==== Broadcast ==== ▲The two nodes adjacent to the core broadcast messages to the rest of the nodes. The messages are sent via the branch edge but not via the core. Each broadcast message contains the ID and level of the fragment. At the end of this stage, each node received the new fragment ID and level. ▲==== Convertcast ==== ▲In this stage, each node tries to cooperate to find the minimum weight outgoing edge of the fragment. Outgoing edges are edges connecting to other fragments. The messages sent in this stage are in the opposite direction of the broadcast stage. Initialized by all the leaves (the nodes have only one branch edge), a message is sent through the branch edge. The message contains the minimum weight (infinity if not exist) of its outgoing edge it found (the way to find the minimum outgoing edge will be discussed later). For each non-leaf node, (let the number of its branch edges is n) after receiving n-1 convertcast messages, it will pick up the minimum weight from the messages and compare it to the weights of its incident outgoing edges. The smallest weight will be sent toward the branch it received broadcast from. ==== Change core ==== After the ~~complete~~completion of the previous stage, the two nodes connected by the core can ~~acknowledge~~inform each other of the best edges they received. Then they can identify the minimum outgoing edge from the entire fragment. A message will be sent from the core to the minimum outgoing edge via a ~~set~~path of branch ~~edge(s)~~edges. Finally, a message will be sent out via the chosen outgoing edge to request to combine ~~those~~the two fragments that the edge connects. Depending on the levels of those two fragments, one of 2two combined operations ~~could~~are ~~perform~~performed to form a new fragment; (details discussed below). ==== ~~How~~Finding ~~to find~~the minimum-weight ~~weight~~incident outgoing edge? ==== As discussed above, every node needs to find its minimum weight outgoing incident edge after the receipt of a broadcast message from the core. If node <math>n</math> receives a broadcast, it will pick its minimum weight basic edge and send a message to the node <math>n'</math> on the other side with its ~~fragment’s~~fragment's ID and level. ~~Let node n’ be the node on the other side~~Then, ~~then~~ node n’<math>n'</math> will decide whether the edge is an outgoing edge and send back a message to notify node <math>n</math> of the result. The decision is made according to ~~followings~~the following:~~<br/>~~ # <math>\mathit{Fragment}_\mathit{ID}(n) = \mathit{Fragment}_\mathit{ID}(n')</math>. That is, nodes <math>n</math> and <math>n'</math> belong to same fragment, so the edge is not outgoing. ~~'''Case 1''': Fragment_ID(n) = Fragment_ID(n’)<br/>~~ # <math>\mathit{Fragment}_\mathit{ID}(n) \neq \mathit{Fragment}_\mathit{ID}(n')</math> and <math>\mathit{Level}(n) \leq \mathit{Level}(n')</math>. That is, nodes <math>n</math> and <math>n'</math> belong to the different fragments, so the edge is outgoing. ~~Then node n and n’ belongs to same fragment.<br/>~~ # <math>\mathit{Fragment}_\mathit{ID}(n) \neq \mathit{Fragment}_\mathit{ID}(n')</math> and <math>\mathit{Level}(n) > \mathit{Level}(n')</math>. We ~~can’t~~cannot make any conclusion. The reason is ~~those~~that the two nodes may ~~belongs~~belong to the same fragment already, but node n’<math>n'</math> has ~~haven’t~~not discovered this fact yet due to the delay of ~~the~~a broadcast message. In this case, the algorithm ~~let~~lets node n’<math>n'</math> postpone the response until its level becomes higher than or equal to the level it received from node <math>n</math>.▼ ~~'''Case 2''': Fragment_ID(n) != Fragment_ID(n’) and Level(n) <= Level(n’)<br/>~~ ~~Then node n and n’ belongs to the different fragments.<br/>~~ ~~'''Case 3''': Fragment_ID(n) != Fragment_ID(n’) and Level(n) > Level(n’)<br/>~~ ▲We can’t make any conclusion. The reason is those two nodes may belongs to the same fragment already but node n’ haven’t discovered this fact yet due to the delay of the broadcast message. In this case, the algorithm let node n’ postpone the response until its level becomes higher or equal to the level it received from node n. ==== ~~How to combine~~Combining two fragments? ==== Let <math>F</math> and F’<math>F'</math> be the two fragments that need to be combined. There isare two ways to do this:<ref name="GHS" /><ref name="Lynch" /> * '''Merge''': This operation occurs if both <math>F</math> and F’<math>F'</math> share ~~the~~a common minimum weight outgoing edge, and <math>\mathit{Level}(F) = \mathit{Level}(F’F')</math>. The level of the combined fragment will ~~have level as~~be <math>\mathit{Level}(F) + 1</math>. * '''Absorb''': This operation occurs if <math>\mathit{Level}(F) < \mathit{Level}(F’F')</math>. The combined fragment will have the same level as F’<math>F'</math>. Furthermore, when ~~this~~an "Absorb" operation occurs, <math>F</math> must be in the stage of changing the core, while F’<math>F'</math> can be in ~~broadcast~~an ~~or convertcast~~arbitrary stage. Therefore, ~~some~~"Absorb" ~~work~~operations ismay ~~needed~~be todone ~~make~~differently ~~sure~~depending ~~absorb~~on ~~operation~~the ~~won’t~~state ~~interfere~~of ~~with operations in F’~~<math>F'</math>. Let <math>e</math> be the edge that <math>F</math> and F’<math>F'</math> want to combine with, and ~~node~~let <math>n</math> and n’<math>n'</math> be the two nodes connected by <math>e</math> in <math>F</math> and F’<math>F'</math>, respectively. There are two cases to consider:~~<br/>~~▼ {{numbered list \| Node <math>n'</math> has received broadcast message but it has not sent a convergecast message back to the core. In this case, fragment <math>F</math> can simply join the broadcast process of <math>F'</math>. Specifically, we image <math>F</math> and <math>F'</math> have already combined to form a new fragment <math>F''</math>, so we want to find the minimum weight outgoing edge of <math>F''</math>. In order to do that, node <math>n'</math> can initiate a broadcast to <math>F</math> to update the fragment ID of each node in <math>F</math> and collect minimum weight outgoing edge in <math>F</math>. \| Node <math>n'</math> has already sent a convergecast message back to the core. Before node <math>n'</math> sent a convergecast message, it must have picked a minimum weight outgoing edge. As we discussed above, <math>n'</math> does that by choosing its minimum weight basic edge, sending a test message to the other side of the chosen edge, and waiting for the response. Suppose <math>e'</math> is the chosen edge, we can conclude the following: # <math>e' \neq e</math> # <math>\mathit{weight}(e') < \mathit{weight}(e)</math> The second statement follows if the first one holds. For the first statement, suppose <math>n'</math> chose the edge <math>e</math> and sent a test message to <math>n</math> via edge <math>e</math>. Then, node <math>n</math> will delay the response (according to case 3 of "Finding the minimum weight incident outgoing edge"). Then, it is impossible that <math>n'</math> has already sent its convergecast message. By the aforementioned conclusions 1 and 2, we can conclude it is safe to absorb <math>F</math> into <math>F'</math> since <math>e'</math> is still the minimum outgoing edge to report after <math>F</math> is absorbed. }} ==== Maximum number of levels ==== ▲Furthermore, when this operation occurs, F must be in the stage of changing the core while F’ can be in broadcast or convertcast stage. Therefore, some work is needed to make sure absorb operation won’t interfere with operations in F’. Let e be the edge that F and F’ want to combine with and node n and n’ be the two nodes connected by e in F and F’ respectively. There are two cases to consider:<br/> As mentioned above, fragments are combined by either "Merge" or "Absorb" operation. The "Absorb" operation does not change the maximum level among all fragments. The "Merge" operation may increase the maximum level by 1. In the worst case, all fragments are combined with "Merge" operations, so the number of fragments decreases by half in each level. Therefore, the maximum number of levels is <math>O(\log V)</math>, where <math>V</math> is the number of nodes. ~~'''Case 1''': Node n’ hasn’t sent convertcast message back to the core.<br/>~~ ~~In this case, node n’ can simply initial a broadcast to F to update the fragment ID and collect minimum weight outgoing edge in F.<br/>~~ ~~'''Case 2''': Node n’ has already finished the convertcast.<br/>~~ That means n’ has packed a minimum weight outgoing edge e’, which is different from e(otherwise, node n’ will postpone the response according to the algorithm), into the convertcast message that sends to the core and weight(e’) < weight(e). Therefore, e’ is still the minimum outgoing edge to choose after F is absorbed. === Progress property === ~~This~~The GHS algorithm has a nice property that the lowest level fragments ~~won’t~~will not be blocked, although some operations in the non-lowest level fragments may be blocked. This property implies the algorithm will eventually terminate with a minimum spanning tree. == Approximation algorithms == An <math>O(\log n)</math>-approximation algorithm was developed by Maleq Khan and Gopal Pandurangan.<ref name="~~khan~~Khan"> ~~Maleq Khan and Gopal Pandurangan. “A Fast Distributed Approximation Algorithm for Minimum Spanning Trees,” ''Distributed Computing'', vol. 20, no. 6, pp. 391–402, Apr. 2008.<~~/~~ref~~> This algorithm runs in <math>O(D+L\log n)</math> time, where <math>L</math> is the local shortest path diameter<ref name=~~khan~~"Khan" /> of the graph.▼ ▲An <math>O(\log n)</math>-approximation algorithm was developed by Maleq Khan and Gopal Pandurangan.<ref name="khan"> Maleq Khan and Gopal Pandurangan. “A Fast Distributed Approximation Algorithm for Minimum Spanning Trees,” ''Distributed Computing'', vol. 20, no. 6, pp. 391–402, Apr. 2008.</ref> This algorithm runs in <math>O(D+L\log n)</math> time, where <math>L</math> is the local shortest path diameter<ref name=khan/> of the graph. == References == <references> <ref name="GHS">[[Robert G. Gallager]], Pierre A. Humblet, and P. M. Spira, “A"A distributed algorithm for minimum-weight spanning trees,”" ''ACM Transactions on Programming Languages and Systems'', vol. 5, no. 1, pp. 66–77, January 1983.</ref> <ref name="Khan">Maleq Khan and Gopal Pandurangan. "A Fast Distributed Approximation Algorithm for Minimum Spanning Trees,"" ''Distributed Computing'', vol. 20, no. 6, pp. 391–402, Apr. 2008.</ref> <ref name="Lynch">Nancy A. Lynch. Distributed Algorithms. Morgan Kaufmann, 1996.</ref> </references>