Expected linear time MST algorithm: Difference between revisions

Content deleted Content added
m Expected Analysis: Typo fixing, replaced: the the → the using AWB
FrescoBot (talk | contribs)
m Bot: fixing section wikilinks and minor changes
Line 11:
| title = A randomized linear-time algorithm to find minimum spanning trees
| volume = 42
| year = 1995}}</ref> The algorithm relies on techniques from [[Borůvka's algorithm]] along with an algorithm for [[MST verification algorithm|verifying a minimum spanning tree in linear time]]<ref name=MST-V1>{{cite journal
<ref name=MST-V1>{{cite journal
| last1 = Dixon | first1 = Brandon
| last2 = Rauch | first2 = Monika
Line 99 ⟶ 98:
 
===Expected Analysis===
Ignoring work done in recursive subproblems the total amount of work done in a single invocation of the algorithm is [[linear time|linear]] in the number of edges in the input graph. Step 1 takes constant time. Borůvka steps can be executed in time linear in the number of edges as mentioned in the [[#Borůvka stepsStep|Borůvka step]] section. Step 3 iterates through the edges and flips a single coin for each one so it is linear in the number of edges. Step 4 can be executed in linear time using a modified [[MST verification algorithm|linear time minimum spanning tree verification algorithm]].<ref name=MST-V1/><ref name=MST-V2/> Since the work done in one iteration of the algorithm is linear in the number of edges the work done in one complete run of the algorithm (including all recursive calls) is bounded by a constant factor times the total number of edges in the original problem and all recursive subproblems.
 
Each invocation of the algorithm produces at most two subproblems so the set of subproblems forms a [[binary tree]]. Each [[#Borůvka stepsStep|Borůvka step]] reduces the number of vertices by at least a factor of two so after two Borůvka steps the number of vertices has been reduced by a factor of four. Thus, if the original graph has ''n'' vertices and ''m'' edges then at depth ''d'' of the tree each subproblem is on a graph of at most ''n''/4<sup>''d''</sup> vertices. Also the tree has at most log<sub>4</sub>''n'' levels.
 
[[File:Linear MST Algorithm Left Subchildren.svg|thumb|right|Left paths of a binary tree are circled in blue]]
Line 107 ⟶ 106:
To reason about the recursion tree let the left child problem be the subproblem in the recursive call in step 3 and the right child problem be the subproblem in the recursive call in step 5. Count the total number of edges in the original problem and all subproblems by counting the number of edges in each left path of the tree. A left path begins at either a right child or the root and includes all nodes reachable through a path of left children. The left paths of a binary tree are shown circled in blue in the diagram on the right.
 
Each edge in a left child problem is selected from the edges of its parent problem (less the edges contracted in the [[#Borůvka stepsStep|Borůvka steps]]) with probability 1/2. If a parent problem has ''x'' edges then the [[Expected value|expected number]] of edges in the left child problem is at most ''x''/2. If ''x'' is replaced by a random variable ''X'' then by the [[Linearity_of_expectation#Linearity|linearity of expectation]] the expected number of edges in the left child problem ''Y'' is given by <math>E[Y] \leq E[X]/2</math>. Thus if the expected number of edges in a problem at the top of a left path is ''k'' then the sum of the expected number of edges in each subproblem in the left path is at most <math>\sum_{d=0}^{\infty} \frac{k}{2^d}=2k</math> (see [[Geometric series]]). The root has ''m'' edges so the [[Expected value|expected number]] of edges is equal to 2''m'' plus twice the expected number of edges in each right subproblem.
 
The expected number of edges in each right subproblem is equal to the number of [[#F-heavy and F-light Edges|F-light]] edges in the parent problem where ''F'' is the minimum spanning tree of the left subproblem. The number of F-light edges is less than or equal to twice the number of vertices in the subproblem by the [[#Random Sampling Lemma|sampling lemma]]. The number of vertices in a subproblem at depth ''d'' is ''n''/4<sup>''d''</sup> so the total number of vertices in all right subproblems is given by <math>\sum_{d=1}^{\infty}\frac{2^{d-1}n}{4^d}=n/2</math>. Thus, the expected number of edges in the original problem and all subproblems is at most 2''m''+''n''. Since ''n'' at most 2''m'' for a graph with no isolated vertices the algorithm runs in expected time ''O''(''m'').