Euclidean minimum spanning tree: Difference between revisions

Content deleted Content added
Definition and related problems: split paragraph again
Computational complexity: fewer repetitions of Euclidean
Line 52:
 
==Computational complexity==
For points in any dimension, the Euclidean minimum spanning tree can be constructed in time <math>O(n^2)</math> by constructing a [[complete graph]] with an edge between every pair of points, weighted by [[Euclidean distance]], and then applying a graph minimum spanning tree algorithm such as the [[Prim's algorithm|Prim–Dijkstra–Jarník algorithm]] or [[Borůvka's algorithm]] on it. These algorithms can be made to take time <math>O(n^2)</math> on complete graphs, unlike another common choice, [[Kruskal's algorithm]], which is slower because it involves sorting all distances.{{r|eppstein}} For points in low-dimensional spaces, the problem may be solved more quickly, as detailed below.
 
Computing Euclidean distances involves a [[square root]] calculation. In any comparison of edge weights, this calculation can be avoided by comparing the squares of the Euclidean distances instead instead of the distances themselves. Because this change does not affect the outcome of these comparisons, it does not change the result of the minimum spanning tree computation. This shortcut speeds up calculation and allows a minimum spanning tree for points with integer coordinates to be constructed using only integer arithmetic.{{r|yao82}}
 
===Two dimensions===
A faster approach to finding the Euclidean minimum spanning tree inof aplanar planepoints uses the property that it is a subgraph of the Delaunay triangulation:
# Compute the Delaunay triangulation, which can be done in <math>O(n\log n)</math> time. Because the Delaunay triangulation is a [[planar graph]], it has at most <math>3n-6</math> edges.
# Label each edge with its (squared) length.
Line 63:
The final result is an algorithm taking <math>O(n\log n)</math> time.{{r|shahoe}}
 
If the input coordinates are integers and can be used as [[array index|array indices]], faster algorithms are possible: the Delaunay triangulation can be constructed by a [[randomized algorithm]] in <math>O(n\log\log n)</math> expected time.{{r|bm09}} Additionally, since the Delaunay triangulation is a [[planar graph]], its minimum spanning tree can be found in [[linear time]] by a variant of Borůvka's algorithm that removes all but the cheapest edge between each pair of components after each stage of the algorithm.{{r|eppstein|mares}} Therefore, the total expected time for this algorithm is <math>O(n\log\log n)</math>.{{r|bm09}} In the other direction, the Delaunay triangulation can be constructed from the Euclidean minimum spanning tree in the near-linear time bound <math>O(n\log^* n)</math>, where <math>\log^*</math> denotes the [[iterated logarithm]].{{r|devillers}}
 
===Higher dimensions===
Line 70:
for any <math>\varepsilon>0</math>—faster than the quadratic time bound for the complete graph and Delaunay triangulation algorithms.{{r|aesw}}
 
The optimal time complexity for higher-dimensional Euclidean minimum spanning trees remains unknown,{{r|topp}} but it is closely related to the time for computing ''bichromatic closest pairs''. In the bichromatic closest pair problem, the input is a set of points, given two different colors (say, red and blue). The output is a pair of a red point and a blue point with the minimum possible distance. This pair always forms one of the edges in the minimum spanning tree. Therefore, the bichromatic closest pair problem can be solved in the amount of time that it takes to construct a Euclidean minimum spanning tree and scan its edges for the shortest red–blue edge. Conversely, for any red–blue coloring of any subset of a given set of points, the bichromatic closest pair produces one edge of the Euclidean minimum spanning tree. By carefully choosing a sequence of colorings of subsets, and finding the bichromatic closest pair of each subproblem, it is possible to find the Euclidean minimum spanning tree in time proportional to the optimal time for finding bichromatic closest pairs for the same number of points, whatever that optimal time turns out to be.{{r|aesw|kln}}
 
For uniformly random point sets in any bounded dimension, the [[Yao graph]]{{r|yao82}} or Delaunay triangulation have linear expected numbers of edges, are guaranteed to contain the minimum spanning tree, and can be constructed in linear expected time.{{r|bwy|clarkson|dwyer}} This allows the minimum spanning tree itself to be constructed in linear time, by using a [[Expected linear time MST algorithm|randomized linear time algorithm for graph minimum spanning trees]].{{r|kkt}} However, the poor performance of these methods on inputs coming from clustered data has led [[algorithm engineering]] researchers to develop methods with a somewhat slower <math>O(n\log n)</math> time bound, for random inputs or inputs whose distances and clustering resemble those of random data, while exhibiting better performance on real-world data.{{r|nzz|cck|mrg}}
 
A [[well-separated pair decomposition]] is a family of pairs of subsets of the given points, so that every pair of points belong to one of these pairs of subsets, and so that all pairs of points coming from the same pair of subsets have approximately the same length. It is possible to find a well-separated pair decomposition with a linear number of subsets, and a representative pair of points for each subset, in time <math>O(n\log n)</math>. The minimum spanning tree of the graph formed by these representative pairs is then an approximation to the Euclidean minimum spanning tree. Using these ideas, it is possible to produce a <math>(1+\varepsilon)</math>-approximation in <math>O(n\log n)</math> time, whenever <math>\varepsilon</math> is a constant. More precisely, by choosing each representative pair to approximate the closest pair in its equivalence class, and carefully varying the quality of this approximation for different pairs, the dependence on <math>\varepsilon</math> in the time bound can be given as <math>O(n \log n + (\varepsilon^{-2} \log ^2 \tfrac{1}{\varepsilon})n),</math> for any fixed dimension.{{r|arymou}}
 
===Dynamic and kinetic===
The Euclidean minimum spanning tree has been generalized in many different ways to systems of moving or changing points:
*If a set of points undergoes a sequence of dynamic insertions or deletions of points, each of these updates causes a bounded amount of change to the Euclidean minimum spanning tree of the points. When the update sequence is known in advance, for points in the plane, the change after each insertion or deletion can be found in time <math>O(\log^2 n)</math> per insertion or deletion.{{r|offline}} When the updates must be handled in an online manner, a slower (but still polylogarithmic) <math>O(\log^{10} n)</math> time bound is known.{{r|chadyn}} For higher dimensional versions of the problem the time per update is slower but still sublinear.{{r|extrema}}
*For <math>n</math> points moving linearly with constant speed, or with more general algebraic motions, the Euclidean minimum spanning tree will change by a sequence of swaps, in which one edge is removed and another replaces it at a point in time where both have equal length.{{r|kti}} For linear motions, the number of changes is at most slightly larger than <math>n^{25/9}</math>.{{r|chalev}} For more general algebraic motions, there is a near-cubic upper bound on the number of swaps that can occur, based on the theory of [[Davenport–Schinzel sequence]]s.{{r|rzcomb}}
*The ''minimum moving spanning tree problem'' again concerns points moving linearly with constant speed, over an interval of time, and seeks a single tree that minimizes the maximum sum of weights occurring at any instant during this interval. It is NP-hard to compute exactly, but can be approximated to within a factor of two in polynomial time.{{r|abb}}
*The [[kinetic Euclidean minimum spanning tree]] problem asks for a [[kinetic data structure]] that can maintain the Euclidean minimum spanning tree as its points undergo both continuous motions and insertions and deletions. Several papers have studied such structures,{{r|bgz|aegh|rzkin|rakwz|msvw}} and a kinetic structure for algebraically moving points with near-cubic total time, nearly matching the bound on the number of swaps, is known.{{r|rakwz}}
 
===Lower bound===
An asymptotic lower bound of <math>\Omega(n\log n)</math> of the Euclidean minimum spanning tree problem can be established in restricted models of computation. These include the [[algebraic decision tree]] and [[algebraic computation tree]] models, in which the algorithm has access to the input points only through certain restricted primitives that perform simple algebraic computations on their coordinates. In these models, the [[closest pair of points problem]] requires <math>\Omega(n\log n)</math> time, but the closest pair is necessarily an edge of the Euclidean minimum spanning tree, so the Euclidean minimum spanning tree also requires this much time.{{r|yao89}} However, these lower bounds do not apply to models of computation with integer point coordinates, in which [[bitwise operation]]s and [[Random access|table indexing]] operations are permitted using those coordinates. In these models, faster algorithms are possible, as described above.{{r|bm09}}
 
== Applications ==