Nearest-neighbor chain algorithm: Difference between revisions

Content deleted Content added
The algorithm: COI edit — this is the best ref I can find on breaking ties in nearest neighbor selection (Murtagh doesn't mention this detail), but I won't complain if someone replaces it with another good reference
top: stack
Line 1:
In the theory of [[cluster analysis]], the '''nearest-neighbor chain algorithm''' is a method that can be used to perform several types of [[agglomerative hierarchical clustering]], in which a hierarchy of clusters is created by repeatedly merging pairs of smaller clusters to form larger clusters. In particular it can be used for [[Ward's method]], [[complete-linkage clustering]], and [[single-linkage clustering]], which all work by merging the closest two clusters under different definitions of the distance between clusters.
 
The main idea of the algorithm is to find pairs of clusters to merge by following paths in the [[nearest neighbor graph]] of the clusters until the paths terminate in pairs of mutual nearest neighbors. That pair of clusters is then chosen as the pair to merge by the algorithm. The algorithm uses a [[Stack (abstract data type)|stack]] to find and merge mutually nearest pairs of clusters efficiently.
The resulting system of merges occurs in a different order than in a naive implementation of the same clustering algorithms, but can be shown to generate the same hierarchy of clusters.
 
The nearest-neighbor chain algorithm uses an amount of memory that is linear in the number of points to be clustered, and determines a clustering in time quadratic in the number of points (linear in the input size, when the input is given as an explicit distance matrix).