Nearest-neighbor chain algorithm: Difference between revisions

Content deleted Content added
expand lead; split off history
No edit summary
Line 1:
In the theory of [[cluster analysis]], the '''nearest-neighborneighbour chain algorithm''' is a method that can be used to perform several types of [[agglomerative hierarchical clustering]], in which a hierarchy of clusters is created by repeatedly merging pairs of smaller clusters to form larger clusters. In particular it can be used for [[Ward's method]], [[complete-linkage clustering]], and [[single-linkage clustering]], which all work by merging the closest two clusters under different definitions of the distance between clusters.
 
The main idea of the algorithm is to find pairs of clusters to merge by following paths in the [[nearest neighborneighbour graph]] of the clusters until the paths terminate in pairs of mutual nearest neighbors. That pair of clusters is then chosen as the pair to merge by the algorithm. The resulting system of merges occurs in a different order than in a naive implementation of the same clustering algorithms, but can be shown to generate the same hierarchy of clusters.
 
The nearest-neighbor chain algorithm uses an amount of memory that is linear in the number of points to be clustered, and determines a clustering in time quadratic in the number of points (linear in the input size, when the input is given as an explicit distance matrix).