In the theory of [[cluster analysis]], the '''nearest-neighbor chain algorithm''' is an [[algorithm]] that can bespeed used to performup several typesmethods offor [[agglomerative hierarchical clustering]]. These are methods that take a collection of points as input, inand whichcreate a hierarchy of clusters isof createdpoints by repeatedly merging pairs of smaller clusters to form larger clusters. InThe particularclustering itmethods that the nearest-neighbor chain algorithm can be used for include [[Ward's method]], [[complete-linkage clustering]], and [[single-linkage clustering]],; whichthese all work by repeatedly merging the closest two clusters underbut use different definitions of the distance between clusters.
The main idea of the algorithm is to find pairs of clusters to merge by following [[Path (graph theory)|paths]] in the [[nearest neighbor graph]] of the clusters. untilEvery thesuch pathspath will eventually terminate inat pairsa pair of mutualclusters that are nearest neighbors.Thatof paireach ofother, clustersand isthe thenalgorithm chosenchooses that pair of clusters as the pair to merge. In order to save work by there-using algorithm.as Themuch as possible of each path, the algorithm uses a [[Stack (abstract data type)|stack data structure]] to keep track of each path that it follows. By following paths in this way, the nearest-neighbor chain algorithm merges its clusters in a different order than methods that always find and merge mutuallythe nearestclosest pairspair of clusters. efficientlyHowever, despite that difference, it always generates the same hierarchy of clusters.
The resulting system of merges occurs in a different order than in a naive implementation of the same clustering algorithms, but can be shown to generate the same hierarchy of clusters.
The nearest-neighbor chain algorithm determinesconstructs a clustering in time quadraticproportional into the square of the number of points to be clustered. This timeis boundalso isproportional linearto inthe size of its input, when the input sizeis forprovided anin inputthe givenform asof an explicit distance matrix. The methodalgorithm uses an amount of memory thatproportional is linear into the number of points, towhen beit clustered,is used for clustering methods such as Ward's method that allow constant-time calculation of the distance between clusters. However, for some other clustering methods it uses a larger amount of memory mustin bean usedauxiliary todata maintainstructure anthat arrayit keeps to keep track of the distances between pairs of clusters.