Revision as of 06:50, 4 December 2016 edit David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,850 edits Fix more citation bot lossage ← Previous edit		Revision as of 08:23, 11 December 2016 edit undo David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,850 edits →top: ce Next edit →
Line 1: In the theory of [[cluster analysis]], the '''nearest-neighbor chain algorithm''' is an [[algorithm]] that can bespeed ~~used to perform~~up several ~~types~~methods offor [[agglomerative hierarchical clustering]]. These are methods that take a collection of points as input, inand ~~which~~create a hierarchy of clusters isof ~~created~~points by repeatedly merging pairs of smaller clusters to form larger clusters. InThe ~~particular~~clustering itmethods that the nearest-neighbor chain algorithm can be used for include [[Ward's method]], [[complete-linkage clustering]], and [[single-linkage clustering]],; ~~which~~these all work by repeatedly merging the closest two clusters ~~under~~but use different definitions of the distance between clusters. The main idea of the algorithm is to find pairs of clusters to merge by following [[Path (graph theory)\|paths]] in the [[nearest neighbor graph]] of the clusters. ~~until~~Every ~~the~~such ~~paths~~path will eventually terminate inat ~~pairs~~a pair of ~~mutual~~clusters that are nearest neighbors. ~~That~~of ~~pair~~each ofother, ~~clusters~~and isthe ~~then~~algorithm ~~chosen~~chooses that pair of clusters as the pair to merge. In order to save work by ~~the~~re-using ~~algorithm.~~as ~~The~~much as possible of each path, the algorithm uses a [[Stack (abstract data type)\|stack data structure]] to keep track of each path that it follows. By following paths in this way, the nearest-neighbor chain algorithm merges its clusters in a different order than methods that always find and merge ~~mutually~~the ~~nearest~~closest ~~pairs~~pair of clusters. ~~efficiently~~However, despite that difference, it always generates the same hierarchy of clusters. ~~The resulting system of merges occurs in a different order than in a naive implementation of the same clustering algorithms, but can be shown to generate the same hierarchy of clusters.~~ The nearest-neighbor chain algorithm ~~determines~~constructs a clustering in time ~~quadratic~~proportional into the square of the number of points to be clustered. This ~~time~~is ~~bound~~also isproportional ~~linear~~to inthe size of its input, when the input ~~size~~is ~~for~~provided anin ~~input~~the ~~given~~form asof an explicit distance matrix. The ~~method~~algorithm uses an amount of memory ~~that~~proportional ~~is linear in~~to the number of points, towhen beit ~~clustered,~~is used for clustering methods such as Ward's method that allow constant-time calculation of the distance between clusters. However, for some other clustering methods it uses a larger amount of memory ~~must~~in bean ~~used~~auxiliary todata ~~maintain~~structure anthat ~~array~~it keeps to keep track of the distances between pairs of clusters. ==Background==

Nearest-neighbor chain algorithm: Difference between revisions