Revision as of 08:35, 11 December 2016 edit David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,850 edits →Distances sensitive to merge order: simpler ← Previous edit		Revision as of 08:41, 11 December 2016 edit undo David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,850 edits →top: which one Next edit →
Line 1: In the theory of [[cluster analysis]], the '''nearest-neighbor chain algorithm''' is an [[algorithm]] that can speed up several methods for [[agglomerative hierarchical clustering]]. These are methods that take a collection of points as input, and create a hierarchy of clusters of points by repeatedly merging pairs of smaller clusters to form larger clusters. The clustering methods that the nearest-neighbor chain algorithm can be used for include [[Ward's method]], [[complete-linkage clustering]], and [[single-linkage clustering]]; these all work by repeatedly merging the closest two clusters but use different definitions of the distance between clusters. The cluster distances for which the nearest-neighbor chain algorithm works are called ''reducible'' and are characterized by a simple inequality among certain cluster distances. The main idea of the algorithm is to find pairs of clusters to merge by following [[Path (graph theory)\|paths]] in the [[nearest neighbor graph]] of the clusters. Every such path will eventually terminate at a pair of clusters that are nearest neighbors of each other, and the algorithm chooses that pair of clusters as the pair to merge. In order to save work by re-using as much as possible of each path, the algorithm uses a [[Stack (abstract data type)\|stack data structure]] to keep track of each path that it follows. By following paths in this way, the nearest-neighbor chain algorithm merges its clusters in a different order than methods that always find and merge the closest pair of clusters. However, despite that difference, it always generates the same hierarchy of clusters.

Nearest-neighbor chain algorithm: Difference between revisions