Nearest-neighbor chain algorithm: Difference between revisions

Content deleted Content added
Background: illo; ce
Line 10:
Alternatively, a hierarchical clustering may be represented as a [[binary tree]] with the points at its leaves; the clusters of the clustering are the sets of points in subtrees descending from each node of the tree.
 
In agglomerative clustering methods, the input also includes a distance function defined on the points, or a numerical measure of their dissimilarity.
In agglomerative clustering methods, the input also includes a distance function defined on the points, or a numerical measure of their dissimilarity that is symmetric (insensitive to the ordering within each pair of points) but (unlike a distance) may not satisfy the triangle inequality. Depending on the method, this dissimilarity function can be extended in several different ways to pairs of clusters; for instance, in the [[single-linkage clustering]] method, the distance between two clusters is defined to be the minimum distance between any two points from each cluster. Given this distance between clusters, a hierarchical clustering may be defined by a [[greedy algorithm]] that initially places each point in its own single-point cluster and then repeatedly merges the [[closest pair]] of clusters.<ref name="murtagh-tcj"/>
The distance or dissimilarity should be symmetric: the distance between two points does not depend on which of them is considered first.
However, unlike the distances in a [[metric space]], it is not required to satisfy the [[triangle inequality]].
 
In agglomerative clustering methodsNext, the input also includes a distancedissimilarity function definedis onextended thefrom points, or a numerical measurepairs of their dissimilarity that is symmetric (insensitivepoints to the ordering within each pairpairs of points) but (unlike a distance) may not satisfy the triangle inequalityclusters. DependingDifferent onclustering themethods method,perform this dissimilarity function can be extendedextension in several different ways. to pairs of clusters; forFor instance, in the [[single-linkage clustering]] method, the distance between two clusters is defined to be the minimum distance between any two points from each cluster. Given this distance between clusters, a hierarchical clustering may be defined by a [[greedy algorithm]] that initially places each point in its own single-point cluster and then repeatedly mergesforms a new cluster by merging the [[closest pair]] of clusters.<ref name="murtagh-tcj"/>
 
The bottleneck of this greedy algorithm is the subproblem of finding which two clusters to merge in each step.
Known methods for repeatedly finding the closest pair of clusters in a dynamic set of clusters either require superlinear space to maintain a [[data structure]] that can find closest pairs quickly, or they take greater than linear time to find each closest pair.<ref name="e-jea">{{citation
| last = Eppstein | first = David | authorlink = David Eppstein