Revision as of 23:03, 29 October 2016 edit David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,864 edits →Background: illo; ce ← Previous edit		Revision as of 23:10, 29 October 2016 edit undo David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,864 edits →Background: ce Next edit →
Line 10: Alternatively, a hierarchical clustering may be represented as a [[binary tree]] with the points at its leaves; the clusters of the clustering are the sets of points in subtrees descending from each node of the tree. In agglomerative clustering methods, the input also includes a distance function defined on the points, or a numerical measure of their dissimilarity. In agglomerative clustering methods, the input also includes a distance function defined on the points, or a numerical measure of their dissimilarity that is symmetric (insensitive to the ordering within each pair of points) but (unlike a distance) may not satisfy the triangle inequality. Depending on the method, this dissimilarity function can be extended in several different ways to pairs of clusters; for instance, in the [[single-linkage clustering]] method, the distance between two clusters is defined to be the minimum distance between any two points from each cluster. Given this distance between clusters, a hierarchical clustering may be defined by a [[greedy algorithm]] that initially places each point in its own single-point cluster and then repeatedly merges the [[closest pair]] of clusters.<ref name="murtagh-tcj"/>▼ The distance or dissimilarity should be symmetric: the distance between two points does not depend on which of them is considered first. However, unlike the distances in a [[metric space]], it is not required to satisfy the [[triangle inequality]]. ▲~~In agglomerative clustering methods~~Next, the ~~input also includes a distance~~dissimilarity function ~~defined~~is onextended ~~the~~from ~~points, or a numerical measure~~pairs of ~~their dissimilarity that is symmetric (insensitive~~points to ~~the ordering within each pair~~pairs of ~~points) but (unlike a distance) may not satisfy the triangle inequality~~clusters. ~~Depending~~Different onclustering ~~the~~methods ~~method,~~perform this ~~dissimilarity function can be extended~~extension in ~~several~~ different ways. ~~to pairs of clusters; for~~For instance, in the [[single-linkage clustering]] method, the distance between two clusters is defined to be the minimum distance between any two points from each cluster. Given this distance between clusters, a hierarchical clustering may be defined by a [[greedy algorithm]] that initially places each point in its own single-point cluster and then repeatedly ~~merges~~forms a new cluster by merging the [[closest pair]] of clusters.<ref name="murtagh-tcj"/> The bottleneck of this greedy algorithm is the subproblem of finding which two clusters to merge in each step. Known methods for repeatedly finding the closest pair of clusters in a dynamic set of clusters either require superlinear space to maintain a [[data structure]] that can find closest pairs quickly, or they take greater than linear time to find each closest pair.<ref name="e-jea">{{citation \| last = Eppstein \| first = David \| authorlink = David Eppstein

Nearest-neighbor chain algorithm: Difference between revisions