Content deleted Content added
→History: more context |
→Background: illo; ce |
||
Line 6:
==Background==
[[File:Hierarchical clustering diagram.png|thumb|upright=1.35|A hierarchical clustering of six points. The points to be clustered are at the top of the diagram, and the nodes below them represent clusters.]]
The input to a clustering problem consists of a set of points. A ''cluster'' is any proper subset of the points, and a hierarchical clustering is a [[maximal element|maximal]] family of clusters with the property that any two clusters in the family are either nested or [[disjoint set|disjoint]].
Alternatively, a hierarchical clustering may be represented as a [[binary tree]] with the points at its leaves; the clusters of the clustering are the sets of points in subtrees descending from each node of the tree.
Line 11 ⟶ 12:
In agglomerative clustering methods, the input also includes a distance function defined on the points, or a numerical measure of their dissimilarity that is symmetric (insensitive to the ordering within each pair of points) but (unlike a distance) may not satisfy the triangle inequality. Depending on the method, this dissimilarity function can be extended in several different ways to pairs of clusters; for instance, in the [[single-linkage clustering]] method, the distance between two clusters is defined to be the minimum distance between any two points from each cluster. Given this distance between clusters, a hierarchical clustering may be defined by a [[greedy algorithm]] that initially places each point in its own single-point cluster and then repeatedly merges the [[closest pair]] of clusters.<ref name="murtagh-tcj"/>
| last = Eppstein | first = David | authorlink = David Eppstein
| arxiv = cs.DS/9912014
Line 31 ⟶ 32:
| url = http://www.cs.duke.edu/~edels/Papers/1984-J-05-HierarchicalClustering.pdf
| volume = 1
| year = 1984}}.</ref> The nearest-neighbor chain algorithm uses a smaller amount of time and space than the greedy algorithm by merging pairs of clusters in a different order.
==The algorithm==
|