Nearest-neighbor chain algorithm: Difference between revisions

Content deleted Content added
clean up, , ISBN format using AWB
Line 9:
[[File:Hierarchical clustering diagram.png|thumb|upright=1.35|A hierarchical clustering of six points. The points to be clustered are at the top of the diagram, and the nodes below them represent clusters.]]
The input to a clustering problem consists of a set of points.<ref name="murtagh-tcj"/> A ''cluster'' is any proper subset of the points, and a hierarchical clustering is a [[maximal element|maximal]] family of clusters with the property that any two clusters in the family are either nested or [[disjoint set|disjoint]].
Alternatively, a hierarchical clustering may be represented as a [[binary tree]] with the points at its leaves; the clusters of the clustering are the sets of points in subtrees descending from each node of the tree.<ref>{{citation|title=Clustering|volume=10|series=IEEE Press Series on Computational Intelligence|first1=Rui|last1=Xu|first2=Don|last2=Wunsch|publisher=John Wiley & Sons|year=2008|isbn=9780470382783978-0-470-38278-3|page=31|contribution-url=https://books.google.com/books?id=kYC3YCyl_tkC&pg=PA31|contribution=3.1 Hierarchical Clustering: Introduction}}.</ref>
 
In agglomerative clustering methods, the input also includes a distance function defined on the points, or a numerical measure of their dissimilarity.
Line 98:
 
==Application to specific clustering distances==
 
===Ward's method===
[[Ward's method]] is an agglomerative clustering method in which the dissimilarity between two clusters {{mvar|A}} and {{mvar|B}} is measured by the amount by which merging the two clusters into a single larger cluster would increase the average squared distance of a point to its cluster [[centroid]].<ref name="mirkin">{{citation