Nearest-neighbor chain algorithm: Difference between revisions

Content deleted Content added
added animation, notes on order-sensitive merge costs, and some typo fixes
Line 67:
 
==The algorithm==
[[File:Nearest-neighbor chain algorithm animated.gif|frame|323px|alt=Animated execution of Nearest-neighbor chain algorithm|Animation of the algorithm using Ward's distance. Black dots are points, grey regions are larger clusters, blue arrows point to nearest neighbors, and the red bar indicates the current chain. For visual simplicity, when a merge leaves the chain empty, it continues with the recently merged cluster.]]
Intuitively, the nearest neighbor chain algorithm repeatedly follows a chain of clusters {{math|''A'' → ''B'' → ''C'' → ...}} where each cluster is the [[nearest neighbor]] of the previous one, until reaching a pair of clusters that are mutual nearest neighbors.<ref name="murtagh-tcj">{{citation
| last = Murtagh | first = Fionn
Line 93 ⟶ 94:
 
==Correctness==
The correctness of this algorithm repliesrelies on a property of its distance function called ''reducibility'', identified by {{harvtxt|Bruynooghe|1977}} in connection with an earlier clustering method that used mutual nearest neighbor pairs but not chains of nearest neighbors.<ref name="b77">{{citation|first=Michel|last=Bruynooghe|title=Méthodes nouvelles en classification automatique de données taxinomiqes nombreuses|journal=Statistique et Analyse des Données|volume=3|pages=24–42|year=1977|url=http://www.numdam.org/item?id=SAD_1977__2_3_24_0}}.</ref> A distance function {{mvar|d}} on clusters is defined to be reducible if, for every three clusters {{mvar|A}}, {{mvar|B}} and {{mvar|C}} in the greedy hierarchical clustering such that {{mvar|A}} and {{mvar|B}} are mutual nearest neighbors, the following inequality holds:
:{{math|''d''(''A'' ∪ ''B'', ''C'') ≥ min(d(''A'',''C''), d(''B'',''C''))}}.
 
Line 128 ⟶ 129:
| series = Wiley Series in Computational Statistics
| title = Data Mining and Statistics for Decision Making
| year = 2011}}.</ref> Alternatively, this distance can be seen as the difference in [[:en:k-means clustering|k-means cost]] between the new cluster and the two old clusters.
| year = 2011}}.</ref>
 
Ward's distance is also reducible, as can be seen more easily from a different formula of Lance–Williams type for calculating the distance of a merged cluster from the distances of the clusters it was merged from:<ref name="mirkin"/><ref name="lance-williams">{{citation
Line 180 ⟶ 181:
===Centroid distance===
Another distance measure commonly used in agglomerative clustering is the distance between the centroids of pairs of clusters, also known as the weighted group method.<ref name="mirkin"/><ref name="lance-williams"/> It can be calculated easily in constant time per distance calculation. However, it is not reducible: for instance, if the input forms the set of three points of an equilateral triangle, merging two of these points into a larger cluster causes the inter-cluster distance to decrease, a violation of reducibility. Therefore, the nearest-neighbor chain algorithm will not necessarily find the same clustering as the greedy algorithm. A different algorithm by Day and Edelsbrunner can be used to find the clustering in {{math|''O''(''n''<sup>2</sup>)}} time for this distance measure.<ref name="day-edels"/>
 
===Distances sensitive to merge order===
The above presentation explicitly disallowed distances sensitive to merge order; indeed, allowing such distances can cause problems. In particular, there exist order-sensitive cluster distances which satisfy reducibility, but the above algorithm will return a hierarchy with suboptimal costs.<ref>{{cite arxiv
| last= Müllner
| first=Daniel
| eprint=1109.2378v1
| title=Modern hierarchical, agglomerative clustering algorithms
| class=stat.ML
| year=2011 }}.</ref> Following the earlier discussion of defining cluster distances recursively, and memoizing earlier distance computations
in order to speed up overall distance computations, care must be taken with such definitions so that they are not using the hierarchy in a way which is
sensitive to merge order.
 
==References==