Revision as of 23:51, 29 October 2016 edit David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,864 edits Explain Lance–Williams nomenclature ← Previous edit		Revision as of 23:51, 29 October 2016 edit undo David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,864 edits →Correctness: again, use Murtagh as source Next edit →
Line 87: For the algorithm to be correct, it must be the case that popping and merging the top two clusters from the algorithm's stack preserves the property that the remaining clusters on the stack form a chain of nearest neighbors. Additionally, it should be the case that all of the clusters produced during the algorithm are the same as the clusters produced by a [[greedy algorithm]] that always merges the closest two clusters, even though the greedy algorithm will in general perform its merges in a different order than the nearest-neighbor chain algorithm. Both of these properties depend on the specific choice of how to measure the distance between clusters.<ref name="murtagh-tcj"/> The correctness of this algorithm relies on a property of its distance function called ''reducibility'', identified by {{harvtxt\|Bruynooghe\|1977}} in connection with an earlier clustering method that used mutual nearest neighbor pairs but not chains of nearest neighbors.<ref name="b77">{{citation\|first=Michel\|last=Bruynooghe\|title=Méthodes nouvelles en classification automatique de données taxinomiqes nombreuses\|journal=Statistique et Analyse des Données\|volume=3\|pages=24–42\|year=1977\|url=http://www.numdam.org/item?id=SAD_1977__2_3_24_0}}.</ref> A distance function {{mvar\|d}} on clusters is defined to be reducible if, for every three clusters {{mvar\|A}}, {{mvar\|B}} and {{mvar\|C}} in the greedy hierarchical clustering such that {{mvar\|A}} and {{mvar\|B}} are mutual nearest neighbors, the following inequality holds:<ref name="murtagh-tcj"/>

Nearest-neighbor chain algorithm: Difference between revisions