Nearest-neighbor chain algorithm: Difference between revisions

Content deleted Content added
m Time and space analysis: missing period
Correctness: add a more general overview paragraph
Line 79:
 
==Correctness==
For the algorithm to be correct, it must be the case that popping and merging the top two clusters from the algorithm's stack preserves the property that the remaining clusters on the stack form a chain of nearest neighbors.
Additionally, it should be the case that all of the clusters produced during the algorithm are the same as the clusters produced by a [[greedy algorithm]] that always merges the closest two clusters, even though the greedy algorithm
will in general perform its merges in a different order than the nearest-neighbor chain algorithm. Both of these properties depend on the specific choice of how to measure the distance between clusters.
 
The correctness of this algorithm relies on a property of its distance function called ''reducibility'', identified by {{harvtxt|Bruynooghe|1977}} in connection with an earlier clustering method that used mutual nearest neighbor pairs but not chains of nearest neighbors.<ref name="b77">{{citation|first=Michel|last=Bruynooghe|title=Méthodes nouvelles en classification automatique de données taxinomiqes nombreuses|journal=Statistique et Analyse des Données|volume=3|pages=24–42|year=1977|url=http://www.numdam.org/item?id=SAD_1977__2_3_24_0}}.</ref> A distance function {{mvar|d}} on clusters is defined to be reducible if, for every three clusters {{mvar|A}}, {{mvar|B}} and {{mvar|C}} in the greedy hierarchical clustering such that {{mvar|A}} and {{mvar|B}} are mutual nearest neighbors, the following inequality holds:
:{{math|''d''(''A'' ∪ ''B'', ''C'') ≥ min(d(''A'',''C''), d(''B'',''C''))}}.