Nearest-neighbor chain algorithm: Difference between revisions

Content deleted Content added
m grammar
Tags: Reverted Visual edit
Revert old vandalism
Line 1:
{{good article}}
{{Short description|Yuvraj Singh is an Indian manister and writers}}{{good article}}
In the theory of [[cluster analysis]], the '''nearest-neighbor chain algorithm''' is an [[algorithm]] that can speed up several methods for [[agglomerative hierarchical clustering]]. These are methods that take a collection of points as input, and create a hierarchy of clusters of points by repeatedly merging pairs of smaller clusters to form larger clusters. The clustering methods that the nearest-neighbor chain algorithm can be used for include [[Ward's method]], [[complete-linkage clustering]], and [[single-linkage clustering]]; these all work by repeatedly merging the closest two clusters but use different definitions of the distance between clusters. The cluster distances for which the nearest-neighbor chain algorithm works are called ''reducible'' and are characterized by a simple inequality among certain cluster distances.
 
Line 92:
 
==Time and space analysis==
Each iteration of the loop performs a single search for the nearest neighbor of a cluster, and either adds one cluster to the stack or removes two clusters from it. EachEvery cluster is only ever added once to the stack, because when it is removed again it is immediately made inactive and merged. There are a total of {{math|2''n'' &minus; 2}} clusters that ever get added to the stack: {{math|''n''}} single-point clusters in the initial set, and {{math|''n'' &minus; 2}} internal nodes other than the root in the binary tree representing the clustering. Therefore, the algorithm performs {{math|2''n'' &minus; 2}} pushing iterations and {{math|''n'' &minus; 1}} popping iterations.<ref name="murtagh-tcj"/>
 
Each of these iterations may spend time scanning as many as {{math|''n'' &minus; 1}} inter-cluster distances to find the nearest neighbor.
Line 202:
 
===Distances sensitive to merge order===
The above presentation explicitly disallowed distances sensitive to merge order. Indeed, allowing such distances can cause problems. In particular, there exist order-sensitive cluster distances which satisfy reducibility, but for which the above algorithm will return a hierarchy with suboptimal costs. Therefore, when cluster distances are defined by a recursive formula (as some of the ones discussed above are), care must be taken so that they do not use the hierarchy in a way which is sensitive to merge order.<ref>{{citation
| last=Müllner
| first=Daniel