Nearest-neighbor chain algorithm: Difference between revisions

Content deleted Content added
Line 144:
[[Complete-linkage clustering|Complete-linkage]] or furthest-neighbor clustering is a form of agglomerative clustering that defines the dissimilarity between clusters to be the maximum distance between any two points from the two clusters. Similarly, average-distance clustering uses the average pairwise distance as the dissimilarity. Like Ward's distance, these two forms of clustering obey a formula of Lance-Williams type. In complete linkage, the distance <math>d(A\cup B,C)</math> is the maximum of the two distances <math>d(A,C)</math> and <math>d(B,C)</math>. Therefore, it is at least equal to the minimum of these two distances, the requirement for being reducible. For average distance, <math>d(A\cup B,C)</math> is just a weighted average of the distances <math>d(A,C)</math> and <math>d(B,C)</math>. Again, this is at least as large as the minimum of the two distances. Thus, in both of these cases, the distance is reducible.<ref name="mirkin"/><ref name="lance-williams"/>
 
Unlike Ward's method, these two forms of clustering do not have a constant-time method for computing distances between pairs of clusters. Instead it is possible to maintain an array of distances between all pairs of clusters,. usingWhenever thetwo Lance–Williamsclusters are merged, the formula can be used to updatecompute the arraydistance asbetween pairsthe ofmerged clusterscluster areand mergedall other clusters. Maintaining this array over the course of the clustering algorithm takes time and space {{math|''O''(''n''<sup>2</sup>)}}. The nearest-neighbor chain algorithm may be used in conjunction with this array of distances to find the same clustering as the greedy algorithm for these cases. Its total time and space, using this array, is also {{math|''O''(''n''<sup>2</sup>)}}.<ref name="gm07"/>
 
The same {{math|''O''(''n''<sup>2</sup>)}} time and space bounds can also be achieved in a different way,