Nearest-neighbor chain algorithm: Difference between revisions

Content deleted Content added
Line 142:
 
===Complete linkage and average distance===
[[Complete-linkage clustering|Complete-linkage]] or furthest-neighbor clustering is a form of agglomerative clustering that defines the dissimilarity between clusters to be the maximum distance between any two points from the two clusters. Similarly, average-distance clustering uses the average pairwise distance as the dissimilarity. Like Ward's distance, these two forms of clustering obey a formula of Lance-Williams type. In complete linkage, the distance <math>d(A\cup B,C)</math> is greater than the average of the distances <math>d(A,C)</math> and <math>d(B,C)</math>. Therefore, it can be represented as the average plus a positive correction term. For average distance, <math>d(A\cup B,C)</math> is just a weighted average of the distances <math>d(A,C)</math> and <math>d(B,C)</math>.<ref name="mirkin"/><ref name="lance-williams"/> Thus, in both of these cases, the distance is reducible.
 
Unlike Ward's method, these two forms of clustering do not have a constant-time method for computing distances between pairs of clusters. Instead it is possible to maintain an array of distances between all pairs of clusters, using the Lance–Williams formula to update the array as pairs of clusters are merged. Maintaining this array over the course of the clustering algorithm takes time and space {{math|''O''(''n''<sup>2</sup>)}}. The nearest-neighbor chain algorithm may be used in conjunction with this array of distances to find the same clustering as the greedy algorithm for these cases. Its total time and space, using this array, is also {{math|''O''(''n''<sup>2</sup>)}}.<ref name="gm07"/>