Nearest-neighbor chain algorithm: Difference between revisions

Content deleted Content added
m changed the width field on the image to reflect the new version's size.
m Complete linkage and average distance: more consistent math formatting
Line 147:
 
===Complete linkage and average distance===
[[Complete-linkage clustering|Complete-linkage]] or furthest-neighbor clustering is a form of agglomerative clustering that uses the maximum distance between any two points from the two clusters as the dissimilarity, and similarly average-distance clustering uses the average pairwise distance. Like Ward's distance, these forms of clustering obey a formula of Lance-Williams type: in complete linkage, the distance {{<math|>d(A\cup B,C)}}</math> is the average of the distances <math>d(A,C)</math> and <math>d(B,C)</math> plus a positive correction term, while for average distance it is just a weighted average of the distances <math>d(A,C)</math> and <math>d(B,C)</math>.<ref name="mirkin"/><ref name="lance-williams"/> Thus, in both of these cases, the distance is reducible.
 
Unlike Ward's method, these two forms of clustering do not have a constant-time method for computing distances between pairs of clusters. Instead it is possible to maintain an array of distances between all pairs of clusters, using the Lance–Williams formula to update the array as pairs of clusters are merged, in time and space {{math|''O''(''n''<sup>2</sup>)}}. The nearest-neighbor chain algorithm may be used in conjunction with this array of distances to find the same clustering as the greedy algorithm for these cases in total time and space {{math|''O''(''n''<sup>2</sup>)}}. The same {{math|''O''(''n''<sup>2</sup>)}} time and space bounds can also be achieved by a different and more general technique that overlays a [[quadtree]]-based priority queue data structure on top of the distance matrix and uses it to perform the standard greedy clustering algorithm, avoiding the need for reducibility,<ref name="e-jea"/> but the nearest-neighbor chain algorithm matches its time and space bounds while using simpler data structures.<ref>{{citation