Revision as of 11:13, 14 May 2016 edit Fmadd (talk \| contribs) Extended confirmed users 10,468 edits increasing link information, agglomerative vs divisive hierarchical clustering ← Previous edit		Revision as of 18:26, 14 May 2016 edit undo David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,864 edits expand lead; split off history Next edit →
Line 1: In the theory of [[cluster analysis]], the '''nearest-neighbor chain algorithm''' is a method that can be used to perform several types of [[agglomerative hierarchical clustering]], ~~using~~in anwhich ~~amount~~a ofhierarchy ~~memory~~of ~~that~~clusters is ~~linear~~created inby ~~the~~repeatedly ~~number~~merging pairs of ~~points~~smaller clusters to form larger clusters. In particular it can be ~~clustered~~used for [[Ward's method]], [[complete-linkage clustering]], and an[[single-linkage ~~amount~~clustering]], ofwhich ~~time~~all ~~linear~~work inby merging the ~~number~~closest oftwo ~~distinct~~clusters ~~distances~~under ~~between~~different ~~pairs~~definitions of ~~points.<ref~~the ~~name="murtagh-hmds">{{citation~~distance between clusters. \| last = Murtagh \| first = Fionn▼ ~~\| year = 2002}}.</ref>~~ The main idea of the algorithm is to find pairs of clusters to merge by following paths in the [[nearest neighbor graph]] of the clusters until the paths terminate in pairs of mutual nearest neighbors. ~~The~~That ~~algorithm~~pair ~~was~~of ~~developed~~clusters ~~and~~is ~~implemented~~then inchosen ~~1982~~as the pair to merge by J.the Palgorithm. ~~Benzécri<ref>{{citation~~The resulting system of merges occurs in a different order than in a naive implementation of the same clustering algorithms, but can be shown to generate the same hierarchy of clusters.▼ \| editor1-last = Abello \| editor1-first = James M.▼ \| editor2-last = Pardalos \| editor2-first = Panos M.▼ The nearest-neighbor chain algorithm uses an amount of memory that is linear in the number of points to be clustered, and determines a clustering in time quadratic in the number of points (linear in the input size, when the input is given as an explicit distance matrix). \| editor3-last = Resende \| editor3-first = Mauricio G. C.▼ \| contribution = Clustering in massive data sets▼ \| isbn = 978-1-4020-0489-6▼ \| pages = 513–516▼ \| publisher = Springer▼ \| series = Massive Computing▼ \| title = Handbook of massive data sets▼ \| url = http://books.google.com/books?id=_VI0LITp3ecC&pg=PA513▼ \| volume = 4▼ ▲ \| year = 2002}}.</ref> The main idea of the algorithm is to find pairs of clusters to merge by following paths in the [[nearest neighbor graph]] of the clusters until the paths terminate in pairs of mutual nearest neighbors. The algorithm was developed and implemented in 1982 by J. P. Benzécri<ref>{{citation \| last = Benzécri \| first = J.-P.▼ \| issue = 2▼ \| journal = Les Cahiers de l'Analyse des Données▼ \| pages = 209–218▼ \| title = Construction d'une classification ascendante hiérarchique par la recherche en chaîne des voisins réciproques▼ \| url = http://www.numdam.org/item?id=CAD_1982__7_2_209_0▼ \| volume = 7▼ \| year = 1982}}.</ref> and J. Juan,<ref>{{citation▼ \| last = Juan \| first = J.▼ \| issue = 2▼ \| journal = Les Cahiers de l'Analyse des Données▼ \| pages = 219–225▼ \| title = Programme de classification hiérarchique par l'algorithme de la recherche en chaîne des voisins réciproques▼ \| url = http://www.numdam.org/item?id=CAD_1982__7_2_219_0▼ \| volume = 7▼ \| year = 1982}}.</ref> based on earlier methods that constructed hierarchical clusterings using mutual nearest neighbor pairs without taking advantage of nearest neighbor chains.<ref name="b77"/><ref>{{citation▼ \| last = de Rham \| first = C.▼ \| issue = 2▼ \| journal = Les Cahiers de l'Analyse des Données▼ \| pages = 135–144▼ \| title = La classification hiérarchique ascendante selon la méthode des voisins réciproques▼ \| url = http://www.numdam.org/item?id=CAD_1980__5_2_135_0▼ \| volume = 5▼ \| year = 1980}}.</ref>▼ ==Background== Line 77 ⟶ 44: \| doi = 10.1093/comjnl/26.4.354}}.</ref> More formally, the algorithm performs the following steps:<ref name="murtagh-~~hmds~~tcj"/><ref name="murtagh-~~tcj~~hmds"/>{{citation ▲ \| last = Murtagh \| first = Fionn ▲ \| editor1-last = Abello \| editor1-first = James M. ▲ \| editor2-last = Pardalos \| editor2-first = Panos M. ▲ \| editor3-last = Resende \| editor3-first = Mauricio G. C. ▲ \| contribution = Clustering in massive data sets ▲ \| isbn = 978-1-4020-0489-6 ▲ \| pages = 513–516 ▲ \| publisher = Springer ▲ \| series = Massive Computing ▲ \| title = Handbook of massive data sets ▲ \| url = http://books.google.com/books?id=_VI0LITp3ecC&pg=PA513 ▲ \| volume = 4 \| year = 2002}}.</ref> Initialize the set of active clusters to consist of {{mvar\|n}} one-point clusters, one for each input point. Let {{mvar\|S}} be a [[Stack (data structure)\|stack data structure]], initially empty, the elements of which will be active clusters. Line 190 ⟶ 170: \| year=2011 }}.</ref> Following the earlier discussion of the value of defining cluster distances recursively (so that [[memoization]] can be used to greatly speed up distance computations), care must be taken with recursively defined distances so that they are not using the hierarchy in a way which is sensitive to merge order. ==History== The algorithm was developed and implemented in 1982 by J. P. Benzécri<ref>{{citation ▲ \| last = Benzécri \| first = J.-P. ▲ \| issue = 2 ▲ \| journal = Les Cahiers de l'Analyse des Données ▲ \| pages = 209–218 ▲ \| title = Construction d'une classification ascendante hiérarchique par la recherche en chaîne des voisins réciproques ▲ \| url = http://www.numdam.org/item?id=CAD_1982__7_2_209_0 ▲ \| volume = 7 ▲ \| year = 1982}}.</ref> and J. Juan,.<ref>{{citation ▲ \| last = Juan \| first = J. ▲ \| issue = 2 ▲ \| journal = Les Cahiers de l'Analyse des Données ▲ \| pages = 219–225 ▲ \| title = Programme de classification hiérarchique par l'algorithme de la recherche en chaîne des voisins réciproques ▲ \| url = http://www.numdam.org/item?id=CAD_1982__7_2_219_0 ▲ \| volume = 7 ▲ \| year = 1982}}.</ref> They based this algorithm on earlier methods that constructed hierarchical clusterings using mutual nearest neighbor pairs without taking advantage of nearest neighbor chains.<ref name="b77"/><ref>{{citation ▲ \| last = de Rham \| first = C. ▲ \| issue = 2 ▲ \| journal = Les Cahiers de l'Analyse des Données ▲ \| pages = 135–144 ▲ \| title = La classification hiérarchique ascendante selon la méthode des voisins réciproques ▲ \| url = http://www.numdam.org/item?id=CAD_1980__5_2_135_0 ▲ \| volume = 5 ▲ \| year = 1980}}.</ref> ==References==

Nearest-neighbor chain algorithm: Difference between revisions