Content deleted Content added
increasing link information, agglomerative vs divisive hierarchical clustering |
expand lead; split off history |
||
Line 1:
In the theory of [[cluster analysis]], the '''nearest-neighbor chain algorithm''' is a method that can be used to perform several types of [[agglomerative hierarchical clustering]],
| last = Murtagh | first = Fionn▼
| editor1-last = Abello | editor1-first = James M.▼
| editor2-last = Pardalos | editor2-first = Panos M.▼
The nearest-neighbor chain algorithm uses an amount of memory that is linear in the number of points to be clustered, and determines a clustering in time quadratic in the number of points (linear in the input size, when the input is given as an explicit distance matrix).
| editor3-last = Resende | editor3-first = Mauricio G. C.▼
| contribution = Clustering in massive data sets▼
| isbn = 978-1-4020-0489-6▼
| pages = 513–516▼
| publisher = Springer▼
| series = Massive Computing▼
| title = Handbook of massive data sets▼
| url = http://books.google.com/books?id=_VI0LITp3ecC&pg=PA513▼
| volume = 4▼
▲ | year = 2002}}.</ref> The main idea of the algorithm is to find pairs of clusters to merge by following paths in the [[nearest neighbor graph]] of the clusters until the paths terminate in pairs of mutual nearest neighbors. The algorithm was developed and implemented in 1982 by J. P. Benzécri<ref>{{citation
| last = Benzécri | first = J.-P.▼
| issue = 2▼
| journal = Les Cahiers de l'Analyse des Données▼
| pages = 209–218▼
| title = Construction d'une classification ascendante hiérarchique par la recherche en chaîne des voisins réciproques▼
| url = http://www.numdam.org/item?id=CAD_1982__7_2_209_0▼
| volume = 7▼
| year = 1982}}.</ref> and J. Juan,<ref>{{citation▼
| last = Juan | first = J.▼
| issue = 2▼
| journal = Les Cahiers de l'Analyse des Données▼
| pages = 219–225▼
| title = Programme de classification hiérarchique par l'algorithme de la recherche en chaîne des voisins réciproques▼
| url = http://www.numdam.org/item?id=CAD_1982__7_2_219_0▼
| volume = 7▼
| year = 1982}}.</ref> based on earlier methods that constructed hierarchical clusterings using mutual nearest neighbor pairs without taking advantage of nearest neighbor chains.<ref name="b77"/><ref>{{citation▼
| last = de Rham | first = C.▼
| issue = 2▼
| journal = Les Cahiers de l'Analyse des Données▼
| pages = 135–144▼
| title = La classification hiérarchique ascendante selon la méthode des voisins réciproques▼
| url = http://www.numdam.org/item?id=CAD_1980__5_2_135_0▼
| volume = 5▼
| year = 1980}}.</ref>▼
==Background==
Line 77 ⟶ 44:
| doi = 10.1093/comjnl/26.4.354}}.</ref>
More formally, the algorithm performs the following steps:<ref name="murtagh-
▲ | last = Murtagh | first = Fionn
▲ | editor1-last = Abello | editor1-first = James M.
▲ | editor2-last = Pardalos | editor2-first = Panos M.
▲ | editor3-last = Resende | editor3-first = Mauricio G. C.
▲ | contribution = Clustering in massive data sets
▲ | isbn = 978-1-4020-0489-6
▲ | pages = 513–516
▲ | publisher = Springer
▲ | series = Massive Computing
▲ | title = Handbook of massive data sets
▲ | url = http://books.google.com/books?id=_VI0LITp3ecC&pg=PA513
▲ | volume = 4
| year = 2002}}.</ref>
*Initialize the set of active clusters to consist of {{mvar|n}} one-point clusters, one for each input point.
*Let {{mvar|S}} be a [[Stack (data structure)|stack data structure]], initially empty, the elements of which will be active clusters.
Line 190 ⟶ 170:
| year=2011 }}.</ref> Following the earlier discussion of the value of defining cluster distances recursively (so that [[memoization]] can be used
to greatly speed up distance computations), care must be taken with recursively defined distances so that they are not using the hierarchy in a way which is sensitive to merge order.
==History==
The algorithm was developed and implemented in 1982 by J. P. Benzécri<ref>{{citation
▲ | last = Benzécri | first = J.-P.
▲ | issue = 2
▲ | journal = Les Cahiers de l'Analyse des Données
▲ | pages = 209–218
▲ | title = Construction d'une classification ascendante hiérarchique par la recherche en chaîne des voisins réciproques
▲ | url = http://www.numdam.org/item?id=CAD_1982__7_2_209_0
▲ | volume = 7
▲ | last = Juan | first = J.
▲ | issue = 2
▲ | journal = Les Cahiers de l'Analyse des Données
▲ | pages = 219–225
▲ | title = Programme de classification hiérarchique par l'algorithme de la recherche en chaîne des voisins réciproques
▲ | url = http://www.numdam.org/item?id=CAD_1982__7_2_219_0
▲ | volume = 7
▲ | year = 1982}}.</ref> They based this algorithm on earlier methods that constructed hierarchical clusterings using mutual nearest neighbor pairs without taking advantage of nearest neighbor chains.<ref name="b77"/><ref>{{citation
▲ | last = de Rham | first = C.
▲ | issue = 2
▲ | journal = Les Cahiers de l'Analyse des Données
▲ | pages = 135–144
▲ | title = La classification hiérarchique ascendante selon la méthode des voisins réciproques
▲ | url = http://www.numdam.org/item?id=CAD_1980__5_2_135_0
▲ | volume = 5
▲ | year = 1980}}.</ref>
==References==
|