Content deleted Content added
m Replace curly quotes with straight quotes (see MOS:CURLY). Tag: Reverted |
Citation bot (talk | contribs) Add: bibcode, authors 1-1. Removed URL that duplicated identifier. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Abductive | Category:Cluster analysis algorithms | #UCB_Category 17/42 |
||
(17 intermediate revisions by 10 users not shown) | |||
Line 3:
{{Machine learning|Clustering}}
In [[data mining]] and [[statistics]], '''hierarchical clustering'''<ref
|chapter=8. Hierarchical Clustering | url=https://www.springer.com/gp/book/9783319219028 |chapter-url=https://www.researchgate.net/publication/314700681 }}</ref>
* '''Agglomerative''': Agglomerative clustering, often referred to as a "bottom-up" approach, begins with each data point as an individual cluster. At each step, the algorithm merges the two most similar clusters based on a chosen distance metric (e.g., Euclidean distance) and linkage criterion (e.g., single-linkage, complete-linkage).<ref name=":4">{{Cite journal |last1=Murtagh |first1=Fionn |last2=Contreras |first2=Pedro |date=2012 |title=Algorithms for hierarchical clustering: an overview |url=https://wires.onlinelibrary.wiley.com/doi/10.1002/widm.53 |journal=WIREs Data Mining and Knowledge Discovery |language=en |volume=2 |issue=1 |pages=86–97 |doi=10.1002/widm.53 |issn=1942-4795|url-access=subscription }}</ref> This process continues until all data points are combined into a single cluster or a stopping criterion is met. Agglomerative methods are more commonly used due to their simplicity and computational efficiency for small to medium-sized datasets.<ref>{{Cite journal |last=Mojena |first=R. |date=1977-04-01 |title=Hierarchical grouping methods and stopping rules: an evaluation |url=https://academic.oup.com/comjnl/article-lookup/doi/10.1093/comjnl/20.4.359 |journal=The Computer Journal |language=en |volume=20 |issue=4 |pages=359–363 |doi=10.1093/comjnl/20.4.359 |issn=0010-4620}}</ref>
* '''Divisive''': Divisive clustering, known as a "top-down" approach, starts with all data points in a single cluster and recursively splits the cluster into smaller ones. At each step, the algorithm selects a cluster and divides it into two or more subsets, often using a criterion such as maximizing the distance between resulting clusters. Divisive methods are less common but can be useful when the goal is to identify large, distinct clusters first.
In general, the merges and splits are determined in a [[greedy algorithm|greedy]] manner. The results of hierarchical clustering<ref
▲|chapter=8. Hierarchical Clustering | url=https://www.springer.com/gp/book/9783319219028 |chapter-url=https://www.researchgate.net/publication/314700681 }}</ref> are usually presented in a [[dendrogram]].
Hierarchical clustering has the distinct advantage that any valid measure of distance can be used. In fact, the observations themselves are not required: all that is used is a [[distance matrix|matrix of distances]]. On the other hand, except for the special case of single-linkage distance, none of the algorithms (except exhaustive search in <math>\mathcal{O}(2^n)</math>) can be guaranteed to find the optimum solution.{{cn|date=October 2024}}
== Complexity ==
The standard algorithm for '''hierarchical agglomerative clustering''' (HAC) has a [[time complexity]] of <math>\mathcal{O}(n^3)</math> and requires <math>\Omega(n^2)</math> memory, which makes it too slow for even medium data sets. However, for some special cases, optimal efficient agglomerative methods (of complexity <math>\mathcal{O}(n^2)</math>) are known: '''SLINK'''<!--boldface per WP:R#PLA--><ref name="SLINK">{{cite journal | author=R. Sibson | title=SLINK: an optimally efficient algorithm for the single-link cluster method | journal=The Computer Journal | volume=16 | issue=1 | pages=30–34 | year=1973 | publisher=British Computer Society | url=http://www.cs.gsu.edu/~wkim/index_files/papers/sibson.pdf | doi=10.1093/comjnl/16.1.30| doi-access=free }}</ref> for [[Single-linkage clustering|single-linkage]] and CLINK<ref name="CLINK">{{cite journal | author=D. Defays | title=An efficient algorithm for a complete-link method | journal=The Computer Journal | volume=20 | issue=4 | pages=364–6 | year=1977 | publisher=British Computer Society | doi=10.1093/comjnl/20.4.364| doi-access= }}</ref> for [[complete-linkage clustering]]. With a [[heap (data structure)|heap]], the runtime of the general case can be reduced to <math>\mathcal{O}(n^2 \log n)</math>, an improvement on the aforementioned bound of <math>\mathcal{O}(n^3)</math>, at the cost of further increasing the memory requirements. In many cases, the memory overheads of this approach are too large to make it practically usable. Methods exist which use [[quadtree]]s that demonstrate <math>\mathcal{O}(n^2)</math> total running time with <math>\mathcal{O}(n)</math> space.<ref name=
Divisive clustering with an exhaustive search is <math>\mathcal{O}(2^n)</math>, but it is common to use faster heuristics to choose splits, such as [[k-means clustering|''k''-means]].
==
In
The linkage criterion determines the distance between sets of observations as a function of the pairwise distances between observations.
{|
! Names
! Formula
Line 56 ⟶ 45:
|<math>d(i\cup j, k) = d(m_{i\cup j}, m_k)</math> where <math>m_{i\cup j} = \tfrac{1}{2}\left(m_i + m_j\right)</math>
|-
|Versatile linkage clustering<ref name=":6">{{cite journal | doi=10.1007/s00357-019-09339-z | last1=Fernández | first1=Alberto | last2=Gómez | first2=Sergio | title=Versatile linkage: a family of space-conserving strategies for agglomerative hierarchical clustering | journal=Journal of Classification | volume=37 | year=2020 | issue=3 | pages=584–597| arxiv=1906.09222 | s2cid=195317052 }}</ref>
| <math>\sqrt[p]{\frac{1}{|A|\cdot|B|} \sum_{a \in A }\sum_{ b \in B} d(a,b)^p}, p\neq 0</math>
|-
|[[Ward's method|Ward linkage]],<ref name="wards method">{{cite journal |last=Ward |first=Joe H. |year=1963 |title=Hierarchical Grouping to Optimize an Objective Function |journal=Journal of the American Statistical Association |volume=58 |issue=301 |pages=236–244 |doi=10.2307/2282967 |jstor=2282967 |mr=0148188}}</ref> Minimum Increase of Sum of Squares (MISSQ)<ref name=":0">{{Citation |last=Podani |first=János |title=New combinatorial clustering methods |date=1989 |url=https://doi.org/10.1007/978-94-009-2432-1_5 |work=Numerical syntaxonomy |pages=61–77 |editor-last=Mucina |editor-first=L. |place=Dordrecht |publisher=Springer Netherlands |language=en |doi=10.1007/978-94-009-2432-1_5 |isbn=978-94-009-2432-1 |access-date=2022-11-04 |editor2-last=Dale |editor2-first=M. B.|url-access=subscription }}</ref>
|<math>\frac{|A|\cdot|B|}{|A\cup B|} \lVert \mu_A - \mu_B \rVert ^2
= \sum_{x\in A\cup B} \lVert x - \mu_{A\cup B} \rVert^2
Line 87 ⟶ 76:
- \min_{m\in B} \sum_{y\in B} d(m,y)</math>
|-
|Medoid linkage<ref>{{Cite conference |last1=Miyamoto |first1=Sadaaki |last2=Kaizu |first2=Yousuke |last3=Endo |first3=Yasunori |date=2016 |title=Hierarchical and Non-Hierarchical Medoid Clustering Using Asymmetric Similarity Measures
|<math>d(m_A, m_B)</math> where <math>m_A</math>, <math>m_B</math> are the medoids of the previous clusters
|-
Line 97 ⟶ 86:
* The probability that candidate clusters spawn from the same distribution function (V-linkage).
* The product of in-degree and out-degree on a k-nearest-neighbour graph (graph degree linkage).<ref>{{Cite book|last1=Zhang|first1=Wei|last2=Wang|first2=Xiaogang|last3=Zhao|first3=Deli|last4=Tang|first4=Xiaoou|title=Computer Vision – ECCV 2012 |chapter=Graph Degree Linkage: Agglomerative Clustering on a Directed Graph |date=2012|editor-last=Fitzgibbon|editor-first=Andrew|editor2-last=Lazebnik|editor2-first=Svetlana|editor2-link= Svetlana Lazebnik |editor3-last=Perona|editor3-first=Pietro|editor4-last=Sato|editor4-first=Yoichi|editor5-last=Schmid|editor5-first=Cordelia|series=Lecture Notes in Computer Science|language=en|publisher=Springer Berlin Heidelberg|volume=7572|pages=428–441|doi=10.1007/978-3-642-33718-5_31|isbn=9783642337185|arxiv=1208.5092|bibcode=2012arXiv1208.5092Z|s2cid=14751}} See also: https://github.com/waynezhanghk/gacluster</ref>
* The increment of some cluster descriptor (i.e., a quantity defined for measuring the quality of a cluster) after merging two clusters.<ref>{{cite journal |first1=W. |last1=Zhang |first2=D. |last2=Zhao |first3=X. |last3=Wang |title=Agglomerative clustering via maximum incremental path integral |journal=Pattern Recognition |volume=46 |issue=11 |pages=3056–65 |date=2013 |doi=10.1016/j.patcog.2013.04.013 |citeseerx=10.1.1.719.5355 |bibcode=2013PatRe..46.3056Z}}</ref><ref>{{cite book |last1=Zhao |first1=D. |last2=Tang |first2=X. |chapter=Cyclizing clusters via zeta function of a graph |chapter-url= |title=NIPS'08: Proceedings of the 21st International Conference on Neural Information Processing Systems |date=2008 |isbn=9781605609492 |pages=1953–60 |publisher=Curran |citeseerx=10.1.1.945.1649}}</ref><ref>{{cite journal |first1=Y. |last1=Ma |first2=H. |last2=Derksen |first3=W. |last3=Hong |first4=J. |last4=Wright |title=Segmentation of Multivariate Mixed Data via Lossy Data Coding and Compression |journal=IEEE Transactions on Pattern Analysis and Machine Intelligence |volume=29 |issue=9 |pages=1546–62 |date=2007 |doi=10.1109/TPAMI.2007.1085 |pmid=17627043 |bibcode=2007ITPAM..29.1546M |hdl=2142/99597 |s2cid=4591894 |hdl-access=free }}</ref>
== Agglomerative clustering example ==
Line 151 ⟶ 140:
The dendrogram of DIANA can be constructed by letting the splinter group <math>C_\textrm{new}</math> be a child of the hollowed-out cluster <math>C_*</math> each time. This constructs a tree with <math>C_0</math> as its root and <math>n</math> unique single-object clusters as its leaves.
== Software ==
|