Hierarchical clustering: Difference between revisions

Content deleted Content added
AnomieBOT (talk | contribs)
Rescuing orphaned refs (":5" from rev 1299220194)
Citation bot (talk | contribs)
Add: bibcode, authors 1-1. Removed URL that duplicated identifier. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Abductive | Category:Cluster analysis algorithms | #UCB_Category 17/42
 
(3 intermediate revisions by 3 users not shown)
Line 5:
In [[data mining]] and [[statistics]], '''hierarchical clustering'''<ref name="HC">{{cite book |first=Frank |last=Nielsen | title=Introduction to HPC with MPI for Data Science | year=2016 | publisher=Springer |isbn=978-3-319-21903-5 |pages=195–211
|chapter=8. Hierarchical Clustering | url=https://www.springer.com/gp/book/9783319219028 |chapter-url=https://www.researchgate.net/publication/314700681 }}</ref> (also called '''hierarchical cluster analysis''' or '''HCA''') is a method of [[cluster analysis]] that seeks to build a [[hierarchy]] of clusters. Strategies for hierarchical clustering generally fall into two categories:
* '''Agglomerative''': Agglomerative: Agglomerative clustering, often referred to as a "bottom-up" approach, begins with each data point as an individual cluster. At each step, the algorithm merges the two most similar clusters based on a chosen distance metric (e.g., Euclidean distance) and linkage criterion (e.g., single-linkage, complete-linkage).<ref name=":4">{{Cite journal |lastlast1=Murtagh |firstfirst1=Fionn |last2=Contreras |first2=Pedro |date=2012 |title=Algorithms for hierarchical clustering: an overview |url=https://wires.onlinelibrary.wiley.com/doi/10.1002/widm.53 |journal=WIREs Data Mining and Knowledge Discovery |language=en |volume=2 |issue=1 |pages=86–97 |doi=10.1002/widm.53 |issn=1942-4795|url-access=subscription }}</ref>. This process continues until all data points are combined into a single cluster or a stopping criterion is met. Agglomerative methods are more commonly used due to their simplicity and computational efficiency for small to medium-sized datasets .<ref>{{Cite journal |last=Mojena |first=R. |date=1977-04-01 |title=Hierarchical grouping methods and stopping rules: an evaluation |url=https://academic.oup.com/comjnl/article-lookup/doi/10.1093/comjnl/20.4.359 |journal=The Computer Journal |language=en |volume=20 |issue=4 |pages=359–363 |doi=10.1093/comjnl/20.4.359 |issn=0010-4620}}</ref>.
* '''Divisive''': Divisive clustering, known as a "top-down" approach, starts with all data points in a single cluster and recursively splits the cluster into smaller ones. At each step, the algorithm selects a cluster and divides it into two or more subsets, often using a criterion such as maximizing the distance between resulting clusters. Divisive methods are less common but can be useful when the goal is to identify large, distinct clusters first.
 
Line 18:
 
== Cluster Linkage ==
In order to decide which clusters should be combined (for agglomerative), or where a cluster should be split (for divisive), a measure of dissimilarity between sets of observations is required. In most methods of hierarchical clustering, this is achieved by use of an appropriate [[distance]] ''d'', such as the Euclidean distance, between ''single'' observations of the data set, and a linkage criterion, which specifies the dissimilarity of ''sets'' as a function of the pairwise distances of observations in the sets. The choice of metric as well as linkage can have a major impact on the result of the clustering, where the lower level metric determines which objects are most [[similarity measure|similar]], whereas the linkage criterion influences the shape of the clusters <ref name=":5">{{Cite journal |last=Wani |first=Aasim Ayaz |date=2024-08-29 |title=Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutions |journal=PeerJ Computer Science |language=en |volume=10 |pages=e2286 |doi=10.7717/peerj-cs.2286 |issn=2376-5992 |pmc=11419652 |pmid=39314716 |doi-access=free}}</ref>. For example, complete-linkage tends to produce more spherical clusters than single-linkage.
 
The linkage criterion determines the distance between sets of observations as a function of the pairwise distances between observations.
Line 76:
- \min_{m\in B} \sum_{y\in B} d(m,y)</math>
|-
|Medoid linkage<ref>{{Cite conference |last1=Miyamoto |first1=Sadaaki |last2=Kaizu |first2=Yousuke |last3=Endo |first3=Yasunori |date=2016 |title=Hierarchical and Non-Hierarchical Medoid Clustering Using Asymmetric Similarity Measures |url=https://ieeexplore.ieee.org/document/7801678 |conference=2016 Joint 8th International Conference on Soft Computing and Intelligent Systems (SCIS) and 17th International Symposium on Advanced Intelligent Systems (ISIS) |pages=400–403 |doi=10.1109/SCIS-ISIS.2016.0091|url-access=subscription }}</ref><ref>{{Cite conference |date=2016 |title=Visual Clutter Reduction through Hierarchy-based Projection of High-dimensional Labeled Data| conference=Graphics Interface |url=https://graphicsinterface.org/wp-content/uploads/gi2016-14.pdf | first1=Dominik|last1=Herr|first2=Qi|last2=Han|first3=Steffen|last3=Lohmann| first4=Thomas |last4=Ertl |access-date=2022-11-04 |website=Graphics Interface |language=en-CA |doi=10.20380/gi2016.14}}</ref>
|<math>d(m_A, m_B)</math> where <math>m_A</math>, <math>m_B</math> are the medoids of the previous clusters
|-
Line 86:
* The probability that candidate clusters spawn from the same distribution function (V-linkage).
* The product of in-degree and out-degree on a k-nearest-neighbour graph (graph degree linkage).<ref>{{Cite book|last1=Zhang|first1=Wei|last2=Wang|first2=Xiaogang|last3=Zhao|first3=Deli|last4=Tang|first4=Xiaoou|title=Computer Vision – ECCV 2012 |chapter=Graph Degree Linkage: Agglomerative Clustering on a Directed Graph |date=2012|editor-last=Fitzgibbon|editor-first=Andrew|editor2-last=Lazebnik|editor2-first=Svetlana|editor2-link= Svetlana Lazebnik |editor3-last=Perona|editor3-first=Pietro|editor4-last=Sato|editor4-first=Yoichi|editor5-last=Schmid|editor5-first=Cordelia|series=Lecture Notes in Computer Science|language=en|publisher=Springer Berlin Heidelberg|volume=7572|pages=428–441|doi=10.1007/978-3-642-33718-5_31|isbn=9783642337185|arxiv=1208.5092|bibcode=2012arXiv1208.5092Z|s2cid=14751}} See also: https://github.com/waynezhanghk/gacluster</ref>
* The increment of some cluster descriptor (i.e., a quantity defined for measuring the quality of a cluster) after merging two clusters.<ref>{{cite journal |first1=W. |last1=Zhang |first2=D. |last2=Zhao |first3=X. |last3=Wang |title=Agglomerative clustering via maximum incremental path integral |journal=Pattern Recognition |volume=46 |issue=11 |pages=3056–65 |date=2013 |doi=10.1016/j.patcog.2013.04.013 |citeseerx=10.1.1.719.5355 |bibcode=2013PatRe..46.3056Z}}</ref><ref>{{cite book |last1=Zhao |first1=D. |last2=Tang |first2=X. |chapter=Cyclizing clusters via zeta function of a graph |chapter-url= |title=NIPS'08: Proceedings of the 21st International Conference on Neural Information Processing Systems |date=2008 |isbn=9781605609492 |pages=1953–60 |publisher=Curran |citeseerx=10.1.1.945.1649}}</ref><ref>{{cite journal |first1=Y. |last1=Ma |first2=H. |last2=Derksen |first3=W. |last3=Hong |first4=J. |last4=Wright |title=Segmentation of Multivariate Mixed Data via Lossy Data Coding and Compression |journal=IEEE Transactions on Pattern Analysis and Machine Intelligence |volume=29 |issue=9 |pages=1546–62 |date=2007 |doi=10.1109/TPAMI.2007.1085 |pmid=17627043 |bibcode=2007ITPAM..29.1546M |hdl=2142/99597 |s2cid=4591894 |hdl-access=free }}</ref>
 
== Agglomerative clustering example ==