Revision as of 11:25, 23 May 2025 edit 240b:10:2760:7810:40b2:7c79:146e:286b (talk) No edit summary ← Previous edit		Revision as of 11:28, 23 May 2025 edit undo 240b:10:2760:7810:40b2:7c79:146e:286b (talk) No edit summary Next edit →
Line 3: {{Machine learning\|Clustering}} In [[data mining]] and [[statistics]], '''hierarchical clustering'''<ref name="HC">{{cite book \|first=Frank \|last=Nielsen \| title=Introduction to HPC with MPI for Data Science \| year=2016 \| publisher=Springer \|isbn=978-3-319-21903-5 \|pages=195–211 \|chapter=8. Hierarchical Clustering \| url=https://www.springer.com/gp/book/9783319219028 \|chapter-url=https://www.researchgate.net/publication/314700681 }}</ref> (also called '''hierarchical cluster analysis''' or '''HCA''') is a method of [[cluster analysis]] that seeks to build a [[hierarchy]] of clusters. Strategies for hierarchical clustering generally fall into two categories: * '''Agglomerative''': Agglomerative: Agglomerative clustering, often referred to as a "bottom-up" approach, begins with each data point as an individual cluster. At each step, the algorithm merges the two most similar clusters based on a chosen distance metric (e.g., Euclidean distance) and linkage criterion (e.g., single-linkage, complete-linkage)<ref name=":4">{{Cite journal \|last=Murtagh \|first=Fionn \|last2=Contreras \|first2=Pedro \|date=2012 \|title=Algorithms for hierarchical clustering: an overview \|url=https://wires.onlinelibrary.wiley.com/doi/10.1002/widm.53 \|journal=WIREs Data Mining and Knowledge Discovery \|language=en \|volume=2 \|issue=1 \|pages=86–97 \|doi=10.1002/widm.53 \|issn=1942-4795}}</ref>. This process continues until all data points are combined into a single cluster or a stopping criterion is met. Agglomerative methods are more commonly used due to their simplicity and computational efficiency for small to medium-sized datasets <ref>{{Cite journal \|last=Mojena \|first=R. \|date=1977-04-01 \|title=Hierarchical grouping methods and stopping rules: an evaluation \|url=https://academic.oup.com/comjnl/article-lookup/doi/10.1093/comjnl/20.4.359 \|journal=The Computer Journal \|language=en \|volume=20 \|issue=4 \|pages=359–363 \|doi=10.1093/comjnl/20.4.359 \|issn=0010-4620}}</ref>. * '''Divisive''': Divisive clustering, known as a "top-down" approach, starts with all data points in a single cluster and recursively splits the cluster into smaller ones. At each step, the algorithm selects a cluster and divides it into two or more subsets, often using a criterion such as maximizing the distance between resulting clusters. Divisive methods are less common but can be useful when the goal is to identify large, distinct clusters first. In general, the merges and splits are determined in a [[greedy algorithm\|greedy]] manner. The results of hierarchical clustering<ref~~>{{cite book~~ ~~\|first~~name=~~Frank~~"HC" ~~\|last=Nielsen~~/> \|are ~~title=Introduction~~usually topresented ~~HPC~~in ~~with~~a ~~MPI~~[[dendrogram]]. ~~for Data Science \| year=2016 \| publisher=Springer \|isbn=978-3-319-21903-5 \|pages=195–211~~ \|chapter=8. Hierarchical Clustering \| url=https://www.springer.com/gp/book/9783319219028 \|chapter-url=https://www.researchgate.net/publication/314700681 }}</ref> are usually presented in a [[dendrogram]]. Hierarchical clustering has the distinct advantage that any valid measure of distance can be used. In fact, the observations themselves are not required: all that is used is a [[distance matrix\|matrix of distances]]. On the other hand, except for the special case of single-linkage distance, none of the algorithms (except exhaustive search in <math>\mathcal{O}(2^n)</math>) can be guaranteed to find the optimum solution.{{cn\|date=October 2024}}

Hierarchical clustering: Difference between revisions