Revision as of 17:47, 7 July 2025 edit MrOllie (talk \| contribs) Extended confirmed users, Pending changes reviewers, Rollbackers 255,382 edits more ← Previous edit		Revision as of 05:45, 9 July 2025 edit undo WikiCleanerBot (talk \| contribs) Bots 1,007,735 edits m v2.05b - Bot T20 CW#61 - Fix errors for CW project (Reference before punctuation) Tag: WPCleaner Next edit →
Line 5: In [[data mining]] and [[statistics]], '''hierarchical clustering'''<ref name="HC">{{cite book \|first=Frank \|last=Nielsen \| title=Introduction to HPC with MPI for Data Science \| year=2016 \| publisher=Springer \|isbn=978-3-319-21903-5 \|pages=195–211 \|chapter=8. Hierarchical Clustering \| url=https://www.springer.com/gp/book/9783319219028 \|chapter-url=https://www.researchgate.net/publication/314700681 }}</ref> (also called '''hierarchical cluster analysis''' or '''HCA''') is a method of [[cluster analysis]] that seeks to build a [[hierarchy]] of clusters. Strategies for hierarchical clustering generally fall into two categories: * '''Agglomerative''': Agglomerative: Agglomerative clustering, often referred to as a "bottom-up" approach, begins with each data point as an individual cluster. At each step, the algorithm merges the two most similar clusters based on a chosen distance metric (e.g., Euclidean distance) and linkage criterion (e.g., single-linkage, complete-linkage).<ref name=":4">{{Cite journal \|last=Murtagh \|first=Fionn \|last2=Contreras \|first2=Pedro \|date=2012 \|title=Algorithms for hierarchical clustering: an overview \|url=https://wires.onlinelibrary.wiley.com/doi/10.1002/widm.53 \|journal=WIREs Data Mining and Knowledge Discovery \|language=en \|volume=2 \|issue=1 \|pages=86–97 \|doi=10.1002/widm.53 \|issn=1942-4795\|url-access=subscription }}</ref>. This process continues until all data points are combined into a single cluster or a stopping criterion is met. Agglomerative methods are more commonly used due to their simplicity and computational efficiency for small to medium-sized datasets .<ref>{{Cite journal \|last=Mojena \|first=R. \|date=1977-04-01 \|title=Hierarchical grouping methods and stopping rules: an evaluation \|url=https://academic.oup.com/comjnl/article-lookup/doi/10.1093/comjnl/20.4.359 \|journal=The Computer Journal \|language=en \|volume=20 \|issue=4 \|pages=359–363 \|doi=10.1093/comjnl/20.4.359 \|issn=0010-4620}}</ref>. * '''Divisive''': Divisive clustering, known as a "top-down" approach, starts with all data points in a single cluster and recursively splits the cluster into smaller ones. At each step, the algorithm selects a cluster and divides it into two or more subsets, often using a criterion such as maximizing the distance between resulting clusters. Divisive methods are less common but can be useful when the goal is to identify large, distinct clusters first.

Hierarchical clustering: Difference between revisions