Content deleted Content added
more |
m v2.05b - Bot T20 CW#61 - Fix errors for CW project (Reference before punctuation) |
||
Line 5:
In [[data mining]] and [[statistics]], '''hierarchical clustering'''<ref name="HC">{{cite book |first=Frank |last=Nielsen | title=Introduction to HPC with MPI for Data Science | year=2016 | publisher=Springer |isbn=978-3-319-21903-5 |pages=195–211
|chapter=8. Hierarchical Clustering | url=https://www.springer.com/gp/book/9783319219028 |chapter-url=https://www.researchgate.net/publication/314700681 }}</ref> (also called '''hierarchical cluster analysis''' or '''HCA''') is a method of [[cluster analysis]] that seeks to build a [[hierarchy]] of clusters. Strategies for hierarchical clustering generally fall into two categories:
* '''Agglomerative''': Agglomerative: Agglomerative clustering, often referred to as a "bottom-up" approach, begins with each data point as an individual cluster. At each step, the algorithm merges the two most similar clusters based on a chosen distance metric (e.g., Euclidean distance) and linkage criterion (e.g., single-linkage, complete-linkage).<ref name=":4">{{Cite journal |last=Murtagh |first=Fionn |last2=Contreras |first2=Pedro |date=2012 |title=Algorithms for hierarchical clustering: an overview |url=https://wires.onlinelibrary.wiley.com/doi/10.1002/widm.53 |journal=WIREs Data Mining and Knowledge Discovery |language=en |volume=2 |issue=1 |pages=86–97 |doi=10.1002/widm.53 |issn=1942-4795|url-access=subscription }}</ref>
* '''Divisive''': Divisive clustering, known as a "top-down" approach, starts with all data points in a single cluster and recursively splits the cluster into smaller ones. At each step, the algorithm selects a cluster and divides it into two or more subsets, often using a criterion such as maximizing the distance between resulting clusters. Divisive methods are less common but can be useful when the goal is to identify large, distinct clusters first.
|