Revision as of 19:37, 12 December 2023 edit 128.197.29.227 (talk) Update prior complexity edit to better fit in the text; add citation. Tag: Visual edit ← Previous edit		Revision as of 19:37, 12 December 2023 edit undo 128.197.29.227 (talk) Minor grammar edit Tag: Visual edit Next edit →
Line 15: == Complexity == The standard algorithm for '''hierarchical agglomerative clustering''' (HAC) has a [[time complexity]] of <math>\mathcal{O}(n^3)</math> and requires <math>\Omega(n^2)</math> memory, which makes it too slow for even medium data sets. However, for some special cases, optimal efficient agglomerative methods (of complexity <math>\mathcal{O}(n^2)</math>) are known: '''SLINK'''<!--boldface per WP:R#PLA--><ref name="SLINK">{{cite journal \| author=R. Sibson \| title=SLINK: an optimally efficient algorithm for the single-link cluster method \| journal=The Computer Journal \| volume=16 \| issue=1 \| pages=30–34 \| year=1973 \| publisher=British Computer Society \| url=http://www.cs.gsu.edu/~wkim/index_files/papers/sibson.pdf \| doi=10.1093/comjnl/16.1.30\| doi-access=free }}</ref> for [[Single-linkage clustering\|single-linkage]] and CLINK<ref name="CLINK">{{cite journal \| author=D. Defays \| title=An efficient algorithm for a complete-link method \| journal=The Computer Journal \| volume=20 \| issue=4 \| pages=364–6 \| year=1977 \| publisher=British Computer Society \| doi=10.1093/comjnl/20.4.364\| doi-access= }}</ref> for [[complete-linkage clustering]]. With a [[heap (data structure)\|heap]], the runtime of the general case can be reduced to <math>\mathcal{O}(n^2 \log n)</math>, an improvement on the aforementioned bound of <math>\mathcal{O}(n^3)</math>, at the cost of further increasing the memory requirements. In many cases, the memory overheads of this approach are too large to make it practically usable. Methods exist towhich use [[Quadtree\|quadtrees]] that demonstrate <math>\mathcal{O}(n^2)</math> total running time with <math>\mathcal{O}(n)</math> space.<ref>{{Cite journal \|last=Eppstein \|first=David \|date=2001-12-31 \|title=Fast hierarchical clustering and other applications of dynamic closest pairs \|url=https://dl.acm.org/doi/10.1145/351827.351829 \|journal=ACM Journal of Experimental Algorithmics \|volume=5 \|pages=1–es \|doi=10.1145/351827.351829 \|issn=1084-6654}}</ref> Divisive clustering with an exhaustive search is <math>\mathcal{O}(2^n)</math>, but it is common to use faster heuristics to choose splits, such as [[k-means clustering\|''k''-means]].

Hierarchical clustering: Difference between revisions