Revision as of 00:50, 19 May 2025 edit Steel1943 (talk \| contribs) Autopatrolled, Extended confirmed users, Page movers, Pending changes reviewers, Rollbackers, Template editors 228,806 edits m fix common MOS:REFSPACE spacing errors, replaced: ) <ref → )<ref, /> <ref → /><ref Tag: AWB ← Previous edit		Revision as of 22:21, 22 May 2025 edit undo Rgdboer (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers 18,111 edits m →Complexity: name ref Next edit →
Line 13: == Complexity == The standard algorithm for '''hierarchical agglomerative clustering''' (HAC) has a [[time complexity]] of <math>\mathcal{O}(n^3)</math> and requires <math>\Omega(n^2)</math> memory, which makes it too slow for even medium data sets. However, for some special cases, optimal efficient agglomerative methods (of complexity <math>\mathcal{O}(n^2)</math>) are known: '''SLINK'''<!--boldface per WP:R#PLA--><ref name="SLINK">{{cite journal \| author=R. Sibson \| title=SLINK: an optimally efficient algorithm for the single-link cluster method \| journal=The Computer Journal \| volume=16 \| issue=1 \| pages=30–34 \| year=1973 \| publisher=British Computer Society \| url=http://www.cs.gsu.edu/~wkim/index_files/papers/sibson.pdf \| doi=10.1093/comjnl/16.1.30\| doi-access=free }}</ref> for [[Single-linkage clustering\|single-linkage]] and CLINK<ref name="CLINK">{{cite journal \| author=D. Defays \| title=An efficient algorithm for a complete-link method \| journal=The Computer Journal \| volume=20 \| issue=4 \| pages=364–6 \| year=1977 \| publisher=British Computer Society \| doi=10.1093/comjnl/20.4.364\| doi-access= }}</ref> for [[complete-linkage clustering]]. With a [[heap (data structure)\|heap]], the runtime of the general case can be reduced to <math>\mathcal{O}(n^2 \log n)</math>, an improvement on the aforementioned bound of <math>\mathcal{O}(n^3)</math>, at the cost of further increasing the memory requirements. In many cases, the memory overheads of this approach are too large to make it practically usable. Methods exist which use [[quadtree]]s that demonstrate <math>\mathcal{O}(n^2)</math> total running time with <math>\mathcal{O}(n)</math> space.<ref name=DE>{{Cite journal \|last=Eppstein \|first=David \|date=2001-12-31 \|title=Fast hierarchical clustering and other applications of dynamic closest pairs \|url=https://dl.acm.org/doi/10.1145/351827.351829 \|journal=ACM Journal of Experimental Algorithmics \|volume=5 \|pages=1–es \|doi=10.1145/351827.351829 \|issn=1084-6654\|arxiv=cs/9912014 }}</ref> Divisive clustering with an exhaustive search is <math>\mathcal{O}(2^n)</math>, but it is common to use faster heuristics to choose splits, such as [[k-means clustering\|''k''-means]].

Hierarchical clustering: Difference between revisions