Content deleted Content added
m fix common MOS:REFSPACE spacing errors, replaced: ) <ref → )<ref, /> <ref → /><ref |
m →Complexity: name ref |
||
Line 13:
== Complexity ==
The standard algorithm for '''hierarchical agglomerative clustering''' (HAC) has a [[time complexity]] of <math>\mathcal{O}(n^3)</math> and requires <math>\Omega(n^2)</math> memory, which makes it too slow for even medium data sets. However, for some special cases, optimal efficient agglomerative methods (of complexity <math>\mathcal{O}(n^2)</math>) are known: '''SLINK'''<!--boldface per WP:R#PLA--><ref name="SLINK">{{cite journal | author=R. Sibson | title=SLINK: an optimally efficient algorithm for the single-link cluster method | journal=The Computer Journal | volume=16 | issue=1 | pages=30–34 | year=1973 | publisher=British Computer Society | url=http://www.cs.gsu.edu/~wkim/index_files/papers/sibson.pdf | doi=10.1093/comjnl/16.1.30| doi-access=free }}</ref> for [[Single-linkage clustering|single-linkage]] and CLINK<ref name="CLINK">{{cite journal | author=D. Defays | title=An efficient algorithm for a complete-link method | journal=The Computer Journal | volume=20 | issue=4 | pages=364–6 | year=1977 | publisher=British Computer Society | doi=10.1093/comjnl/20.4.364| doi-access= }}</ref> for [[complete-linkage clustering]]. With a [[heap (data structure)|heap]], the runtime of the general case can be reduced to <math>\mathcal{O}(n^2 \log n)</math>, an improvement on the aforementioned bound of <math>\mathcal{O}(n^3)</math>, at the cost of further increasing the memory requirements. In many cases, the memory overheads of this approach are too large to make it practically usable. Methods exist which use [[quadtree]]s that demonstrate <math>\mathcal{O}(n^2)</math> total running time with <math>\mathcal{O}(n)</math> space.<ref name=DE>{{Cite journal |last=Eppstein |first=David |date=2001-12-31 |title=Fast hierarchical clustering and other applications of dynamic closest pairs |url=https://dl.acm.org/doi/10.1145/351827.351829 |journal=ACM Journal of Experimental Algorithmics |volume=5 |pages=1–es |doi=10.1145/351827.351829 |issn=1084-6654|arxiv=cs/9912014 }}</ref>
Divisive clustering with an exhaustive search is <math>\mathcal{O}(2^n)</math>, but it is common to use faster heuristics to choose splits, such as [[k-means clustering|''k''-means]].
|