Content deleted Content added
No edit summary |
m WP:CHECKWIKI error #61 fixed + general fixes using AWB (8884) |
||
Line 43:
The problem with the Small-Space is that the number of subsets <math>\ell</math> that we partition ''S'' into is limited, since it has to store in memory the intermediate medians in ''X'''. So, if ''M'' is the size of memory, we need to partition ''S'' into <math>\ell</math> subsets such that each subset fits in memory, (n/<math>\ell</math>) and so that the weighted <math>\ell</math>''k'' centers also fit in memory, <math>\ell</math>''k<M''. But such an <math>\ell</math> may not always exist.
The STREAM algorithm solves the problem of storing intermediate medians and achieves better running time and space requirements. The algorithm works as follows:<ref name=cds />
<ol>
<li> Input the first ''m'' points; using the randomized algorithm presented in <ref name=cds /> reduce these to ''O(k)'' (say ''2k'') points.
Line 55:
Other well-known algorithms used for data stream clustering are:
* [[BIRCH (data clustering)|BIRCH]]:<ref>T. Zhang, R. Ramakrishnan, M. Linvy. [http://doi.acm.org/10.1145/235968.233324 BIRCH: An Efficient Data Clustering Method for Very Large Databases], Proceedings of the ACM SIGMOD Conference on Management of Data, 1996</ref>
* [[Cobweb (clustering)|COBWEB]]:<ref>D.H. Fisher [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.9914 Iterative Optimization and Simplification of Hierarchical Clusterings]. Journal of AI Research, Vol 4, 1996</ref>
== References ==
|