Data stream clustering: Difference between revisions

Content deleted Content added
No edit summary
Yobot (talk | contribs)
m WP:CHECKWIKI error #61 fixed + general fixes using AWB (8884)
Line 43:
The problem with the Small-Space is that the number of subsets <math>\ell</math> that we partition ''S'' into is limited, since it has to store in memory the intermediate medians in ''X'''. So, if ''M'' is the size of memory, we need to partition ''S'' into <math>\ell</math> subsets such that each subset fits in memory, (n/<math>\ell</math>) and so that the weighted <math>\ell</math>''k'' centers also fit in memory, <math>\ell</math>''k<M''. But such an <math>\ell</math> may not always exist.
 
The STREAM algorithm solves the problem of storing intermediate medians and achieves better running time and space requirements. The algorithm works as follows:<ref name=cds />:
<ol>
<li> Input the first ''m'' points; using the randomized algorithm presented in <ref name=cds /> reduce these to ''O(k)'' (say ''2k'') points.
Line 55:
 
Other well-known algorithms used for data stream clustering are:
* [[BIRCH (data clustering)|BIRCH]]:<ref>T. Zhang, R. Ramakrishnan, M. Linvy. [http://doi.acm.org/10.1145/235968.233324 BIRCH: An Efficient Data Clustering Method for Very Large Databases], Proceedings of the ACM SIGMOD Conference on Management of Data, 1996</ref> : builds a hierarchical data structure to incrementally cluster the incoming points using the available memory and minimizing the amount of I/O required. The complexity of the algorithm is ''O(N)'' since one pass suffices to get a good clustering (though, results can be improved by allowing several passes).
* [[Cobweb (clustering)|COBWEB]]:<ref>D.H. Fisher [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.9914 Iterative Optimization and Simplification of Hierarchical Clusterings]. Journal of AI Research, Vol 4, 1996</ref> : is an incremental clustering technique that keeps a hierarchical clustering model in the form of a [[Decision tree learning|classification tree]]. For each new point. COBWEB descends the tree, updates the nodes along the way and looks for the best node to put the point on (using a [[Category utility| category utility function]]).
 
== References ==