Content deleted Content added
No edit summary |
|||
Line 17:
=== STREAM ===
STREAM is an algorithm for clustering data streams described by Guha, Mishra, Motwani and O'Callaghan <ref name=
'''''Theorem''': STREAM can solve the k-Median problem on a data stream with time O(n<sup>1+e</sup>) and space θ(n<sup>ε</sup>) up to a factor 2<sup>O(1/e)</sup>, where n the number of points and e<1/2''.
Line 45:
The actual STREAM algorithm works as follows:
<ol>
<li> Input the first ''O(M/k)'' points, where ''M'' denotes bounded memory and ''k'' number of centers. Using the randomized algorithm presented in <ref name=
<li> Use a local search algorithm to cluster'' O(M)'' intermediate medians of level ''i'' to ''2k'' medians of level ''i+1''.
<li> Use the primal dual algorithm <ref> K. Jain and V. Vazirani. [http://portal.acm.org/citation.cfm?id=796509 Primal-dual approximation algorithms for metric facility ___location and k-median problems.] Proc. FOCS, 1999</ref> to cluster the ''O(M)'' medians to k medians.
Line 53:
Other well-known algorithms used for data stream clustering are:
* [[BIRCH (data clustering) | BIRCH]]<ref>T. Zhang, R. Ramakrishnan, M. Linvy. [http://doi.acm.org/10.1145/235968.233324 BIRCH: An Efficient Data Clustering Method for Very Large Databases], Proceedings of the ACM SIGMOD Conference on Management of Data, 1996</ref> : builds a hierarchical data structure to incrementally cluster the incoming points using the available memory and minimizing the amount of I/O required. The complexity of the algorithm is O(N) since one pass suffices to get a good clustering (though, results can be improved by allowing several passes).
* COBWEB
== References ==
<blockquote>
<references> <references/>
</blockquote>
{{Reflist}}
== Notes ==
|