Data stream clustering: Difference between revisions

Content deleted Content added
No edit summary
Line 20:
=== STREAM ===
 
STREAM is an algorithm for clustering data streams described by Guha, Mishra, Motwani and O'Callaghan <ref name=cds > S. Guha, N. Mishra, R. Motwani, L. O'Callaghan. [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.32.1927 Clustering Data Streams]. Proceedings of the Annual Symposium on Foundations of Computer Science, 2000</ref> which achieves a [[approximation algorithm | constant factor approximation]] for the k-Median problem in a single pass and using small space.
 
'''''Theorem''': STREAM can solve the ''k''-Median problem on a data stream in a single pass, with time ''O(n<sup>1+e</sup>)'' and space ''θ(n<sup>ε</sup>)'' up to a factor ''2<sup>O(1/e)</sup>'', where ''n'' the number of points and ''e<1/2''.
Line 51:
<ol>
<li> Input the first ''O(M / k)'' points, where ''M'' denotes bounded memory and ''k'' number of centers. Using the randomized algorithm presented in <ref name=cds /> cluster this to ''2k'' intermediate median points.
<li> Use a [[Local search (optimization) | local search]] algorithm to cluster'' O(M)'' intermediate medians of level ''i'' to ''2k'' medians of level ''i+1''.
<li> Use the primal dual algorithm <ref> K. Jain and V. Vazirani. [http://portal.acm.org/citation.cfm?id=796509 Primal-dual approximation algorithms for metric facility ___location and k-median problems.] Proc. FOCS, 1999</ref> to cluster the ''O(M)'' medians to k medians.
</ol>