Revision as of 20:24, 19 February 2010 edit 128.148.33.5 (talk) No edit summary ← Previous edit		Revision as of 15:44, 9 March 2010 edit undo 72.221.66.121 (talk) No edit summary Next edit →
Line 22: STREAM is an algorithm for clustering data streams described by Guha, Mishra, Motwani and O'Callaghan <ref name=cds > S. Guha, N. Mishra, R. Motwani, L. O'Callaghan. [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.32.1927 Clustering Data Streams]. Proceedings of the Annual Symposium on Foundations of Computer Science, 2000</ref> which achieves a constant factor approximation for the k-Median problem in a single pass and using small space. '''''Theorem''': STREAM can solve the ''k''-Median problem on a data stream in a single pass, with time ''O(n<sup>1+e</sup>)'' and space ''θ(n<sup>ε</sup>)'' up to a factor ''2<sup>O(1/e)</sup>'', where ''n'' the number of points and ''e<1/2''. To understand STREAM, the first step is to show that clustering can take place in small space (not caring about the number of passes). Small-Space is a [[divide-and-conquer algorithm]] that divides the data, ''S'', into <math>\ell</math> pieces, clusters each one of them (using ''k''-means) and then clusters the centers obtained. Line 32: <ol> <li> Divide ''S'' into ~~''l''~~<math>\ell</math> disjoint pieces ''X''<sub>1</sub>,...,''X''<sub><math>_{\ell}</math></sub>''.</li> <li> For each ''i'', find ''O(k)'' centers in ''X<sub>i</sub>''. Assign each point in ''X<sub>i</sub>'' to its closest center.</li>

Data stream clustering: Difference between revisions