Revision as of 19:09, 29 January 2010 edit Bfoteini (talk \| contribs) 70 edits No edit summary ← Previous edit		Revision as of 19:26, 29 January 2010 edit undo Bfoteini (talk \| contribs) 70 edits →Algorithms Next edit →
Line 14: These algorithms have many similarities with [[online algorithms]] but they are not identical. Unlike online algorithms, algorithms for data stream clustering have only a bounded amount of memory available and they may be able to take action after a group of points arrives while online algorithms are required to take action after each point arrives. Since data stream algorithms have limited memory available, the first goal is to show that clustering can take place in small space (not caring about the number of passes). Small-Space<ref>S. Guha, A. Meyerson, N. Mishra, R. Motwani. ''[http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1198387&tag=1 Clustering Data Streams: Theory and Practice]''. IEEE Transactions on Knowledge and Data Engineering, Volume 15, Issue 3, 2003 </ref> is a [[divide-and-conquer algorithm]] that divides the data into pieces, clusters each one of them (using k-means) and then clusters the centers obtained (where each center is weighted by the number of points assigned to it). [[File:Small-Space.jpg \| thumb \| center \| 520x260px \| Small-Space Algorithm representation]] The approximation factor of this algorithm is <math>2c(1+2b)+2b</math> and if we generalize it so that it recursively calls itself, <math>i</math> times on a successively smaller set of weighted centers then it gives a constant factor approximation to the k-median problem, which, as expected, worsens with each successive reclustering. The STREAM algorithm for clustering data streams described in Line 24 ⟶ 28: * COBWEB * STREAM == References ==

Data stream clustering: Difference between revisions