Revision as of 19:40, 29 January 2010 edit Bfoteini (talk \| contribs) 70 edits No edit summary ← Previous edit		Revision as of 19:07, 30 January 2010 edit undo Bfoteini (talk \| contribs) 70 edits No edit summary Next edit →
Line 5: == History == The problem of data stream clustering has recently attracted much attention for its applicability to emerging applications that involve large amount of streaming data such as network flows, sensor data, and web click streams. One of the first results on data streams was due to Munro and Paterson <ref>J.Munro and M. Paterson. [http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4567985 Selection and Sorting with Limited Storage]. ''Theoretical Computer Science'', pages 315-323, 1980</ref> but the model was formalized much later by Henzinger, Raghavan, and Rajagopalan <ref>M. Henzinger, P. Raghavan, and S. Rajagopalan. ''[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.9554 Computing on Data Streams]. ''Digital Equipment Corporation, TR-1998-011'', August 1998.</ref>. [[k-means clustering \| K-means]] is a widely used heuristic for clustering but also alternate algorithms for clustering have been developed such as [[k-Medoids]], [[CURE data clustering algorithm \| CURE]] and the popular [[BIRCH(data clustering) \| BIRCH]]. == Algorithms == Line 23: STREAM is an algorithm for clustering data streams described by Guha, Mishra, Motwani and O'Callaghan <ref>S. Guha, N. Mishra, R. Motwani, L. O'Callaghan. [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.32.1927 Clustering Data Streams]. Proceedings of the Annual Symposium on Foundations of Computer Science, 2000</ref> which achieves a constant factor approximation algorithm for the k-Median problem in a single pass. ~~Some others~~Other well-known algorithms used for data stream clustering ~~include~~are: * BIRCH * COBWEB

Data stream clustering: Difference between revisions