Data stream clustering

In computer science, data stream clustering is defined as the clustering of data that arrive continuously such as telephone records, multimedia data, financial transactions etc. Data stream clustering is usually studied under the data stream model of computation and the objective is, given a sequence of points, to maintain a consistently good clustering of the sequence observed so far, using a small amount of memory and time.

History

The problem of data stream clustering has recently attracted much attention for its applicability to emerging applications that involve a large amount of streaming data such as network flows, sensor data, and web click streams. One of the first results on data streams was due to Munro and Paterson ^[1] but the model was formalized much later by Henzinger, Raghavan, and Rajagopalan ^[2]. The method used for data stream clustering is the k-means

Algorithms

Many algorithms have been proposed for the data stream clustering problem. One of the basic requisites is that the computation must be carried out in small space.

References

^ J.Munro and M. Paterson. Selection and Sorting with Limited Storage. Theoretical Computer Science, pages 315-323, 1980
^ M. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on Data Streams. Digital Equipment Corporation, TR-1998-011, August 1998.

Notes

http://www.cc.gatech.edu/projects/disl/Courses/cs4440/07Fall/project/proposals/Team5Proposal_final.pdf

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5222900&isnumber=5222860&tag=1

http://www.springerlink.com/content/uc06wwfpc8wl04wf/fulltext.pdf

[1] J.Munro and M. Paterson. Selection and Sorting with Limited Storage. Theoretical Computer Science, pages 315-323, 1980

[2] M. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on Data Streams. Digital Equipment Corporation, TR-1998-011, August 1998.

[1]

[2]