Spectral clustering: Difference between revisions

Content deleted Content added
Took (talk | contribs)
 
Split content from Cluster analysis.
Line 1:
Given a set of data points A, the [[similarity matrix]] may be defined as a matrix <math>S</math> where <math>S_{ij}</math> represents a measure of the similarity between points <math>i, j\in A</math>. Spectral clustering techniques make use of the [[Spectrum of a matrix|spectrum]] of the similarity matrix of the data to perform [[dimensionality reduction]] for clustering in fewer dimensions.
#REDIRECT [[Cluster_analysis#Spectral_clustering]]
 
One such technique is the '''[[Segmentation_based_object_categorization#Normalized_Cuts|Normalized Cuts algorithm]]''' or ''Shi–Malik algorithm'' introduced by Jianbo Shi and Jitendra Malik,<ref>Jianbo Shi and Jitendra Malik, [http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf "Normalized Cuts and Image Segmentation"], IEEE Transactions on PAMI, Vol. 22, No. 8, Aug 2000.</ref> commonly used for [[segmentation (image processing)|image segmentation]]. It partitions points into two sets <math>(S_1,S_2)</math> based on the [[eigenvector]] <math>v</math> corresponding to the second-smallest [[eigenvalue]] of the [[Laplacian matrix]]
 
:<math>L = I - D^{-1/2}SD^{-1/2}</math>
 
of <math>S</math>, where <math>D</math> is the diagonal matrix
 
:<math>D_{ii} = \sum_{j} S_{ij}.</math>
 
This partitioning may be done in various ways, such as by taking the median <math>m</math> of the components in <math>v</math>, and placing all points whose component in <math>v</math> is greater than <math>m</math> in <math>S_1</math>, and the rest in <math>S_2</math>. The algorithm can be used for hierarchical clustering by repeatedly partitioning the subsets in this fashion.
 
A related algorithm is the '''[[Meila–Shi algorithm]]'''<ref>Marina Meilă & Jianbo Shi, "[http://www.citeulike.org/user/mpotamias/article/498897 Learning Segmentation by Random Walks]", Processing Systems 13 (NIPS 2000), 2001, pp. 873-879.</ref>, which takes the [[eigenvector]]s corresponding to the ''k'' largest [[eigenvalue]]s of the matrix <math>P = SD^{-1}</math> for some ''k'', and then invokes another algorithm (e.g. ''k''-means) to cluster points by their respective ''k'' components in these eigenvectors.
 
== References ==
<references />
 
== See also ==
* [[Kernel principal component analysis]]
* [[Cluster analysis]]
 
[[Category:Data clustering algorithms]]