Spectral clustering

Given a set of data points A, the similarity matrix may be defined as a matrix $S$ , where $S_{ij}$ represents a measure of the similarity between points $i,j\in A$ . Spectral clustering techniques make use of the spectrum of the similarity matrix of the data to perform dimensionality reduction for clustering in fewer dimensions.

One such technique is the Normalized Cuts algorithm or Shi–Malik algorithm introduced by Jianbo Shi and Jitendra Malik,^[1] commonly used for image segmentation. It partitions points into two sets $(S_{1},S_{2})$ based on the eigenvector $v$ corresponding to the second-smallest eigenvalue of the Laplacian matrix

L=I-D^{-1/2}SD^{-1/2}

of $S$ , where $D$ is the diagonal matrix

D_{ii}=\sum _{j}S_{ij}.

This partitioning may be done in various ways, such as by taking the median $m$ of the components in $v$ , and placing all points whose component in $v$ is greater than $m$ in $S_{1}$ , and the rest in $S_{2}$ . The algorithm can be used for hierarchical clustering by repeatedly partitioning the subsets in this fashion.

A related algorithm is the Meila–Shi algorithm^[2], which takes the eigenvectors corresponding to the k largest eigenvalues of the matrix $P=SD^{-1}$ for some k, and then invokes another algorithm (e.g. k-means) to cluster points by their respective k components in these eigenvectors.

References

^ Jianbo Shi and Jitendra Malik, "Normalized Cuts and Image Segmentation", IEEE Transactions on PAMI, Vol. 22, No. 8, Aug 2000.
^ Marina Meilă & Jianbo Shi, "Learning Segmentation by Random Walks", Processing Systems 13 (NIPS 2000), 2001, pp. 873-879.

Spectral clustering

References

See also