Spectral clustering

This is an old revision of this page, as edited by Tokekar (talk | contribs) at 02:03, 15 December 2011 (Relationship with K-Means). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Given a set of data points A, the similarity matrix may be defined as a matrix , where represents a measure of the similarity between points . Spectral clustering techniques make use of the spectrum of the similarity matrix of the data to perform dimensionality reduction for clustering in fewer dimensions.

Algorithms

One such technique is the Normalized Cuts algorithm or Shi–Malik algorithm introduced by Jianbo Shi and Jitendra Malik,[1] commonly used for image segmentation. It partitions points into two sets   based on the eigenvector   corresponding to the second-smallest eigenvalue of the Laplacian matrix

 

of  , where   is the diagonal matrix

 

This partitioning may be done in various ways, such as by taking the median   of the components in  , and placing all points whose component in   is greater than   in  , and the rest in  . The algorithm can be used for hierarchical clustering by repeatedly partitioning the subsets in this fashion.

A related algorithm is the Meila–Shi algorithm[2], which takes the eigenvectors corresponding to the k largest eigenvalues of the matrix   for some k, and then invokes another algorithm (e.g. k-means) to cluster points by their respective k components in these eigenvectors.

Relationship with K-Means

The Kernel K-Means problem is an extension of the K-Means problem where the input data points are mapped non-linearly into a higher-dimensional feature space via a kernel function  . The weighted kernel K-Means problem further extends this problem by defining a weight   for each cluster,

 

Suppose   is a matrix of the normalizing coefficients for each point for each cluster   if   and zero otherwise. Suppose   is the kernel matrix for all points. The Weighted Kernel K-Means problem with n points and k clusters is given as,

 

such that,

 
 
 
 
 

This problem can be recast as,

 

This problem is equivalent to the Spectral Clustering problem when the identity constraints on F are relaxed.

References

  1. ^ Jianbo Shi and Jitendra Malik, "Normalized Cuts and Image Segmentation", IEEE Transactions on PAMI, Vol. 22, No. 8, Aug 2000.
  2. ^ Marina Meilă & Jianbo Shi, "Learning Segmentation by Random Walks", Processing Systems 13 (NIPS 2000), 2001, pp. 873-879.

See also