Spectral clustering

This is an old revision of this page, as edited by 188.104.93.153 (talk) at 18:53, 3 May 2014 (rm pseudo papercore reference.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In multivariate statistics and the clustering of data, spectral clustering techniques make use of the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions. The similarity matrix is provided as an input and consists of a quantitative assessment of the relative similarity of each pair of points in the dataset.

File:K-means v.s. Spectral Clustering.png
A figure showing the relative strengths of K-means and spectral clustering.[1]

In application to image segmentation, spectral clustering is known as segmentation-based object categorization.

Algorithms

Given an enumerated set of data points, the similarity matrix may be defined as a symmetric matrix  , where   represents a measure of the similarity between data points with indexes   and  .

One spectral clustering technique is the normalized cuts algorithm or Shi–Malik algorithm introduced by Jianbo Shi and Jitendra Malik,[2] commonly used for image segmentation. It partitions points into two sets   based on the eigenvector   corresponding to the second-smallest eigenvalue of the symmetric normalized Laplacian defined as

 ,

where   is the diagonal matrix

 

A mathematically equivalent algorithm [3] takes the eigenvector corresponding to the largest eigenvalue of the random walk normalized Laplacian matrix  .

Another possibility is to use the Laplacian matrix defined as

 

rather than the symmetric normalized Laplacian matrix.

Partitioning may be done in various ways, such as by taking the median   of the components in  , and placing all points whose component in   is greater than   in  , and the rest in  . The algorithm can be used for hierarchical clustering by repeatedly partitioning the subsets in this fashion.

Alternatively to computing just one eigenvector, k eigenvectors for some k, are computed, and then another algorithm (e.g. k-means clustering) is used to cluster points by their respective k components in these eigenvectors.

An efficiency of spectral clustering may be improved if the solve of the corresponding eigenvalue problem is performed in a matrix-free fashion, i.e., without explicitly manipulating or even computing the similarity matrix, as, e.g., in the Lanczos algorithm.

For large-size graphs, the second eigenvalue of the (normalized) graph Laplacian matrix is often ill-conditioned, leading to slow convergence of iterative eigenvalue solvers. Preconditioning is a key technology accelerating the convergence, e.g., in the matrix-free LOBPCG method.

Spectral clustering is closely related to Nonlinear dimensionality reduction, and dimension reduction techniques such as locally-linear embedding can be used to reduce errors from noise or outliers.[4]

Relationship with k-means

The kernel k-means problem is an extension of the k-means problem where the input data points are mapped non-linearly into a higher-dimensional feature space via a kernel function  . The weighted kernel k-means problem further extends this problem by defining a weight   for each cluster as the reciprocal of the number of elements in the cluster,

 

Suppose   is a matrix of the normalizing coefficients for each point for each cluster   if   and zero otherwise. Suppose   is the kernel matrix for all points. The weighted kernel k-means problem with n points and k clusters is given as,

 

such that,

 
 

such that  . In addition, there are identity constrains on   given by,

 

where   represents a vector of ones.

 

This problem can be recast as,

 

This problem is equivalent to the spectral clustering problem when the identity constraints on   are relaxed. In particular, the weighted kernel k-means problem can be reformulated as a spectral clustering (graph partitioning) problem and vice-versa. The output of the algorithms are eigenvectors which do not satisfy the identity requirements for indicator variables defined by  . Hence, post-processing of the eigenvectors is required for the equivalence between the problems.[5] Transforming the spectral clustering problem into a weighted kernel k-means problem greatly reduces the computational burden.[6]

See also

References

  1. ^ Martin, Charles (October 9, 2012), Spectral Clustering: A quick overview
  2. ^ Jianbo Shi and Jitendra Malik, "Normalized Cuts and Image Segmentation", IEEE Transactions on PAMI, Vol. 22, No. 8, Aug 2000.
  3. ^ Marina Meilă & Jianbo Shi, "Learning Segmentation by Random Walks", Neural Information Processing Systems 13 (NIPS 2000), 2001, pp. 873–879.
  4. ^ Arias-Castro, E. and Chen, G. and Lerman, G. (2011), "Spectral clustering based on local linear approximations.", Electronic Journal of Statistics, 5: 1537-1587{{citation}}: CS1 maint: multiple names: authors list (link)
  5. ^ Dhillon, I.S. and Guan, Y. and Kulis, B. (2004). "Kernel k-means: spectral clustering and normalized cuts". Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 551–556. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)CS1 maint: multiple names: authors list (link)
  6. ^ Dhillon, Inderjit (November 2007). "Weighted Graph Cuts without Eigenvectors: A Multilevel Approach". IEEE Transactions on Pattern Analysis and Machine Intelligence. 29 (11): 1–14. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)CS1 maint: date and year (link)