Revision as of 01:35, 4 December 2011 edit Bluesky234 (talk \| contribs) 12 edits mNo edit summary ← Previous edit		Revision as of 01:58, 15 December 2011 edit undo Tokekar (talk \| contribs) 16 edits No edit summary Next edit →
Line 1: Given a set of data points A, the [[similarity matrix]] may be defined as a matrix <math>S</math>, where <math>S_{ij}</math> represents a measure of the similarity between points <math>i, j\in A</math>. Spectral clustering techniques make use of the [[Spectrum of a matrix\|spectrum]] of the similarity matrix of the data to perform [[dimensionality reduction]] for clustering in fewer dimensions. == Algorithms == One such technique is the '''[[Segmentation_based_object_categorization#Normalized_Cuts\|Normalized Cuts algorithm]]''' or ''Shi–Malik algorithm'' introduced by Jianbo Shi and Jitendra Malik,<ref>Jianbo Shi and Jitendra Malik, [http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf "Normalized Cuts and Image Segmentation"], IEEE Transactions on PAMI, Vol. 22, No. 8, Aug 2000.</ref> commonly used for [[segmentation (image processing)\|image segmentation]]. It partitions points into two sets <math>(S_1,S_2)</math> based on the [[eigenvector]] <math>v</math> corresponding to the second-smallest [[eigenvalue]] of the [[Laplacian matrix]] Line 13 ⟶ 14: A related algorithm is the '''[[Meila–Shi algorithm]]'''<ref>Marina Meilă & Jianbo Shi, "[http://www.citeulike.org/user/mpotamias/article/498897 Learning Segmentation by Random Walks]", Processing Systems 13 (NIPS 2000), 2001, pp. 873-879.</ref>, which takes the [[eigenvector]]s corresponding to the ''k'' largest [[eigenvalue]]s of the matrix <math>P = SD^{-1}</math> for some ''k'', and then invokes another algorithm (e.g. ''k''-means) to cluster points by their respective ''k'' components in these eigenvectors. == Relationship with K-Means == The Kernel K-Means problem is an extension of the K-Means problem where the input data points are mapped non-linearly into a higher-dimensional feature space via a kernel function <math>k(x_i,x_j) = \phi^T(x_i)\phi(x_j)</math>. The weighted kernel K-Means problem further extends this problem by defining a weight <math>w_r</math> for each cluster, :<math> \max_{C_i} \sum_{r=1}^{k} w_r \sum_{x_i,x_j \in C_r} k(x_i,x_j). </math> Suppose <math>F</math> is a matrix of the normalizing coefficients for each point for each cluster <math>F_{ij} = w_r</math> if <math>i,j \in C_r</math> and zero otherwise. Suppose <math>K</math> is the kernel matrix for all points. The Weighted Kernel K-Means problem with n points and k clusters is given as, :<math> \max_F trace(KF) </math> such that, :<math> F = G_{n\times k}G_{n\times k}^T rank(G) = k G^TG = I F\cdot 1 = 1 F^T1 = 1 </math> This problem can be recast as, :<math> \max_G trace(G^TG). </math> This problem is equivalent to the Spectral Clustering problem when the identity constraints on F are relaxed. == References == <references />

Spectral clustering: Difference between revisions