Spectral clustering: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Alter: url. URLs might have been anonymized. Add: url, date, title. Changed bare reference to CS1/2. | Use this bot. Report bugs. | Suggested by BrownHairedGirl | Linked from User:BrownHairedGirl/Articles_with_bare_links | #UCB_webform_linked 1968/2195
Definitions: added a link to the spring system and a picture with an explanation
Line 8:
Given an enumerated set of data points, the [[similarity matrix]] may be defined as a symmetric matrix <math>A</math>, where <math>A_{ij}\geq 0</math> represents a measure of the similarity between data points with indices <math>i</math> and <math>j</math>. The general approach to spectral clustering is to use a standard [[Cluster analysis|clustering]] method (there are many such methods, ''k''-means is discussed [[#Relationship with k-means|below]]) on relevant [[eigenvector]]s of a [[Laplacian matrix]] of <math>A</math>. There are many different ways to define a Laplacian which have different mathematical interpretations, and so the clustering will also have different interpretations. The eigenvectors that are relevant are the ones that correspond to smallest several eigenvalues of the Laplacian except for the smallest eigenvalue which will have a value of 0. For computational efficiency, these eigenvectors are often computed as the eigenvectors corresponding to the largest several eigenvalues of a function of the Laplacian.
 
[[Image:elastic network model.png|thumb|A 2-dimensional spring system.]]
Spectral clustering is well known to relate to partitioning of a mass-spring system, where each mass is associated with a data point and each spring stiffness corresponds to a weight of an edge describing a similarity of the two related data points, as in the [[spring system]]. Specifically, the classical reference <ref>J. Demmel, [https://people.eecs.berkeley.edu/~demmel/cs267/lecture20/lecture20.html], CS267: Notes for Lecture 23, April 9, 1999, Graph Partitioning, Part 2</ref> explains that the eigenvalue problem describing transversal vibration modes of a mass-spring system is exactly the same as the eigenvalue problem for the graph [[Laplacian matrix]] defined as
: <math>L:=D-A</math>,
where <math>D</math> is the diagonal matrix
:<math>D_{ii} = \sum_j A_{ij}.</math>
The masses that are tightly connected by the springs in the mass-spring system evidently move together from the equilibrium position in low-frequency vibration modes, so that the components of the eigenvectors corresponding to the smallest eigenvalues of the graph Laplacian can be used for meaningful clustering of the masses. For example, assuming that all the springs and the masses are identical in the 2-dimensional spring system pictured, one would intuitively expect that the loosest connected masses on the right-hand side of the system would move with the largest amplitude and in the opposite direction to the rest of the masses when the system is shaken — and this expectation will be confirmed by analyzing components of the eigenvectors of the graph Laplacian corresponding to the smallest eigenvalues, i.e., the smallest [[Vibration#Multiple_degrees_of_freedom_systems_and_mode_shapes|vibration frequencies]].
 
A popular related spectral clustering technique is the '''[[Segmentation based object categorization#Normalized cuts|normalized cuts algorithm]]''' or ''Shi–Malik algorithm'' introduced by Jianbo Shi and Jitendra Malik,<ref name=":0">Jianbo Shi and Jitendra Malik, [http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf "Normalized Cuts and Image Segmentation"], IEEE Transactions on PAMI, Vol. 22, No. 8, Aug 2000.</ref> commonly used for [[segmentation (image processing)|image segmentation]]. It partitions points into two sets <math>(B_1,B_2)</math> based on the [[eigenvector]] <math>v</math> corresponding to the second-smallest [[eigenvalue]] of the [[Laplacian matrix#Symmetric normalized Laplacian|symmetric normalized Laplacian]] defined as