Content deleted Content added
→Relationship with k-means and dbscan: explanation |
m v2.05b - Bot T20 CW#61 - Fix errors for CW project (Reference before punctuation) |
||
Line 111:
Because of this equivalence, '''spectral clustering can be viewed as performing kernel k-means in the eigenspace defined by the graph Laplacian'''. This theoretical insight has practical implications: the final clustering step in spectral clustering typically involves running the '''standard k-means algorithm''' on the rows of the matrix formed by the first k eigenvectors of the Laplacian. These rows can be thought of as embedding each data point or node in a low-dimensional space where the clusters are more well-separated and hence, easier for k-means to detect.
Additionally, '''multi-level methods''' have been developed to directly optimize this shared objective function. These methods work by iteratively '''coarsening''' the graph to reduce problem size, solving the problem on a coarse graph, and then '''refining''' the solution on successively finer graphs. This leads to more efficient optimization for large-scale problems, while still capturing the global structure preserved by the spectral embedding
=== Relationship to DBSCAN ===
Spectral clustering is also conceptually related to '''DBSCAN''' (Density-Based Spatial Clustering of Applications with Noise), particularly in the special case where the spectral method is used to identify [[Connected component (graph theory)|'''connected graph components''']] of a graph. In this trivial case—where the goal is to identify subsets of nodes with '''no interconnecting edges''' between them—the spectral method effectively reduces to a connectivity-based clustering approach, much like DBSCAN.<ref>{{Cite conference |last1=Schubert |first1=Erich |last2=Hess |first2=Sibylle |last3=Morik |first3=Katharina |date=2018 |title=The Relationship of DBSCAN to Matrix Factorization and Spectral Clustering |url=http://ceur-ws.org/Vol-2191/paper38.pdf |conference=LWDA |pages=330–334}}</ref>
DBSCAN operates by identifying '''density-connected regions''' in the input space: points that are reachable from one another via a sequence of neighboring points within a specified radius (ε), and containing a minimum number of points (minPts). The algorithm excels at discovering clusters of arbitrary shape and separating out noise without needing to specify the number of clusters in advance.
|