Spectral clustering: Difference between revisions

Content deleted Content added
WikiCleanerBot (talk | contribs)
m v2.05b - Bot T20 CW#61 - Fix errors for CW project (Reference before punctuation)
m fix common MOS:REFSPACE spacing errors, replaced: . <ref → .<ref, /ref> <ref → /ref><ref
Line 82:
The cost of computing the <math>n</math>-by-<math>k</math> (with <math>k\ll n</math>) matrix of selected eigenvectors of the graph Laplacian is normally proportional to the cost of multiplication of the <math>n</math>-by-<math>n</math> graph Laplacian matrix by a vector, which varies greatly whether the graph Laplacian matrix is dense or sparse. For the dense case the cost thus is <math>O(n^2)</math>. The very commonly cited in the literature cost <math>O(n^3)</math> comes from choosing <math>k=n</math> and is clearly misleading, since, e.g., in a hierarchical spectral clustering <math>k=1</math> as determined by the [[Fiedler vector]].
 
In the sparse case of the <math>n</math>-by-<math>n</math> graph Laplacian matrix with <math>O(n)</math> non-zero entries, the cost of the matrix-vector product and thus of computing the <math>n</math>-by-<math>k</math> with <math>k\ll n</math> matrix of selected eigenvectors is <math>O(n)</math>, with the memory footprint also only <math>O(n)</math> — both are the optimal low bounds of complexity of clustering <math>n</math> data points. Moreover, matrix-free eigenvalue solvers such as [[LOBPCG]] can efficiently run in parallel, e.g., on multiple [[GPUs]] with distributed memory, resulting not only in high quality clusters, which spectral clustering is famous for, but also top performance. <ref name="msw2014">{{Cite journal|last1=Acer|first1=Seher|last2=Boman|first2=Erik G.|last3=Glusa|first3=Christian A.|last4=Rajamanickam|first4=Sivasankaran|year=2021|title=Sphynx: A parallel multi-GPU graph partitioner for distributed-memory systems|journal=Parallel Computing|volume=106|page=102769 |doi=10.1016/j.parco.2021.102769|s2cid=233481603 |arxiv=2105.00578}}</ref>
 
==Software==
Free software implementing spectral clustering is available in large open source projects like [[scikit-learn]]<ref>{{Cite web|url=http://scikit-learn.org/stable/modules/clustering.html#spectral-clustering|title = 2.3. Clustering}}</ref> using [[LOBPCG]]<ref>{{Cite conference | url = https://www.researchgate.net/publication/343531874 | title = Modern preconditioned eigensolvers for spectral image segmentation and graph bisection | conference = Clustering Large Data Sets; Third IEEE International Conference on Data Mining (ICDM 2003) Melbourne, Florida: IEEE Computer Society| editor = Boley| editor2 = Dhillon| editor3 = Ghosh| editor4 = Kogan | pages = 59–62| year = 2003| last1 = Knyazev| first1 = Andrew V.}}</ref> with [[multigrid]] [[preconditioning]]<ref name="spectralmultigrid2006">{{Cite conference | url = https://www.researchgate.net/publication/354448354 | title = Multiscale Spectral Image Segmentation Multiscale preconditioning for computing eigenvalues of graph Laplacians in image segmentation | conference = Fast Manifold Learning Workshop, WM Williamburg, VA| year = 2006| last1 = Knyazev| first1 = Andrew V. | doi=10.13140/RG.2.2.35280.02565}}</ref> <ref>{{Cite conference | url = https://www.researchgate.net/publication/343531874 | title = Multiscale Spectral Graph Partitioning and Image Segmentation | conference = Workshop on Algorithms for Modern Massive Datasets Stanford University and Yahoo! Research| year = 2006| last1 = Knyazev| first1 = Andrew V.}}</ref> or [[ARPACK]], [[Apache Spark#MLlib Machine Learning Library|MLlib]] for pseudo-eigenvector clustering using the [[power iteration]] method,<ref>{{Cite web|url=http://spark.apache.org/docs/latest/mllib-clustering.html#power-iteration-clustering-pic|title = Clustering - RDD-based API - Spark 3.2.0 Documentation}}</ref> and [[R (programming language)|R]].<ref>{{Cite web|url=https://cran.r-project.org/web/packages/kernlab|title = Kernlab: Kernel-Based Machine Learning Lab|date = 12 November 2019}}</ref>
 
== Relationship with other clustering methods ==