Spectral clustering: Difference between revisions

Content deleted Content added
Line 79:
The cost of computing the <math>n</math>-by-<math>k</math> (with <math>k\ll n</math>) matrix of selected eigenvectors of the graph Laplacian is normally proportional to the cost of multiplication of the <math>n</math>-by-<math>n</math> graph Laplacian matrix by a vector, which varies greatly whether the graph Laplacian matrix is dense or sparse. For the dense case the cost thus is <math>O(n^2)</math>. The very commonly cited in the literature cost <math>O(n^3)</math> comes from choosing <math>k=n</math> and is clearly misleading, since, e.g., in a hierarchical spectral clustering <math>k=1</math> as determined by the [[Fiedler vector]].
 
In the sparse case of the <math>n</math>-by-<math>n</math> graph Laplacian matrix with <math>O(n)</math> non-zero entries, the cost of the matrix-vector product and thus of computing the <math>n</math>-by-<math>k</math> with <math>k\ll n</math> matrix of selected eigenvectors is <math>O(n)</math> AO, with the memory footprint also only <math>O(n)</math> — both are the optimal low bounds of complexity of clustering <math>n</math> data points. Moreover, matrix-free eigenvalue solvers such as [[LOBPCG]] can efficiently run in parallel, e.g., on multiple [[GPUs]] with distributed memory, resulting not only in high quality clusters, which spectral clustering is famous for, but also top performance. <ref name="msw2014">{{Cite journal|last1=Acer|first1=Seher|last2=Boman|first2=Erik G.|last3=Glusa|first3=Christian A.|last4=Rajamanickam|first4=Sivasankaran|year=2021|title=Sphynx: A parallel multi-GPU graph partitioner for distributed-memory systems|journal=Parallel Computing|volume=106|doi=10.1016/j.parco.2021.102769}}</ref>
 
==Software==