Revision as of 16:26, 8 March 2022 edit 24.61.244.65 (talk) →Software: started a new section on the costs ← Previous edit		Revision as of 17:12, 8 March 2022 edit undo 24.61.244.65 (talk) →Costs: extended to cover the case of dense adjacency matrices Next edit →
Line 59: ==Costs== Denoting the number of the data points ny <math>n</math>, it is important to estimate the memory footprint and compute time, or number of arithmetic operations (AO) performed, as a function of <math>n</math>. No matter the algorithm of the spectral clustering, the two main costly items are the construction of the graph Laplacian and determining its <math>k</math> eigenvectors for the spectral embedding. The last step — determining the labels from the <math>n</math>-by-<math>k</math> matrix of eigenvectors — is typically the least expensive requiring only <math>kn</math> AO and creating just a <math>n</math>-by-<math>1</math> vector of the labels in memory. The need to construct the the graph Laplacian is common for all distance- or correlation-based clustering methods. Computing the eigenvectors is specific to spectral clustering only. ===Constructing graph Laplacian=== The graph Laplacian can be and commonly is constructed from the adjacency matrix. The construction can be performed matrix-free, i.e., without explicitly forming the matrix of the graph Laplacian and no AO. It can also be performed in-place of the adjacency matrix without increasing the memory footprint. Either way, the costs of constructing the graph Laplacian is essentially determined by the costs of constructing the <math>n</math>-by-<math>n</math> graph adjacency matrix. Naive constructions of the graph adjacency matrix make it dense, thus requiring <math>n^2</math> memory and <math>n^2</math> AO to determine each of the <math>n^2</math> entries of the matrix. ===Computing eigenvectors=== The cost of computing the <math>n</math>-by-<math>k</math> (with <math>k<<n</math>) matrix of selected eigenvectors of the graph Laplacian is normally proportional to the cost of multiplication of the graph Laplacian matrix by a vector, which varies greatly whether the graph Laplacian matrix is dense or sparse. For the dense case the cost thus is <math>O(n^2)</math>. The very commonly cited in the literature cost <math>O(n^3)</math> comes from choosing <math>k=n</math> and is clearly misleading, since, e.g., in a hierarchical spectral clustering <math>k=1</math> as determined by the [[Fiedler vector]]. ==Software==

Spectral clustering: Difference between revisions