CUR matrix approximation: Difference between revisions

Content deleted Content added
mNo edit summary
ws
 
(15 intermediate revisions by 7 users not shown)
Line 1:
A '''CUR matrix approximation''' is a set of three [[matrix (mathematics)|matrices]] that, when multiplied together, closely approximate a given matrix.<ref name=mahoney>{{cite webjournal|title=CUR matrix decompositions for improved data analysis|urlauthor=http://wwwMichael W.pnas.org/content/ Mahoney|author2=Petros Drineas|journal=Proceedings of the National Academy of Sciences |date=2009 |volume=106/ |issue=3 |pages=697–702 |doi=10.1073/697pnas.full0803205106 |accessdatedoi-access=26free June|pmid=19139392 2012|authorpmc=Michael2630100 W. Mahoney|author2bibcode=Petros2009PNAS..106..697M Drineas}}</ref><ref>{{cite conference|title=Optimal CUR matrix decompositions| conference = STOC '14 Proceedings of the forty-sixth annual ACM symposium on Theory of Computing|last1= Boutsidis |first1= Christos |last2=Woodruff|first2=David P.|year=2014}}</ref><ref>{{cite conference|title=Low Rank Approximation with Entrywise L1-Norm Error| conference = STOC '17 Proceedings of the forty-ninth annual ACM symposium on Theory of Computing|last1=Song|first1=Zhao|last2=Woodruff|first2=David P.|last3=Zhong|first3=Peilin|year=2017| arxiv = 1611.00898}}</ref> A CUR approximation can be used in the same way as the [[low-rank approximation]] of the [[singular value decomposition]] (SVD). CUR approximations are less accurate than the SVD, but they offer two key advantages, both stemming from the fact that the rows and columns come from the original matrix (rather than left and right singular vectors):
 
* There are methods to calculate it with lower asymptotic time complexity versus the SVD.
Line 7:
 
The CUR matrix approximation is often {{citation needed|date=November 2012}} used in place of the low-rank approximation of the SVD in [[principal component analysis]]. The CUR is less accurate, but the columns of the matrix ''C'' are taken from ''A'' and the rows of ''R'' are taken from ''A''. In PCA, each column of ''A'' contains a data sample; thus, the matrix ''C'' is made of a subset of data samples. This is much easier to interpret than the SVD's left singular vectors, which represent the data in a rotated space. Similarly, the matrix ''R'' is made of a subset of variables measured for each data sample. This is easier to comprehend than the SVD's right singular vectors, which are another rotations of the data in space.
==Matrix CUR==
Hamm and Huang<ref>Keaton Hamm and Longxiu Huang. Perspectives on CUR decompositions. Applied and Computational Harmonic Analysis, 48(3):1088–1099, 2020.</ref> givesand theAldroubi followinget theoremal.<ref>Aldroubi, describingAkram and Hamm, Keaton and Koku, Ahmet Bugra and Sekmen, Ali. CUR decompositions, similarity matrices, and subspace clustering. Frontiers in Applied Mathematics and Statistics, 2019, Frontiers Media SA</ref> describe the basicsfollowing theorem, which ofoutlines a CUR decomposition of a matrix <math>L</math> with rank <math>r</math>:
 
Theorem: Consider row and column indices <math>I, J \subseteq [n]</math> with <math>|I|, |J| \ge r</math>. Denote submatrices <math>C = L_{:,J},</math> <math>U = L_{I,J}</math> and <math>R = L_{I,:}</math>. If rank(<math>U</math>) = rank(<math>L</math>), then <math>L = CU^+R</math>, where <math>(\cdot)^+</math> denotes the [[Moore–Penrose pseudoinverse]].
==Mathematical definition==
 
Hamm and Huang<ref>Keaton Hamm and Longxiu Huang. Perspectives on CUR decompositions. Applied and Computational Harmonic Analysis, 48(3):1088–1099, 2020.</ref> gives the following theorem describing the basics of a CUR decomposition of a matrix <math>L</math> with rank <math>r</math>:
 
Theorem: Consider row and column indices <math>I, J \subseteq [n]</math> with <math>|I|, |J| \ge r</math>.
Denote submatrices <math>C = L_{:,J},</math> <math>U = L_{I,J}</math> and <math>R = L_{I,:}</math>.
If rank(<math>U</math>) = rank(<math>L</math>), then <math>L = CU^+R</math>, where <math>(\cdot)^+</math> denotes the [[Moore–Penrose pseudoinverse]].
 
In other words, if <math>L</math> has low rank, we can take a sub-matrix <math>U = L_{I,J}</math> of the same rank, together with some rows <math>R</math> and columns <math>C</math> of <math>L</math> and use them to reconstruct <math>L</math>.
==Tensor CUR==
 
Tensor-CURT decomposition<ref>{{cite arXiv|title=Relative Error Tensor Low Rank Approximation|eprint = 1704.08246|last1=Song|first1=Zhao|last2=Woodruff|first2=David P.|last3=Zhong|first3=Peilin|class=cs.DS|year=2017}}</ref>
is a generalization of matrix-CUR decomposition. Formally, a CURT tensor approximation of a tensor ''A'' is three matrices and a (core-)tensor ''C'', ''R'', ''T'' and ''U'' such that ''C'' is made from columns of ''A'', ''R'' is made from rows of ''A'', ''T'' is made from tubes of ''A'' and that the product ''U(C,R,T)'' (where the <math>i,j,l</math>-th entry of it is <math>\sum_{i',j',l'}U_{i',j',l'}C_{i,i'}R_{j,j'}T_{l,l'} </math>) closely approximates ''A''. Usually the CURT is selected to be a [[Rank (linear algebra)|rank]]-''k'' approximation, which means that ''C'' contains ''k'' columns of ''A'', ''R'' contains ''k'' rows of ''A'', ''T'' contains tubes of ''A'' and ''U'' is a ''k''-by-''k''-by-''k'' (core-)tensor.
==Algorithms==
 
The CUR matrix approximation is not unique and there are multiple algorithms for computing one. One is ALGORITHMCUR.<ref name=mahoney />
 
The "Linear Time CUR" algorithm <ref>{{Cite journal |last1=Drineas |first1=Petros |last2=Kannan |first2=Ravi |last3=Mahoney |first3=Michael W. |date=2006-01-01 |title=Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication |url=https://epubs.siam.org/doi/abs/10.1137/S0097539704442684 |journal=SIAM Journal on Computing |volume=36 |issue=1 |pages=132–157 |doi=10.1137/S0097539704442684 |issn=0097-5397|url-access=subscription }}</ref> simply picks ''J'' by sampling columns randomly (with replacement) with probability proportional to the squared column norms, <math>\|L_{:,j}\|_2^2</math>; and similarly sampling ''I'' proportional to the squared row norms, <math>\|L_{i}\|_2^2</math>.
==Tensor==
The authors show that
Tensor-CURT decomposition<ref>{{cite arXiv|title=Relative Error Tensor Low Rank Approximation|eprint = 1704.08246|last1=Song|first1=Zhao|last2=Woodruff|first2=David P.|last3=Zhong|first3=Peilin|class=cs.DS|year=2017}}</ref>
taking <math>|J| \approx k /\varepsilon^4</math> and <math>|I| \approx k / \varepsilon^2</math> where <math>0 \le \varepsilon</math>, the algorithm achieves Frobenius error bound <math>\|A - CUR\|_F \le \|A - A_k\|_F + \varepsilon \|A\|_F</math>, where <math>A_k</math> is the optimal rank ''k'' approximation.
is a generalization of matrix-CUR decomposition. Formally, a CURT tensor approximation of a tensor ''A'' is three matrices and a (core-)tensor ''C'', ''R'', ''T'' and ''U'' such that ''C'' is made from columns of ''A'', ''R'' is made from rows of ''A'', ''T'' is made from tubes of ''A'' and that the product ''U(C,R,T)'' (where the <math>i,j,l</math>-th entry of it is <math>\sum_{i',j',l'}U_{i',j',l'}C_{i,i'}R_{j,j'}T_{l,l'} </math>) closely approximates ''A''. Usually the CURT is selected to be a [[Rank (linear algebra)|rank]]-''k'' approximation, which means that ''C'' contains ''k'' columns of ''A'', ''R'' contains ''k'' rows of ''A'', ''T'' contains tubes of ''A'' and ''U'' is a ''k''-by-''k''-by-''k'' (core-)tensor.
 
==See also==
 
* [[dimensionalityDimensionality reduction]]
 
==References==
Line 34 ⟶ 32:
<references />
 
[[Category:Matrices (mathematics)]]
[[Category:Matrix decompositions]]