CUR matrix approximation: Difference between revisions

Content deleted Content added
m top: Typo/date/general fixes, replaced: the the → the using AWB
Line 2:
 
* There are methods to calculate it with lower asymptotic time complexity versus the SVD.
* The matrices are more interpretable; The meanings of rows and columns in the decomposed matrix are essentially the the same as their meanings in the original matrix.
 
Formally, a CUR matrix approximation of a matrix ''A'' is three matrices ''C'', ''U'', and ''R'' such that ''C'' is made from columns of ''A'', ''R'' is made from rows of ''A'', and that the product ''CUR'' closely approximates ''A''. Usually the CUR is selected to be a [[Rank (linear algebra)|rank]]-''k'' approximation, which means that ''C'' contains ''k'' columns of ''A'', ''R'' contains ''k'' rows of ''A'', and ''U'' is a ''k''-by-''k'' matrix. There are many possible CUR matrix approximations, and many CUR matrix approximations for a given rank.
 
The CUR matrix approximation is often {{cncitation needed|date=November 2012}} used in place of the low-rank approximation of the SVD in [[Principal components analysis]]. The CUR is less accurate, but the columns of the matrix ''C'' are taken from ''A'' and the rows of ''R'' are taken from ''A''. In PCA, each column of ''A'' contains a data sample; thus, the matrix ''C'' is made of a subset of data samples. This is much easier to interpret than the SVD's left singular vectors, which represent the data in a rotated space. Similarly, the matrix ''R'' is made of a subset of variables measured for each data sample. This is easier to comprehend than the SVD's right singular vectors, which are another rotations of the data in space.
 
==Algorithms==