Principal component analysis: Difference between revisions

Content deleted Content added
Aug zxj (talk | contribs)
m Fixed the Error equation
Line 115:
:<math>\mathbf{T}_L = \mathbf{X} \mathbf{W}_L</math>
 
where the matrix '''T'''<sub>L</sub> now has ''n'' rows but only ''L'' columns. In other words, PCA learns a linear transformation <math> t = W_L^\mathsf{T} x, x \in \mathbb{R}^p, t \in \mathbb{R}^L,</math> where the columns of {{math|''p'' × ''L''}} matrix <math>W_L</math> form an orthogonal basis for the ''L'' features (the components of representation ''t'') that are decorrelated.<ref>{{Cite journal |author=Bengio, Y.|year=2013|title=Representation Learning: A Review and New Perspectives |journal=IEEE Transactions on Pattern Analysis and Machine Intelligence |volume=35 |issue=8 |pages=1798–1828 |doi=10.1109/TPAMI.2013.50|pmid=23787338|display-authors=etal|arxiv=1206.5538|s2cid=393948}}</ref> By construction, of all the transformed data matrices with only ''L'' columns, this score matrix maximises the variance in the original data that has been preserved, while minimising the total squared reconstruction error <math>\|[e(\mathbf{T}alpha, \mathbfbeta) = \frac{W1}^T{1 -+ \mathbf{Talpha^2}_L \mathbfsum_{Wi}^T_L \|_2left( \alpha L_i + \beta - W_i \right)^2,\]</math> orwhere <math>\|[W = \mathbf{X}alpha -L + \mathbf{X}_Lbeta.\|_2^2]</\math>. <ref>Cite book | author1=Holmes, M. title=Introduction to Scientific Computing and Data Analysis, 2nd Ed | year=2023 | publisher=Springer | isbn=978-3-031-22429-4 </ref>
 
[[File:PCA of Haplogroup J using 37 STRs.png|thumb|right|A principal components analysis scatterplot of [[Y-STR]] [[haplotype]]s calculated from repeat-count values for 37 Y-chromosomal STR markers from 354 individuals.<br /> PCA has successfully found linear combinations of the markers that separate out different clusters corresponding to different lines of individuals' Y-chromosomal genetic descent.]]