Content deleted Content added
m v2.04b - Bot T20 CW#61 - Fix errors for CW project (Reference before punctuation) |
→Limitations: fix refs; rewrite para for encyclopedic tone |
||
Line 264:
Another limitation is the mean-removal process before constructing the covariance matrix for PCA. In fields such as astronomy, all the signals are non-negative, and the mean-removal process will force the mean of some astrophysical exposures to be zero, which consequently creates unphysical negative fluxes,<ref name="soummer12"/> and forward modeling has to be performed to recover the true magnitude of the signals.<ref name="pueyo16">{{Cite journal|arxiv= 1604.06097 |last1= Pueyo|first1= Laurent |title= Detection and Characterization of Exoplanets using Projections on Karhunen Loeve Eigenimages: Forward Modeling |journal= The Astrophysical Journal |volume= 824|issue= 2|pages= 117|year= 2016|doi= 10.3847/0004-637X/824/2/117|bibcode = 2016ApJ...824..117P|s2cid= 118349503}}</ref> As an alternative method, [[non-negative matrix factorization]] focusing only on the non-negative elements in the matrices, which is well-suited for astrophysical observations.<ref name="blantonRoweis07"/><ref name="zhu16"/><ref name="ren18"/> See more at [[#Non-negative matrix factorization|Relation between PCA and Non-negative Matrix Factorization]].
▲If data has not been standardized before PCA is applied then a world of hurt is about to descend on your computer. PCA transforms original data into data that is relevant to the principal components of that data which means that the new data variables are not able to be interpreted in the same ways that the originals were. They are linear interpretations of the original variables. Also, if you are even slightly sloppy in the way you perform PCA there is a high likelihood of information loss.<ref>(What Are the Pros and Cons of the PCA? | I2tutorials, n.d., p. 2)</ref> This is why PCA can be a challenge for data scientists.
▲Well, PCA may be fantastic for very specific uses but it relies on a linear model. If a dataset has a pattern hidden inside it that is nonlinear then PCA can actually steer the analysis in the complete opposite direction of progress.<ref>(Abbott, 2014)</ref> Researchers at Kansas State University discovered that the sampling error in their experiments impacted the bias of PCA results. "If the number of subjects or blocks is smaller than 30, and/or the researcher is interested in PC's beyond the first, it may be better to first correct for the serial correlation, before PCA is conducted".<ref>(Jiang and Eskridge, 2000)</ref> The researchers at Kansas State also found that PCA could be "seriously biased if the autocorrelation structure of the data is not correctly handled" <ref>(Jiang & Eskridge, 2000)</ref>.
=== PCA and information theory ===
|