Principal component analysis: Difference between revisions

Content deleted Content added
WikiCleanerBot (talk | contribs)
m v2.04b - Bot T20 CW#61 - Fix errors for CW project (Reference before punctuation)
Limitations: fix refs; rewrite para for encyclopedic tone
Line 264:
Another limitation is the mean-removal process before constructing the covariance matrix for PCA. In fields such as astronomy, all the signals are non-negative, and the mean-removal process will force the mean of some astrophysical exposures to be zero, which consequently creates unphysical negative fluxes,<ref name="soummer12"/> and forward modeling has to be performed to recover the true magnitude of the signals.<ref name="pueyo16">{{Cite journal|arxiv= 1604.06097 |last1= Pueyo|first1= Laurent |title= Detection and Characterization of Exoplanets using Projections on Karhunen Loeve Eigenimages: Forward Modeling |journal= The Astrophysical Journal |volume= 824|issue= 2|pages= 117|year= 2016|doi= 10.3847/0004-637X/824/2/117|bibcode = 2016ApJ...824..117P|s2cid= 118349503}}</ref> As an alternative method, [[non-negative matrix factorization]] focusing only on the non-negative elements in the matrices, which is well-suited for astrophysical observations.<ref name="blantonRoweis07"/><ref name="zhu16"/><ref name="ren18"/> See more at [[#Non-negative matrix factorization|Relation between PCA and Non-negative Matrix Factorization]].
 
IfPCA is a disadvantage if data has not been standardized before PCA is applied then a world of hurt is about to descend on your computer. PCA transforms original data into data that is relevant to the principal components of that data which means that the new data variables are not able to be interpreted in the same ways that the originals were. They are linear interpretations of the original variables. Also, if youPCA areis evennot slightlyperformed sloppy in the way you perform PCAproperly, there is a high likelihood of information loss.<ref>({{cite web | title=What Areare the Pros and Conscons of the PCA? | I2tutorialswebsite=i2tutorials | date=September 1, n2019 | url=https://www.di2tutorials.com/what-are-the-pros-and-cons-of-the-pca/ | access-date=June 4, p. 2)2021}}</ref> This is why PCA can be a challenge for data scientists.
When PCA is a disadvantage
 
If data has not been standardized before PCA is applied then a world of hurt is about to descend on your computer. PCA transforms original data into data that is relevant to the principal components of that data which means that the new data variables are not able to be interpreted in the same ways that the originals were. They are linear interpretations of the original variables. Also, if you are even slightly sloppy in the way you perform PCA there is a high likelihood of information loss.<ref>(What Are the Pros and Cons of the PCA? | I2tutorials, n.d., p. 2)</ref> This is why PCA can be a challenge for data scientists.
Well, PCA may be fantastic for very specific uses but it relies on a linear model. If a dataset has a pattern hidden inside it that is nonlinear, then PCA can actually steer the analysis in the complete opposite direction of progress.<ref name=abbott>({{cite book | title=Applied Predictive Analytics | last=Abbott, | first=Dean | isbn=9781118727966 | date=May 2014) | publisher=Wiley}}</ref>{{Page needed|date=June 2021}} Researchers at Kansas State University discovered that the sampling error in their experiments impacted the bias of PCA results. "If the number of subjects or blocks is smaller than 30, and/or the researcher is interested in PC's beyond the first, it may be better to first correct for the serial correlation, before PCA is conducted".<ref>(Jiang andname=jiang Eskridge, 2000)</ref> The researchers at Kansas State also found that PCA could be "seriously biased if the autocorrelation structure of the data is not correctly handled" .<ref name=jiang>({{cite web| title=Bias in Principal Components Analysis Due to Correlated Observations| url=https://newprairiepress.org/agstatconference/2000/proceedings/13/ |last1=Jiang &| first1=Hong| last2=Eskridge, 2000)| first2=Kent M.| publisher=Conference on Applied Statistics in Agriculture |issn=2475-7772| doi=10.4148/2475-7772.1247}}</ref>.
What about bias?
Well, PCA may be fantastic for very specific uses but it relies on a linear model. If a dataset has a pattern hidden inside it that is nonlinear then PCA can actually steer the analysis in the complete opposite direction of progress.<ref>(Abbott, 2014)</ref> Researchers at Kansas State University discovered that the sampling error in their experiments impacted the bias of PCA results. "If the number of subjects or blocks is smaller than 30, and/or the researcher is interested in PC's beyond the first, it may be better to first correct for the serial correlation, before PCA is conducted".<ref>(Jiang and Eskridge, 2000)</ref> The researchers at Kansas State also found that PCA could be "seriously biased if the autocorrelation structure of the data is not correctly handled" <ref>(Jiang & Eskridge, 2000)</ref>.
<ref>Abbott, D. (2014). Applied predictive analytics: Principles and techniques for the professional data analyst. Wiley.
Jiang, H., & Eskridge, K. M. (2000). BIAS IN PRINCIPAL COMPONENTS ANALYSIS DUE TO CORRELATED OBSERVATIONS. Conference on Applied Statistics in Agriculture. https://doi.org/10.4148/2475-7772.1247</ref>
 
=== PCA and information theory ===