Content deleted Content added
Added a nice interdisciplinary reference |
No edit summary |
||
Line 8:
The '''principal components''' of a collection of points in a [[real coordinate space]] are a sequence of <math>p</math> [[unit vector]]s, where the <math>i</math>-th vector is the direction of a line that best fits the data while being [[orthogonal]] to the first <math>i-1</math> vectors. Here, a best-fitting line is defined as one that minimizes the average squared [[perpendicular distance|perpendicular]] [[Distance from a point to a line|distance from the points to the line]]. These directions (i.e., principal components) constitute an [[orthonormal basis]] in which different individual dimensions of the data are [[Linear correlation|linearly uncorrelated]]. Many studies use the first two principal components in order to plot the data in two dimensions and to visually identify clusters of closely related data points.<ref>{{Cite journal |last1=Jolliffe |first1=Ian T. |last2=Cadima |first2=Jorge |date=2016-04-13 |title=Principal component analysis: a review and recent developments |journal=Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences |volume=374 |issue=2065 |pages=20150202 |bibcode=2016RSPTA.37450202J |doi=10.1098/rsta.2015.0202 |pmc=4792409 |pmid=26953178}}</ref>
Principal component analysis has applications in many fields such as [[population genetics]], [[microbiome]] studies, and [[atmospheric science]].<ref>{{cite journal |last1=Gewers |first1=Felipe L. |last2=Ferreira |first2=Gustavo R. |last3=Arruda |first3=Henrique F. De |last4=Silva |first4=Filipi N. |last5=Comin |first5=Cesar H. |last6=Amancio |first6=Diego R. |last7=Costa |first7=Luciano Da F. |title=Principal Component Analysis: A Natural Approach to Data Exploration |journal=ACM Comput. Surv. |date=24 May 2021 |volume=54 |issue=4 |pages=70:1–70:34 |doi=
== Overview ==
|