The data is linearly transformed onto a new [[coordinate system]] such that the directions (principal components) capturing the largest variation in the data can be easily identified.
The '''principal components''' of a collection of points in a [[real coordinate space]] are a sequence of <math>p</math> [[unit vector]]s, where the <math>i</math>-th vector is the direction of a line that best fits the data while being [[orthogonal]] to the first <math>i-1</math> vectors. Here, a best-fitting line is defined as one that minimizes the average squared [[perpendicular distance|perpendicular]] [[Distance from a point to a line|distance from the points to the line]]. These directions (i.e., principal components) constitute an [[orthonormal basis]] in which different individual dimensions of the data are [[Linear correlation|linearly uncorrelated]]. Many studies use the first two principal components in order to plot the data in two dimensions and to visually identify clusters of closely related data points.<ref>{{Citecite journal |last1=JolliffeGewers |first1=IanFelipe TL. |last2=CadimaFerreira |first2=JorgeGustavo R. |datelast3=2016-04-13Arruda |titlefirst3=PrincipalHenrique componentF. analysis:De a|last4=Silva review|first4=Filipi andN. recent|last5=Comin developments|first5=Cesar H. |journallast6=PhilosophicalAmancio Transactions|first6=Diego ofR. the|last7=Costa Royal|first7=Luciano SocietyDa AF. |title=Principal Component Analysis: Mathematical,A PhysicalNatural andApproach Engineeringto SciencesData Exploration |volumejournal=374ACM Comput. Surv. |issuedate=206524 May 2021 |pagesvolume=2015020254 |bibcodeissue=2016RSPTA.37450202J4 |pages=70:1–70:34 |doi=10.10981145/rsta.2015.02023447755 |pmcurl=4792409 |pmid=26953178https://dl.acm.org/doi/abs/10.1145/3447755}}</ref>
Principal component analysis has applications in many fields such as [[population genetics]], [[microbiome]] studies, and [[atmospheric science]].<ref>{{citeCite journal |last1=GewersJolliffe |first1=FelipeIan LT. |last2=FerreiraCadima |first2=Gustavo R.Jorge |last3date=Arruda2016-04-13 |first3title=HenriquePrincipal F.component Deanalysis: |last4=Silvaa |first4=Filipireview N.and |last5=Cominrecent |first5=Cesar H.developments |last6journal=AmancioPhilosophical |first6=DiegoTransactions R.of |last7=Costathe |first7=LucianoRoyal DaSociety F. |title=Principal Component AnalysisA: AMathematical, NaturalPhysical Approachand toEngineering Data ExplorationSciences |journalvolume=ACM Comput. Surv.374 |dateissue=24 May 20212065 |volumepages=5420150202 |issuebibcode=4 |pages=70:1–70:342016RSPTA.37450202J |doi=10.11451098/3447755rsta.2015.0202 |urlpmc=https://dl.acm.org/doi/abs/10.1145/34477554792409 |pmid=26953178}}</ref>