Content deleted Content added
m Open access bot: url-access=subscription updated in citation with #oabot. |
m changed common redundancy "separate out" to "separate" Tag: Reverted |
||
Line 121:
<math>E(\alpha) = \frac{1}{1 - \alpha^2} \sum_{i=1}^{n} (\alpha p_i - q_i)^2</math>; such that setting the derivative of the error function to zero <math>(E'(\alpha) = 0)</math> yields:<math>\alpha = \frac{1}{2} \left( -\lambda \pm \sqrt{\lambda^2 + 4} \right)</math> where<math>\lambda = \frac{p \cdot p - q \cdot q}{p \cdot q}</math>.<ref name="Holmes2023" />
[[File:PCA of Haplogroup J using 37 STRs.png|thumb|right|A principal components analysis scatterplot of [[Y-STR]] [[haplotype]]s calculated from repeat-count values for 37 Y-chromosomal STR markers from 354 individuals.<br /> PCA has successfully found linear combinations of the markers that separate
Such [[dimensionality reduction]] can be a very useful step for visualising and processing high-dimensional datasets, while still retaining as much of the variance in the dataset as possible. For example, selecting ''L'' = 2 and keeping only the first two principal components finds the two-dimensional plane through the high-dimensional dataset in which the data is most spread out, so if the data contains [[Cluster analysis|clusters]] these too may be most spread out, and therefore most visible to be plotted
Similarly, in [[regression analysis]], the larger the number of [[explanatory variable]]s allowed, the greater is the chance of [[overfitting]] the model, producing conclusions that fail to generalise to other datasets. One approach, especially when there are strong correlations between different possible explanatory variables, is to reduce them to a few principal components and then run the regression against them, a method called [[principal component regression]].
Line 540:
Market research has been an extensive user of PCA. It is used to develop customer satisfaction or customer loyalty scores for products, and with clustering, to develop market segments that may be targeted with advertising campaigns, in much the same way as factorial ecology will locate geographical areas with similar characteristics.<ref>{{Cite journal |last1=DeSarbo |first1=Wayne |last2=Hausmann |first2=Robert |last3=Kukitz |first3=Jeffrey |date=2007 |title=Restricted principal components analysis for marketing research |url=https://www.researchgate.net/publication/247623679 |journal=Journal of Marketing in Management |volume=2 |pages=305–328 |via=ResearchGate}}</ref>
PCA rapidly transforms large amounts of data into smaller, easier-to-digest variables that can be more rapidly and readily analyzed. In any consumer questionnaire, there are series of questions designed to elicit consumer attitudes, and principal components seek
Another example from Joe Flood in 2008 extracted an attitudinal index toward housing from 28 attitude questions in a national survey of 2697 households in Australia. The first principal component represented a general attitude toward property and home ownership. The index, or the attitude questions it embodied, could be fed into a General Linear Model of tenure choice. The strongest determinant of private renting by far was the attitude index, rather than income, marital status or household type.<ref>{{Cite journal |last=Flood |first=Joe |date=2008 |title=Multinomial Analysis for Housing Careers Survey |url=https://www.academia.edu/33218811 |access-date=6 May 2022 |website=Paper to the European Network for Housing Research Conference, Dublin}}</ref>
|