Revision as of 03:05, 30 June 2025 edit OAbot (talk \| contribs) Bots 644,036 edits m Open access bot: url-access=subscription updated in citation with #oabot. ← Previous edit		Revision as of 11:53, 20 July 2025 edit undo Mepperelf (talk \| contribs) Extended confirmed users 1,367 edits m changed common redundancy "separate out" to "separate" Tag: Reverted Next edit →
Line 121: <math>E(\alpha) = \frac{1}{1 - \alpha^2} \sum_{i=1}^{n} (\alpha p_i - q_i)^2</math>; such that setting the derivative of the error function to zero <math>(E'(\alpha) = 0)</math> yields:<math>\alpha = \frac{1}{2} \left( -\lambda \pm \sqrt{\lambda^2 + 4} \right)</math> where<math>\lambda = \frac{p \cdot p - q \cdot q}{p \cdot q}</math>.<ref name="Holmes2023" /> [[File:PCA of Haplogroup J using 37 STRs.png\|thumb\|right\|A principal components analysis scatterplot of [[Y-STR]] [[haplotype]]s calculated from repeat-count values for 37 Y-chromosomal STR markers from 354 individuals.<br /> PCA has successfully found linear combinations of the markers that separate ~~out~~ different clusters corresponding to different lines of individuals' Y-chromosomal genetic descent.]] Such [[dimensionality reduction]] can be a very useful step for visualising and processing high-dimensional datasets, while still retaining as much of the variance in the dataset as possible. For example, selecting ''L'' = 2 and keeping only the first two principal components finds the two-dimensional plane through the high-dimensional dataset in which the data is most spread out, so if the data contains [[Cluster analysis\|clusters]] these too may be most spread out, and therefore most visible to be plotted ~~out~~ in a two-dimensional diagram; whereas if two directions through the data (or two of the original variables) are chosen at random, the clusters may be much less spread apart from each other, and may in fact be much more likely to substantially overlay each other, making them indistinguishable. Similarly, in [[regression analysis]], the larger the number of [[explanatory variable]]s allowed, the greater is the chance of [[overfitting]] the model, producing conclusions that fail to generalise to other datasets. One approach, especially when there are strong correlations between different possible explanatory variables, is to reduce them to a few principal components and then run the regression against them, a method called [[principal component regression]]. Line 540: Market research has been an extensive user of PCA. It is used to develop customer satisfaction or customer loyalty scores for products, and with clustering, to develop market segments that may be targeted with advertising campaigns, in much the same way as factorial ecology will locate geographical areas with similar characteristics.<ref>{{Cite journal \|last1=DeSarbo \|first1=Wayne \|last2=Hausmann \|first2=Robert \|last3=Kukitz \|first3=Jeffrey \|date=2007 \|title=Restricted principal components analysis for marketing research \|url=https://www.researchgate.net/publication/247623679 \|journal=Journal of Marketing in Management \|volume=2 \|pages=305–328 \|via=ResearchGate}}</ref> PCA rapidly transforms large amounts of data into smaller, easier-to-digest variables that can be more rapidly and readily analyzed. In any consumer questionnaire, there are series of questions designed to elicit consumer attitudes, and principal components seek ~~out~~ latent variables underlying these attitudes. For example, the Oxford Internet Survey in 2013 asked 2000 people about their attitudes and beliefs, and from these analysts extracted four principal component dimensions, which they identified as 'escape', 'social networking', 'efficiency', and 'problem creating'.<ref>{{Cite book \|last1=Dutton \|first1=William H \|url=http://oxis.oii.ox.ac.uk/wp-content/uploads/2014/11/OxIS-2013.pdf \|title=Cultures of the Internet: The Internet in Britain \|last2=Blank \|first2=Grant \|publisher=Oxford Internet Institute \|year=2013 \|pages=6}}</ref> Another example from Joe Flood in 2008 extracted an attitudinal index toward housing from 28 attitude questions in a national survey of 2697 households in Australia. The first principal component represented a general attitude toward property and home ownership. The index, or the attitude questions it embodied, could be fed into a General Linear Model of tenure choice. The strongest determinant of private renting by far was the attitude index, rather than income, marital status or household type.<ref>{{Cite journal \|last=Flood \|first=Joe \|date=2008 \|title=Multinomial Analysis for Housing Careers Survey \|url=https://www.academia.edu/33218811 \|access-date=6 May 2022 \|website=Paper to the European Network for Housing Research Conference, Dublin}}</ref>

Principal component analysis: Difference between revisions