Revision as of 14:51, 6 May 2022 edit Sebaxion (talk \| contribs) 6 edits No edit summary Tag: Visual edit ← Previous edit		Revision as of 05:43, 30 October 2022 edit undo Citation bot (talk \| contribs) Bots 5,865,322 edits Add: s2cid, authors 1-1. Removed parameters. Some additions/deletions were parameter name changes. \| Use this bot. Report bugs. \| Suggested by Abductive \| #UCB_webform 2395/3850 Next edit →
Line 1: {{Short description\|Statistical method in psychology}} [[File:Exploratory Factor Analysis(EFA).png\|thumb\|Exploratory Factor Analysis Model]] In [[multivariate statistics]], '''exploratory factor analysis''' ('''EFA''') is a statistical method used to uncover the underlying structure of a relatively large set of [[Variable (research)\|variables]]. EFA is a technique within [[factor analysis]] whose overarching goal is to identify the underlying relationships between measured variables.<ref name=Norris>{{cite journal\|last=Norris\|first=Megan\|author2=Lecavalier, Luc\|title=Evaluating the Use of Exploratory Factor Analysis in Developmental Disability Psychological Research\|journal=Journal of Autism and Developmental Disorders\|date=17 July 2009\|volume=40\|issue=1\|pages=8–20\|doi=10.1007/s10803-009-0816-2\|pmid=19609833\|s2cid=45751299 }}</ref> It is commonly used by researchers when developing a scale (a ''scale'' is a collection of questions used to measure a particular research topic) and serves to identify a set of [[Latent variable\|latent constructs]] underlying a battery of measured variables.<ref name=Fabrigar>{{cite journal\|last=Fabrigar\|first=Leandre R.\|author2=Wegener, Duane T. \|author3=MacCallum, Robert C. \|author4=Strahan, Erin J. \|title=Evaluating the use of exploratory factor analysis in psychological research.\|journal=Psychological Methods\|date=1 January 1999\|volume=4\|issue=3\|pages=272–299\|doi=10.1037/1082-989X.4.3.272\|url=http://www.statpower.net/Content/312/Handout/Fabrigar1999.pdf}}</ref> It should be used when the researcher has no ''a priori'' hypothesis about factors or patterns of measured variables.<ref name=Finch>{{cite journal \| last1 = Finch \| first1 = J. F. \| last2 = West \| first2 = S. G. \| year = 1997 \| title = The investigation of personality structure: Statistical models \| journal = Journal of Research in Personality \| volume = 31 \| issue = 4\| pages = 439–485 \| doi=10.1006/jrpe.1997.2194}}</ref> ''Measured variables'' are any one of several attributes of people that may be observed and measured. Examples of measured variables could be the physical height, weight, and pulse rate of a human being. Usually, researchers would have a large number of measured variables, which are assumed to be related to a smaller number of "unobserved" factors. Researchers must carefully consider the number of measured variables to include in the analysis.<ref name =Fabrigar/> EFA procedures are more accurate when each factor is represented by multiple measured variables in the analysis. EFA is based on the common factor model.<ref name =Norris/> In this model, manifest variables are expressed as a function of common factors, unique factors, and errors of measurement. Each unique factor influences only one manifest variable, and does not explain correlations between manifest variables. Common factors influence more than one manifest variable and "factor loadings" are measures of the influence of a common factor on a manifest variable.<ref name =Norris/> For the EFA procedure, we are more interested in identifying the common factors and the related manifest variables. EFA assumes that any indicator/measured variable may be associated with any factor. When developing a scale, researchers should use EFA first before moving on to [[confirmatory factor analysis]] (CFA).<ref name=worthington>{{cite journal\|last=Worthington\|first=Roger L.\|author2= Whittaker, Tiffany A J. \|title=Scale development research: A content analysis and recommendations for best practices.\|journal=The Counseling Psychologist\|date=1 January 2006\|volume=34\|issue=6\|pages=806–838\|doi=10.1177/0011000006288127\|s2cid=146284440 }}</ref> EFA is essential to determine underlying factors/constructs for a set of measured variables; while CFA allows the researcher to test the hypothesis that a relationship between the observed variables and their underlying latent {{Not a typo\|factor(s)/construct(s)}} exists.<ref>Suhr, D. D. (2006). Exploratory or confirmatory factor analysis? (pp. 1-17). Cary: SAS Institute.</ref> EFA requires the researcher to make a number of important decisions about how to conduct the analysis because there is no one set method. Line 22: }} Mistakes in factor extraction may consist in extracting too few or too many factors. A comprehensive review of the state-of-the-art and a proposal of criteria for choosing the number of factors is presented in.<ref>{{Cite journal \|~~last~~last1=Iantovics \|~~first~~first1=Laszlo Barna \|last2=Rotar \|first2=Corina \|last3=Morar \|first3=Florica \|date=2018-12-04 \|title=Survey on establishing the optimal number of factors in exploratory factor analysis applied to data mining \|url=https://doi.org/10.1002/widm.1294 \|journal=WIREs Data Mining and Knowledge Discovery \|volume=9 \|issue=2 \|doi=10.1002/widm.1294 \|s2cid=69358367 \|issn=1942-4787}}</ref> When selecting how many factors to include in a model, researchers must try to balance [[Occam's razor\|parsimony]] (a model with relatively few factors) and plausibility (that there are enough factors to adequately account for correlations among measured variables).<ref>{{cite book\|last=Fabrigar\|first=Leandre R.\|title=Exploratory factor analysis\|publisher=Oxford University Press\|___location=Oxford\|isbn=978-0-19-973417-7\|author2=Wegener, Duane T.\|date=2012-01-12}}</ref> Line 30: ''Underfactoring'' occurs when too few factors are included in a model. If not enough factors are included in a model, there is likely to be substantial error. Measured variables that load onto a factor not included in the model can falsely load on factors that are included, altering true factor loadings. This can result in rotated solutions in which two factors are combined into a single factor, obscuring the true factor structure. There are a number of procedures designed to determine the optimal number of factors to retain in EFA. These include Kaiser's (1960) eigenvalue-greater-than-one rule (or K1 rule),<ref>{{cite journal\|last=Kaiser\|first=H.F.\|title=The application of electronic computers to factor analysis\|journal=Educational and Psychological Measurement\|year=1960\|volume=20\|pages=141–151\|doi=10.1177/001316446002000116\|s2cid=146138712 }}</ref> Cattell's (1966) [[scree plot]],<ref name="Cattell, R. B. 1966">Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, I, 245-276.</ref> Revelle and Rocklin's (1979) very simple structure criterion,<ref>{{cite journal \| last1 = Revelle \| first1 = W. \| last2 = Rocklin \| first2 = T. \| year = 1979 \| title = Very simple structure-alternative procedure for estimating the optimal number of interpretable factors \| journal = Multivariate Behavioral Research \| volume = 14 \| issue = 4\| pages = 403–414 \| doi = 10.1207/s15327906mbr1404_2 \| pmid = 26804437 }}</ref> model comparison techniques,<ref>{{cite journal \| last1 = Fabrigar \| first1 = Leandre R. \| last2 = Wegener \| first2 = Duane T. \| last3 = MacCallum \| first3 = Robert C. \| last4 = Strahan \| first4 = Erin J. \| year = 1999 \| title = Evaluating the use of exploratory factor analysis in psychological research. \| journal = Psychological Methods \| volume = 4 \| issue = 3\| pages = 272–299 \| doi = 10.1037/1082-989X.4.3.272 }}</ref> Raiche, Roipel, and Blais's (2006) acceleration factor and optimal coordinates,<ref>Raiche, G., Roipel, M., & Blais, J. G.\|Non graphical solutions for the Cattell’s scree test. Paper presented at The International Annual Meeting of the Psychometric Society, Montreal\|date=2006\|Retrieved December 10, 2012 from {{cite web \|url=https://ppw.kuleuven.be/okp/_pdf/Raiche2013NGSFC.pdf \|title=Archived copy \|access-date=2013-05-03 \|url-status=live \|archive-url=https://web.archive.org/web/20131021052759/https://ppw.kuleuven.be/okp/_pdf/Raiche2013NGSFC.pdf \|archive-date=2013-10-21 }}</ref> Velicer's (1976) minimum average partial,<ref name=Velicer>{{cite journal\|last=Velicer\|first=W.F.\|title=Determining the number of components from the matrix of partial correlations\|journal=Psychometrika\|year=1976\|volume=41\|issue=3\|pages=321–327\|doi=10.1007/bf02293557\|s2cid=122907389 }}</ref> Horn's (1965) [[parallel analysis]], and Ruscio and Roche's (2012) comparison data.<ref name =Ruscio>{{cite journal\|last=Ruscio\|first=J.\|author2=Roche, B.\|title=Determining the number of factors to retain in an exploratory factor analysis using comparison data of a known factorial structure\|journal=Psychological Assessment\|year=2012\|volume=24\|issue=2\|pages=282–292\|doi=10.1037/a0025697\|pmid=21966933}}</ref> Recent simulation studies assessing the robustness of such techniques suggest that the latter five can better assist practitioners to judiciously model data.<ref name =Ruscio/> These five modern techniques are now easily accessible through integrated use of IBM SPSS Statistics software (SPSS) and R (R Development Core Team, 2011). See Courtney (2013)<ref name="pareonline.net">Courtney, M. G. R. (2013). Determining the number of factors to retain in EFA: Using the SPSS R-Menu v2.0 to make more judicious estimations. ''Practical Assessment, Research and Evaluation'', 18(8). Available online: {{cite web \|url=http://pareonline.net/getvn.asp?v=18&n=8 \|title=Archived copy \|access-date=2014-06-08 \|url-status=live \|archive-url=https://web.archive.org/web/20150317145450/http://pareonline.net/getvn.asp?v=18&n=8 \|archive-date=2015-03-17 }}</ref> for guidance on how to carry out these procedures for continuous, ordinal, and heterogenous (continuous and ordinal) data. Line 44: {{Main\|Scree plot}} [[File:Scree Plot.png\|thumb\|SPSS output of Scree Plot]] Compute the eigenvalues for the correlation matrix and plot the values from largest to smallest. Examine the graph to determine the last substantial drop in the magnitude of eigenvalues. The number of plotted points before the last drop is the number of factors to include in the model.<ref name="Cattell, R. B. 1966"/> This method has been criticized because of its subjective nature (i.e., there is no clear objective definition of what constitutes a substantial drop).<ref>{{cite journal \| last1 = Kaiser \| first1 = H. F. \| year = 1970 \| title = A second generation little jiffy \| journal = Psychometrika \| volume = 35 \| issue = 4 \| pages = 401–415 \| doi = 10.1007/bf02291817 \| s2cid = 121850294 }}</ref> As this procedure is subjective, Courtney (2013) does not recommend it.<ref name="pareonline.net"/> ===Revelle and Rocklin (1979) very simple structure=== Line 54: There are different methods that can be used to assess model fit:<ref name =Fabrigar/> '''Likelihood ratio statistic:'''<ref>Lawley, D. N. (1940). The estimation of factor loadings by the method of maximumlikelihood. Proceedings of the Royal Society ofedinborough, 60A, 64-82.</ref> Used to test the null hypothesis that a model has perfect model fit. It should be applied to models with an increasing number of factors until the result is nonsignificant, indicating that the model is not rejected as good model fit of the population. This statistic should be used with a large sample size and normally distributed data. There are some drawbacks to the likelihood ratio test. First, when there is a large sample size, even small discrepancies between the model and the data result in model rejection.<ref name =Humphreys/><ref>{{cite journal \| last1 = Hakstian \| first1 = A. R. \| last2 = Rogers \| first2 = W. T. \| last3 = Cattell \| first3 = R. B. \| year = 1982 \| title = The behavior of number-offactors rules with simulated data \| journal = Multivariate Behavioral Research \| volume = 17 \| issue = 2\| pages = 193–219 \| doi = 10.1207/s15327906mbr1702_3 \| pmid = 26810948 }}</ref><ref>{{cite journal\|last=Harris\|first=M. L.\|author2=Harris, C. W.\|title=A Factor Analytic Interpretation Strategy\|journal=Educational and Psychological Measurement\|date=1 October 1971\|volume=31\|issue=3\|pages=589–606\|doi=10.1177/001316447103100301\|s2cid=143515527 }}</ref> When there is a small sample size, even large discrepancies between the model and data may not be significant, which leads to underfactoring.<ref name =Humphreys/> Another disadvantage of the likelihood ratio test is that the null hypothesis of perfect fit is an unrealistic standard.<ref name=Maccallum>{{cite journal \| last1 = Maccallum \| first1 = R. C. \| year = 1990 \| title = The need for alternative measures of fit in covariance structure modeling \| journal = Multivariate Behavioral Research \| volume = 25 \| issue = 2\| pages = 157–162 \| doi=10.1207/s15327906mbr2502_2\| pmid = 26794477 }}</ref><ref name=Browne>{{cite journal \| last1 = Browne \| first1 = M. W. \| last2 = Cudeck \| first2 = R. \| year = 1992 \| title = Alternative ways of assessing model fit \| journal = Sociological Methods and Research \| volume = 21 \| issue = 2 \| pages = 230–258 \| doi = 10.1177/0049124192021002005 \| s2cid = 120166447 }}</ref> '''Root mean square error of approximation (RMSEA) fit index:''' RMSEA is an estimate of the discrepancy between the model and the data per degree of freedom for the model. Values less that .05 constitute good fit, values between 0.05 and 0.08 constitute acceptable fit, a values between 0.08 and 0.10 constitute marginal fit and values greater than 0.10 indicate poor fit .<ref name =Browne/><ref>Steiger, J. H. (1989). EzPATH: A supplementary module for SYSTAT andsygraph. Evanston, IL: SYSTAT</ref> An advantage of the RMSEA fit index is that it provides confidence intervals which allow researchers to compare a series of models with varying numbers of factors. Line 65: ===Parallel analysis=== {{Main\|Parallel analysis}} To carry out the PA test, users compute the eigenvalues for the correlation matrix and plot the values from largest to smallest and then plot a set of random eigenvalues. The number of eigenvalues before the intersection points indicates how many factors to include in your model.<ref name=Humphreys>{{cite journal \| last1 = Humphreys \| first1 = L. G. \| last2 = Montanelli \| first2 = R. G. Jr \| year = 1975 \| title = An investigation of the parallel analysis criterion for determining the number of common factors \| journal = Multivariate Behavioral Research \| volume = 10 \| issue = 2\| pages = 193–205 \| doi = 10.1207/s15327906mbr1002_5 }}</ref><ref>{{cite journal\|last=Horn\|first=John L.\|title=A rationale and test for the number of factors in factor analysis\|journal=Psychometrika\|date=1 June 1965\|volume=30\|issue=2\|pages=179–185\|doi=10.1007/BF02289447\|pmid=14306381\|s2cid=19663974 }}</ref><ref>{{cite journal\|last=Humphreys\|first=L. G.\|author2=Ilgen, D. R.\|title=Note On a Criterion for the Number of Common Factors\|journal=Educational and Psychological Measurement\|date=1 October 1969\|volume=29\|issue=3\|pages=571–578\|doi=10.1177/001316446902900303\|s2cid=145258601 }}</ref> This procedure can be somewhat arbitrary (i.e. a factor just meeting the cutoff will be included and one just below will not).<ref name =Fabrigar/> Moreover, the method is very sensitive to sample size, with PA suggesting more factors in datasets with larger sample sizes.<ref>{{cite journal \| last1 = Warne \| first1 = R. G. \| last2 = Larsen \| first2 = R. \| year = 2014 \| title = Evaluating a proposed modification of the Guttman rule for determining the number of factors in an exploratory factor analysis \| journal = Psychological Test and Assessment Modeling \| volume = 56 \| pages = 104–123 }}</ref> Despite its shortcomings, this procedure performs very well in simulation studies and is one of Courtney's recommended procedures.<ref name="pareonline.net"/> PA has been [[Parallel_analysis#Implementation\|implemented]] in a number of commonly used statistics programs such as R and SPSS. ===Ruscio and Roche's comparison data=== Line 78: ==Factor rotation== Factor rotation is a commonly employed step in EFA, used to aide interpretation of factor matrixes.<ref name="Browne2001">{{cite journal \|last1=Browne \|first1=Michael W. \|title=An Overview of Analytic Rotation in Exploratory Factor Analysis \|journal=Multivariate Behavioral Research \|date=January 2001 \|volume=36 \|issue=1 \|pages=111–150 \|doi=10.1207/S15327906MBR3601_05\|s2cid=9598774 }}</ref><ref name="Sass2010">{{cite journal \|last1=Sass \|first1=Daniel A. \|last2=Schmitt \|first2=Thomas A. \|title=A Comparative Investigation of Rotation Criteria Within Exploratory Factor Analysis \|journal=Multivariate Behavioral Research \|date=29 January 2010 \|volume=45 \|issue=1 \|pages=73–103 \|doi=10.1080/00273170903504810\|pmid=26789085 \|s2cid=6458980 }}</ref><ref name="Schmitt2011">{{cite journal \|last1=Schmitt \|first1=Thomas A. \|last2=Sass \|first2=Daniel A. \|title=Rotation Criteria and Hypothesis Testing for Exploratory Factor Analysis: Implications for Factor Pattern Loadings and Interfactor Correlations \|journal=Educational and Psychological Measurement \|date=February 2011 \|volume=71 \|issue=1 \|pages=95–113 \|doi=10.1177/0013164410387348\|s2cid=120709021 }}</ref> For any solution with two or more factors there are an infinite number of orientations of the factors that will explain the data equally well. Because there is no unique solution, a researcher must select a single solution from the infinite possibilities. The goal of factor rotation is to [[Rotation of axes\|rotate]] factors in multidimensional space to arrive at a solution with best simple structure. There are two main types of factor rotation: [[Orthogonality\|orthogonal]] and [[Angle#Types of angles\|oblique]] rotation. ===Orthogonal rotation=== Line 98: Common factor analysis software is capable of producing an unrotated solution. This refers to the result of a [[#Principal_axis_factoring_(PAF)\|principal axis factoring]] with no further rotation. The so-called unrotated solution is in fact an orthogonal rotation that maximizes the variance of the first factors. The unrotated solution tends to give a general factor with loadings for most of the variables. This may be useful if many variables are correlated with each other, as revealed by one or a few dominating [[eigenvalue\|eigenvalues]] on a [[scree plot]]. The usefulness of an unrotated solution was emphasized by a [[meta analysis]] of studies of cultural differences. This revealed that many published studies of cultural differences have given similar factor analysis results, but rotated differently. Factor rotation has obscured the similarity between the results of different studies and the existence of a strong general factor, while the unrotated solutions were much more similar.<ref name="Fog2020">{{cite journal\|last=Fog\|first=A. \|title=A Test of the Reproducibility of the Clustering of Cultural Variables \|journal=Cross-Cultural Research \|year=2020 \|volume=55 \|pages=29–57 \|doi=10.1177/1069397120956948\|s2cid=224909443 }}</ref><ref>{{Cite journal\|title=Examining Factors in 2015 TIMSS Australian Grade 4 Student Questionnaire Regarding Attitudes Towards Science Using Exploratory Factor Analysis (EFA)\|url=https://twasp.info/journal/gi93583P/examining-factors-in-2015-timss-australian-grade-4-student-questionnaire-regarding-attitudes-towards-science-using-exploratory-factor-analysis-efa\|journal=North American Academic Research\|volume=3}}</ref> ==Factor interpretation==

Exploratory factor analysis: Difference between revisions