Content deleted Content added
m Open access bot: doi added to citation with #oabot. |
m Open access bot: hdl updated in citation with #oabot. |
||
(48 intermediate revisions by 23 users not shown) | |||
Line 1:
{{Short description|Statistical method in psychology}}
In [[multivariate statistics]], '''exploratory factor analysis''' ('''EFA''') is a statistical method used to uncover the underlying structure of a relatively large set of [[Variable (research)|variables]]. EFA is a technique within [[factor analysis]] whose overarching goal is to identify the underlying relationships between measured variables.<ref name=Norris>{{cite journal|last=Norris|first=Megan|author2=Lecavalier, Luc|title=Evaluating the Use of Exploratory Factor Analysis in Developmental Disability Psychological Research|journal=Journal of Autism and Developmental Disorders|date=17 July 2009|volume=40|issue=1|pages=8–20|doi=10.1007/s10803-009-0816-2|pmid=19609833}}</ref> It is commonly used by researchers when developing a scale (a ''scale'' is a collection of questions used to measure a particular research topic) and serves to identify a set of [[Latent variable|latent constructs]] underlying a battery of measured variables.<ref name=Fabrigar>{{cite journal|last=Fabrigar|first=Leandre R.|author2=Wegener, Duane T. |author3=MacCallum, Robert C. |author4=Strahan, Erin J. |title=Evaluating the use of exploratory factor analysis in psychological research.|journal=Psychological Methods|date=1 January 1999|volume=4|issue=3|pages=272–299|doi=10.1037/1082-989X.4.3.272}}</ref> It should be used when the researcher has no ''a priori'' hypothesis about factors or patterns of measured variables.<ref name=Finch>{{cite journal | last1 = Finch | first1 = J. F. | last2 = West | first2 = S. G. | year = 1997 | title = The investigation of personality structure: Statistical models | url = | journal = Journal of Research in Personality | volume = 31 | issue = 4| pages = 439–485 | doi=10.1006/jrpe.1997.2194}}</ref> ''Measured variables'' are any one of several attributes of people that may be observed and measured. Examples of measured variables could be the physical height, weight, and pulse rate of a human being. Usually, researchers would have a large number of measured variables, which are assumed to be related to a smaller number of "unobserved" factors. Researchers must carefully consider the number of measured variables to include in the analysis.<ref name =Fabrigar/> EFA procedures are more accurate when each factor is represented by multiple measured variables in the analysis.▼
[[File:Exploratory Factor Analysis(EFA).png|thumb|Exploratory Factor Analysis Model]]
▲In [[multivariate statistics]], '''exploratory factor analysis''' ('''EFA''') is a statistical method used to uncover the underlying structure of a relatively large set of [[Variable (research)|variables]]. EFA is a technique within [[factor analysis]] whose overarching goal is to identify the underlying relationships between measured variables.<ref name=Norris>{{cite journal|last=Norris|first=Megan|author2=Lecavalier, Luc|title=Evaluating the Use of Exploratory Factor Analysis in Developmental Disability Psychological Research|journal=Journal of Autism and Developmental Disorders|date=17 July 2009|volume=40|issue=1|pages=8–20|doi=10.1007/s10803-009-0816-2|pmid=19609833|s2cid=45751299 }}</ref> It is commonly used by researchers when developing a scale (a ''scale'' is a collection of questions used to measure a particular research topic) and serves to identify a set of [[Latent variable|latent constructs]] underlying a battery of measured variables.<ref name=Fabrigar>{{cite journal|last=Fabrigar|first=Leandre R.|author2=Wegener, Duane T. |author3=MacCallum, Robert C. |author4=Strahan, Erin J. |title=Evaluating the use of exploratory factor analysis in psychological research.|journal=Psychological Methods|date=1 January 1999|volume=4|issue=3|pages=272–299|doi=10.1037/1082-989X.4.3.272|url=http://www.statpower.net/Content/312/Handout/Fabrigar1999.pdf}}</ref> It should be used when the researcher has no ''a priori'' hypothesis about factors or patterns of measured variables.<ref name=Finch>{{cite journal | last1 = Finch | first1 = J. F. | last2 = West | first2 = S. G. | year = 1997 | title = The investigation of personality structure: Statistical models
EFA is based on the common factor model.<ref name =Norris/> In this model, manifest variables are expressed as a function of common factors, unique factors, and errors of measurement. Each unique factor influences only one manifest variable, and does not explain correlations between manifest variables. Common factors influence more than one manifest variable and "factor loadings" are measures of the influence of a common factor on a manifest variable.<ref name =Norris/> For the EFA procedure, we are more interested in identifying the common factors and the related manifest variables.
EFA assumes that any indicator/measured variable may be associated with any factor. When developing a scale, researchers should use EFA first before moving on to [[confirmatory factor analysis]]
EFA requires the researcher to make a number of important decisions about how to conduct the analysis because there is no one set method.
Line 10 ⟶ 12:
===Maximum likelihood (ML)===
The maximum likelihood method has many advantages in that it allows researchers to compute of a wide range of indexes of the [[goodness of fit]] of the model, it allows researchers to test the [[statistical significance]] of factor loadings, calculate correlations among factors and compute [[confidence interval]]s for these parameters.<ref>{{cite journal | last1 = Cudeck | first1 = R. | last2 = O'Dell | first2 = L. L. | year = 1994 | title = Applications of standard error estimates in unrestricted factor analysis: Significance tests for factor loadings and correlations
===Principal axis factoring (PAF)===
Line 16 ⟶ 18:
==Selecting the appropriate number of factors==
{{More citations needed
| date = June 2017
}}
When selecting how many factors to include in a model, researchers must try to balance [[Occam's razor|parsimony]] (a model with relatively few factors) and plausibility (that there are enough factors to adequately account for correlations among measured variables).<ref>{{cite book|last=Fabrigar|first=Leandre R.|title=Exploratory factor analysis|publisher=Oxford University Press|___location=Oxford|isbn=978-0-19-973417-7|author2=Wegener, Duane T.|date=2012-01-12}}</ref>
''Overfactoring'' occurs when too many factors are included in a model
''Underfactoring'' occurs when too few factors are included in a model
There are a number of procedures designed to determine the optimal number of factors to retain in EFA.
{{cite web |url=http://pareonline.net/getvn.asp?v=18&n=8 |title=Archived copy |
With the exception of Revelle and Rocklin's (1979) very simple structure criterion, model comparison techniques, and Velicer's (1976) minimum average partial, all other procedures rely on the analysis of eigenvalues. The ''eigenvalue'' of a factor represents the amount of variance of the variables accounted for by that factor. The lower the eigenvalue, the less that factor contributes to explaining the variance of the variables.<ref name =Norris/>
A short description of each of the nine procedures mentioned above is provided below.
{{anchor|Kaiser criterion}}
===Kaiser's (1960) eigenvalue-greater-than-one rule (K1 or Kaiser criterion)===
Compute the eigenvalues for the correlation matrix and determine how many of these eigenvalues are greater than 1. This number is the number of factors to include in the model. A disadvantage of this procedure is that it is quite arbitrary (e.g., an eigenvalue of 1.01 is included whereas an eigenvalue of .99 is not). This procedure often leads to overfactoring and sometimes underfactoring. Therefore, this procedure should not be used.<ref name =Fabrigar /> A variation of the K1 criterion has been created to lessen the severity of the criterion's problems where a researcher calculates [[confidence interval]]s for each eigenvalue and retains only factors which have the entire confidence interval greater than 1.0.<ref>{{cite journal | last1 = Larsen | first1 = R. | last2 = Warne | first2 = R. T. | year = 2010 | title = Estimating confidence intervals for eigenvalues in exploratory factor analysis
===Cattell's (1966) scree plot===
{{Main|Scree plot}}
[[File:Scree Plot.png|thumb|SPSS output of Scree Plot]]
Compute the eigenvalues for the correlation matrix and plot the values from largest to smallest. Examine the graph to determine the last substantial drop in the magnitude of eigenvalues. The number of plotted points before the last drop is the number of factors to include in the model.<ref name="Cattell, R. B. 1966"/> This method has been criticized because of its subjective nature (i.e., there is no clear objective definition of what constitutes a substantial drop).<ref>{{cite journal | last1 = Kaiser | first1 = H. F. | year = 1970 | title = A second generation little jiffy
===Revelle and Rocklin (1979) very simple structure===
Line 48 ⟶ 53:
There are different methods that can be used to assess model fit:<ref name =Fabrigar/>
*'''Likelihood ratio statistic:'''<ref>Lawley, D. N. (1940). The estimation of factor loadings by the method of maximumlikelihood. Proceedings of the Royal Society ofedinborough, 60A, 64-82.</ref> Used to test the null hypothesis that a model has perfect model fit. It should be applied to models with an increasing number of factors until the result is nonsignificant, indicating that the model is not rejected as good model fit of the population. This statistic should be used with a large sample size and normally distributed data. There are some drawbacks to the likelihood ratio test. First, when there is a large sample size, even small discrepancies between the model and the data result in model rejection.<ref name =Humphreys/><ref>{{cite journal | last1 = Hakstian | first1 = A. R. | last2 = Rogers | first2 = W. T. | last3 = Cattell | first3 = R. B. | year = 1982 | title = The behavior of number-offactors rules with simulated data
*'''Root mean square error of approximation (RMSEA) fit index:''' RMSEA is an estimate of the discrepancy between the model and the data per degree of freedom for the model. Values less that .05 constitute good fit, values between 0.05 and 0.08 constitute acceptable fit, a values between 0.08 and 0.10 constitute marginal fit and values greater than 0.10 indicate poor fit .<ref name =Browne/><ref>Steiger, J. H. (1989). EzPATH: A supplementary module for SYSTAT andsygraph. Evanston, IL: SYSTAT</ref> An advantage of the RMSEA fit index is that it provides confidence intervals which allow researchers to compare a series of models with varying numbers of factors.
*'''Information Criteria:''' Information criteria such as Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC) <ref>Neath, A. A., & Cavanaugh, J. E. (2012). The Bayesian information criterion: background, derivation, and applications. Wiley Interdisciplinary Reviews: Computational Statistics, 4(2), 199-203.</ref> can be used to trade-off model fit with model complexity and select an optimal number of factors.
*'''Out-of-sample Prediction Errors (PE):''' Using the connection between model-implied covariance matrices and standardized regression weights, the number of factors can be selected using out-of-sample prediction errors.<ref name=":0" /> In other words, the PE approach tests the ability of a factor model with ''k'' factors to predict scores on ''p'' items in held-out respondents, using the model-implied covariance structure to derive item-level regressions (e.g., predicting item ''i'' as a linear combination of all other items, with coefficients given by the inverse covariance matrix), selecting the value of ''k'' that best predicts out-of-sample item scores. In an extensive 2022 simulation study, Haslbeck and van Bork<ref name=":0" /> found that the PE method compares favorably with the best-performing existing methods (e.g., parallel analysis, exploratory graph analysis, AIC).
===Optimal Coordinate and Acceleration Factor===
Line 55 ⟶ 62:
===Velicer's Minimum Average Partial test (MAP)===
Velicer's (1976) MAP test<ref name=Velicer/> “involves a complete principal components analysis followed by the examination of a series of matrices of partial correlations” (p. 397). The squared correlation for Step “0” (see Figure 4) is the average squared off-diagonal correlation for the unpartialed correlation matrix. On Step 1, the first principal component and its associated items are partialed out. Thereafter, the average squared off-diagonal correlation for the subsequent correlation matrix is computed for Step 1. On Step 2, the first two principal components are partialed out and the resultant average squared off-diagonal correlation is again computed. The computations are carried out for k minus one steps (k representing the total number of variables in the matrix). Finally, the average squared correlations for all steps are lined up and the step number that resulted in the lowest average squared partial correlation determines the number of components or factors to retain (Velicer, 1976). By this method, components are maintained as long as the variance in the correlation matrix represents systematic variance, as opposed to residual or error variance. Although methodologically akin to principal components analysis, the MAP technique has been shown to perform quite well in determining the number of factors to retain in multiple simulation studies.<ref name =Ruscio/><ref name=Garrido>Garrido, L. E., & Abad, F. J., & Ponsoda, V. (2012). A new look at Horn's parallel analysis with ordinal variables. Psychological Methods. Advance online publication. {{doi
===Parallel analysis===
{{Main|Parallel analysis}}
To carry out the PA test, users compute the eigenvalues for the correlation matrix and plot the values from largest to smallest and then plot a set of random eigenvalues. The number of eigenvalues before the intersection points indicates how many factors to include in your model.<ref name=Humphreys>{{cite journal | last1 = Humphreys | first1 = L. G. | last2 = Montanelli | first2 = R. G. Jr | year = 1975 | title = An investigation of the parallel analysis criterion for determining the number of common factors
===Ruscio and Roche's comparison data===
In 2012 Ruscio and Roche<ref name =Ruscio/> introduced the comparative data (CD) procedure in an attempt improve to upon the PA method. The authors state that "rather than generating random datasets, which only take into account sampling error, multiple datasets with known factorial structures are analyzed to determine which best reproduces the profile of eigenvalues for the actual data" (p. 258). The strength of the procedure is its ability to not only incorporate sampling error, but also the factorial structure and multivariate distribution of the items. Ruscio and Roche's (2012) simulation study<ref name =Ruscio/> determined that the CD procedure outperformed many other methods aimed at determining the correct number of factors to retain. In that study, the CD technique, making use of Pearson correlations accurately predicted the correct number of factors 87.14% of the time. However, the simulated study never involved more than five factors. Therefore, the applicability of the CD procedure to estimate factorial structures beyond five factors is yet to be tested. Courtney includes this procedure in his recommended list and gives guidelines showing how it can be easily carried out from within SPSS's user interface.<ref name="pareonline.net"/>
In 2023, Goretzko and Ruscio proposed the Comparison Data Forest as an extension of the CD approach.<ref>{{Cite journal |last1=Goretzko |first1=David |last2=Ruscio |first2=John |date=2023-06-15 |title=The comparison data forest: A new comparison data approach to determine the number of factors in exploratory factor analysis |journal=Behavior Research Methods |volume=56 |issue=3 |pages=1838–1851 |doi=10.3758/s13428-023-02122-4 |issn=1554-3528 |pmc=10991039 |pmid=37382813}}</ref>
===Convergence of multiple tests===
Line 72 ⟶ 81:
==Factor rotation==
Factor rotation is
# Each row (denoting the loadings of a single item on all ''m'' factors) contains at least one zero
# Each column (denoting the loadings of all items on a single factor) contains at least ''m'' zeros
# All pairs of columns (i.e., factors) have several rows (i.e., items) with a zero loading in one column but not the other (i.e., all pairs of factors have several items that can differentiate the factors)
# If ''m'' ≥ 4, all pairs of columns should have several rows with zeros in both columns
# All pairs of columns should have few rows with non-zero loadings in both columns (i.e., there should be few items with cross-loadings)
There are two main types of factor rotation: [[Orthogonality|orthogonal]] and [[Angle#Types of angles|oblique]] rotation.
===Orthogonal rotation===
Orthogonal rotations constrain factors to be
[[Varimax rotation]] is an orthogonal rotation of the factor axes to maximize the variance of the squared loadings of a factor (column) on all the variables (rows) in a factor matrix, which has the effect of differentiating the original variables by extracted factor. Each factor will tend to have either large or small loadings of any particular variable. A varimax solution yields results which make it as easy as possible to identify each variable with a single factor. This is the most common orthogonal rotation option.<ref name =Fabrigar/>
Quartimax rotation is an orthogonal rotation that maximizes the squared loadings for each variable rather than each factor. This minimizes the number of factors needed to explain each variable. This type of rotation often generates a general factor on which most variables are loaded to a high or medium degree.<ref name=Neuhaus>{{cite journal|last=Neuhaus|first=Jack O|author2=Wrigley, C.|title=The Quartimax Method|journal=British Journal of Statistical Psychology|date=1954|volume=7|issue=2|pages=81–91|doi=10.1111/j.2044-8317.1954.tb00147.x}}</ref>
Equimax rotation is a compromise between varimax and quartimax criteria.
===Oblique rotation===
Oblique rotations permit correlations among factors
Several oblique rotation procedures are commonly used.
Direct oblimin rotation is the standard oblique rotation method. Promax rotation is often seen in older literature because it is easier to calculate than oblimin. Other oblique methods include direct quartimin rotation and Harris-Kaiser orthoblique rotation.<ref name =Fabrigar/>
===Unrotated solution===
Common factor analysis software is capable of producing an unrotated solution. This refers to the result of a [[#Principal_axis_factoring_(PAF)|principal axis factoring]] with no further rotation. The so-called unrotated solution is in fact an orthogonal rotation that maximizes the variance of the first factors. The unrotated solution tends to give a general factor with loadings for most of the variables. This may be useful if many variables are correlated with each other, as revealed by one or a few dominating [[eigenvalue]]s on a [[scree plot]].
The usefulness of an unrotated solution was emphasized by a [[meta analysis]] of studies of cultural differences. This revealed that many published studies of cultural differences have given similar factor analysis results, but rotated differently. Factor rotation has obscured the similarity between the results of different studies and the existence of a strong general factor, while the unrotated solutions were much more similar.<ref name="Fog2020">{{cite journal|last=Fog|first=A. |title=A Test of the Reproducibility of the Clustering of Cultural Variables |journal=Cross-Cultural Research |year=2020 |volume=55 |pages=29–57 |doi=10.1177/1069397120956948|s2cid=224909443 }}</ref><ref>{{Cite journal|title=Examining Factors in 2015 TIMSS Australian Grade 4 Student Questionnaire Regarding Attitudes Towards Science Using Exploratory Factor Analysis (EFA)|url=https://twasp.info/journal/gi93583P/examining-factors-in-2015-timss-australian-grade-4-student-questionnaire-regarding-attitudes-towards-science-using-exploratory-factor-analysis-efa|journal=North American Academic Research|volume=3}}</ref>
==Factor interpretation==
Factor loadings are numerical values that indicate the strength and direction of a factor on a measured variable. Factor loadings indicate how strongly the factor influences the measured variable. In order to label the factors in the model, researchers should examine the factor pattern to see which items load highly on which factors and then determine what those items have in common.<ref name =Fabrigar/> Whatever the items have in common will indicate the meaning of the factor. Interpretation has long been noted as an important, but difficult, part of the analytic process.<ref>{{Cite journal |last=Copeland |first=Herman A. |date=March 1935 |title=A note on "The Vectors of Mind." |url=https://doi.apa.org/doi/10.1037/h0057026 |journal=Psychological Review |volume=42 |issue=2 |pages=216–218 |doi=10.1037/h0057026 |issn=1939-1471|url-access=subscription }}</ref>
However, while exploratory factor analysis is a powerful tool for uncovering underlying structures among variables, it is crucial to avoid reliance on it without adequate theorizing. Armstrong's<ref>{{cite journal |last1=Armstrong |first1=J. Scott |title=Derivation of Theory by Means of Factor Analysis or Tom Swift and His Electric Factor Analysis Machine |journal=The American Statistician |date=December 1967 |volume=21 |issue=5 |pages=17–21 |doi=10.1080/00031305.1967.10479849|hdl=1721.1/47256 |hdl-access=free }}</ref> critique highlights that EFA, when conducted without a theoretical framework, can lead to misleading interpretations. For instance, in a hypothetical case study involving the analysis of various physical properties of metals, the results of EFA failed to identify the true underlying factors, instead producing an "over-factored" model that obscured the simplicity of the relationships amongst the observed variables. Similarly, poorly designed survey items can lead to spurious factor structures.<ref>{{cite journal |last1=Maul |first1=Andrew |title=Rethinking Traditional Methods of Survey Validation |journal=Measurement: Interdisciplinary Research and Perspectives |date=3 April 2017 |volume=15 |issue=2 |pages=51–69 |doi=10.1080/15366367.2017.1348108}}</ref>
==See also==
*[[Confirmatory factor analysis]]
*[[Factor analysis#Exploratory factor analysis (EFA) versus principal components analysis (PCA)|Exploratory factor analysis
*[[v:Exploratory factor analysis|Exploratory factor analysis]] (Wikiversity)
*[[Factor analysis]]
Line 95 ⟶ 128:
*Best Practices in Exploratory Factor Analysis: Four Recommendations for Getting the Most From Your Analysis. http://pareonline.net/pdf/v10n7.pdf
*Wikiversity: Exploratory Factor Analysis. http://en.wikiversity.org/wiki/Exploratory_factor_analysis
*[http://inis.jinr.ru/sl/M_Mathematics/MV_Probability/MVas_Applied%20statistics/Tucker%20L.R.,%20MacCallum%20R.C.%20Exploratory%20factor%20analysis%20(1997)(459s).pdf Tucker and MacCallum: Exploratory Factor Analysis.] pdf
[[Category:Factor analysis]]
|