Exploratory factor analysis

In multivariate statistics, exploratory factor analysis (EFA) is a statistical method used to uncover the underlying structure of a relatively large set of variables. EFA is a technique within Factor Analysis whose overarching goal is to identify the underlying relationships between measured variables ^[1] . It is commonly used by researchers when developing a scale^{[clarification needed]} and serves to identify a set of latent constructs underlying a battery of measured variables. ^[2] It should be used when the researcher has no a priori hypothesis about factors or patterns of measured variables.^[3] Measured variables are any one of several attributes of people that may be observed and measured. An example of a measured variable would be one item on a scale. Researchers must carefully consider the number of measured variables to include in the analysis.^[2]) EFA procedures are more accurate when each factor is represented by multiple measured variables in the analysis. There should be at least 3 to 5 measured variables per factor.^[4]

EFA is based on the common factor model. Within the common factor model, measured variables are expressed as a function of common factors, unique factors, and errors of measurement. Common factors inﬂuence two or more measured variables, while each unique factor inﬂuences only one measured variable and does not explain correlations among measured variables. ^[1])

An assumption of EFA is that any indicator/measured variable may be associated with any factor. When developing a scale, researchers should use EFA first before moving on to confirmatory factor analysis (CFA). EFA requires the researcher to make a number of important decisions about how to conduct the analysis because there is no one set method.

Fitting procedures

Fitting procedures are used to estimate the factor loadings and unique variances of the model (Factor loadings are the regression coefﬁcients between items and factors and measure the inﬂuence of a common factor on a measured variable). There are several factor analysis fitting methods to choose from, however there is little information on all of their strengths and weaknesses and many don’t even have an exact name that is used consistently. Principal axis factoring (PAF) and maximum likelihood (ML) are two extraction methods that are generally recommended. In general, ML or PAF give the best results, depending on whether data are normally-distributed or if the assumption of normality has been violated. ^[2])

Maximum likelihood (ML)

The maximum likelihood method has many advantages in that it allows researchers to compute of a wide range of indexes of the goodness of fit of the model, it allows researchers to test the statistical significance of factor loadings, calculate correlations among factors and compute confidence intervals for these parameters^[5] . ML is the best choice when data are normally distributed because “it allows for the computation of a wide range of indexes of the goodness of fit of the model [and] permits statistical significance testing of factor loadings and correlations among factors and the computation of confidence intervals” ^[6] . ML should not be used if the data are not normally distributed.

Principal axis factoring (PAF)

Called “principal” axis factoring because the first factor accounts for as much common variance as possible, then the second factor next most variance, and so on. PAF is a descriptive procedure so it is best to use when the focus is just on your sample and you do not plan to generalize the results beyond your sample. An advantage of PAF is that it can be used when the assumption of normality has been violated. ^[6]) Another advantage of PAF is that it is less likely than ML to produce improper solutions.^[3]) A downside of PAF is that it provides a limited range of goodness-of-fit indexes compared to ML and does not allow for the computation of confidence intervals and significance tests.

Selecting the appropriate number of factors

When selecting how many factors to include in a model, researchers must try to balance parsimony (a model with relatively few factors) and plausibility (that there are enough factors to adequately account for correlations among measured variables). ^[7] It is better to include too many factors (overfactoring) than too few factors (underfactoring).

Overfactoring occurs when too many factors are included in a model. It is not as bad as underfactoring because major factors will usually be accurately represented and extra factors will have no measured variables load onto them. Still, it should be avoided because overfactoring may lead researchers to put forward constructs with little theoretical value.

Underfactoring occurs when too few factors are included in a model. This is considered to be a greater error than overfactoring. If not enough factors are included in a model, there is likely to be substantial error. Measured variables that load onto a factor not included in the model can falsely loaded on factors that are included, altering true factor loadings . This can result in rotated solutions in which two factors are combined into a single factor, obscuring the true factor structure.

There are a number of procedures in order to determine the best number of factors, including scree plot, parallel analysis, kaiser criterion, and model comparison. The first three measures rely on eigenvalues. The eigenvalue of a factor represents the amount of variance of the variables accounted for by that factor. The lower the eigenvalue, the less that factor contributes to the explanation of variances in the variables.^[1]

Scree plot

Compute the eigenvalues for the correlation matrix and plot the values from largest to smallest. Examine the graph to determine the last substantial drop in the magnitude of eigenvalues. The number of plotted points before the last drop is the number of factors to include in the model. ^[8] This method has been criticized because of its subjective nature (i.e., there is no clear objective definition of what constitutes a substantial drop). ^[9]

Parallel analysis

Compute the eigenvalues for the correlation matrix and plot the values from largest to smallest and then plot a set of random eigenvalues. The number of eigenvalues before the intersection points indicates how many factors to include in your model. ^[10] ^[11] ^[12] This procedure can be somewhat artbitrary (i.e. a factor just meeting the cutoff will be included and one just below will not).

Kaiser criterion

Compute the eigenvalues for the correlation matrix and determine how many of these eigenvalues are greater than 1. This number is the number of factors to include in the model. A disadvantage of this procedure is that it is quite arbitrary (e.g. an eigenvalue of 1.01 is included whereas an eigenvalue of .99 is not). This procedure often leads to overfactoring and sometimes underfactoring. Therefore, this procedure should not be used. ^[2])

Model comparison

Choose the best model from a series of models that differ in complexity. Researchers use goodness-of-fit measures to fit models beginning with a model with zero factors and gradually increase the number of factors. The goal is to ultimately choose a model that explains the data significantly better than simpler models (with fewer factors) and explains the data as well as more complex models (with more factors).

There are different methods to assess model fit:^{[citation needed]}

Likelihood ratio statistic:^[13] Used to test the null hypothesis that a model has perfect model fit. It should be applied to models with an increasing number of factors until the result is nonsignificant, indicating that the model is not rejected as good model fit of the population. This statistic should be used with a large sample size and normally distributed data. There are some drawbacks to the likelihood ratio test. First, when there is a large sample size, even small discrepancies between the model and the data result in model rejection.^[14] ^[12] ^[15] When there is a small sample size, even large discrepancies between the model and data may not be significant, which leads to underfactoring. ^[12]) Another disadvantage of the likelihood ratio test is that the null hypothesis of perfect fit is an unrealistic standard.^[16] ^[17] ^[18]

Root mean square error of approximation (RMSEA) fit index: RMSEA is an estimate of the discrepancy between the model and the data per degree of freedom for the model. Values less that .05 constitute good fit, values between 0.05 and 0.08 constitute acceptable fit, a values between 0.08 and 0.10 constitute marginal fit and values greater than 0.10 indicate poor fit .^[17] ^[19] An advantage of the RMSEA fit index is that it provides confidence intervals which allow researchers to compare a series of models with varying numbers of factors.

Factor rotation

Factor rotation is the process for interpreting factor matrixes. For any solution with two or more factors there are an infinite number of orientations of the factors that will explain the data equally well. Because there is no unique solution, a researcher must select a single solution from the infinite possibilities. The goal of factor rotation is to rotate factors in multidimensional space to arrive at a solution with best simple structure. There are two types of factor rotation: orthogonal and oblique rotation.

Orthogonal rotation

Orthogonal rotations constrain factors to be uncorrelated. Varimax is considered the best orthogonal rotation and consequently is used the most often in psychology research.^{[citation needed]} An advantage of orthogonal rotation is its simplicity and conceptual clarity, although there are several disadvantages. In the social sciences, there is often a theoretical basis for expecting constructs to be correlated, therefore orthogonal rotations may not be very realistic because it ignores this possibility. Also, because orthogonal rotations require factors to be oriented at 90° angles from one another (because the factors are uncorrelated), they are more likely to produce solutions with poor simple structure.

Oblique rotation

Oblique rotations permit correlations among factors; however it does not require that factors are correlated. If factors are not correlated, it will provide correlation estimates that are close to zero and produce a solution similar to orthogonal rotation . There are several oblique rotation procedures that are commonly used, such as the direct quartimin rotation , the promax rotation , and the Harris-Kaiser orthoblique rotation.^{[citation needed]} An advantage of oblique rotation is that it produces solutions with better simple structure because it allows factors to be oriented at different angles. Another advantage is that it produces estimates of correlations among factors.

Factor interpretation

Factor loadings are numerical values that indicate the strength and direction of a factor on a measured variable. Factor loadings indicate how strongly the factor influences the measured variable. In order to label the factors in the model, researchers should examine the factor pattern to see which items load highly on which factors and then determine what those items have in common.^{[citation needed]} Whatever the items have in common will indicate the meaning of the factor.

References

^ ^a ^b ^c Norris, Megan (17 July 2009). "Evaluating the Use of Exploratory Factor Analysis in Developmental Disability Psychological Research". Journal of Autism and Developmental Disorders. 40 (1): 8–20. doi:10.1007/s10803-009-0816-2. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
^ ^a ^b ^c ^d Fabrigar, Leandre R. (1 January 1999). "Evaluating the use of exploratory factor analysis in psychological research". Psychological Methods. 4 (3): 272–299. doi:10.1037/1082-989X.4.3.272. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
^ ^a ^b Finch, J. F., & West, S. G. (1997). "The investigation of personality structure: Statistical models". Journal of Research in Personality, 31 (4), 439-485.
^ Maccallum, R. C. (1990). "The need for alternative measures of fit in covariance structure modeling". Multivariate Behavioral Research, 25(2), 157-162.
^ {{Cudeck, R., & O'Dell, L. L. (1994). Applications of standard error estimates in unrestricted factor analysis: Significance tests for factor loadings and correlations. Psychological Bulletin, 115, 475-487. doi:10.1037/0033-2909.115.3.475.1994-32085-00110.1037/0033-2909.115.3.475}}
^ ^a ^b Fabrigar, L. R., & Petty, R. E. (1999). The role of the affective and cognitive bases of attitudes insusceptibility to affectively and cognitively based persuasion. Personality and Social Psychologybulletin, 25, 91-109.
^ Fabrigar, Leandre R. Exploratory factor analysis. Oxford: Oxford University Press. ISBN 9780199734177. {{cite book}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
^ Catell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, I, 245-276.
^ Kaiser, H. F. (1970). A second generation little jiffy. Psychometrika 1970; 35: 401-415.1972-07976-001
^ cite journal|last=Horn|first=John L.|title=A rationale and test for the number of factors in factor analysis|journal=Psychometrika|date=1 June 1965|volume=30|issue=2|pages=179–185|doi=10.1007/BF02289447
^ cite journal|last=Humphreys|first=L. G.|coauthors=Ilgen, D. R.|title=Note On a Criterion for the Number of Common Factors|journal=Educational and Psychological Measurement|date=1 October 1969|volume=29|issue=3|pages=571–578|doi=10.1177/001316446902900303
^ ^a ^b ^c Humphreys, L. G. & Montanelli, R. G., Jr. 1975. An investigation of the parallel analysis criterion for determining the number of common factors. Multivariate Behavioral Research, 10(2): 193-205.
^ Lawley, D. N. (1940). The estimation of factor loadings by the method of maximumlikelihood. Proceedings of the Royal Society ofedinborough, 60A, 64-82.
^ Hakstian, A. R., Rogers, W. T., & Cattell, R. B. (1982). The behavior of number-offactors rules with simulated data. Multivariate Behavioral Research, 17(2), 193-219
^ Harris, M. L. (1 October 1971). "A Factor Analytic Interpretation Strategy". Educational and Psychological Measurement. 31 (3): 589–606. doi:10.1177/001316447103100301. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
^ MacCallum, R. C. (1990). The need for alternative measures of fit in covariance structure modeling. Multivariate Behavioral Research, 25, 157-162.
^ ^a ^b Browne, M. W., & Cudeck, R. (1992). Alternative ways of assessing model fit. Sociological Methods and Research, 21, 230-258.
^ Cudeck, R., & Henly, S. J. (1991). Model selection in covariance structures analysis and the "problem" of sample size: A clarification. Psychological Bulletin, 109 (3), 512-519.1991-20270-00110.1037//0033-2909.109.3.512
^ Steiger, J. H. (1989). EzPATH: A supplementary module for SYSTAT andsygraph. Evanston, IL: SYSTAT

[Norris-1] Norris, Megan (17 July 2009). "Evaluating the Use of Exploratory Factor Analysis in Developmental Disability Psychological Research". Journal of Autism and Developmental Disorders. 40 (1): 8–20. doi:10.1007/s10803-009-0816-2. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)

[Fabrigar-2] Fabrigar, Leandre R. (1 January 1999). "Evaluating the use of exploratory factor analysis in psychological research". Psychological Methods. 4 (3): 272–299. doi:10.1037/1082-989X.4.3.272. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)

[Finch-3] Finch, J. F., & West, S. G. (1997). "The investigation of personality structure: Statistical models". Journal of Research in Personality, 31 (4), 439-485.

[4] Maccallum, R. C. (1990). "The need for alternative measures of fit in covariance structure modeling". Multivariate Behavioral Research, 25(2), 157-162.

[5] {{Cudeck, R., & O'Dell, L. L. (1994). Applications of standard error estimates in unrestricted factor analysis: Significance tests for factor loadings and correlations. Psychological Bulletin, 115, 475-487. doi:10.1037/0033-2909.115.3.475.1994-32085-00110.1037/0033-2909.115.3.475}}

[FabrigarPetty-6] Fabrigar, L. R., & Petty, R. E. (1999). The role of the affective and cognitive bases of attitudes insusceptibility to affectively and cognitively based persuasion. Personality and Social Psychologybulletin, 25, 91-109.

[7] Fabrigar, Leandre R. Exploratory factor analysis. Oxford: Oxford University Press. ISBN 9780199734177. {{cite book}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)

[8] Catell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, I, 245-276.

[9] Kaiser, H. F. (1970). A second generation little jiffy. Psychometrika 1970; 35: 401-415.1972-07976-001

[10] te journal|last=Horn|first=John L.|title=A rationale and test for the number of factors in factor analysis|journal=Psychometrika|date=1 June 1965|volume=30|issue=2|pages=179–185|doi=10.1007/BF02289447

[11] te journal|last=Humphreys|first=L. G.|coauthors=Ilgen, D. R.|title=Note On a Criterion for the Number of Common Factors|journal=Educational and Psychological Measurement|date=1 October 1969|volume=29|issue=3|pages=571–578|doi=10.1177/001316446902900303

[Humphreys-12] Humphreys, L. G. & Montanelli, R. G., Jr. 1975. An investigation of the parallel analysis criterion for determining the number of common factors. Multivariate Behavioral Research, 10(2): 193-205.

[13] Lawley, D. N. (1940). The estimation of factor loadings by the method of maximumlikelihood. Proceedings of the Royal Society ofedinborough, 60A, 64-82.

[14] Hakstian, A. R., Rogers, W. T., & Cattell, R. B. (1982). The behavior of number-offactors rules with simulated data. Multivariate Behavioral Research, 17(2), 193-219

[15] Harris, M. L. (1 October 1971). "A Factor Analytic Interpretation Strategy". Educational and Psychological Measurement. 31 (3): 589–606. doi:10.1177/001316447103100301. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)

[16] MacCallum, R. C. (1990). The need for alternative measures of fit in covariance structure modeling. Multivariate Behavioral Research, 25, 157-162.

[Browne-17] Browne, M. W., & Cudeck, R. (1992). Alternative ways of assessing model fit. Sociological Methods and Research, 21, 230-258.

[18] Cudeck, R., & Henly, S. J. (1991). Model selection in covariance structures analysis and the "problem" of sample size: A clarification. Psychological Bulletin, 109 (3), 512-519.1991-20270-00110.1037//0033-2909.109.3.512

[19] Steiger, J. H. (1989). EzPATH: A supplementary module for SYSTAT andsygraph. Evanston, IL: SYSTAT

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]