Exploratory factor analysis: Difference between revisions

Content deleted Content added
Lododo (talk | contribs)
Lododo (talk | contribs)
No edit summary
Line 1:
{{context|date=April 2012}}
In [[multivariate statistics]], '''exploratory factor analysis''' (EFA) is a statistical method used to uncover the underlying structure of a relatively large set of variables. EFA is a technique within Factor Analysis whose overarching goal is to identify the underlying relationships between measured variables <ref name=Norris> {{cite journal|last=Norris|first=Megan|coauthors=Lecavalier, Luc|title=Evaluating the Use of Exploratory Factor Analysis in Developmental Disability Psychological Research|journal=Journal of Autism and Developmental Disorders|date=17 July 2009|volume=40|issue=1|pages=8–20|doi=10.1007/s10803-009-0816-2}}</ref> . It is commonly used by researchers when developing a scale{{clarify|reason=undefined technical term|date=April 2012}} and serves to identify a set of [[Latent variable|latent constructs]] underlying a battery of measured variables. <ref name=Fabrigar> {{cite journal|last=Fabrigar|first=Leandre R.|coauthors=Wegener, Duane T., MacCallum, Robert C., Strahan, Erin J.|title=Evaluating the use of exploratory factor analysis in psychological research.|journal=Psychological Methods|date=1 January 1999|volume=4|issue=3|pages=272–299|doi=10.1037/1082-989X.4.3.272}}</ref> It should be used when the researcher has no a priori hypothesis about factors or patterns of measured variables.<ref name=Finch>Finch, J. F., & West, S. G. (1997). "The investigation of personality structure: Statistical models". ''Journal of Research in Personality'', 31 (4), 439-485.</ref> ''Measured variables'' are any one of several attributes of people that may be observed and measured. An example of a measured variable would be one item on a scale. Researchers must carefully consider the number of measured variables to include in the analysis.<ref name =Fabrigar />) EFA procedures are more accurate when each factor is represented by multiple measured variables in the analysis. There should be at least 3 to 5 measured variables per factor.<ref>Maccallum, R. C. (1990). "The need for alternative measures of fit in covariance structure modeling". ''Multivariate Behavioral Research'', 25(2), 157-162.</ref>
 
EFA is based on the common factor model. Within the common factor model, measured variables are
expressed as a function of common factors, unique factors, and errors of measurement. Common factors influence two or more measured variables, while each unique factor influences only one measured variable and does not explain correlations among measured variables. <ref name =Norris/>)
 
An assumption of EFA is that any indicator/measured variable may be associated with any factor. When developing a scale, researchers should use EFA first before moving on to [[confirmatory factor analysis]] (CFA). EFA requires the researcher to make a number of important decisions about how to conduct the analysis because there is no one set method.
Line 14:
 
===Principal axis factoring (PAF)===
Called “principal” axis factoring because the first factor accounts for as much common variance as possible, then the second factor next most variance, and so on. PAF is a descriptive procedure so it is best to use when the focus is just on your sample and you do not plan to generalize the results beyond your sample. An advantage of PAF is that it can be used when the assumption of normality has been violated. <ref name =FabrigarPetty/>) Another advantage of PAF is that it is less likely than ML to produce improper solutions.<ref name =Finch />) A downside of PAF is that it provides a limited range of goodness-of-fit indexes compared to ML and does not allow for the computation of confidence intervals and significance tests.
 
==Selecting the appropriate number of factors==
Line 33:
 
===Kaiser criterion===
Compute the eigenvalues for the correlation matrix and determine how many of these eigenvalues are greater than 1. This number is the number of factors to include in the model. A disadvantage of this procedure is that it is quite arbitrary (e.g. an eigenvalue of 1.01 is included whereas an eigenvalue of .99 is not). This procedure often leads to overfactoring and sometimes underfactoring. Therefore, this procedure should not be used. <ref name =Fabrigar />)
 
===Model comparison===
Line 40:
There are different methods to assess model fit:{{cn|date=April 2012}}
 
*'''Likelihood ratio statistic:'''<ref>Lawley, D. N. (1940). The estimation of factor loadings by the method of maximumlikelihood. Proceedings of the Royal Society ofedinborough, 60A, 64-82.</ref> Used to test the null hypothesis that a model has perfect model fit. It should be applied to models with an increasing number of factors until the result is nonsignificant, indicating that the model is not rejected as good model fit of the population. This statistic should be used with a large sample size and normally distributed data. There are some drawbacks to the likelihood ratio test. First, when there is a large sample size, even small discrepancies between the model and the data result in model rejection.<ref>Hakstian, A. R., Rogers, W. T., & Cattell, R. B. (1982). The behavior of number-offactors rules with simulated data. Multivariate Behavioral Research, 17(2), 193-219</ref> <ref name =Humphreys/> <ref>{{cite journal|last=Harris|first=M. L.|coauthors=Harris, C. W.|title=A Factor Analytic Interpretation Strategy|journal=Educational and Psychological Measurement|date=1 October 1971|volume=31|issue=3|pages=589–606|doi=10.1177/001316447103100301}}</ref> When there is a small sample size, even large discrepancies between the model and data may not be significant, which leads to underfactoring. <ref name =Humphreys/>) Another disadvantage of the likelihood ratio test is that the null hypothesis of perfect fit is an unrealistic standard.<ref>MacCallum, R. C. (1990). The need for alternative measures of fit in covariance structure modeling. Multivariate Behavioral Research, 25, 157-162.</ref> <ref name=Browne>Browne, M. W., & Cudeck, R. (1992). Alternative ways of assessing model fit. Sociological Methods and Research, 21, 230-258.</ref> <ref>Cudeck, R., & Henly, S. J. (1991). Model selection in covariance structures analysis and the "problem" of sample size: A clarification. Psychological Bulletin, 109 (3), 512-519.1991-20270-00110.1037//0033-2909.109.3.512</ref>
 
*'''Root mean square error of approximation (RMSEA) fit index:''' RMSEA is an estimate of the discrepancy between the model and the data per degree of freedom for the model. Values less that .05 constitute good fit, values between 0.05 and 0.08 constitute acceptable fit, a values between 0.08 and 0.10 constitute marginal fit and values greater than 0.10 indicate poor fit .<ref name =Browne/> <ref>Steiger, J. H. (1989). EzPATH: A supplementary module for SYSTAT andsygraph. Evanston, IL: SYSTAT</ref> An advantage of the RMSEA fit index is that it provides confidence intervals which allow researchers to compare a series of models with varying numbers of factors.