Structural equation modeling: Difference between revisions

Content deleted Content added
Altered template type. Add: isbn, date, title, authors 1-2. | Use this tool. Report bugs. | #UCB_Gadget
m ce
Line 17:
== History ==
 
Structural equation modeling (SEM) began differentiating itself from correlation and regression when [[Sewall Wright]] provided explicit causal interpretations for a set of regression-style equations based on a solid understanding of the physical and physiological mechanisms producing direct and indirect effects among his observed variables.<ref name="Wright21">{{cite journal |last1=Wright |first1=Sewall |date=1921 |title=Correlation and causation |journal=Journal of Agricultural Research |volume=20 |pages=557–585 }}</ref><ref name="Wright34">{{cite journal | doi=10.1214/aoms/1177732676 | title=The Method of Path Coefficients | date=1934 | last1=Wright | first1=Sewall | journal=The Annals of Mathematical Statistics | volume=5 | issue=3 | pages=161–215 }}</ref><ref name="Wolfle99">{{cite journal |last1=Wolfle |first1=Lee M. |title=Sewall wright on the method of path coefficients: An annotated bibliography |journal=Structural Equation Modeling: A Multidisciplinary Journal |date=January 1999 |volume=6 |issue=3 |pages=280–291 |doi=10.1080/10705519909540134 }}</ref> The equations were estimated like ordinary regression equations but the substantive context for the measured variables permitted clear causal, not merely predictive, understandings. O. D. Duncan introduced SEM to the social sciences in his 1975 book,<ref name="Duncan75">Duncan, Otis Dudley. (1975). Introduction to Structural Equation Models. New York: Academic Press. ISBN 0-12-224150-9.{{pn|date=June 2025}}</ref> and SEM blossomed in the late 1970's and 1980's when increasing computing power permitted practical model estimation. In 1987 Hayduk<ref name="Hayduk87"/> provided the first book-length introduction to structural equation modeling with latent variables, and this was soon followed by Bollen's popular text (1989).<ref name="Bollen89">Bollen, K. (1989). Structural Equations with Latent Variables. New York, Wiley. ISBN 0-471-01171-1.{{pn|date=June 2025}}</ref>
 
Different yet mathematically related modeling approaches developed in psychology, sociology, and economics. Early [[Cowles Foundation|Cowles Commission]] work on [[Simultaneous equations model|simultaneous equations]] estimation centered on Koopman and Hood's (1953) algorithms from [[transport economics]] and optimal routing, with [[maximum likelihood estimation]], and closed form algebraic calculations, as iterative solution search techniques were limited in the days before computers. The convergence of two of these developmental streams (factor analysis from psychology, and path analysis from sociology via Duncan) produced the current core of SEM. One of several programs Karl Jöreskog developed at Educational Testing Services, LISREL<ref name="JGvT70">Jöreskog, Karl; Gruvaeus, Gunnar T.; van Thillo, Marielle. (1970) ACOVS: A General Computer Program for Analysis of Covariance Structures. Princeton, N.J.; Educational Testing Services.{{pn|date=June 2025}}</ref><ref name=":0">{{cite journal |last1=Jőreskog |first1=Karl G. |last2=van Thiilo |first2=Marielle |title=Lisrel a General Computer Program for Estimating a Linear Structural Equation System Involving Multiple Indicators of Unmeasured Variables |journal=ETS Research Bulletin Series |date=December 1972 |volume=1972 |issue=2 |id={{ERIC|ED073122}} |doi=10.1002/j.2333-8504.1972.tb00827.x }}</ref><ref name="JS76">Jöreskog, Karl; Sorbom, Dag. (1976) LISREL III: Estimation of Linear Structural Equation Systems by Maximum Likelihood Methods. Chicago: National Educational Resources, Inc.{{pn|date=June 2025}}</ref> embedded latent variables (which psychologists knew as the latent factors from factor analysis) within path-analysis-style equations (which sociologists inherited from Wright and Duncan). The factor-structured portion of the model incorporated measurement errors which permitted measurement-error-adjustment, though not necessarily error-free estimation, of effects connecting different postulated latent variables.
 
Traces of the historical convergence of the factor analytic and path analytic traditions persist as the distinction between the measurement and structural portions of models; and as continuing disagreements over model testing, and whether measurement should precede or accompany structural estimates.<ref name="HG00a">{{cite journal |last1=Hayduk |first1=Leslie A. |last2=Glaser |first2=Dale N. |title=Jiving the Four-Step, Waltzing Around Factor Analysis, and Other Serious Fun |journal=Structural Equation Modeling: A Multidisciplinary Journal |date=January 2000 |volume=7 |issue=1 |pages=1–35 |doi=10.1207/s15328007sem0701_01 }}</ref><ref name="HG00b">{{cite journal |last1=Hayduk |first1=Leslie A. |last2=Glaser |first2=Dale N. |title=Doing the Four-Step, Right-2-3, Wrong-2-3: A Brief Reply to Mulaik and Millsap; Bollen; Bentler; and Herting and Costner |journal=Structural Equation Modeling: A Multidisciplinary Journal |date=January 2000 |volume=7 |issue=1 |pages=111–123 |doi=10.1207/S15328007SEM0701_06 }}</ref> Viewing factor analysis as a data-reduction technique deemphasizes testing, which contrasts with path analytic appreciation for testing postulated causal connections – where the test result might signal model misspecification. The friction between factor analytic and path analytic traditions continue to surface in the literature.
 
Wright's path analysis influenced Hermann Wold, Wold's student Karl Jöreskog, and Jöreskog's student Claes Fornell, but SEM never gained a large following among U.S. econometricians, possibly due to fundamental differences in modeling objectives and typical data structures. The prolonged separation of SEM's economic branch led to procedural and terminological differences, though deep mathematical and statistical connections remain.<ref name="Westland15">{{cite book |doi=10.1007/978-3-030-12508-0 |title=Structural Equation Models |series=Studies in Systems, Decision and Control |date=2019 |volume=22 |isbn=978-3-030-12507-3 }}{{pn|date=June 2025}}</ref><ref>{{cite journal |last1=Christ |first1=Carl F. |title=The Cowles Commission's Contributions to Econometrics at Chicago, 1939-1955 |journal=Journal of Economic Literature |date=1994 |volume=32 |issue=1 |pages=30–59 |jstor=2728422 }}</ref> Disciplinary differences in approaches can be seen in SEMNET discussions of endogeneity, and in discussions on causality via directed acyclic graphs (DAGs).<ref name="Pearl09"/> Discussions comparing and contrasting various SEM approaches are available<ref name="Imbens20">{{cite journal |last1=Imbens |first1=Guido W. |title=Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics |journal=Journal of Economic Literature |date=December 2020 |volume=58 |issue=4 |pages=1129–1179 |doi=10.1257/jel.20191597 }}</ref><ref name="BP13">{{cite book | doi=10.1007/978-94-007-6094-3_15 | chapter=Eight Myths About Causality and Structural Equation Models | title=Handbook of Causal Analysis for Social Research | series=Handbooks of Sociology and Social Research | date=2013 | last1=Bollen | first1=Kenneth A. | last2=Pearl | first2=Judea | pages=301–328 | isbn=978-94-007-6093-6 }}</ref> highlighting disciplinary differences in data structures and the concerns motivating economic models.
Line 194:
=== Interpretation ===
 
Causal interpretations of SE models are the clearest and most understandable but those interpretations will be fallacious/wrong if the model’s structure does not correspond to the world’s causal structure. Consequently, interpretation should address the overall status and structure of the model, not merely the model’s estimated coefficients. Whether a model fits the data, and/or how a model came to fit the data, are paramount for interpretation. Data fit obtained by exploring, or by following successive modification indices, does not guarantee the model is wrong but raises serious doubts because these approaches are prone to incorrectly modeling data features. For example, exploring to see how many factors are required preempts finding the data are not factor structured, especially if the factor model has been “persuaded” to fit via inclusion of measurement error covariances. Data’s ability to speak against a postulated model is progressively eroded with each unwarranted inclusion of a “modification index suggested” effect or error covariance. It becomes exceedingly difficult to recover a proper model if the initial/base model contains several misspecifications.<ref name="HC00">{{cite journal |last1=Herting |first1=Jerald R. |last2=Costner |first2=Herbert L. |title=Another Perspective on 'The Proper Number of Factors' and the Appropriate Number of Steps |journal=Structural Equation Modeling: A Multidisciplinary Journal |date=January 2000 |volume=7 |issue=1 |pages=92–110 |doi=10.1207/S15328007SEM0701_05 }}</ref>
 
Direct-effect estimates are interpreted in parallel to the interpretation of coefficients in regression equations but with causal commitment. Each unit increase in a causal variable’s value is viewed as producing a change of the estimated magnitude in the dependent variable’s value given control or adjustment for all the other operative/modeled causal mechanisms. Indirect effects are interpreted similarly, with the magnitude of a specific indirect effect equaling the product of the series of direct effects comprising that indirect effect. The units involved are the real scales of observed variables’ values, and the assigned scale values for latent variables. A specified/fixed 1.0 effect of a latent on a specific indicator coordinates that indicator’s scale with the latent variable’s scale. The presumption that the remainder of the model remains constant or unchanging may require discounting indirect effects that might, in the real world, be simultaneously prompted by a real unit increase. And the unit increase itself might be inconsistent with what is possible in the real world because there may be no known way to change the causal variable’s value. If a model adjusts for measurement errors, the adjustment permits interpreting latent-level effects as referring to variations in true scores.<ref name="BMvH03"/>
Line 202:
SE model interpretation should connect specific model causal segments to their variance and covariance implications. A single direct effect reports that the variance in the independent variable produces a specific amount of variation in the dependent variable’s values, but the causal details of precisely what makes this happens remains unspecified because a single effect coefficient does not contain sub-components available for integration into a structured story of how that effect arises. A more fine-grained SE model incorporating variables intervening between the cause and effect would be required to provide features constituting a story about how any one effect functions. Until such a model arrives each estimated direct effect retains a tinge of the unknown, thereby invoking the essence of a theory. A parallel essential unknownness would accompany each estimated coefficient in even the more fine-grained model, so the sense of fundamental mystery is never fully eradicated from SE models.
Even if each modeled effect is unknown beyond the identity of the variables involved and the estimated magnitude of the effect, the structures linking multiple modeled effects provide opportunities to express how things function to coordinate the observed variables – thereby providing useful interpretation possibilities. For example, a common cause contributes to the covariance or correlation between two effected variables, because if the value of the cause goes up, the values of both effects should also go up (assuming positive effects) even if we do not know the full story underlying each cause.<ref name="Duncan75"/> (A correlation is the covariance between two variables that have both been standardized to have variance 1.0). Another interpretive contribution might be made by expressing how two causal variables can both explain variance in a dependent variable, as well as how covariance between two such causes can increase or decrease explained variance in the dependent variable. That is, interpretation may involve explaining how a pattern of effects and covariances can contribute to decreasing a dependent variable’s variance.<ref name="Hayduk87p20">Hayduk, L. (1987) Structural Equation Modeling with LISREL: Essentials and Advances, page 20. Baltimore, Johns Hopkins University Press. ISBN 0-8018-3478-3 Page 20</ref> Understanding causal implications implicitly connects to understanding “controlling”, and potentially explaining why some variables, but not others, should be controlled.<ref name="Pearl09"/><ref name="HCSNGDGP-R03">{{cite journal |last1=Hayduk |first1=Leslie |last2=Cummings |first2=Greta |last3=Stratkotter |first3=Rainer |last4=Nimmo |first4=Melanie |last5=Grygoryev |first5=Kostyantyn |last6=Dosman |first6=Donna |last7=Gillespie |first7=Michael |last8=Pazderka-Robinson |first8=Hannah |last9=Boadu |first9=Kwame |title=Pearl's D-Separation: One More Step Into Causal Thinking |journal=Structural Equation Modeling: A Multidisciplinary Journal |date=April 2003 |volume=10 |issue=2 |pages=289–311 |doi=10.1207/S15328007SEM1002_8 }}</ref> As models become more complex these fundamental components can combine in non-intuitive ways, such as explaining how there can be no correlation (zero covariance) between two variables despite the variables being connected by a direct non-zero causal effect.<ref name="Duncan75"/><ref name="Bollen89"/><ref name="Hayduk87"/><ref name="Hayduk96"/>
 
The statistical insignificance of an effect estimate indicates the estimate could rather easily arise as a random sampling variation around a null/zero effect, so interpreting the estimate as a real effect becomes equivocal. As in regression, the proportion of each dependent variable’s variance explained by variations in the modeled causes are provided by ''R''<sup>2</sup>, though the Blocked-Error ''R''<sup>2</sup> should be used if the dependent variable is involved in reciprocal or looped effects, or if it has an error variable correlated with any predictor’s error variable.<ref name="Hayduk06">{{cite journal |last1=Hayduk |first1=Leslie A. |title=Blocked-Error-R 2: A Conceptually Improved Definition of the Proportion of Explained Variance in Models Containing Loops or Correlated Residuals |journal=Quality & Quantity |date=August 2006 |volume=40 |issue=4 |pages=629–649 |doi=10.1007/s11135-005-1095-4 }}</ref>
Line 251:
* Random intercepts models {{citation needed|date=July 2023}}
* Structural Equation Model Trees {{citation needed|date=July 2023}}
* Structural Equation [[Multidimensional scaling]]<ref>{{cite journal |last1=Vera |first1=José Fernando |last2=Mair |first2=Patrick |title=SEMDS: An R Package for Structural Equation Multidimensional Scaling |journal=Structural Equation Modeling: A Multidisciplinary Journal |date=3 September 2019 |volume=26 |issue=5 |pages=803–818 |doi=10.1080/10705511.2018.1561292 }}</ref>
 
== Software ==