General linear model: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 02:17, 9 June 2019 edit Akrasia25 (talk \| contribs) Extended confirmed users, Pending changes reviewers 19,817 edits m link dichotomous using Find link ← Previous edit		Latest revision as of 16:28, 18 July 2025 edit undo 129.137.96.11 (talk) →Comparison to multiple linear regression: There was a typo stating that X_{ik} is the kth observation of the kth independent variable, while it should be the ith observation.
(40 intermediate revisions by 28 users not shown)
Line 1: {{~~distinguish~~Short description\|Statistical linear model}} {{Distinguish\|text=[[Multiple linear regression]], [[Generalized linear model]] or [[General linear methods]]}} {{Regression bar}} The '''general linear model''' or '''general multivariate regression model''' is a compact way of simultaneously writing several [[multiple linear regression]] models. In that sense it is not a separate statistical [[linear model]]. ItThe various multiple linear regression models may be compactly written as<ref name="MardiaK1979Multivariate">{{Cite book \|last1=Mardia ~~author~~ \|first1= [[K. V. \|author1-link=Kanti Mardia~~]],~~ \|last2=Kent \|first2=J. T. ~~Kent and~~\|last3=Bibby \|first3=J. M. ~~Bibby~~ \|year=1979 \|title = Multivariate Analysis \| publisher = [[Academic Press]] \| ~~year = 1979 \|~~ isbn = 0-12-471252-5}}</ref> : <math>\mathbf{Y} = \mathbf{X}\mathbf{B} + \mathbf{U},</math> where '''Y''' is a [[~~matrix~~Matrix (mathematics)\|matrix]] with series of multivariate measurements (each column being a set of measurements on one of the [[dependent variable]]s), '''X''' is a matrix of observations on [[independent variable]]s that might be a [[design matrix]] (each column being a set of observations on one of the independent variables), '''B''' is a matrix containing parameters that are usually to be estimated and '''U''' is a matrix containing [[~~errors~~Errors and residuals in statistics\|errors]] (noise). The errors are usually assumed to be uncorrelated across measurements, and follow a [[multivariate normal distribution]]. If the errors do not follow a multivariate normal distribution, [[generalized linear model]]s may be used to relax assumptions about '''Y''' and '''U'''. The errors are usually assumed to be uncorrelated across measurements, and follow a [[multivariate normal distribution]]. If the errors do not follow a multivariate normal distribution, [[generalized linear model]]s may be used to relax assumptions about '''Y''' and '''U'''. The general linear model ~~incorporates~~(GLM) aencompasses ~~number of different~~several statistical models:, including [[Analysis of variance\|ANOVA]], [[Analysis of covariance\|ANCOVA]], [[Multivariate analysis of variance\|MANOVA]], [[Multivariate analysis of covariance\|MANCOVA]], ordinary [[linear regression]]. Within this framework, both [[t-test\|''t''-test]] and [[F-test\|''F''-test]] can be applied. The general linear model is a generalization of multiple linear regression to the case of more than one dependent variable. If '''Y''', '''B''', and '''U''' were [[column vector]]s, the matrix equation above would represent multiple linear regression. Hypothesis tests with the general linear model can be made in two ways: [[multivariate statistics\|multivariate]] or as several independent [[univariate]] tests. In multivariate tests the columns of '''Y''' are tested together, whereas in univariate tests the columns of '''Y''' are tested independently, i.e., as multiple univariate tests with the same design matrix. == Comparison to multiple linear regression == {{~~further~~Further\|Multiple linear regression}} Multiple linear regression is a generalization of [[simple linear regression]] to the case of more than one independent variable, and a [[special case]] of general linear models, restricted to one dependent variable. The basic model for multiple linear regression is :<math> Y_i = \beta_0 + \beta_1 X_{i1} + \beta_2 X_{i2} + \ldots + \beta_p X_{ip} + \epsilon_i</math> or more compactly <math>Y_i = \beta_0 + \sum \limits_{k=1}^{p} {\beta_k X_{ik}} + \epsilon_i</math> for each observation ''i'' = 1, ... , ''n''. In the formula above we consider ''n'' observations of one dependent variable and ''p'' independent variables. Thus, ''Y''<sub>''i''</sub> is the ''i''<sup>th</sup> observation of the dependent variable, ''X''<sub>''ijik''</sub> is ''i''<sup>th</sup> observation of the ''jk''<sup>th</sup> independent variable, ''jk'' = 1, 2, ..., ''p''. The values ''ββk''~~<sub>''j''</sub>~~ represent parameters to be estimated, and ''ε''<sub>''i''</sub> is the ''i''<sup>th</sup> independent identically distributed normal error. In the more general multivariate linear regression, there is one equation of the above form for each of ''m'' > 1 dependent variables that share the same set of explanatory variables and hence are estimated simultaneously with each other: :<math> Y_{ij} = \beta_{0j} + \beta_{1j} X_{i1} + \beta_{2j}X_{i2} + \ldots + \beta_{pj} X_{ip} + \epsilon_{ij}</math> or more compactly <math>Y_{ij} = \beta_{0j} + \sum \limits_{k=1}^{p} { \beta_{kj} X_{ik}} + \epsilon_{ij}</math> for all observations indexed as ''i'' = 1, ... , ''n'' and for all dependent variables indexed as ''j = 1'', ''...'' , ''m''. Note that, since each dependent variable has its own set of regression parameters to be fitted, from a computational point of view the general multivariate regression is simply a sequence of standard multiple linear regressions using the same explanatory variables. == Comparison to generalized linear model == The general linear model ~~(GLM)<ref>Neter,~~and ~~J.,~~the ~~Kutner, M. H., Nachtsheim, C. J., & Wasserman, W. (1996). ''Applied~~[[generalized linear ~~statistical models''~~model]] (~~Vol. 4, p. 318~~GLM)~~. Chicago: Irwin.</ref>~~<ref name=":10">~~Cohen,~~{{Cite ~~J.,~~book ~~Cohen,~~\|last1=McCullagh \|first1=P., ~~West,~~\|author1-link=Peter S.McCullagh ~~G.,~~\|last2=Nelder ~~& Aiken, L~~\|first2=J. SA. ~~(2003).~~\|author2-link=John ~~Applied~~Nelder ~~multiple regression/correlation analysis for the behavioral sciences.</ref> and the [[Generalized linear model~~\|~~generalized~~date=January ~~linear~~1, ~~model~~1983 ~~(GLiM)]]<ref name=":0">{{Citation~~\|~~last=McCullagh\|first=P.\|title~~chapter=An outline of generalized linear models \|~~date=1989\|work~~title=Generalized Linear Models \|pages=21–47 \|publisher=Springer US \|isbn=9780412317606~~\|last2=Nelder\|first2=J.~~ A.\|doi=10.1007/978-1-4899-3242-6_2 \|doi-broken-date=12 July 2025}}</ref><ref>Fox, J. (2015). ''Applied regression analysis and generalized linear models''. Sage Publications.</ref> are two commonly used families of [[Statistics\|statistical methods]] to relate some number of continuous and/or categorical [[Dependent and independent variables\|predictors]] to a single [[Dependent and independent variables\|outcome variable]]. ▼ The main difference between the two approaches is that the ~~GLM~~general linear model strictly assumes that the [[Errors and residuals\|residuals]] will follow a [[Conditional probability distribution\|conditionally]] [[normal distribution]],<ref name=":1">{{cite report \|last1=Cohen \|first1=J. \|last2=Cohen \|first2=P. \|last3=West \|first3=S. G. \|last4=Aiken \|first4=L. S. \|author4-link=Leona S. Aiken \|date=2003 \|title=Applied multiple regression/correlation analysis for the behavioral sciences}}</ref>, while the ~~GLiM~~GLM loosens this assumption and allows for a variety of other [[Distribution (mathematics)\|distributions]] from the [[~~Exponential~~exponential family~~\|exponential~~]] ~~family~~ for the residuals.<ref name=":0" />. OfThe ~~note,~~general ~~the~~linear ~~GLM~~model is a special case of the ~~GLiM~~GLM in which the distribution of the residuals follow a conditionally normal distribution. ▼ ▲The general linear model (GLM)<ref>Neter, J., Kutner, M. H., Nachtsheim, C. J., & Wasserman, W. (1996). ''Applied linear statistical models'' (Vol. 4, p. 318). Chicago: Irwin.</ref><ref name=":1">Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences.</ref> and the [[Generalized linear model\|generalized linear model (GLiM)]]<ref name=":0">{{Citation\|last=McCullagh\|first=P.\|title=An outline of generalized linear models\|date=1989\|work=Generalized Linear Models\|pages=21–47\|publisher=Springer US\|isbn=9780412317606\|last2=Nelder\|first2=J. A.\|doi=10.1007/978-1-4899-3242-6_2}}</ref><ref>Fox, J. (2015). ''Applied regression analysis and generalized linear models''. Sage Publications.</ref> are two commonly used families of [[Statistics\|statistical methods]] to relate some number of continuous and/or categorical [[Dependent and independent variables\|predictors]] to a single [[Dependent and independent variables\|outcome variable]]. The distribution of the residuals largely depends on the type and distribution of the outcome variable; different types of outcome variables lead to the variety of models within the ~~GLiM~~GLM family. Commonly used models in the ~~GLiM~~GLM family include [[Logistic regression\|binary logistic regression]]<ref>Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). ''Applied logistic regression'' (Vol. 398). John Wiley & Sons.</ref> for binary or ~~[[dichotomy\|~~dichotomous]] outcomes, [[Poisson regression]]<ref>{{cite journal \|last1=Gardner, \|first1=W., \|last2=Mulvey, \|first2=E. P.~~, &~~ \|last3=Shaw, \|first3=E. C. (\|date=1995). \|title=Regression analyses of counts and rates: Poisson, overdispersed Poisson, and negative binomial models. ''\|journal=Psychological ~~bulletin'',~~Bulletin ''\|volume=118~~''(~~ \|issue=3), ~~392~~\|pages=392–404 \|doi=10.1037/0033-2909.118.3.392 \|pmid=7501743}}</ref> for count outcomes, and [[linear regression]] for continuous, normally distributed outcomes. This means that ~~GLiM~~GLM may be spoken of as a general family of statistical models or as specific models for specific outcome types. ▼ ▲The main difference between the two approaches is that the GLM strictly assumes that the [[Errors and residuals\|residuals]] will follow a [[Conditional probability distribution\|conditionally]] [[normal distribution]]<ref name=":1" />, while the GLiM loosens this assumption and allows for a variety of other [[Distribution (mathematics)\|distributions]] from the [[Exponential family\|exponential]] family for the residuals<ref name=":0" />. Of note, the GLM is a special case of the GLiM in which the distribution of the residuals follow a conditionally normal distribution. ▲The distribution of the residuals largely depends on the type and distribution of the outcome variable; different types of outcome variables lead to the variety of models within the GLiM family. Commonly used models in the GLiM family include [[Logistic regression\|binary logistic regression]]<ref>Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). ''Applied logistic regression'' (Vol. 398). John Wiley & Sons.</ref> for binary or [[dichotomy\|dichotomous]] outcomes, [[Poisson regression]]<ref>Gardner, W., Mulvey, E. P., & Shaw, E. C. (1995). Regression analyses of counts and rates: Poisson, overdispersed Poisson, and negative binomial models. ''Psychological bulletin'', ''118''(3), 392.</ref> for count outcomes, and [[linear regression]] for continuous, normally distributed outcomes. This means that GLiM may be spoken of as a general family of statistical models or as specific models for specific outcome types. {\| class="wikitable" ! ![[General linear model]] ![[Generalized linear model]] \|- Line 45 ⟶ 47: \|Examples \|[[ANOVA]], [[ANCOVA]], [[linear regression]] \|[[linear regression]], [[logistic regression]], [[Poisson regression]], gamma regression,<ref name=":02">{{cite book \|~~title~~last1=~~Generalized~~McCullagh ~~Linear~~\|first1=Peter ~~Models,~~\|author1-link=Peter ~~Second~~McCullagh ~~Edition~~\|~~last~~last2=~~McCullagh~~Nelder \|~~first~~first2=~~Peter~~John \|author2-link=John Nelder, ~~John~~\|year=1989 \|title=Generalized Linear Models \|edition=2nd \|publisher=Boca Raton: Chapman and Hall/CRC~~\|year=1989~~ \|isbn=978-0-412-31760-6 \|ref=McCullagh1989~~\|authorlink=Peter McCullagh\|authorlink2=John Nelder~~}}</ref> [[general linear model]] \|- \|Extensions and related methods Line 53 ⟶ 55: \|[[R (programming language)\|R]] package and function \|[https://stat.ethz.ch/R-manual/R-devel/library/stats/html/lm.html lm()] in stats package (base R) \|[https://stat.ethz.ch/R-manual/R-devel/library/stats/html/glm.html glm()] in stats package (base R) manova, \|- \|[[~~Matlab (programming language)\|Matlab~~MATLAB]] function \|mvregress() \|glmfit() \|- \|[[SAS ~~System~~(software)\|SAS]] procedures \|[https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#glm_toc.htm PROC GLM], [https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#reg_toc.htm PROC REG] \|[https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#genmod_toc.htm PROC GENMOD], [https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#logistic_toc.htm PROC LOGISTIC] (for binary & ordered or unordered categorical outcomes) Line 73 ⟶ 75: \|[[Wolfram Language]] & [[Mathematica]] function \|LinearModelFit[]<ref>[http://reference.wolfram.com/language/ref/LinearModelFit.html LinearModelFit], Wolfram Language Documentation Center.</ref> \|GeneralizedLinearModelFit[]<ref>[http://reference.wolfram.com/language/ref/GeneralizedLinearModelFit.html GeneralizedLinearModelFit], Wolfram Language Documentation Center.</ref> \|- Line 79 ⟶ 80: \|ls<ref>[http://www.eviews.com/help/helpintro.html#page/content%2Fcommandcmd-ls.html ls], EViews Help.</ref> \|glm<ref>[http://www.eviews.com/help/helpintro.html#page/content%2Fcommandcmd-glm.html glm], EViews Help.</ref> \|- \|statsmodels Python Package \|[https://www.statsmodels.org/dev/user-guide.html#regression-and-linear-models regression-and-linear-models] \|[https://www.statsmodels.org/dev/glm.html GLM] \|} == Applications == An application of the general linear model appears in the analysis of multiple [[brain scan]]s in scientific experiments where ~~'''~~{{var\|Y~~'''~~}} contains data from brain scanners, ~~'''~~{{var\|X~~'''~~}} contains experimental design variables and confounds. It is usually tested in a univariate way (usually referred to a ''mass-univariate'' in this setting) and is often referred to as [[statistical parametric mapping]].<ref>{{Cite journal \| ~~doi~~ last1=Friston ~~10.1002/hbm.460020402~~\|~~author1~~first1=K.J. ~~Friston~~\|last2=Holmes \|~~author2~~first2=A.P. ~~Holmes~~\|last3=Worsley \|~~author3~~first3=K.J. ~~Worsley~~\|last4=Poline \|~~author4~~first4=J.-B. ~~Poline~~\|last5=Frith \|~~author5~~first5=C.D. ~~Frith~~\|last6=Frackowiak \|~~author6~~first6=R.S.J. ~~Frackowiak~~ \| year = 1995 \| title = Statistical Parametric Maps in functional imaging: A general linear approach \| journal = Human Brain Mapping\| \|volume = 2\| \|pages = 189–210\| \|issue=4 \|doi=10.1002/hbm.460020402 4\|s2cid=9898609}}</ref> == See also == * [[Bayesian multivariate linear regression]] * [[F-test]] * [[t-test]] == Notes == Line 91 ⟶ 98: == References == * {{cite book \|last1=Christensen \|first1=Ronald \|year=2020 \|title=Plane Answers to Complex Questions: The Theory of Linear Models~~\|last=Christensen\|first=Ronald~~ \|___location=New York \|publisher=Springer~~\|year=2002\|~~ \|edition=~~Third~~5th \|isbn=0978-3-~~387~~030-~~95361~~32096-26}}▼ * {{cite book * {{cite book \|~~last~~last1=Wichura \|~~first~~first1=Michael J. \|year=2006 \|title=The coordinate-free approach to linear models \|series=Cambridge Series in Statistical and Probabilistic Mathematics \|publisher=Cambridge University Press \|___location=Cambridge~~\|year=2006~~ \|pages=xiv+199 \|isbn=978-0-521-86842-6 \|mr=2283455~~\|ref=harv~~}}▼ ▲\|title=Plane Answers to Complex Questions: The Theory of Linear Models\|last=Christensen\|first=Ronald\|___location=New York\|publisher=Springer\|year=2002\| edition=Third\|isbn=0-387-95361-2}} * {{Cite ~~journal~~book \| editor1-last = Rawlings \| editor1-first = John O. \|editor2-last=Pantula \|editor2-first = Sastry G. \|editor3-last=Dickey \|editor3-first = David A. \| ~~doi~~ year= ~~10.1007/b98890~~1998 \| title = Applied Regression Analysis \| series = Springer Texts in Statistics \| ~~year = 1998 \|~~ isbn = 0-387-98454-2 \| ~~pmid~~ doi= ~~\| pmc = \| editor2-last = Pantula \| editor3-last = Dickey~~ 10.1007/b98890}}▼ ▲* {{cite book\|last=Wichura\|first=Michael J.\|title=The coordinate-free approach to linear models\|series=Cambridge Series in Statistical and Probabilistic Mathematics\|publisher=Cambridge University Press\|___location=Cambridge\|year=2006\|pages=xiv+199\|isbn=978-0-521-86842-6\|mr=2283455\|ref=harv}} ▲* {{Cite journal \| editor1-last = Rawlings \| editor1-first = John O. \| editor2-first = Sastry G. \| editor3-first = David A. \| doi = 10.1007/b98890 \| title = Applied Regression Analysis \| series = Springer Texts in Statistics \| year = 1998 \| isbn = 0-387-98454-2 \| pmid = \| pmc = \| editor2-last = Pantula \| editor3-last = Dickey }} {{~~statistics~~Statistics}} [[Category:Regression models]]