General linear model: Difference between revisions

Content deleted Content added
m Corrected unclosed italics.
Comparison to multiple linear regression: There was a typo stating that X_{ik} is the kth observation of the kth independent variable, while it should be the ith observation.
 
(7 intermediate revisions by 4 users not shown)
Line 1:
{{Short description|Statistical linear model}}
{{distinguishDistinguish|text=[[Multiple linear regression]], [[Generalized linear model]] or [[General linear methods]]}}
{{Regression bar}}
The '''general linear model''' or '''general multivariate regression model''' is a compact way of simultaneously writing several [[multiple linear regression]] models. In that sense it is not a separate statistical [[linear model]]. The various multiple linear regression models may be compactly written as<ref name="MardiaK1979Multivariate">{{Cite book |last1=Mardia author |first1= [[K. V. |author1-link=Kanti Mardia]], |last2=Kent |first2=J. T. Kent and|last3=Bibby |first3=J. M. Bibby|year=1979 | title = Multivariate Analysis | publisher = [[Academic Press]] | year = 1979 | isbn = 0-12-471252-5}}</ref>
: <math>\mathbf{Y} = \mathbf{X}\mathbf{B} + \mathbf{U},</math>
 
where '''Y''' is a [[matrixMatrix (mathematics)|matrix]] with series of multivariate measurements (each column being a set of measurements on one of the [[dependent variable]]s), '''X''' is a matrix of observations on [[independent variable]]s that might be a [[design matrix]] (each column being a set of observations on one of the independent variables), '''B''' is a matrix containing parameters that are usually to be estimated and '''U''' is a matrix containing [[errorsErrors and residuals in statistics|errors]] (noise). The errors are usually assumed to be uncorrelated across measurements, and follow a [[multivariate normal distribution]]. If the errors do not follow a multivariate normal distribution, [[generalized linear model]]s may be used to relax assumptions about '''Y''' and '''U'''.
The errors are usually assumed to be uncorrelated across measurements, and follow a [[multivariate normal distribution]]. If the errors do not follow a multivariate normal distribution, [[generalized linear model]]s may be used to relax assumptions about '''Y''' and '''U'''.
 
 
The general linear model incorporates(GLM) aencompasses number of differentseveral statistical models:, including [[Analysis of variance|ANOVA]], [[Analysis of covariance|ANCOVA]], [[Multivariate analysis of variance|MANOVA]], [[Multivariate analysis of covariance|MANCOVA]], ordinary [[linear regression]]. Within this framework, both [[t-test|''t''-test]] and [[F-test|''F''-test]] can be applied. The general linear model is a generalization of multiple linear regression to the case of more than one dependent variable. If '''Y''', '''B''', and '''U''' were [[column vector]]s, the matrix equation above would represent multiple linear regression.
 
Hypothesis tests with the general linear model can be made in two ways: [[multivariate statistics|multivariate]] or as several independent [[univariate]] tests. In multivariate tests the columns of '''Y''' are tested together, whereas in univariate tests the columns of '''Y''' are tested independently, i.e., as multiple univariate tests with the same design matrix.
 
== Comparison to multiple linear regression ==
{{furtherFurther|Multiple linear regression}}
Multiple linear regression is a generalization of [[simple linear regression]] to the case of more than one independent variable, and a [[special case]] of general linear models, restricted to one dependent variable. The basic model for multiple linear regression is
 
Line 20:
for each observation ''i'' = 1, ... , ''n''.
 
In the formula above we consider ''n'' observations of one dependent variable and ''p'' independent variables. Thus, ''Y''<sub>''i''</sub> is the ''i''<sup>th</sup> observation of the dependent variable, ''X''<sub>''ik''</sub> is ''ki''<sup>th</sup> observation of the ''k''<sup>th</sup> independent variable, ''jk'' = 1, 2, ..., ''p''. The values ''ββk''<sub>''j''</sub> represent parameters to be estimated, and ''ε''<sub>''i''</sub> is the ''i''<sup>th</sup> independent identically distributed normal error.
 
In the more general multivariate linear regression, there is one equation of the above form for each of ''m'' > 1 dependent variables that share the same set of explanatory variables and hence are estimated simultaneously with each other:
Line 31:
 
== Comparison to generalized linear model ==
The general linear model and the [[Generalized linear model|generalized linear model]] (GLM)]]<ref name=":0">{{CitationCite book |last1=McCullagh |first1=P. |titleauthor1-link=Peter McCullagh |last2=Nelder |first2=J. A. |author2-link=John Nelder |date=January 1, 1983 |chapter=An outline of generalized linear models |date=1989|worktitle=Generalized Linear Models |pages=21–47 |publisher=Springer US |isbn=9780412317606|last2=Nelder|first2=J. A.|doi=10.1007/978-1-4899-3242-6_2 |doi-broken-date=13 December12 2024July 2025}}</ref><ref>Fox, J. (2015). ''Applied regression analysis and generalized linear models''. Sage Publications.</ref> are two commonly used families of [[Statistics|statistical methods]] to relate some number of continuous and/or categorical [[Dependent and independent variables|predictors]] to a single [[Dependent and independent variables|outcome variable]].
 
The main difference between the two approaches is that the general linear model strictly assumes that the [[Errors and residuals|residuals]] will follow a [[Conditional probability distribution|conditionally]] [[normal distribution]],<ref name=":1">{{cite report |last1=Cohen, |first1=J., |last2=Cohen, |first2=P., |last3=West, |first3=S. G., &|last4=Aiken [[Leona|first4=L. S. Aiken|Aiken, L.author4-link=Leona S.]] (Aiken |date=2003). |title=Applied multiple regression/correlation analysis for the behavioral sciences.}}</ref> while the GLM loosens this assumption and allows for a variety of other [[Distribution (mathematics)|distributions]] from the [[exponential family]] for the residuals.<ref name=":0" /> Of note, theThe general linear model is a special case of the GLM in which the distribution of the residuals follow a conditionally normal distribution.
The general linear model and the [[Generalized linear model|generalized linear model (GLM)]]<ref name=":0">{{Citation|last1=McCullagh|first1=P.|title=An outline of generalized linear models|date=1989|work=Generalized Linear Models|pages=21–47|publisher=Springer US|isbn=9780412317606|last2=Nelder|first2=J. A.|doi=10.1007/978-1-4899-3242-6_2|doi-broken-date=13 December 2024 }}</ref><ref>Fox, J. (2015). ''Applied regression analysis and generalized linear models''. Sage Publications.</ref> are two commonly used families of [[Statistics|statistical methods]] to relate some number of continuous and/or categorical [[Dependent and independent variables|predictors]] to a single [[Dependent and independent variables|outcome variable]].
 
The distribution of the residuals largely depends on the type and distribution of the outcome variable; different types of outcome variables lead to the variety of models within the GLM family. Commonly used models in the GLM family include [[Logistic regression|binary logistic regression]]<ref>Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). ''Applied logistic regression'' (Vol. 398). John Wiley & Sons.</ref> for binary or dichotomous outcomes, [[Poisson regression]]<ref>{{cite journal |last1=Gardner |first1=W. |last2=Mulvey |first2=E. P. |last3=Shaw |first3=E. C. |date=1995 |title=Regression analyses of counts and rates: Poisson, overdispersed Poisson, and negative binomial models. |journal=Psychological Bulletin |date=1995 |volume=118 |issue=3 |pages=392–404 |doi=10.1037/0033-2909.118.3.392 |pmid=7501743 }}</ref> for count outcomes, and [[linear regression]] for continuous, normally distributed outcomes. This means that GLM may be spoken of as a general family of statistical models or as specific models for specific outcome types.
The main difference between the two approaches is that the general linear model strictly assumes that the [[Errors and residuals|residuals]] will follow a [[Conditional probability distribution|conditionally]] [[normal distribution]],<ref name=":1">Cohen, J., Cohen, P., West, S. G., & [[Leona S. Aiken|Aiken, L. S.]] (2003). Applied multiple regression/correlation analysis for the behavioral sciences.</ref> while the GLM loosens this assumption and allows for a variety of other [[Distribution (mathematics)|distributions]] from the [[exponential family]] for the residuals.<ref name=":0" /> Of note, the general linear model is a special case of the GLM in which the distribution of the residuals follow a conditionally normal distribution.
 
The distribution of the residuals largely depends on the type and distribution of the outcome variable; different types of outcome variables lead to the variety of models within the GLM family. Commonly used models in the GLM family include [[Logistic regression|binary logistic regression]]<ref>Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). ''Applied logistic regression'' (Vol. 398). John Wiley & Sons.</ref> for binary or dichotomous outcomes, [[Poisson regression]]<ref>{{cite journal |last1=Gardner |first1=W. |last2=Mulvey |first2=E. P. |last3=Shaw |first3=E. C. |title=Regression analyses of counts and rates: Poisson, overdispersed Poisson, and negative binomial models. |journal=Psychological Bulletin |date=1995 |volume=118 |issue=3 |pages=392–404 |doi=10.1037/0033-2909.118.3.392|pmid=7501743 }}</ref> for count outcomes, and [[linear regression]] for continuous, normally distributed outcomes. This means that GLM may be spoken of as a general family of statistical models or as specific models for specific outcome types.
{| class="wikitable"
!
Line 48 ⟶ 47:
|Examples
|[[ANOVA]], [[ANCOVA]], [[linear regression]]
|[[linear regression]], [[logistic regression]], [[Poisson regression]], gamma regression,<ref name=":02">{{cite book |titlelast1=GeneralizedMcCullagh Linear|first1=Peter Models,|author1-link=Peter SecondMcCullagh Edition|lastlast2=McCullaghNelder |firstfirst2=PeterJohn |author2-link=John Nelder, John|year=1989 |title=Generalized Linear Models |edition=2nd |publisher=Boca Raton: Chapman and Hall/CRC|year=1989 |isbn=978-0-412-31760-6 |ref=McCullagh1989|author-link=Peter McCullagh|author-link2=John Nelder}}</ref> general linear model
|-
|Extensions and related methods
Line 56 ⟶ 55:
|[[R (programming language)|R]] package and function
|[https://stat.ethz.ch/R-manual/R-devel/library/stats/html/lm.html lm()] in stats package (base R)
|[https://stat.ethz.ch/R-manual/R-devel/library/stats/html/glm.html glm()] in stats package (base R) manova,
|-
|[[Matlab (programming language)|MatlabMATLAB]] function
|mvregress()
|glmfit()
|-
|[[SAS System(software)|SAS]] procedures
|[https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#glm_toc.htm PROC GLM], [https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#reg_toc.htm PROC REG]
|[https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#genmod_toc.htm PROC GENMOD], [https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#logistic_toc.htm PROC LOGISTIC] (for binary & ordered or unordered categorical outcomes)
Line 76 ⟶ 75:
|[[Wolfram Language]] & [[Mathematica]] function
|LinearModelFit[]<ref>[http://reference.wolfram.com/language/ref/LinearModelFit.html LinearModelFit], Wolfram Language Documentation Center.</ref>
 
|GeneralizedLinearModelFit[]<ref>[http://reference.wolfram.com/language/ref/GeneralizedLinearModelFit.html GeneralizedLinearModelFit], Wolfram Language Documentation Center.</ref>
 
|-
|[[EViews]] command
|ls<ref>[http://www.eviews.com/help/helpintro.html#page/content%2Fcommandcmd-ls.html ls], EViews Help.</ref>
|glm<ref>[http://www.eviews.com/help/helpintro.html#page/content%2Fcommandcmd-glm.html glm], EViews Help.</ref>
 
|-
|statsmodels Python Package
Line 91 ⟶ 87:
 
== Applications ==
An application of the general linear model appears in the analysis of multiple [[brain scan]]s in scientific experiments where {{var|Y}} contains data from brain scanners, {{var|X}} contains experimental design variables and confounds. It is usually tested in a univariate way (usually referred to a ''mass-univariate'' in this setting) and is often referred to as [[statistical parametric mapping]].<ref>{{Cite journal | doi last1=Friston 10.1002/hbm.460020402|author1first1=K.J. Friston|last2=Holmes |author2first2=A.P. Holmes|last3=Worsley |author3first3=K.J. Worsley|last4=Poline |author4first4=J.-B. Poline|last5=Frith |author5first5=C.D. Frith|last6=Frackowiak |author6first6=R.S.J. Frackowiak | year =1995 1995| title = Statistical Parametric Maps in functional imaging: A general linear approach | journal = Human Brain Mapping | volume =2 2| pages =189–210 189–210| issue=4 |doi=10.1002/hbm.460020402 4|s2cid=9898609 }}</ref>
 
== See also ==
* [[Bayesian multivariate linear regression]]
* [[F-test]]
* [[t-test]]
 
Line 102 ⟶ 98:
 
== References ==
* {{cite book |last1=Christensen |first1=Ronald |year=2020 |title=Plane Answers to Complex Questions: The Theory of Linear Models|last=Christensen|first=Ronald |___location=New York |publisher=Springer|year=2020| |edition=Fifth5th |isbn=978-3-030-32096-6}}
* {{cite book
* {{cite book |lastlast1=Wichura |firstfirst1=Michael J. |year=2006 |title=The coordinate-free approach to linear models |series=Cambridge Series in Statistical and Probabilistic Mathematics |publisher=Cambridge University Press |___location=Cambridge|year=2006 |pages=xiv+199 |isbn=978-0-521-86842-6 |mr=2283455}}
|title=Plane Answers to Complex Questions: The Theory of Linear Models|last=Christensen|first=Ronald|___location=New York|publisher=Springer|year=2020| edition=Fifth|isbn=978-3-030-32096-6}}
* {{Cite book | editor1-last = Rawlings | editor1-first = John O. |editor2-last=Pantula |editor2-first = Sastry G. |editor3-last=Dickey |editor3-first = David A. | doi year= 10.1007/b988901998 | title = Applied Regression Analysis | series = Springer Texts in Statistics | year = 1998 | isbn = 0-387-98454-2 | editor2-last doi= Pantula | editor3-last = Dickey 10.1007/b98890}}
* {{cite book|last=Wichura|first=Michael J.|title=The coordinate-free approach to linear models|series=Cambridge Series in Statistical and Probabilistic Mathematics|publisher=Cambridge University Press|___location=Cambridge|year=2006|pages=xiv+199|isbn=978-0-521-86842-6|mr=2283455}}
* {{Cite book | editor1-last = Rawlings | editor1-first = John O. | editor2-first = Sastry G. | editor3-first = David A. | doi = 10.1007/b98890 | title = Applied Regression Analysis | series = Springer Texts in Statistics | year = 1998 | isbn = 0-387-98454-2 | editor2-last = Pantula | editor3-last = Dickey }}
 
{{statisticsStatistics}}
 
[[Category:Regression models]]