Content deleted Content added
m link dichotomous using Find link |
→Comparison to multiple linear regression: There was a typo stating that X_{ik} is the kth observation of the kth independent variable, while it should be the ith observation. |
||
(40 intermediate revisions by 28 users not shown) | |||
Line 1:
{{
{{Distinguish|text=[[Multiple linear regression]], [[Generalized linear model]] or [[General linear methods]]}} {{Regression bar}}
The '''general linear model''' or '''general multivariate regression model''' is a compact way of simultaneously writing several [[multiple linear regression]] models. In that sense it is not a separate statistical [[linear model]].
: <math>\mathbf{Y} = \mathbf{X}\mathbf{B} + \mathbf{U},</math>
where '''Y''' is a [[
The general linear model
Hypothesis tests with the general linear model can be made in two ways: [[multivariate statistics|multivariate]] or as several independent [[univariate]] tests. In multivariate tests the columns of '''Y''' are tested together, whereas in univariate tests the columns of '''Y''' are tested independently, i.e., as multiple univariate tests with the same design matrix.
== Comparison to multiple linear regression ==
{{
Multiple linear regression is a generalization of [[simple linear regression]] to the case of more than one independent variable, and a [[special case]] of general linear models, restricted to one dependent variable. The basic model for multiple linear regression is
:<math> Y_i = \beta_0 + \beta_1 X_{i1} + \beta_2 X_{i2} + \ldots + \beta_p X_{ip} + \epsilon_i</math> or more compactly <math>Y_i = \beta_0 + \sum \limits_{k=1}^{p} {\beta_k X_{ik}} + \epsilon_i</math>
for each observation ''i'' = 1, ... , ''n''.
In the formula above we consider ''n'' observations of one dependent variable and ''p'' independent variables. Thus, ''Y''<sub>''i''</sub> is the ''i''<sup>th</sup> observation of the dependent variable, ''X''<sub>''
In the more general multivariate linear regression, there is one equation of the above form for each of ''m'' > 1 dependent variables that share the same set of explanatory variables and hence are estimated simultaneously with each other:
:<math> Y_{ij} = \beta_{0j} + \beta_{1j} X_{i1} + \beta_{2j}X_{i2} + \ldots + \beta_{pj} X_{ip} + \epsilon_{ij}</math> or more compactly <math>Y_{ij} = \beta_{0j} + \sum \limits_{k=1}^{p} { \beta_{kj} X_{ik}} + \epsilon_{ij}</math>
for all observations indexed as ''i'' = 1, ... , ''n'' and for all dependent variables indexed as ''j = 1'', ''...'' , ''m''.
Note that, since each dependent variable has its own set of regression parameters to be fitted, from a computational point of view the general multivariate regression is simply a sequence of standard multiple linear regressions using the same explanatory variables.
== Comparison to generalized linear model ==
The general linear model
The main difference between the two approaches is that the
▲The general linear model (GLM)<ref>Neter, J., Kutner, M. H., Nachtsheim, C. J., & Wasserman, W. (1996). ''Applied linear statistical models'' (Vol. 4, p. 318). Chicago: Irwin.</ref><ref name=":1">Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences.</ref> and the [[Generalized linear model|generalized linear model (GLiM)]]<ref name=":0">{{Citation|last=McCullagh|first=P.|title=An outline of generalized linear models|date=1989|work=Generalized Linear Models|pages=21–47|publisher=Springer US|isbn=9780412317606|last2=Nelder|first2=J. A.|doi=10.1007/978-1-4899-3242-6_2}}</ref><ref>Fox, J. (2015). ''Applied regression analysis and generalized linear models''. Sage Publications.</ref> are two commonly used families of [[Statistics|statistical methods]] to relate some number of continuous and/or categorical [[Dependent and independent variables|predictors]] to a single [[Dependent and independent variables|outcome variable]].
The distribution of the residuals largely depends on the type and distribution of the outcome variable; different types of outcome variables lead to the variety of models within the
▲The main difference between the two approaches is that the GLM strictly assumes that the [[Errors and residuals|residuals]] will follow a [[Conditional probability distribution|conditionally]] [[normal distribution]]<ref name=":1" />, while the GLiM loosens this assumption and allows for a variety of other [[Distribution (mathematics)|distributions]] from the [[Exponential family|exponential]] family for the residuals<ref name=":0" />. Of note, the GLM is a special case of the GLiM in which the distribution of the residuals follow a conditionally normal distribution.
▲The distribution of the residuals largely depends on the type and distribution of the outcome variable; different types of outcome variables lead to the variety of models within the GLiM family. Commonly used models in the GLiM family include [[Logistic regression|binary logistic regression]]<ref>Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). ''Applied logistic regression'' (Vol. 398). John Wiley & Sons.</ref> for binary or [[dichotomy|dichotomous]] outcomes, [[Poisson regression]]<ref>Gardner, W., Mulvey, E. P., & Shaw, E. C. (1995). Regression analyses of counts and rates: Poisson, overdispersed Poisson, and negative binomial models. ''Psychological bulletin'', ''118''(3), 392.</ref> for count outcomes, and [[linear regression]] for continuous, normally distributed outcomes. This means that GLiM may be spoken of as a general family of statistical models or as specific models for specific outcome types.
{| class="wikitable"
!
!
![[Generalized linear model]]
|-
Line 45 ⟶ 47:
|Examples
|[[ANOVA]], [[ANCOVA]], [[linear regression]]
|[[linear regression]], [[logistic regression]], [[Poisson regression]], gamma regression,<ref name=":02">{{cite book |
|-
|Extensions and related methods
Line 53 ⟶ 55:
|[[R (programming language)|R]] package and function
|[https://stat.ethz.ch/R-manual/R-devel/library/stats/html/lm.html lm()] in stats package (base R)
|[https://stat.ethz.ch/R-manual/R-devel/library/stats/html/glm.html glm()] in stats package (base R) manova,
|-
|[[
|mvregress()
|glmfit()
|-
|[[SAS
|[https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#glm_toc.htm PROC GLM], [https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#reg_toc.htm PROC REG]
|[https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#genmod_toc.htm PROC GENMOD], [https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#logistic_toc.htm PROC LOGISTIC] (for binary & ordered or unordered categorical outcomes)
Line 73 ⟶ 75:
|[[Wolfram Language]] & [[Mathematica]] function
|LinearModelFit[]<ref>[http://reference.wolfram.com/language/ref/LinearModelFit.html LinearModelFit], Wolfram Language Documentation Center.</ref>
|GeneralizedLinearModelFit[]<ref>[http://reference.wolfram.com/language/ref/GeneralizedLinearModelFit.html GeneralizedLinearModelFit], Wolfram Language Documentation Center.</ref>
|-
Line 79 ⟶ 80:
|ls<ref>[http://www.eviews.com/help/helpintro.html#page/content%2Fcommandcmd-ls.html ls], EViews Help.</ref>
|glm<ref>[http://www.eviews.com/help/helpintro.html#page/content%2Fcommandcmd-glm.html glm], EViews Help.</ref>
|-
|statsmodels Python Package
|[https://www.statsmodels.org/dev/user-guide.html#regression-and-linear-models regression-and-linear-models]
|[https://www.statsmodels.org/dev/glm.html GLM]
|}
== Applications ==
An application of the general linear model appears in the analysis of multiple [[brain scan]]s in scientific experiments where
== See also ==
* [[Bayesian multivariate linear regression]]
* [[F-test]]
* [[t-test]]
== Notes ==
Line 91 ⟶ 98:
== References ==
* {{cite book |last1=Christensen |first1=Ronald |year=2020 |title=Plane Answers to Complex Questions: The Theory of Linear Models
* {{cite book |
▲|title=Plane Answers to Complex Questions: The Theory of Linear Models|last=Christensen|first=Ronald|___location=New York|publisher=Springer|year=2002| edition=Third|isbn=0-387-95361-2}}
* {{Cite
▲* {{cite book|last=Wichura|first=Michael J.|title=The coordinate-free approach to linear models|series=Cambridge Series in Statistical and Probabilistic Mathematics|publisher=Cambridge University Press|___location=Cambridge|year=2006|pages=xiv+199|isbn=978-0-521-86842-6|mr=2283455|ref=harv}}
▲* {{Cite journal | editor1-last = Rawlings | editor1-first = John O. | editor2-first = Sastry G. | editor3-first = David A. | doi = 10.1007/b98890 | title = Applied Regression Analysis | series = Springer Texts in Statistics | year = 1998 | isbn = 0-387-98454-2 | pmid = | pmc = | editor2-last = Pantula | editor3-last = Dickey }}
{{
[[Category:Regression models]]
|