General linear model: Difference between revisions

Content deleted Content added
m Cut needless carriage return whitespace characters in paragraph, sections, WP:LISTGAPs between WP:TABLE items: to standardize, aid work via small screens.
Comparison to multiple linear regression: There was a typo stating that X_{ik} is the kth observation of the kth independent variable, while it should be the ith observation.
 
(5 intermediate revisions by 3 users not shown)
Line 7:
where '''Y''' is a [[Matrix (mathematics)|matrix]] with series of multivariate measurements (each column being a set of measurements on one of the [[dependent variable]]s), '''X''' is a matrix of observations on [[independent variable]]s that might be a [[design matrix]] (each column being a set of observations on one of the independent variables), '''B''' is a matrix containing parameters that are usually to be estimated and '''U''' is a matrix containing [[Errors and residuals in statistics|errors]] (noise). The errors are usually assumed to be uncorrelated across measurements, and follow a [[multivariate normal distribution]]. If the errors do not follow a multivariate normal distribution, [[generalized linear model]]s may be used to relax assumptions about '''Y''' and '''U'''.
 
 
The general linear model incorporates(GLM) aencompasses number of differentseveral statistical models:, including [[Analysis of variance|ANOVA]], [[Analysis of covariance|ANCOVA]], [[Multivariate analysis of variance|MANOVA]], [[Multivariate analysis of covariance|MANCOVA]], ordinary [[linear regression]]. Within this framework, both [[t-test|''t''-test]] and [[F-test|''F''-test]] can be applied. The general linear model is a generalization of multiple linear regression to the case of more than one dependent variable. If '''Y''', '''B''', and '''U''' were [[column vector]]s, the matrix equation above would represent multiple linear regression.
 
Hypothesis tests with the general linear model can be made in two ways: [[multivariate statistics|multivariate]] or as several independent [[univariate]] tests. In multivariate tests the columns of '''Y''' are tested together, whereas in univariate tests the columns of '''Y''' are tested independently, i.e., as multiple univariate tests with the same design matrix.
Line 19 ⟶ 20:
for each observation ''i'' = 1, ... , ''n''.
 
In the formula above we consider ''n'' observations of one dependent variable and ''p'' independent variables. Thus, ''Y''<sub>''i''</sub> is the ''i''<sup>th</sup> observation of the dependent variable, ''X''<sub>''ik''</sub> is ''ki''<sup>th</sup> observation of the ''k''<sup>th</sup> independent variable, ''jk'' = 1, 2, ..., ''p''. The values ''ββk''<sub>''j''</sub> represent parameters to be estimated, and ''ε''<sub>''i''</sub> is the ''i''<sup>th</sup> independent identically distributed normal error.
 
In the more general multivariate linear regression, there is one equation of the above form for each of ''m'' > 1 dependent variables that share the same set of explanatory variables and hence are estimated simultaneously with each other:
Line 30 ⟶ 31:
 
== Comparison to generalized linear model ==
The general linear model and the [[generalized linear model]] (GLM)<ref name=":0">{{Cite book |last1=McCullagh |first1=P. |author1-link=Peter McCullagh |last2=Nelder |first2=J. A. |author2-link=John Nelder |date=January 1, 1983 |chapter=An outline of generalized linear models |title=Generalized Linear Models |pages=21–47 |publisher=Springer US |isbn=9780412317606 |doi=10.1007/978-1-4899-3242-6_2 |doi-broken-date=1312 DecemberJuly 20242025}}</ref><ref>Fox, J. (2015). ''Applied regression analysis and generalized linear models''. Sage Publications.</ref> are two commonly used families of [[Statistics|statistical methods]] to relate some number of continuous and/or categorical [[Dependent and independent variables|predictors]] to a single [[Dependent and independent variables|outcome variable]].
 
The main difference between the two approaches is that the general linear model strictly assumes that the [[Errors and residuals|residuals]] will follow a [[Conditional probability distribution|conditionally]] [[normal distribution]],<ref name=":1">{{cite report |last1=Cohen |first1=J. |last2=Cohen |first2=P. |last3=West |first3=S. G. |last4=Aiken |first4=L. S. |author4-link=Leona S. Aiken |date=2003 |title=Applied multiple regression/correlation analysis for the behavioral sciences}}</ref> while the GLM loosens this assumption and allows for a variety of other [[Distribution (mathematics)|distributions]] from the [[exponential family]] for the residuals.<ref name=":0"/> The general linear model is a special case of the GLM in which the distribution of the residuals follow a conditionally normal distribution.
Line 54 ⟶ 55:
|[[R (programming language)|R]] package and function
|[https://stat.ethz.ch/R-manual/R-devel/library/stats/html/lm.html lm()] in stats package (base R)
|[https://stat.ethz.ch/R-manual/R-devel/library/stats/html/glm.html glm()] in stats package (base R) manova,
|-
|[[MATLAB]] function