Content deleted Content added
m v2.05b - Bot T5 CW#90 - Fix errors for CW project (Internal link written as an external link - Link equal to linktext) |
m v2.05b - Bot T18 CW#553 - Fix errors for CW project (<nowiki> tags) |
||
(15 intermediate revisions by 11 users not shown) | |||
Line 3:
In [[statistics]], '''multivariate analysis of variance''' ('''MANOVA''') is a procedure for comparing [[multivariate random variable|multivariate]] sample means. As a multivariate procedure, it is used when there are two or more [[dependent variables]],<ref name="Warne2014">{{cite journal |last=Warne |first=R. T. |year=2014 |title=A primer on multivariate analysis of variance (MANOVA) for behavioral scientists |journal=Practical Assessment, Research & Evaluation |volume=19 |issue=17 |pages=1–10 |url=https://scholarworks.umass.edu/pare/vol19/iss1/17/ }}</ref> and is often followed by significance tests involving individual dependent variables separately.<ref>Stevens, J. P. (2002). ''Applied multivariate statistics for the social sciences.'' Mahwah, NJ: Lawrence Erblaum.</ref>
Without relation to the image, the dependent variables may be k life satisfactions scores measured at sequential [[time
== Model ==
Assume <math display="inline">n</math> <math display="inline">q</math>-dimensional observations, where the <math display="inline">i</math>’th observation <math display="inline">y_i</math> is assigned to the group <math display="inline">g(i)\in \{1,\dots,m\}</math> and is distributed around the group center <math display="inline">\mu^{(g(i))}\in \mathbb R^q</math> with [[Multivariate normal distribution|
y_i = \mu^{(g(i))} + \varepsilon_i\quad \varepsilon_i \overset{\text{i.i.d.}}{\sim} \mathcal N_q (0, \Sigma) \quad \text{ for } i=1,\dots, n
</math> where <math display="inline">\Sigma</math> is the [[covariance matrix]]. Then we formulate our [[null hypothesis]] as
<math display="block">H_0\!:\;\mu^{(1)}=\mu^{(2)}=\dots =\mu^{(m)}.</math>
==Relationship with ANOVA==
Line 16:
Where [[Partition of sums of squares|sums of squares]] appear in univariate analysis of variance, in multivariate analysis of variance certain [[positive-definite matrix|positive-definite matrices]] appear. The diagonal entries are the same kinds of sums of squares that appear in univariate ANOVA. The off-diagonal entries are corresponding sums of products. Under normality assumptions about [[errors and residuals in statistics|error]] distributions, the counterpart of the sum of squares due to error has a [[Wishart distribution]].
== Hypothesis Testing ==
First, define the following <math display="inline">n\times q</math> matrices:
* <math display="inline">Y</math>: where the <math display="inline">i</math>-th row is equal to <math display="inline">y_i</math
* <math display="inline">\hat Y</math>: where the <math display="inline">i</math>-th row is the best prediction given the group membership <math display="inline">g(i)</math>. That is the mean over all observation in group <math display="inline">g(i)</math>: <math display="inline">\frac{1}{\text{size of group }g(i)}\sum_{k: g(k)=g(i)}y_k</math>.
* <math display="inline">\bar Y</math>: where the <math display="inline">i</math>-th row is the best prediction given no information. That is the [[Sample mean and covariance|empirical mean]] over all <math display="inline">n</math> observations <math display="inline">\frac{1}{n}\sum_{k=1}^n y_k</math>
Then the matrix <math display="inline">S_{\text{model}} := (\hat Y - \bar Y)^T(\hat Y - \bar Y)</math> is a generalization of the sum of squares explained by the group, and <math display="inline">S_{\text{res}} := (Y - \hat Y)^T(Y - \hat Y)</math> is a generalization of the [[residual sum of squares]].<ref name="Anderson1994">{{cite book |last=Anderson |first=T. W. |title=An Introduction to Multivariate Statistical Analysis |year=1994 |publisher=Wiley}}</ref>
Note that alternatively one could also speak about covariances when the abovementioned matrices are scaled by 1/(n-1) since the subsequent test statistics do not change by multiplying <math display="inline">S_{\text{model}}</math> and <math display="inline">S_{\text{res}}</math> by the same non-zero constant.
The most common<ref
* [[Samuel Stanley Wilks]]' <math>\Lambda_\text{Wilks} = \prod_{1,\ldots,p}(1/(1 + \lambda_{p})) = \det(I + A)^{-1} = \det(S_\text{res})/\det(S_\text{res} + S_\text{model})</math> distributed as [[Wilks' lambda distribution|lambda]] (Λ)
* the [[K. C. Sreedharan Pillai]]–[[M. S. Bartlett]] [[trace of a matrix|trace]], <math>\Lambda_\text{Pillai} = \sum_{1,\ldots,p}(\lambda_p/(1 + \lambda_p)) = \operatorname{tr}(A(I + A)^{-1})</math><ref>{{cite web|url=http://www.real-statistics.com/multivariate-statistics/multivariate-analysis-of-variance-manova/manova-basic-concepts/|title=MANOVA Basic Concepts – Real Statistics Using Excel|website=www.real-statistics.com|access-date=5 April 2018}}</ref>
* the
* [[Roy's greatest root]] (also called ''Roy's largest root''), <math>\Lambda_\text{Roy} = \max_p(\lambda_p) </math>
Discussion continues over the merits of each,<ref name="Warne2014" /> although the greatest root leads only to a bound on significance which is not generally of practical interest. A further complication is that, except for the Roy's greatest root, the distribution of these statistics under the [[null hypothesis]] is not straightforward and can only be approximated except in a few low-dimensional cases.
An algorithm for the distribution of the Roy's largest root under the [[null hypothesis]] was derived in <ref>{{Citation
|last=Chiani | first=M.
Line 54 ⟶ 52:
In the case of two groups, all the statistics are equivalent and the test reduces to [[Hotelling's T-square]].
== Introducing
{{main|Multivariate analysis of covariance}}
One can also test if there is a group effect after adjusting for covariates. For this, follow the procedure above but substitute <math display="inline">\hat Y</math> with the predictions of the [[general linear model]], containing the group and the covariates, and substitute <math display="inline">\bar Y</math> with the predictions of the general linear model containing only the covariates (and an intercept). Then <math display="inline">S_{\text{model}}</math> are the additional sum of squares explained by adding the grouping information and <math display="inline">S_{\text{res}}</math> is the residual sum of squares of the model containing the grouping and the covariates.<ref name="Krzanowski1988" />
Note that in case of unbalanced data, the order of adding the covariates
==Correlation of dependent variables==
Line 78 ⟶ 76:
==External links==
{{wikiversity}}
{{Statistics}}
{{Experimental design}}
|