Multivariate analysis of variance: Difference between revisions

Content deleted Content added
Sigasys (talk | contribs)
Removed a broken link (which advertised a paid statistics consulting service)
WikiCleanerBot (talk | contribs)
m v2.05b - Bot T18 CW#553 - Fix errors for CW project (<nowiki> tags)
 
(7 intermediate revisions by 7 users not shown)
Line 3:
In [[statistics]], '''multivariate analysis of variance''' ('''MANOVA''') is a procedure for comparing [[multivariate random variable|multivariate]] sample means. As a multivariate procedure, it is used when there are two or more [[dependent variables]],<ref name="Warne2014">{{cite journal |last=Warne |first=R. T. |year=2014 |title=A primer on multivariate analysis of variance (MANOVA) for behavioral scientists |journal=Practical Assessment, Research & Evaluation |volume=19 |issue=17 |pages=1–10 |url=https://scholarworks.umass.edu/pare/vol19/iss1/17/ }}</ref> and is often followed by significance tests involving individual dependent variables separately.<ref>Stevens, J. P. (2002). ''Applied multivariate statistics for the social sciences.'' Mahwah, NJ: Lawrence Erblaum.</ref>
 
Without relation to the image, the dependent variables may be k life satisfactions scores measured at sequential [[time pointspoint]]s and p job satisfaction scores measured at sequential time points. In this case there are k+p dependent variables whose [[linear combination]] follows a multivariate [[normal distribution]], multivariate variance-covariance matrix homogeneity, and linear relationship, no multicollinearity, and each without outliers.
 
== Model ==
Line 16:
 
Where [[Partition of sums of squares|sums of squares]] appear in univariate analysis of variance, in multivariate analysis of variance certain [[positive-definite matrix|positive-definite matrices]] appear. The diagonal entries are the same kinds of sums of squares that appear in univariate ANOVA. The off-diagonal entries are corresponding sums of products. Under normality assumptions about [[errors and residuals in statistics|error]] distributions, the counterpart of the sum of squares due to error has a [[Wishart distribution]].
 
 
== Hypothesis Testing ==
First, define the following <math display="inline">n\times q</math> matrices:
Line 27 ⟶ 25:
* <math display="inline">\bar Y</math>: where the <math display="inline">i</math>-th row is the best prediction given no information. That is the [[Sample mean and covariance|empirical mean]] over all <math display="inline">n</math> observations <math display="inline">\frac{1}{n}\sum_{k=1}^n y_k</math>
 
Then the matrix <math display="inline">S_{\text{model}} := (\hat Y - \bar Y)^T(\hat Y - \bar Y)</math> is a generalization of the sum of squares explained by the group, and <math display="inline">S_{\text{res}} := (Y - \hat Y)^T(Y - \hat Y)</math> is a generalization of the [[residual sum of squares]].<ref name="Anderson1994">{{cite book |last=Anderson |first=T. W. |title=An Introduction to Multivariate Statistical Analysis |year=1994 |publisher=Wiley}}</ref> <ref name="Krzanowski1988">{{cite book |last=Krzanowski |first=W. J. |title=Principles of Multivariate Analysis. A User's Perspective |year=1988 |publisher=Oxford University Press}}</ref>
Note that alternatively one could also speak about covariances when the abovementioned matrices are scaled by 1/(n-1) since the subsequent test statistics do not change by multiplying <math display="inline">S_{\text{model}}</math> and <math display="inline">S_{\text{res}}</math> by the same non-zero constant.
 
The most common<ref>{{cite web|lastname=Garson|first=G. David|title=Multivariate GLM, MANOVA, and MANCOVA|url=http://faculty.chass.ncsu.edu/garson/PA765/manova.htm|access-date=2011-03-22}}"Anderson1994"></ref><ref>{{cite web|last=UCLA: Academic Technology Services, Statistical Consulting Group.|title=Stata Annotated Output – MANOVA|url=httphttps://wwwstats.atsoarc.ucla.edu/stat/stata/output/Stata_MANOVA.htmmanova/|access-date=20112024-0302-2210}}</ref> statistics are summaries based on the roots (or eigenvalues) <math display="inline">\lambda_p</math> of the matrix <math display="inline">A:= S_{\text{model}}S_{\text{res}}^{-1}</math>
 
* [[Samuel Stanley Wilks]]' <math>\Lambda_\text{Wilks} = \prod_{1,\ldots,p}(1/(1 + \lambda_{p})) = \det(I + A)^{-1} = \det(S_\text{res})/\det(S_\text{res} + S_\text{model})</math> distributed as [[Wilks' lambda distribution|lambda]] (Λ)
* the [[K. C. Sreedharan Pillai]]–[[M. S. Bartlett]] [[trace of a matrix|trace]], <math>\Lambda_\text{Pillai} = \sum_{1,\ldots,p}(\lambda_p/(1 + \lambda_p)) = \operatorname{tr}(A(I + A)^{-1})</math><ref>{{cite web|url=http://www.real-statistics.com/multivariate-statistics/multivariate-analysis-of-variance-manova/manova-basic-concepts/|title=MANOVA Basic Concepts – Real Statistics Using Excel|website=www.real-statistics.com|access-date=5 April 2018}}</ref>
* the Lawley–[[Derrick Norman Lawley|Lawley]]–[[Harold Hotelling|Hotelling]] trace, <math>\Lambda_\text{LH} = \sum_{1,\ldots,p}(\lambda_{p}) = \operatorname{tr}(A)</math>
* [[Roy's greatest root]] (also called ''Roy's largest root''), <math>\Lambda_\text{Roy} = \max_p(\lambda_p) </math>
 
Line 54 ⟶ 52:
In the case of two groups, all the statistics are equivalent and the test reduces to [[Hotelling's T-square]].
 
== Introducing Covariatescovariates (MANCOVA) ==
{{main|Multivariate analysis of covariance}}
 
One can also test if there is a group effect after adjusting for covariates. For this, follow the procedure above but substitute <math display="inline">\hat Y</math> with the predictions of the [[general linear model]], containing the group and the covariates, and substitute <math display="inline">\bar Y</math> with the predictions of the general linear model containing only the covariates (and an intercept). Then <math display="inline">S_{\text{model}}</math> are the additional sum of squares explained by adding the grouping information and <math display="inline">S_{\text{res}}</math> is the residual sum of squares of the model containing the grouping and the covariates.<ref name="Krzanowski1988" />
 
Note that in case of unbalanced data, the order of adding the covariates mattermatters.
 
==Correlation of dependent variables==