Multivariate analysis of variance: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 01:55, 8 August 2023 edit WikiCleanerBot (talk \| contribs) Bots 1,007,735 edits m v2.05b - Bot T5 CW#90 - Fix errors for CW project (Internal link written as an external link - Link equal to linktext) Tag: WPCleaner ← Previous edit		Latest revision as of 23:33, 23 June 2025 edit undo WikiCleanerBot (talk \| contribs) Bots 1,007,735 edits m v2.05b - Bot T18 CW#553 - Fix errors for CW project (<nowiki> tags) Tag: WPCleaner
(15 intermediate revisions by 11 users not shown)
Line 3: In [[statistics]], '''multivariate analysis of variance''' ('''MANOVA''') is a procedure for comparing [[multivariate random variable\|multivariate]] sample means. As a multivariate procedure, it is used when there are two or more [[dependent variables]],<ref name="Warne2014">{{cite journal \|last=Warne \|first=R. T. \|year=2014 \|title=A primer on multivariate analysis of variance (MANOVA) for behavioral scientists \|journal=Practical Assessment, Research & Evaluation \|volume=19 \|issue=17 \|pages=1–10 \|url=https://scholarworks.umass.edu/pare/vol19/iss1/17/ }}</ref> and is often followed by significance tests involving individual dependent variables separately.<ref>Stevens, J. P. (2002). ''Applied multivariate statistics for the social sciences.'' Mahwah, NJ: Lawrence Erblaum.</ref> Without relation to the image, the dependent variables may be k life satisfactions scores measured at sequential [[time ~~points~~point]]s and p job satisfaction scores measured at sequential time points. In this case there are k+p dependent variables whose [[linear combination]] follows a multivariate [[normal distribution]], multivariate variance-covariance matrix homogeneity, and linear relationship, no multicollinearity, and each without outliers. == Model == Assume <math display="inline">n</math> <math display="inline">q</math>-dimensional observations, where the <math display="inline">i</math>’th observation <math display="inline">y_i</math> is assigned to the group <math display="inline">g(i)\in \{1,\dots,m\}</math> and is distributed around the group center <math display="inline">\mu^{(g(i))}\in \mathbb R^q</math> with [[Multivariate normal distribution\|~~Multivariate~~multivariate Gaussian]] noise: <math display="block"> y_i = \mu^{(g(i))} + \varepsilon_i\quad \varepsilon_i \overset{\text{i.i.d.}}{\sim} \mathcal N_q (0, \Sigma) \quad \text{ for } i=1,\dots, n , </math> where <math display="inline">\Sigma</math> is the [[covariance matrix]]. Then we formulate our [[null hypothesis]] as <math display="block">H_0\!:\;\mu^{(1)}=\mu^{(2)}=\dots =\mu^{(m)}.</math> ==Relationship with ANOVA== Line 16: Where [[Partition of sums of squares\|sums of squares]] appear in univariate analysis of variance, in multivariate analysis of variance certain [[positive-definite matrix\|positive-definite matrices]] appear. The diagonal entries are the same kinds of sums of squares that appear in univariate ANOVA. The off-diagonal entries are corresponding sums of products. Under normality assumptions about [[errors and residuals in statistics\|error]] distributions, the counterpart of the sum of squares due to error has a [[Wishart distribution]]. == Hypothesis Testing == First, define the following <math display="inline">n\times q</math> matrices: * <math display="inline">Y</math>: where the <math display="inline">i</math>-th row is equal to <math display="inline">y_i</math~~><br /~~> * <math display="inline">\hat Y</math>: where the <math display="inline">i</math>-th row is the best prediction given the group membership <math display="inline">g(i)</math>. That is the mean over all observation in group <math display="inline">g(i)</math>: <math display="inline">\frac{1}{\text{size of group }g(i)}\sum_{k: g(k)=g(i)}y_k</math>.~~<br />~~ * <math display="inline">\bar Y</math>: where the <math display="inline">i</math>-th row is the best prediction given no information. That is the [[Sample mean and covariance\|empirical mean]] over all <math display="inline">n</math> observations <math display="inline">\frac{1}{n}\sum_{k=1}^n y_k</math> Then the matrix <math display="inline">S_{\text{model}} := (\hat Y - \bar Y)^T(\hat Y - \bar Y)</math> is a generalization of the sum of squares explained by the group, and <math display="inline">S_{\text{res}} := (Y - \hat Y)^T(Y - \hat Y)</math> is a generalization of the [[residual sum of squares]].<ref name="Anderson1994">{{cite book \|last=Anderson \|first=T. W. \|title=An Introduction to Multivariate Statistical Analysis \|year=1994 \|publisher=Wiley}}</ref> <ref name="Krzanowski1988">{{cite book \|last=Krzanowski \|first=W. J. \|title=Principles of Multivariate Analysis. A ~~User’s~~User's Perspective \|year=1988 \|publisher=Oxford University Press}}</ref> Note that alternatively one could also speak about covariances when the abovementioned matrices are scaled by 1/(n-1) since the subsequent test statistics do not change by multiplying <math display="inline">S_{\text{model}}</math> and <math display="inline">S_{\text{res}}</math> by the same non-zero constant. The most common<ref~~>{{cite~~ ~~web\|last~~name=~~Garson\|first=G. David\|title=Multivariate GLM, MANOVA, and MANCOVA\|url=http://faculty.chass.ncsu.edu/garson/PA765/manova.htm\|access-date=2011-03-22}}~~"Anderson1994"></ref><ref>{{cite web\|last=UCLA: Academic Technology Services, Statistical Consulting Group.\|title=Stata Annotated Output – MANOVA\|url=~~http~~https://~~www~~stats.~~ats~~oarc.ucla.edu~~/stat~~/stata/output/~~Stata_MANOVA.htm~~manova/\|access-date=~~2011~~2024-0302-2210}}</ref> statistics are summaries based on the roots (or eigenvalues) <math display="inline">\lambda_p</math> of the matrix <math display="inline">A:= S_{\text{model}}S_{\text{res}}^{-1}</math> * [[Samuel Stanley Wilks]]' <math>\Lambda_\text{Wilks} = \prod_{1,\ldots,p}(1/(1 + \lambda_{p})) = \det(I + A)^{-1} = \det(S_\text{res})/\det(S_\text{res} + S_\text{model})</math> distributed as [[Wilks' lambda distribution\|lambda]] (Λ) * the [[K. C. Sreedharan Pillai]]–[[M. S. Bartlett]] [[trace of a matrix\|trace]], <math>\Lambda_\text{Pillai} = \sum_{1,\ldots,p}(\lambda_p/(1 + \lambda_p)) = \operatorname{tr}(A(I + A)^{-1})</math><ref>{{cite web\|url=http://www.real-statistics.com/multivariate-statistics/multivariate-analysis-of-variance-manova/manova-basic-concepts/\|title=MANOVA Basic Concepts – Real Statistics Using Excel\|website=www.real-statistics.com\|access-date=5 April 2018}}</ref> * the ~~Lawley–~~[[Derrick Norman Lawley\|Lawley]]–[[Harold Hotelling\|Hotelling]] trace, <math>\Lambda_\text{LH} = \sum_{1,\ldots,p}(\lambda_{p}) = \operatorname{tr}(A)</math> * [[Roy's greatest root]] (also called ''Roy's largest root''), <math>\Lambda_\text{Roy} = \max_p(\lambda_p) </math> Discussion continues over the merits of each,<ref name="Warne2014" /> although the greatest root leads only to a bound on significance which is not generally of practical interest. A further complication is that, except for the Roy's greatest root, the distribution of these statistics under the [[null hypothesis]] is not straightforward and can only be approximated except in a few low-dimensional cases.~~<ref>Camo http://www.camo.com/multivariate_analysis.html</ref>~~ An algorithm for the distribution of the Roy's largest root under the [[null hypothesis]] was derived in <ref>{{Citation \|last=Chiani \| first=M. Line 54 ⟶ 52: In the case of two groups, all the statistics are equivalent and the test reduces to [[Hotelling's T-square]]. == Introducing ~~Covariates~~covariates (MANCOVA) == {{main\|Multivariate analysis of covariance}} One can also test if there is a group effect after adjusting for covariates. For this, follow the procedure above but substitute <math display="inline">\hat Y</math> with the predictions of the [[general linear model]], containing the group and the covariates, and substitute <math display="inline">\bar Y</math> with the predictions of the general linear model containing only the covariates (and an intercept). Then <math display="inline">S_{\text{model}}</math> are the additional sum of squares explained by adding the grouping information and <math display="inline">S_{\text{res}}</math> is the residual sum of squares of the model containing the grouping and the covariates.<ref name="Krzanowski1988" /> Note that in case of unbalanced data, the order of adding the covariates ~~matter~~matters. ==Correlation of dependent variables== Line 78 ⟶ 76: ==External links== {{wikiversity}} [http://online.sfsu.edu/~efc/classes/biol710/manova/manovanewest.htm Multivariate Analysis of Variance (MANOVA) by Aaron French, Marcelo Macedo, John Poulsen, Tyler Waterson and Angela Yu, San Francisco State University] [https://spss-tutor.com/manova.php What is a MANOVA test used for?] {{Statistics}} {{Experimental design}}