Content deleted Content added
No edit summary |
m →See also: bypass redirect |
||
(13 intermediate revisions by 10 users not shown) | |||
Line 1:
{{short description|Statistical test examining influence of two categorical variables on one continuous variable}}
In [[statistics]], the '''two-way analysis of variance''' ('''ANOVA''') is an extension of the [[One-way analysis of variance|one-way ANOVA]] that examines the influence of two different [[Categorical variable|categorical]] [[independent variables]] on one [[Continuous function|continuous]] [[dependent variable]]. The two-way ANOVA not only aims at assessing the [[main effect]] of each independent variable but also if there is any [[Interaction (statistics)|interaction]] between them.
==History==
In 1925, [[Ronald Fisher]] mentions the two-way ANOVA in his celebrated book, ''[[Statistical Methods for Research Workers]]'' (chapters 7 and 8). In 1934, [[Frank Yates]] published procedures for the unbalanced case.<ref>{{cite journal |last=Yates |first=Frank |date=March 1934 |title=The analysis of multiple classifications with unequal numbers in the different classes |jstor=2278459 |journal=Journal of the American Statistical Association |volume=29 |issue=185 |pages=51–66 |doi=10.1080/01621459.1934.10502686}}</ref> Since then, an extensive literature has been produced. The topic was reviewed in 1993 by [[Yasunori Fujikoshi]].<ref>{{cite journal |last=Fujikoshi |first=Yasunori |date=1993 |title=Two-way ANOVA models with unbalanced data |journal=Discrete Mathematics |volume=116 |issue=1 |pages=315–334 |doi=10.1016/0012-365X(93)90410-U |doi-access=free }}</ref> In 2005, [[Andrew Gelman]] proposed a different approach of ANOVA, viewed as a [[multilevel model]].<ref>{{cite journal |last=Gelman |first=Andrew |date=February 2005 |title=Analysis of variance? why it is more important than ever |journal=The Annals of Statistics |volume=33 |issue=1 |pages=1–53 | arxiv=math/
==Data set==
Line 22 ⟶ 21:
<math>\mu_{ij} = \mu + \alpha_i + \beta_j + \gamma_{ij}</math>,
where <math>\mu</math> is the grand mean, <math>\alpha_i</math> is the additive main effect of level <math>i </math> from the first factor (''i''-th row in the
Another equivalent way of describing the two-way ANOVA is by mentioning that, besides the variation explained by the factors, there remains some [[statistical noise]]. This amount of unexplained variation is handled via the introduction of one random variable per data point, <math>\epsilon_{ijk}</math>, called [[Errors and residuals in statistics|error]]. These <math>n</math> random variables are seen as deviations from the means, and are assumed to be independent and normally distributed:
Line 29 ⟶ 28:
==Assumptions==
Following [[Andrew Gelman|Gelman]] and [[Jennifer Hill|Hill]], the assumptions of the ANOVA, and more generally the [[general linear model]], are, in decreasing order of importance:<ref>{{cite book |
# the data points are relevant with respect to the scientific question under investigation;
# the mean of the response variable is influenced additively (if not interaction term) and linearly by the factors;
Line 50 ⟶ 49:
Testing if the interaction term is significant can be difficult because of the potentially-large number of [[degrees of freedom (statistics)|degrees of freedom]].<ref>{{cite journal |author=Yi-An Ko|date=September 2013 |title=Novel Likelihood Ratio Tests for Screening Gene-Gene and Gene-Environment Interactions with Unbalanced Repeated-Measures Data |journal=Genetic Epidemiology |volume=37 |issue=6 |pages=581–591 |doi=10.1002/gepi.21744 |pmid=23798480 |display-authors=etal|pmc=4009698}}</ref>
==Example==
The following hypothetical example gives the yields of 15 plants subject to two different environmental variations, and three different fertilisers.
{| class="wikitable"
|-
!
! Extra CO<sub>2</sub>
! Extra humidity
|-
| No fertiliser
| 7, 2, 1
| 7, 6
|-
| Nitrate
| 11, 6
| 10, 7, 3
|-
| Phosphate
| 5, 3, 4
| 11, 4
|}
Five sums of squares are calculated:
{| class="wikitable"
|-
! Factor
! Calculation
! Sum
! ''N''
|-
| Individual
| <math>7^2+2^2+1^2 + 7^2+6^2 + 11^2+6^2 + 10^2+7^2+3^2 + 5^2+3^2+4^2 + 11^2+4^2</math>
| 641
| 15
|-
| Fertilizer × Environment
| <math>\frac{(7+2+1)^2}{3} + \frac{(7+6)^2}{2} + \frac{(11+6)^2}{2} + \frac{(10+7+3)^2}{3} + \frac{(5+3+4)^2}{3} + \frac{(11+4)^2}{2}</math>
| 556.1667
| 6
|-
| Fertilizer
| <math>\frac{(7+2+1+7+6)^2}{5} + \frac{(11+6+10+7+3)^2}{5} + \frac{(5+3+4+11+4)^2}{5}</math>
| 525.4
| 3
|-
| Environment
| <math>\frac{(7+2+1+11+6+5+3+4)^2}{8} + \frac{(7+6+10+7+3+11+4)^2}{7} </math>
| 519.2679
| 2
|-
| Composite
| <math>\frac{(7+2+1+11+6+5+3+4+7+6+10+7+3+11+4)^2}{15} </math>
| 504.6
| 1
|}
Finally, the sums of squared deviations required for the [[analysis of variance]] can be calculated.<ref>{{cite book|last=Mecklin|first=Christopher|title=STA 265 Notes (Methods of Statistics and Data Science)|date=20 October 2020|access-date=6 December 2024|chapter-url=https://bookdown.org/cmecklin/sta265notes/anova-with-interaction.html|chapter=Chapter 7: ANOVA with Interaction|via=bookdown.org}}</ref>
{| class="wikitable"
|-
! Factor
! Sum
! ''N''
! Total
! Environment
! Fertiliser
! Fertiliser × Environment
! Residual
|-
| Individual
| 641
| 15
| 1
|
|
|
| 1
|-
| Fertiliser × Environment
| 556.1667
| 6
|
|
|
| 1
| −1
|-
| Fertiliser
| 525.4
| 3
|
|
| 1
| −1
|
|-
| Environment
| 519.2679
| 2
|
| 1
|
| −1
|
|-
| Composite (correction factor<ref>{{cite book|chapter-url=https://iastate.pressbooks.pub/quantitativeplantbreeding/chapter/the-analysis-of-variance-anova/|title=Quantitative Methods for Plant Breeding|chapter=Chapter 8: The Analysis of Variance (ANOVA)|last1=Moore|first1=Ken|last2=Mowers|first2=Ron|last3=Harbur|first3=M.L.|last4=Merrick|first4=Laura|last5=Mahama|first5=Anthony Assibi|publisher=Iowa State University Digital Press|editor-last1=Suza|editor-first1=W.P.|editor-last2=Lamkey|editor-first2=K.R.|year=2023|access-date=6 December 2024}}</ref>)
| 504.6
| 1
| −1
| −1
| −1
| 1
|
|-
|
|
|
|
|
|
|
|
|-
| Squared deviations (<math>\sigma^2</math>)
|
|
| 136.4
| 14.668
| 20.8
| 16.099
| 84.833
|-
| Degrees of freedom
|
|
| 14
| 1
| 2
| 2
| 9
|-
| Mean square variance
|
|
|
| 14.668
| 10.4
| 8.0495
| 9.426
|}
==See also==
* [[Analysis of variance]]
* [[F
* [[Mixed model]]
* [[Multivariate analysis of variance|Multivariate analysis of variance (MANOVA)]]
Line 62 ⟶ 212:
==Notes==
{{Reflist}}
== References ==
* {{cite book |author=George Casella |date=18 April 2008 |title=Statistical design |url=https://www.springer.com/statistics/statistical+theory+and+methods/book/978-0-387-75964-7 |publisher=[[Springer Science+Business Media|Springer]] |isbn=978-0-387-75965-4 |series=Springer Texts in Statistics |author-link=George Casella }}
|