Content deleted Content added
Marcocapelle (talk | contribs) removed Category:Regression analysis using HotCat |
m →See also: bypass redirect |
||
(35 intermediate revisions by 24 users not shown) | |||
Line 1:
{{short description|Statistical test examining influence of two categorical variables on one continuous variable}}
In [[statistics]], the '''two-way
==History==
In 1925, [[Ronald Fisher]] mentions the two-way ANOVA in his celebrated book
==Data set==
Line 13:
==Model==
Upon observing variation among all <math>n</math> data points, for instance via a [[histogram]], "[[Probability theory|probability]] may be used to describe such variation".<ref>{{cite journal |last=Kass |first=Robert E |date=1 February 2011 |title=Statistical inference: The big picture
<math>Y_{ijk} \, | \, \mu_{ij}, \sigma^2 \; \overset{\mathrm{i.i.d.}}{\sim} \; \mathcal{N}(\mu_{ij}, \sigma^2)</math>.
Specifically, the mean of the response variable is modeled as a [[linear combination]] of the explanatory variables:
Line 21:
<math>\mu_{ij} = \mu + \alpha_i + \beta_j + \gamma_{ij}</math>,
where <math>\mu</math> is the grand mean, <math>\alpha_i</math> is the additive main effect of level <math>i </math> from the first factor (''i''-th row in the
<math>Y_{ijk} = \mu_{ij} + \epsilon_{ijk} \text{ with } \epsilon_{ijk} \overset{\mathrm{i.i.d.}}{\sim} \mathcal{N}(0, \sigma^2)</math>.
==Assumptions==
Following [[Andrew Gelman|Gelman]] and [[Jennifer Hill|Hill]], the assumptions of the ANOVA, and more generally the [[general linear model]], are, in decreasing order of importance:<ref>{{cite book |
# the data points are relevant with respect to the scientific question under investigation;
# the mean of the response variable is influenced additively (if not interaction term) and linearly by the factors;
Line 38:
To ensure [[identifiability]] of parameters, we can add the following "sum-to-zero" constraints:
<math>\sum_i \alpha_i = \sum_j \beta_j = \sum_i \gamma_{ij} =\sum_j \gamma_{ij}
==Hypothesis testing==
Line 48:
-->
Testing if the interaction term is significant can be difficult because of the potentially-large number of [[degrees of freedom (statistics)|degrees of freedom]].<ref>{{cite journal |author=Yi-An Ko|date=September 2013 |title=Novel Likelihood Ratio Tests for Screening Gene-Gene and Gene-Environment Interactions with Unbalanced Repeated-Measures Data |journal=Genetic
==Example==
The following hypothetical example gives the yields of 15 plants subject to two different environmental variations, and three different fertilisers.
{| class="wikitable"
|-
!
! Extra CO<sub>2</sub>
! Extra humidity
|-
| No fertiliser
| 7, 2, 1
| 7, 6
|-
| Nitrate
| 11, 6
| 10, 7, 3
|-
| Phosphate
| 5, 3, 4
| 11, 4
|}
Five sums of squares are calculated:
{| class="wikitable"
|-
! Factor
! Calculation
! Sum
! ''N''
|-
| Individual
| <math>7^2+2^2+1^2 + 7^2+6^2 + 11^2+6^2 + 10^2+7^2+3^2 + 5^2+3^2+4^2 + 11^2+4^2</math>
| 641
| 15
|-
| Fertilizer × Environment
| <math>\frac{(7+2+1)^2}{3} + \frac{(7+6)^2}{2} + \frac{(11+6)^2}{2} + \frac{(10+7+3)^2}{3} + \frac{(5+3+4)^2}{3} + \frac{(11+4)^2}{2}</math>
| 556.1667
| 6
|-
| Fertilizer
| <math>\frac{(7+2+1+7+6)^2}{5} + \frac{(11+6+10+7+3)^2}{5} + \frac{(5+3+4+11+4)^2}{5}</math>
| 525.4
| 3
|-
| Environment
| <math>\frac{(7+2+1+11+6+5+3+4)^2}{8} + \frac{(7+6+10+7+3+11+4)^2}{7} </math>
| 519.2679
| 2
|-
| Composite
| <math>\frac{(7+2+1+11+6+5+3+4+7+6+10+7+3+11+4)^2}{15} </math>
| 504.6
| 1
|}
Finally, the sums of squared deviations required for the [[analysis of variance]] can be calculated.<ref>{{cite book|last=Mecklin|first=Christopher|title=STA 265 Notes (Methods of Statistics and Data Science)|date=20 October 2020|access-date=6 December 2024|chapter-url=https://bookdown.org/cmecklin/sta265notes/anova-with-interaction.html|chapter=Chapter 7: ANOVA with Interaction|via=bookdown.org}}</ref>
{| class="wikitable"
|-
! Factor
! Sum
! ''N''
! Total
! Environment
! Fertiliser
! Fertiliser × Environment
! Residual
|-
| Individual
| 641
| 15
| 1
|
|
|
| 1
|-
| Fertiliser × Environment
| 556.1667
| 6
|
|
|
| 1
| −1
|-
| Fertiliser
| 525.4
| 3
|
|
| 1
| −1
|
|-
| Environment
| 519.2679
| 2
|
| 1
|
| −1
|
|-
| Composite (correction factor<ref>{{cite book|chapter-url=https://iastate.pressbooks.pub/quantitativeplantbreeding/chapter/the-analysis-of-variance-anova/|title=Quantitative Methods for Plant Breeding|chapter=Chapter 8: The Analysis of Variance (ANOVA)|last1=Moore|first1=Ken|last2=Mowers|first2=Ron|last3=Harbur|first3=M.L.|last4=Merrick|first4=Laura|last5=Mahama|first5=Anthony Assibi|publisher=Iowa State University Digital Press|editor-last1=Suza|editor-first1=W.P.|editor-last2=Lamkey|editor-first2=K.R.|year=2023|access-date=6 December 2024}}</ref>)
| 504.6
| 1
| −1
| −1
| −1
| 1
|
|-
|
|
|
|
|
|
|
|
|-
| Squared deviations (<math>\sigma^2</math>)
|
|
| 136.4
| 14.668
| 20.8
| 16.099
| 84.833
|-
| Degrees of freedom
|
|
| 14
| 1
| 2
| 2
| 9
|-
| Mean square variance
|
|
|
| 14.668
| 10.4
| 8.0495
| 9.426
|}
==See also==
* [[Analysis of variance]]
* [[F-test]] (''Includes a one-way ANOVA example'')
* [[Mixed model]]
* [[Multivariate analysis of variance|Multivariate analysis of variance (MANOVA)]]
* [[One-way ANOVA]]
* [[Repeated measures#Repeated measures ANOVA|Repeated measures ANOVA]]
* [[Tukey's test of additivity]]
==Notes==
{{Reflist}}
== References ==
* {{cite book |author=George Casella |date=18 April 2008 |title=Statistical design |url=https://www.springer.com/statistics/statistical+theory+and+methods/book/978-0-387-75964-7 |publisher=[[Springer Science+Business Media|Springer]] |isbn=978-0-387-75965-4 |series=Springer Texts in Statistics |author-link=George Casella }}
[[Category:Analysis of variance]]
|