Two-way analysis of variance: Difference between revisions

Content deleted Content added
OAbot (talk | contribs)
m Open access bot: add arxiv identifier to citation with #oabot.
Citation bot (talk | contribs)
m Alter: isbn, journal. Add: series, pmc, pmid. Removed accessdate with no specified URL. Removed parameters. You can use this bot yourself. Report bugs here. | User-activated.
Line 3:
 
==History==
In 1925, [[Ronald Fisher]] mentions the two-way ANOVA in his celebrated book from 1925, ''[[Statistical Methods for Research Workers]]'' (chapters 7 and 8). In 1934, [[Frank Yates]] published procedures for the unbalanced case.<ref>{{cite journal |last=Yates |first=Frank |date=March 1934 |title=The analysis of multiple classifications with unequal numbers in the different classes |jstor=2278459 |journal=Journal of the American Statistical Association |publisher=American Statistical Association |volume=29 |issue=185 |pages=51–66 |doi=10.1080/01621459.1934.10502686}}</ref> Since then, an extensive literature has been produced. The topic was reviewed in 1993 by [[Yasunori Fujikoshi]].<ref>{{cite journal |last=Fujikoshi |first=Yasunori |date=1993 |title=Two-way ANOVA models with unbalanced data |url=http://www.sciencedirect.com/science/article/pii/0012365X9390410U |journal=Discrete Mathematics |publisher=Elsevier |volume=116 |issue=1 |pages=315–334 |doi=10.1016/0012-365X(93)90410-U |accessdate=19 June 2014}}</ref> In 2005, [[Andrew Gelman]] proposed a different approach of ANOVA, viewed as a [[multilevel model]].<ref>{{cite journal |last=Gelman |first=Andrew |date=February 2005 |title=Analysis of variance? why it is more important than ever |journal=The Annals of Statistics |volume=33 |issue=1 |pages=1–53 |doi=10.1214/009053604000001048 |url=http://projecteuclid.org/euclid.aos/1112967698 |accessdate=19 June 2014}}</ref>
 
==Data set==
Line 13:
 
==Model==
Upon observing variation among all <math>n</math> data points, for instance via a [[histogram]], "[[Probability theory|probability]] may be used to describe such variation".<ref>{{cite journal |last=Kass |first=Robert E |date=1 February 2011 |title=Statistical inference: The big picture |url=http://projecteuclid.org/euclid.ss/1307626554 |journal=[[Statistical Science]] |publisher=[[Institute of Mathematical Statistics]] |volume=26 |issue=1 |pages=1–9 |doi=10.1214/10-sts337|pmid=21841892 |pmc=3153074 |arxiv=1106.2895 }}</ref> Let us hence denote by <math>Y_{ijk}</math> the [[random variable]] which observed value <math>y_{ijk}</math> is the <math>k</math>-th measure for treatment <math>(i,j)</math>. The '''two-way ANOVA''' models all these variables as varying [[Independence (probability theory)|independently]] and [[Normal distribution|normally]] around a mean, <math>\mu_{ij}</math>, with a constant variance, <math>\sigma^2</math> ([[homoscedasticity]]):
 
<math>Y_{ijk} \, | \, \mu_{ij}, \sigma^2 \; \overset{i.i.d.}{\sim} \; \mathcal{N}(\mu_{ij}, \sigma^2)</math>.
Line 28:
 
==Assumptions==
Following Gelman and Hill, the assumptions of the ANOVA, and more generally the [[general linear model]], are, in decreasing order of importance:<ref>{{cite book |last=Gelman |first=Andrew |last2=Hill |first2=Jennifer |date=18 December 2006 |title= Data Analysis Using Regression and Multilevel/Hierarchical Models |url=http://www.cambridge.org/us/academic/subjects/statistics-probability/statistical-theory-and-methods/data-analysis-using-regression-and-multilevelhierarchical-models |publisher=[[Cambridge University Press]] |pages=45–46 |isbn=978-0521867061 }}</ref>
# the data points are relevant with respect to the scientific question under investigation;
# the mean of the response variable is influenced additively (if not interaction term) and linearly by the factors;
Line 48:
-->
 
Testing if the interaction term is significant can be difficult because of the potentially-large number of [[degrees of freedom (statistics)|degrees of freedom]].<ref>{{cite journal |author=Yi-An Ko|date=September 2013 |title=Novel Likelihood Ratio Tests for Screening Gene-Gene and Gene-Environment Interactions with Unbalanced Repeated-Measures Data |journal=Genetic epidemiologyEpidemiology |volume=37 |issue=6 |pages=581–591 |doi=10.1002/gepi.21744 |urlpmid=http://onlinelibrary.wiley.com/doi/10.1002/gepi.21744/abstract |accessdate=19 June23798480 2014|display-authors=etal|pmc=4009698}}</ref>
 
==See also==
Line 62:
{{Reflist}}
== References ==
* {{cite book |author=[[George Casella]] |date=18 April 2008 |title=Statistical design |url=https://www.springer.com/statistics/statistical+theory+and+methods/book/978-0-387-75964-7 |publisher=[[Springer Science+Business Media|Springer]] |isbn=978-0-387-75965-4 |series=Springer Texts in Statistics }}
 
[[Category:Analysis of variance]]