#REDIRECT [[Bayes factor]]
A common problem in [[statistical inference]] is to use data to determine which of two competing models is the truth. Frequentist statistics uses [[hypothesis test]]s for this purpose. There are several [[Bayesian]] approaches. One approach is through [[Bayes factor]]s.
The posterior probability of a model given data, Pr(''H''|''D''), is given by [[Bayes' theorem]]:
:<math>Pr(H|D) = \frac{Pr(D|H)Pr(H)}{Pr(D)}</math>
The key data-dependent term Pr(''D''|''H'') is a [[likelihood function|likelihood]], and is sometimes called the evidence for model ''H''; evaluating it correctly is the key to Bayesian model comparison.
The evidence is usually the [[normalizing constant]] or [[partition function]] of another inference, namely the inference of the parameters of model ''H'' given the data ''D''.
The plausibility of two different models ''H''<sub>1</sub> and ''H''<sub>2</sub>, parametrised by model parameter vectors <math> \theta_1 </math> and <math> \theta_2 </math> is assessed by the [[Bayes factor]] given by
:<math> \frac{\Pr(D|H_2)}{\Pr(D|H_1)}
= \frac{\int \Pr(\theta_2|H_2)\Pr(D|\theta_2,H_2)\,d\theta_2}
{\int \Pr(\theta_1|H_1)\Pr(D|\theta_1,H_1)\,d\theta_1
}.
</math>
Thus the Bayesian model comparison does not depend on the parameters used by each model. Instead, it considers the probability of the model considering all possible parameter values. Alternatively, the [[Maximum likelihood estimate]] could be used for each of the parameters.
An advantage of the use of [[Bayes factors]] is that it automatically, and quite naturally, includes a penalty for including too much model structure. It thus guards against [[overfitting]].
Another approach is to treat model comparison as a [[Decision theory#Choice under uncertainty|decision problem]], computing the expected value or cost of each model choice.
Another approach is to use [[Minimum Message Length]] ([[Minimum_Message_Length|MML]]).
== See also ==
*[[Akaike information criterion]]
*Schwarz's [[Bayesian information criterion]]
*[[Conditional predictive ordinate]]
*[[Chris_Wallace_(computer_scientist)|Wallace]]'s [[Minimum Message Length]] ([[Minimum Message Length|MML]])
*[[Model selection]]
== References ==
* Gelman, A., Carlin, J.,Stern, H. and Rubin, D. Bayesian Data Analysis. Chapman and Hall/CRC.(1995)
* Bernardo, J., and Smith, A.F.M., Bayesian Theory. John Wiley. (1994)
* Lee, P.M. Bayesian Statistics. Arnold.(1989).
* Denison, D.G.T., Holmes, C.C., Mallick, B.K., Smith, A.F.M., Bayesian Methods for Nonlinear Classification and Regression. John Wiley. (2002).
* Richard O. Duda, Peter E. Hart, David G. Stork (2000) ''Pattern classification'' (2nd edition), Section 9.6.5, p. 487-489, Wiley, ISBN 0-471-05669-3
* Chapter 24 in [http://omega.math.albany.edu:8008/JaynesBook.html Probability Theory - The logic of science] by [[Edwin Thompson Jaynes|E. T. Jaynes]], 1994.
* [[David J.C. MacKay]] (2003) Information theory, inference and learning algorithms, CUP, ISBN 0-521-64298-1, (also [http://www.inference.phy.cam.ac.uk/mackay/itila/book.html available online])
== External links ==
* [http://www.inference.phy.cam.ac.uk/mackay/itila/ The on-line textbook: Information Theory, Inference, and Learning Algorithms], by [[David J.C. MacKay]], discusses Bayesian model comparison in Chapter 28, p343.
[[Category:Bayesian statistics]]
[[Category:Probability theory]]
|