'''Approximate Bayesian computation (ABC)''' constitutes a class of [[wp:Computational science|computational methods]] rooted in [[wp:Bayesian statistics|Bayesian statistics]]. In all model-based [[wp:statistical inference|statistical inference]], the [[wp:likelihood|likelihood function]] is of central importance, since it expresses the probability of the observed data under a particular statistical model, and thus quantifies the support data lend to particular values of parameters and to choices among different models. For simple models, an analytical formula for the likelihood function can typically be derived. However, for more complex models, an analytical formula might be elusive or the likelihood function might be computationally very costly to evaluate.
'''Approximate Bayesian computation (ABC)''' is a family of computational techniques in [[Bayesian statistics]]. These simulation techniques operate on summary data (such as population mean, or variance) to make broad inferences with less computation than might be required if all available data were analyzed in detail. They are especially useful in situations where evaluation of the likelihood is computationally prohibitive, or whenever suitable likelihoods are not available.
ABC methods bypass the evaluation of the likelihood function. In this way, ABC methods widen the realm of models for which statistical inference can be considered. ABC methods are mathematically well-founded, but they inevitably make assumptions and approximations whose impact needs to be carefully assessed. Furthermore, the wider application ___domain of ABC exacerbates the challenges of [[wp:Estimation Theory|parameter estimation]] and [[wp:Model selection|model selection]].
ABC methods originated in population and evolutionary genetics <ref name=Pritchard1999>{{cite journal|last = Pritchard|first = J. K.|authorlink=Jonathan K. Pritchard|coauthors = Seielstad, M. T., Perez-Lezaun, A., and Feldman, M. T.|title = Population Growth of Human Y Chromosomes: A Study of Y Chromosome Microsatellites|journal = Mol. Biol. Evol.|volume = 16|year = 1999|pages = 1791–1798|pmid = 10605120|issue = 12}}</ref><ref name=Beaumont>{{cite journal|last = Beaumont|first = M. A.|coauthors = Zhang, W. and [[David Balding|Balding, D. J.]]|title = Approximate Bayesian computation in population genetics|journal = Genetics|volume = 162|pages = 2025–2035|url = http://www.genetics.org/cgi/content/abstract/162/4/2025|pmid = 12524368|issue = 4|date = December 1, 2002|pmc = 1462356 }}</ref> but have recently also been introduced to the analysis of complex and stochastic [[dynamical systems]].<ref name=Toni2009>{{cite journal |author = Toni, T.; Welch, D.; Strelkowa, N.; Ipsen, A.; Stumpf, M.P.H. |year = 2009 |title = Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems | journal = Journal of the Royal Society Interface |volume = 6 |issue = 31 |pages = 187–202 |doi = 10.1098/rsif.2008.0172 |url=http://rsif.royalsocietypublishing.org/content/6/31/187.abstract}}</ref>
ABC has rapidly gained popularity over the last years and in particular for the analysis of complex problems arising in [[wp:Biology|biological sciences]], e.g. in [[wp:Population genetics|population genetics]], [[wp:ecology|ecology]], [[wp:epidemiology|epidemiology]], and [[wp:systems biology|systems biology]].
==Overview==
==History==
In standard Bayesian inference the [[posterior distribution]] is given by
The first ABC-related ideas date back to 1980’s. [[wp:Donald_Rubin|Donald Rubin]], when discussing the interpretation of Bayesian statements in 1984<ref name="Rubin" />, described a hypothetical sampling mechanism that yields a sample from the [[wp:Posterior_probability|posterior distribution]]. This scheme was more of a conceptual [[wp:thought experiment|thought experiment]] to demonstrate what type of manipulations are done when inferring the posterior distributions of parameters. The description of the sampling mechanism coincides exactly with that of the [[#The ABC rejection algorithm|ABC-rejection scheme]], and this paper can be considered to be the first to describe approximate Bayesian computation. Another prescient point was made when Rubin argued that in Bayesian inference, applied statisticians should not settle for analytically tractable models only, but instead consider computational methods that allow them to estimate the posterior distribution of interest. This way, a wider range of models can be considered. These arguments are particularly relevant in the context of ABC.
In 1984, Peter Diggle and Richard Gratton<ref name="Diggle" /> suggested using a systematic simulation scheme to approximate the likelihood function in situations where its analytic form is [[wp:Intractability_(complexity)|intractable]]. Their method was based on defining a grid in the parameter space and using it to approximate the likelihood by running several simulations for each grid point. The approximation was then improved by applying smoothing techniques to the outcomes of the simulations. While the idea of using simulation for hypothesis testing was not new,<ref name="Bartlett63" /><ref name="Hoel71" /> Diggle and Gratton seemingly introduced the first procedure using simulation to do statistical inference under a circumstance where the likelihood is intractable.
:<math>P(\theta|D)\propto P(D|\theta) \pi(\theta)</math>
Although Diggle and Gratton’s approach had opened a new frontier, their method was not yet exactly identical to what is now known as ABC, as it aimed at approximating the likelihood rather than the posterior distribution. A paper of Simon Tavaré et al.<ref name="Tavare" /> was first to propose an ABC algorithm for posterior inference. In their seminal work, inference about the genealogy of DNA sequence data was considered, and in particular the problem of deciding the posterior distribution of the time to the [[wp:Most recent common ancestor|most recent common ancestor]] of the sampled individuals. Such inference is analytically intractable for many demographical models, but the authors presented ways of simulating coalescent trees under the putative models. A sample from the posterior of model parameters was obtained by accepting/rejecting proposals based on comparing the number of segregating sites in the synthetic and real data. This work was followed by an applied study on modeling the variation in human Y chromosome by Jonathan K. Pritchard et al.<ref name="Pritchard1999" /> using the ABC method. Finally, the term Approximate Bayesian Computation was established by Mark Beaumont et al.,<ref name="Beaumont2002" /> extending further the ABC methodology and discussing the suitability of the ABC-approach more specifically for problems in population genetics. Since then, ABC has spread to applications outside population genetics, such as systems biology, epidemiology, or [[wp:Phylogeography|phylogeography]].
where <math>\theta</math> are the parameters of a probability model, <math>D</math> are the observed data, and <math>\pi(\theta)</math> is the [[prior distribution]] of the parameters <math>\theta</math>. <math>P(D|\theta)</math> is the [[likelihood]] of <math>\theta</math>, that is the probability of observing the data <math>D</math> given the model with parameter <math>\theta</math>.
The explicit evaluation of the likelihood <math>P(D|\theta)</math> is avoided in ABC approaches by considering distances between observed and data simulated from a model with parameter <math>\theta</math>. For sufficiently complex models and large data sets the probability of happening upon a simulation run that yields precisely the same dataset as the one observed will be very small, often unacceptably so. So rather than considering the data we consider a summary statistic of the data, <math>S(D)</math>, and use a distance <math>\Delta(S(D),S(X))</math> between the summary statistics of real and simulated data, <math>D</math> and <math>X</math>, respectively.
The generic ABC approach to infer the posterior probability distribution of a parameter <math>\theta</math> is as follows:
:# Sample a candidate parameter vector <math>\theta^\ast</math> from some proposal distribution <math>\pi(\theta)</math>.
:# Simulate a dataset <math>X</math> from the model with parameter <math>\theta^\ast</math>.
:# If <math>\Delta(S(D),S(X))<\epsilon</math> then accept <math>\theta^\ast</math> as a sample from the posterior.
For <math>\epsilon</math> sufficiently small the ABC procedure should deliver a good approximation to the true posterior, in particular if the summary statistic <math>S</math> is a [[sufficient statistic]] of the probability model. If sufficient statistics do not exist or are hard to come by, setting up a satisfying and efficient ABC approach can be challenging.
The generic procedure outlined above can be computationally inefficient but ABC and likelihood-free inferential procedures can be combined with the standard computational approaches used in [[Bayesian inference]] such as [[Markov chain Monte Carlo]] <ref name=Marjoram>{{cite journal|last = Marjoram|first = P.|coauthors = Molitor, J., Plagnol, V. and Tavaré, S.|title = Markov chain Monte Carlo without likelihoods|journal = P Natl Acad Sci USA|volume = 100|issue = 26|year = 2003|pages = 15324–15328|doi = 10.1073/pnas.0306899100|pmid = 14663152|pmc = 307566}}</ref><ref name=Plagnol>{{cite journal|last = Plagnol|first = V.|coauthors = Tavaré, S.|title = Approximate Bayesian computation and MCMC|journal = Monte Carlo and Quasi-Monte Carlo Methods 2002|year = 2004|url = http://www-gene.cimr.cam.ac.uk/vplagnol/papers/vpst-web.pdf|format=PDF}} (The link is to a preprint.)</ref> and [[Sequential Monte Carlo method]] <ref name=Toni2009 /> approaches. In these frameworks ABC can be used to tackle otherwise computationally intractable problems.
While ABC and related likelihood-free methods have overwhelmingly been employed for parameter estimation, they can also be used for [[model selection]], as the whole apparatus of Bayesian model selection can be adapted to the ABC framework.<ref name= Toni2009b>{{cite journal |author = Toni, T.; Stumpf, M.P.H. |year = 2010 |title = Simulation-based model selection for dynamical systems in systems and population biology | journal = Bioinformatics |volume = 26|pages = 104–10 |doi = 10.1093/bioinformatics/btp619 |url=http://bioinformatics.oxfordjournals.org/cgi/reprint/26/1/104.pdf|format=PDF |pmid = 19880371 |issue = 1 |pmc = 2796821 }}</ref>
An increasing number of software implementations of ABC approaches exist.<ref name=Cornuet>{{cite journal|last = Cornuet|first = J-M.|coauthors = Santos, F., Beaumont, M. A., Robert, C. P., Marin, J-M., [[David Balding|Balding, D. J.]], Guillemaud, T. and Estoup, A.|title = Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation|journal = Bioinformatics|year = 2008|url = http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btn514|pmid = 18842597|doi = 10.1093/bioinformatics/btn514|volume = 24|pages = 2713–9|issue = 23|pmc = 2639274}}</ref><ref name=Liepe>{{cite journal|author = Liepe, J.; Barnes, C.; Cule, E.; Erguler, K.; Kirk, P.; Toni, T.; Stumpf, M.P.H.|year=2010|title=ABC-SysBio—approximate Bayesian computation in Python with GPU support|journal=Bioinformatics|volume=26|pages=1797–9|doi=10.1093/bioinformatics/btq278|url=http://bioinformatics.oxfordjournals.org/cgi/content/full/26/14/1797|issue=14|pmid = 20591907|pmc = 2894518}}</ref><ref name=Wegmann>{{cite journal|author=Wegmann, D.; Leuenberger, C.; Neuenschwander, S.; Excoffier, L.|year=2010|title=ABCtoolbox: a versatile toolkit for approximate Bayesian computations|journal=BMC Bioinformatics|volume=11|pages=116|doi=10.1186/1471-2105-11-116|url=http://www.biomedcentral.com/1471-2105/11/116|pmid=20202215|pmc=2848233}}</ref>
Recent advances in ABC methodology, computational implementations and applications are discussed at the '''ABC in ...''' meetings:
* In 2009 [http://www.ceremade.dauphine.fr/~xian/ABCinParis.html ABC in Paris] started this series at [http://www.dauphine.fr/ Université Paris Dauphine].
* In 2011 [http://www.bioinformatics.ic.ac.uk/abcil/index.html ABC in London] is on the [http://www.bioinformatics.ic.ac.uk/abcil/programme.html 5th of May] at [http://www.imperial.ac.uk Imperial College London].
* In 2013 the event will be held in Rome at [http://www.uniroma3.it/ Università degli Studi Roma Tre].
==See also==
==References==
<!-- #{{reflist}} -->
==Software==
<!--
[http://www1.montpellier.inra.fr/CBGP/diyabc/ DIYABC] : "Do it yourself ABC".
[http://www.cmpg.iee.unibe.ch/content/softwares__services/computer_programs/abctoolbox/index_eng.html ABC Toolbox]: Inference for [[Population Genetics]].
-->
{{DEFAULTSORT:Approximate Bayesian Computation}}
|