Approximate Bayesian computation: Difference between revisions

Content deleted Content added
Kekremi (talk | contribs)
m typo correction
Line 2:
{{Bayesian statistics}}
 
'''Approximate Bayesian computation''' ('''ABC''') constitutes a class of [[Computational science|computational methods]] rooted in [[Bayesian statistics]] that can be used to estimate the posterior distributions of model parameters.
 
In all model-based [[statistical inference]], the [[likelihood|likelihood function]] is of central importance, since it expresses the probability of the observed data under a particular [[statistical model]], and thus quantifies the support data lend to particular values of parameters and to choices among different models. For simple models, an analytical formula for the likelihood function can typically be derived. However, for more complex models, an analytical formula might be elusive or the likelihood function might be computationally very costly to evaluate.
Line 17:
Although Diggle and Gratton's approach had opened a new frontier, their method was not yet exactly identical to what is now known as ABC, as it aimed at approximating the likelihood rather than the posterior distribution. An article of [[Simon Tavaré]] and co-authors was first to propose an ABC algorithm for posterior inference.<ref name="Tavare" /> In their seminal work, inference about the genealogy of DNA sequence data was considered, and in particular the problem of deciding the posterior distribution of the time to the [[most recent common ancestor]] of the sampled individuals. Such inference is analytically intractable for many demographic models, but the authors presented ways of simulating coalescent trees under the putative models. A sample from the posterior of model parameters was obtained by accepting/rejecting proposals based on comparing the number of segregating sites in the synthetic and real data. This work was followed by an applied study on modeling the variation in human Y chromosome by [[Jonathan K. Pritchard]] and co-authors using the ABC method.<ref name="Pritchard1999" /> Finally, the term approximate Bayesian computation was established by Mark Beaumont and co-authors,<ref name="Beaumont2002" /> extending further the ABC methodology and discussing the suitability of the ABC-approach more specifically for problems in population genetics. Since then, ABC has spread to applications outside population genetics, such as systems biology, epidemiology, and [[phylogeography]].
 
Approximate Bayesian computation can be understood as a kind of Bayesian version of [[indirect inference]].<ref>{{cite arXiv | eprint=1803.01999 | author1=Christopher C Drovandi | title=ABC and Indirect Inference | date=2018 | class=stat.CO }}</ref><ref name="Peters 2009">{{Cite journal |last=Peters |first=Gareth |date=2009 |title=Advances in Approximate Bayesian Computation and Trans-Dimensional Sampling Methodology |url=https://www.ssrn.com/abstract=3785580 |journal=SSRN Electronic Journal |language=en |doi=10.2139/ssrn.3785580 |issn=1556-5068|hdl=1959.4/50086 |hdl-access=free }}</ref>
 
Several efficient Monte Carlo based approaches have been developed to perform sampling from the ABC posterior distribution for purposes of estimation and prediction problems. A popular choice is the SMC Samplers algorithm <ref>{{Cite journal |last1=Del Moral |first1=Pierre |last2=Doucet |first2=Arnaud |last3=Jasra |first3=Ajay |date=2006 |title=Sequential Monte Carlo Samplers |url=https://www.jstor.org/stable/3879283 |journal=Journal of the Royal Statistical Society. Series B (Statistical Methodology) |volume=68 |issue=3 |pages=411–436 |doi=10.1111/j.1467-9868.2006.00553.x |jstor=3879283 |issn=1369-7412|arxiv=cond-mat/0212648 }}</ref><ref>{{Cite journal |last1=Del Moral |first1=Pierre |last2=Doucet |first2=Arnaud |last3=Peters |first3=Gareth |date=2004 |title=Sequential Monte Carlo Samplers CUED Technical Report |url=https://www.ssrn.com/abstract=3841065 |journal=SSRN Electronic Journal |language=en |doi=10.2139/ssrn.3841065 |issn=1556-5068}}</ref><ref>{{Cite journal |last=Peters |first=Gareth |date=2005 |title=Topics in Sequential Monte Carlo Samplers |url=https://www.ssrn.com/abstract=3785582 |journal=SSRN Electronic Journal |language=en |doi=10.2139/ssrn.3785582 |issn=1556-5068}}</ref> adapted to the ABC context in the method (SMC-ABC).<ref>{{Cite journal |last1=Sisson |first1=S. A. |last2=Fan |first2=Y. |last3=Tanaka |first3=Mark M. |date=2007-02-06 |title=Sequential Monte Carlo without likelihoods |journal=Proceedings of the National Academy of Sciences |language=en |volume=104 |issue=6 |pages=1760–1765 |doi=10.1073/pnas.0607208104 |doi-access=free |issn=0027-8424 |pmc=1794282 |pmid=17264216|bibcode=2007PNAS..104.1760S }}</ref><ref>{{Cite journal |lastname="Peters |first=Gareth |date=2009 |title=Advances in Approximate Bayesian Computation and Trans-Dimensional Sampling Methodology |url=https://www.ssrn.com/abstract=3785580 |journal=SSRN Electronic Journal |language=en |doi=10.2139/ssrn.3785580 |issn=1556-5068|hdl=1959.4/50086 |hdl-access=free }}<"/ref><ref>{{Cite journal |last1=Peters |first1=G. W. |last2=Sisson |first2=S. A. |last3=Fan |first3=Y. |date=2012-11-01 |title=Likelihood-free Bayesian inference for α-stable models |url=https://www.sciencedirect.com/science/article/pii/S0167947310003786 |journal=Computational Statistics & Data Analysis |series=1st issue of the Annals of Computational and Financial Econometrics |volume=56 |issue=11 |pages=3743–3756 |doi=10.1016/j.csda.2010.10.004 |issn=0167-9473}}</ref><ref>{{Cite journal |last1=Peters |first1=Gareth W. |last2=Wüthrich |first2=Mario V. |last3=Shevchenko |first3=Pavel V. |date=2010-08-01 |title=Chain ladder method: Bayesian bootstrap versus classical bootstrap |url=https://www.sciencedirect.com/science/article/pii/S0167668710000351 |journal=Insurance: Mathematics and Economics |volume=47 |issue=1 |pages=36–51 |doi=10.1016/j.insmatheco.2010.03.007 |arxiv=1004.2548 |issn=0167-6687}}</ref>
 
==Method==
Line 30:
where <math>p(\theta|D)</math> denotes the posterior, <math>p(D|\theta)</math> the likelihood, <math>p(\theta)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the [[marginal likelihood]] or the prior predictive probability of the data). Note that the denominator <math>p(D)</math> is normalizing the total probability of the posterior density <math>p(\theta|D)</math> to one and can be calculated that way.
 
The prior represents beliefs or knowledge (such as f.e. physical constraints) about <math>\theta</math> before <math>D</math> is available. Since the prior narrows down uncertainty, the posterior estimates have less variance, but might be biased. For convenience the prior is often specified by choosing a particular distribution among a set of well-known and tractable families of distributions, such that both the evaluation of prior probabilities and random generation of values of <math>\theta</math> are relatively straightforward. For certain kinds of models, it is more pragmatic to specify the prior <math>p(\theta)</math> using a factorization of the joint distribution of all the elements of <math>\theta</math> in terms of a sequence of their conditional distributions. If one is only interested in the relative posterior plausibilities of different values of <math>\theta</math>, the evidence <math>p(D)</math> can be ignored, as it constitutes a [[Normalizing constant|normalising constant]], which cancels for any ratio of posterior probabilities. It remains, however, necessary to evaluate the likelihood <math>p(D|\theta)</math> and the prior <math>p(\theta)</math>. For numerous applications, it is [[computationally expensive]], or even completely infeasible, to evaluate the likelihood,<ref name="Busetto2009a" /> which motivates the use of ABC to circumvent this issue.
 
===The ABC rejection algorithm===
All ABC-based methods approximate the likelihood function by simulations, the outcomes of which are compared with the observed data.<ref>{{Cite journal |last=Hunter |first=Dawn |date=2006-12-08 |title=Bayesian inference, Monte Carlo sampling and operational risk |url=https://www.risk.net/journal-of-operational-risk/2160915/bayesian-inference-monte-carlo-sampling-and-operational-risk |journal=Journal of Operational Risk |volume=1 |issue=3 |pages=27–50 |language=en |doi=10.21314/jop.2006.014}}</ref><ref>{{Cite journal |lastname="Peters |first=Gareth |date=2009 |title=Advances in Approximate Bayesian Computation and Trans-Dimensional Sampling Methodology |url=https://www.ssrn.com/abstract=3785580 |journal=SSRN Electronic Journal |language=en |doi=10.2139/ssrn.3785580 |issn=1556-5068|hdl=1959.4/50086 |hdl-access=free }}<"/ref><ref name="Beaumont2010" /><ref name="Bertorelle" /><ref name="Csillery" /> More specifically, with the ABC rejection algorithm — the most basic form of ABC — a set of parameter points is first sampled from the prior distribution. Given a sampled parameter point <math>\hat{\theta}</math>, a data set <math>\hat{D}</math> is then simulated under the statistical model <math>M</math> specified by <math>\hat{\theta}</math>. If the generated <math>\hat{D}</math> is too different from the observed data <math>D</math>, the sampled parameter value is discarded. In precise terms, <math>\hat{D}</math> is accepted with tolerance <math>\epsilon \ge 0</math> if:
 
:<math>\rho (\hat{D},D)\le\epsilon</math>,
Line 182:
 
===Approximation of the posterior===
A non-negligible <math>\epsilon</math> comes with the price that one samples from <math>p(\theta|\rho(\hat{D},D)\le\epsilon)</math> instead of the true posterior <math>p(\theta|D)</math>. With a sufficiently small tolerance, and a sensible distance measure, the resulting distribution <math>p(\theta|\rho(\hat{D},D)\le\epsilon)</math> should often approximate the actual target distribution <math>p(\theta|D)</math> reasonably well. On the other hand, a tolerance that is large enough that every point in the parameter space becomes accepted will yield a replica of the prior distribution. There are empirical studies of the difference between <math>p(\theta|\rho(\hat{D},D)\le\epsilon)</math> and <math>p(\theta|D)</math> as a function of <math>\epsilon</math>,<ref name="Sisson" /> <ref>{{Cite journal |lastname="Peters |first=Gareth |date=2009 |title=Advances in Approximate Bayesian Computation and Trans-Dimensional Sampling Methodology |url=https:"//www.ssrn.com/abstract=3785580 |journal=SSRN Electronic Journal |language=en |doi=10.2139/ssrn.3785580 |issn=1556-5068|hdl=1959.4/50086 |hdl-access=free }}</ref> and theoretical results for an upper <math>\epsilon</math>-dependent bound for the error in parameter estimates.<ref name="Dean" /> The accuracy of the posterior (defined as the expected quadratic loss) delivered by ABC as a function of <math>\epsilon</math> has also been investigated.<ref name="Fearnhead" /> However, the convergence of the distributions when <math>\epsilon</math> approaches zero, and how it depends on the distance measure used, is an important topic that has yet to be investigated in greater detail. In particular, it remains difficult to disentangle errors introduced by this approximation from errors due to model mis-specification.<ref name="Beaumont2010" />
 
As an attempt to correct some of the error due to a non-zero <math>\epsilon</math>, the usage of local linear weighted regression with ABC to reduce the variance of the posterior estimates has been suggested.<ref name="Beaumont2002" /> The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of nonlinear regression using a feed-forward neural network model.<ref name="Blum2010" /> However, it has been shown that the posterior distributions obtained with these approaches are not always consistent with the prior distribution, which did lead to a reformulation of the regression adjustment that respects the prior distribution.<ref name="Leuenberger2009" />
Line 322:
| [https://abcpy.readthedocs.io/en/latest/ ABCpy]
| Python package for ABC and other likelihood-free inference schemes. Several state-of-the-art algorithms available. Provides quick way to integrate existing generative (from C++, R etc.), user-friendly parallelization using MPI or Spark and summary statistics learning (with neural network or linear regression).
| <ref>{{cite journal |title=ABCpy: A High-Performance Computing Perspective to Approximate Bayesian Computation |last1=Dutta |first1=R |last2=Schoengens |first2=M |last3=Pacchiardi |first3=L |last4=Ummadisingu |first4=A |last5=Widmer |first5=N |last6=Onnela |first6=J. P. |last7=Mira |first7=A|journal=Journal of Statistical Software |author7-link=Antonietta Mira |year=2021|volume=100 |issue=7 |doi=10.18637/jss.v100.i07 |doi-access=free|arxiv=1711.04694 |s2cid=88516340 }}</ref>
|year=2021|volume=100 |issue=7 |doi=10.18637/jss.v100.i07 |arxiv=1711.04694 |s2cid=88516340 }}</ref>
|}
The suitability of individual software packages depends on the specific application at hand, the computer system environment, and the algorithms required.