Approximate Bayesian computation: Difference between revisions

Content deleted Content added
Bender the Bot (talk | contribs)
m Software: HTTP to HTTPS for SourceForge
 
(4 intermediate revisions by 4 users not shown)
Line 19:
Approximate Bayesian computation can be understood as a kind of Bayesian version of [[indirect inference]].<ref>{{cite arXiv | eprint=1803.01999 | author1=Christopher C Drovandi | title=ABC and Indirect Inference | date=2018 | class=stat.CO }}</ref><ref name="Peters 2009">{{Cite journal |last=Peters |first=Gareth |date=2009 |title=Advances in Approximate Bayesian Computation and Trans-Dimensional Sampling Methodology |url=https://www.ssrn.com/abstract=3785580 |journal=SSRN Electronic Journal |language=en |doi=10.2139/ssrn.3785580 |issn=1556-5068|hdl=1959.4/50086 |hdl-access=free }}</ref>
 
Several efficient Monte Carlo based approaches have been developed to perform sampling from the ABC posterior distribution for purposes of estimation and prediction problems. A popular choice is the SMC Samplers algorithm <ref>{{Cite journal |last1=Del Moral |first1=Pierre |last2=Doucet |first2=Arnaud |last3=Jasra |first3=Ajay |date=2006 |title=Sequential Monte Carlo Samplers |url=https://www.jstor.org/stable/3879283 |journal=Journal of the Royal Statistical Society. Series B (Statistical Methodology) |volume=68 |issue=3 |pages=411–436 |doi=10.1111/j.1467-9868.2006.00553.x |jstor=3879283 |issn=1369-7412|arxiv=cond-mat/0212648 }}</ref><ref>{{Cite journal |last1=Del Moral |first1=Pierre |last2=Doucet |first2=Arnaud |last3=Peters |first3=Gareth |date=2004 |title=Sequential Monte Carlo Samplers CUED Technical Report |url=https://www.ssrn.com/abstract=3841065 |journal=SSRN Electronic Journal |language=en |doi=10.2139/ssrn.3841065 |issn=1556-5068|url-access=subscription }}</ref><ref>{{Cite journal |last=Peters |first=Gareth |date=2005 |title=Topics in Sequential Monte Carlo Samplers |url=https://www.ssrn.com/abstract=3785582 |journal=SSRN Electronic Journal |language=en |doi=10.2139/ssrn.3785582 |issn=1556-5068|url-access=subscription }}</ref> adapted to the ABC context in the method (SMC-ABC).<ref>{{Cite journal |last1=Sisson |first1=S. A. |last2=Fan |first2=Y. |last3=Tanaka |first3=Mark M. |date=2007-02-06 |title=Sequential Monte Carlo without likelihoods |journal=Proceedings of the National Academy of Sciences |language=en |volume=104 |issue=6 |pages=1760–1765 |doi=10.1073/pnas.0607208104 |doi-access=free |issn=0027-8424 |pmc=1794282 |pmid=17264216|bibcode=2007PNAS..104.1760S }}</ref><ref name="Peters 2009"/><ref>{{Cite journal |last1=Peters |first1=G. W. |last2=Sisson |first2=S. A. |last3=Fan |first3=Y. |date=2012-11-01 |title=Likelihood-free Bayesian inference for α-stable models |url=https://www.sciencedirect.com/science/article/pii/S0167947310003786 |journal=Computational Statistics & Data Analysis |series=1st issue of the Annals of Computational and Financial Econometrics |volume=56 |issue=11 |pages=3743–3756 |doi=10.1016/j.csda.2010.10.004 |issn=0167-9473|url-access=subscription }}</ref><ref>{{Cite journal |last1=Peters |first1=Gareth W. |last2=Wüthrich |first2=Mario V. |last3=Shevchenko |first3=Pavel V. |date=2010-08-01 |title=Chain ladder method: Bayesian bootstrap versus classical bootstrap |url=https://www.sciencedirect.com/science/article/pii/S0167668710000351 |journal=Insurance: Mathematics and Economics |volume=47 |issue=1 |pages=36–51 |doi=10.1016/j.insmatheco.2010.03.007 |arxiv=1004.2548 |issn=0167-6687}}</ref>
 
==Method==
Line 30:
where <math>p(\theta|D)</math> denotes the posterior, <math>p(D|\theta)</math> the likelihood, <math>p(\theta)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the [[marginal likelihood]] or the prior predictive probability of the data). Note that the denominator <math>p(D)</math> is normalizing the total probability of the posterior density <math>p(\theta|D)</math> to one and can be calculated that way.
 
The prior represents beliefs or knowledge (such as f.e.g. physical constraints) about <math>\theta</math> before <math>D</math> is available. Since the prior narrows down uncertainty, the posterior estimates have less variance, but might be biased. For convenience the prior is often specified by choosing a particular distribution among a set of well-known and tractable families of distributions, such that both the evaluation of prior probabilities and random generation of values of <math>\theta</math> are relatively straightforward. For certain kinds of models, it is more pragmatic to specify the prior <math>p(\theta)</math> using a factorization of the joint distribution of all the elements of <math>\theta</math> in terms of a sequence of their conditional distributions. If one is only interested in the relative posterior plausibilities of different values of <math>\theta</math>, the evidence <math>p(D)</math> can be ignored, as it constitutes a [[Normalizing constant|normalising constant]], which cancels for any ratio of posterior probabilities. It remains, however, necessary to evaluate the likelihood <math>p(D|\theta)</math> and the prior <math>p(\theta)</math>. For numerous applications, it is [[computationally expensive]], or even completely infeasible, to evaluate the likelihood,<ref name="Busetto2009a" /> which motivates the use of ABC to circumvent this issue.
 
===The ABC rejection algorithm===
All ABC-based methods approximate the likelihood function by simulations, the outcomes of which are compared with the observed data.<ref>{{Cite journal |last=Hunter |first=Dawn |date=2006-12-08 |title=Bayesian inference, Monte Carlo sampling and operational risk |url=https://www.risk.net/journal-of-operational-risk/2160915/bayesian-inference-monte-carlo-sampling-and-operational-risk |journal=Journal of Operational Risk |volume=1 |issue=3 |pages=27–50 |language=en |doi=10.21314/jop.2006.014|url-access=subscription }}</ref><ref name="Peters 2009"/><ref name="Beaumont2010" /><ref name="Bertorelle" /><ref name="Csillery" /> More specifically, with the ABC rejection algorithm — the most basic form of ABC — a set of parameter points is first sampled from the prior distribution. Given a sampled parameter point <math>\hat{\theta}</math>, a data set <math>\hat{D}</math> is then simulated under the statistical model <math>M</math> specified by <math>\hat{\theta}</math>. If the generated <math>\hat{D}</math> is too different from the observed data <math>D</math>, the sampled parameter value is discarded. In precise terms, <math>\hat{D}</math> is accepted with tolerance <math>\epsilon \ge 0</math> if:
 
:<math>\rho (\hat{D},D)\le\epsilon</math>,
Line 296:
| <ref name="Wegmann2010" />
|-
| [httphttps://msbayes.sourceforge.net/ msBayes]
| Open source software package consisting of several C and R programs that are run with a Perl "front-end". Hierarchical coalescent models. Population genetic data from multiple co-distributed species.
| <ref name="Hickerson07" />
Line 338:
<ref name="Bharti">{{cite journal | last1 = Bharti | first1 = A | last2 = Briol | first2 = F.-X. | last3 = Pedersen | first3 = T | year = 2021 | title = A General Method for Calibrating Stochastic Radio Channel Models with Kernels | journal = IEEE Transactions on Antennas and Propagation | volume = 70 | issue = 6 | pages = 3986–4001 | doi=10.1109/TAP.2021.3083761| arxiv = 2012.09612 | s2cid = 233880538 }}</ref>
<ref name="Bertorelle">{{cite journal | last1 = Bertorelle | first1 = G | last2 = Benazzo | first2 = A | last3 = Mona | first3 = S | year = 2010 | title = ABC as a flexible framework to estimate demography over space and time: some cons, many pros | journal = Molecular Ecology | volume = 19 | issue = 13| pages = 2609–2625 | doi=10.1111/j.1365-294x.2010.04690.x| pmid = 20561199 | bibcode = 2010MolEc..19.2609B | s2cid = 12129604 | doi-access = free }}</ref>
<ref name="Csillery">{{cite journal | last1 = Csilléry | first1 = K | last2 = Blum | first2 = MGB | last3 = Gaggiotti | first3 = OE | last4 = François | first4 = O | year = 2010 | title = Approximate Bayesian Computation (ABC) in practice | journal = Trends in Ecology & Evolution | volume = 25 | issue = 7| pages = 410–418 | doi=10.1016/j.tree.2010.04.001| pmid = 20488578 | bibcode = 2010TEcoE..25..410C | s2cid = 13957079 }}</ref>
<ref name="Rubin">{{cite journal | last1 = Rubin | first1 = DB | year = 1984 | title = Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician | journal = The Annals of Statistics | volume = 12 | issue = 4| pages = 1151–1172 | doi=10.1214/aos/1176346785| doi-access = free }}</ref>
<ref name="Marjoram">{{cite journal | last1 = Marjoram | first1 = P | last2 = Molitor | first2 = J | last3 = Plagnol | first3 = V | last4 = Tavare | first4 = S | year = 2003 | title = Markov chain Monte Carlo without likelihoods | journal = Proc Natl Acad Sci U S A | volume = 100 | issue = 26| pages = 15324–15328 | doi=10.1073/pnas.0306899100| pmid = 14663152 | pmc = 307566 | bibcode = 2003PNAS..10015324M | doi-access = free }}</ref>
Line 368:
<ref name="Templeton2010">{{cite journal | last1 = Templeton | first1 = AR | year = 2010 | title = Coherent and incoherent inference in phylogeography and human evolution | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 107 | issue = 14| pages = 6376–6381 | doi=10.1073/pnas.0910647107| pmid = 20308555 | pmc = 2851988 | bibcode = 2010PNAS..107.6376T| doi-access = free }}</ref>
<!--<ref name="Fagundes">{{cite journal | last1 = Fagundes | first1 = NJR | last2 = Ray | first2 = N | last3 = Beaumont | first3 = M | last4 = Neuenschwander | first4 = S | last5 = Salzano | first5 = FM |display-authors=et al | year = 2007 | title = Statistical evaluation of alternative models of human evolution | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 104 | pages = 17614–17619 | doi=10.1073/pnas.0708280104 | pmid=17978179 | pmc=2077041}}</ref>-->
<!-- <ref name="Gelfand">{{cite journal | last1 = Gelfand | first1 = AE | last2 = Dey | first2 = DK | year = 1994 | title = Bayesian model choice: Asymptotics and exact calculations | journal = J R StatistStat Soc Ser B | volume = 56 | pages = 501–514 }}</ref> -->
<!-- <ref name="Bernardo">Bernardo JM, Smith AFM (1994) Bayesian Theory: John Wiley.</ref> -->
<!-- <ref name="Box">Box G, Draper NR (1987) Empirical Model-Building and Response Surfaces: John Wiley and Sons, Oxford.</ref> -->
Line 382:
<ref name="Dean">{{cite arXiv | eprint=1103.5399 | last1=Dean | first1=Thomas A. | last2=Singh | first2=Sumeetpal S. | last3=Jasra | first3=Ajay | last4=Peters | first4=Gareth W. | title=Parameter Estimation for Hidden Markov Models with Intractable Likelihoods | date=2011 | class=math.ST }}</ref>
<ref name="Fearnhead">{{cite arXiv | eprint=1004.1112 | last1=Fearnhead | first1=Paul | last2=Prangle | first2=Dennis | title=Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC | date=2010 | class=stat.ME }}</ref>
<ref name="Wilkinson">{{cite journal | arxiv=0811.3355 | doi=10.1515/sagmb-2013-0010 | title=Approximate Bayesian computation (ABC) gives exact results under the assumption of model error | date=2013 | last1=Wilkinson | first1=Richard David | journal=Statistical Applications in Genetics and Molecular Biology | volume=12 | issue=2 | pmid=23652634 }}</ref>
<ref name="Nunes">{{cite journal | last1 = Nunes | first1 = MA | last2 = Balding | first2 = DJ | year = 2010 | title = On optimal selection of summary statistics for approximate Bayesian computation | journal = Stat Appl Genet Mol Biol | volume = 9 | page = Article 34 | doi=10.2202/1544-6115.1576| pmid = 20887273 | s2cid = 207319754 }}</ref>
<ref name="Joyce">{{cite journal | last1 = Joyce | first1 = P | last2 = Marjoram | first2 = P | year = 2008 | title = Approximately sufficient statistics and bayesian computation | journal = Stat Appl Genet Mol Biol | volume = 7 | issue = 1| page = Article 26 | doi=10.2202/1544-6115.1389| pmid = 18764775 | s2cid = 38232110 }}</ref>
Line 389:
<ref name="Toni">{{cite journal | last1 = Toni | first1 = T | last2 = Welch | first2 = D | last3 = Strelkowa | first3 = N | last4 = Ipsen | first4 = A | last5 = Stumpf | first5 = M | year = 2007 | title = Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems | journal = J R Soc Interface | volume = 6 | issue = 31| pages = 187–202 | pmid = 19205079 | pmc = 2658655 | doi = 10.1098/rsif.2008.0172 }}</ref>
<ref name="Tavare">{{cite journal | last1 = Tavaré | first1 = S | last2 = Balding | first2 = DJ | last3 = Griffiths | first3 = RC | last4 = Donnelly | first4 = P | year = 1997 | title = Inferring Coalescence Times from DNA Sequence Data | journal = Genetics | volume = 145 | issue = 2 | pages = 505–518 | doi = 10.1093/genetics/145.2.505 | pmc = 1207814 | pmid=9071603}}</ref>
<ref name="Toni2010">{{cite journal | doi=10.1093/bioinformatics/btp619 | title=Simulation-based model selection for dynamical systems in systems and population biology | date=2010 | last1=Toni | first1=Tina | last2=Stumpf | first2=Michael P. H. | journal=Bioinformatics | volume=26 | issue=1 | pages=104–110 | pmid=19880371 | pmc=2796821 | arxiv=0911.1705 }}</ref>
.<ref name="Pritchard1999">{{cite journal | last1 = Pritchard | first1 = JK | last2 = Seielstad | first2 = MT | last3 = Perez-Lezaun | first3 = A |display-authors=et al | year = 1999 | title = Population Growth of Human Y Chromosomes: A Study of Y Chromosome Microsatellites | journal = Molecular Biology and Evolution | volume = 16 | issue = 12| pages = 1791–1798 | doi=10.1093/oxfordjournals.molbev.a026091| pmid = 10605120 | doi-access = free }}</ref>
<ref name="Diggle">{{cite journal | last1 = Diggle | first1 = PJ | year = 1984 | title = Monte Carlo Methods of Inference for Implicit Statistical Models | journal = Journal of the Royal Statistical Society, Series B | volume = 46 | issue = 2 | pages = 193–227 | doi = 10.1111/j.2517-6161.1984.tb01290.x }}</ref>
Line 413:
<ref name="Klinger2017">Klinger, E.; Rickert, D.; Hasenauer, J. (2017). pyABC: distributed, likelihood-free inference.</ref>
<ref name="Salvatier2016">{{cite journal | doi=10.7717/peerj-cs.55 | doi-access=free | title=Probabilistic programming in Python using PyMC3 | date=2016 | last1=Salvatier | first1=John | last2=Wiecki | first2=Thomas V. | last3=Fonnesbeck | first3=Christopher | journal=PeerJ Computer Science | volume=2 | pages=e55 | arxiv=1507.08050 }}</ref>
<ref name="Prangle">{{cite journal | doi=10.1515/sagmb-2013-0012 | title=Semi-automatic selection of summary statistics for ABC model choice| date=2014 | last1=Prangle | first1=Dennis | last2=Fearnhead | first2=Paul | last3=Cox | first3=Murray P. | last4=Biggs | first4=Patrick J. | last5=French | first5=Nigel P. | journal=Stat Appl Genet Mol Biol | volume=13| issue=1| pages=67–82 | pmid=24323893| arxiv=1302.5624 }}</ref>
}}