The first ABC-related ideas date back to the 1980s. [[Donald Rubin]], when discussing the interpretation of Bayesian statements in 1984,<ref name="Rubin" /> described a hypothetical sampling mechanism that yields a sample from the [[Posterior probability|posterior distribution]]. This scheme was more of a conceptual [[thought experiment]] to demonstrate what type of manipulations are done when inferring the posterior distributions of parameters. The description of the sampling mechanism coincides exactly with that of the [[#The ABC rejection algorithm|ABC-rejection scheme]], and this article can be considered to be the first to describe approximate Bayesian computation. However, a two-stage [[quincunx]] was constructed by [[Francis Galton]] in the late 1800s that can be seen as a physical implementation of an [[#The ABC rejection algorithm|ABC-rejection scheme]] for a single unknown (parameter) and a single observation.<ref name="Stigler2010">see figure 5 in {{cite journal|last1=Stigler|first1=Stephen M.|title=Darwin, Galton and the Statistical Enlightenment|journal=Journal of the Royal Statistical Society. Series A (Statistics in Society)|volume=173|issue=3|year=2010|pages=469–482|issn=0964-1998|doi=10.1111/j.1467-985X.2010.00643.x}}</ref> Another prescient point was made by Rubin when he argued that in Bayesian inference, applied statisticians should not settle for analytically tractable models only, but instead consider computational methods that allow them to estimate the posterior distribution of interest. This way, a wider range of models can be considered. These arguments are particularly relevant in the context of ABC.
In 1984, [[Peter Diggle]] and [[Richard Gratton<ref name="Diggle" />]] suggested using a systematic simulation scheme to approximate the likelihood function in situations where its analytic form is [[Intractability (complexity)|intractable]].<ref name="Diggle" /> Their method was based on defining a grid in the parameter space and using it to approximate the likelihood by running several simulations for each grid point. The approximation was then improved by applying smoothing techniques to the outcomes of the simulations. While the idea of using simulation for hypothesis testing was not new,<ref name="Bartlett63" /><ref name="Hoel71" /> Diggle and Gratton seemingly introduced the first procedure using simulation to do statistical inference under a circumstance where the likelihood is intractable.
Although Diggle and Gratton's approach had opened a new frontier, their method was not yet exactly identical to what is now known as ABC, as it aimed at approximating the likelihood rather than the posterior distribution. An article of [[Simon Tavaré]] ''etand al.''<ref name="Tavare" />co-authors was first to propose an ABC algorithm for posterior inference.<ref name="Tavare" /> In their seminal work, inference about the genealogy of DNA sequence data was considered, and in particular the problem of deciding the posterior distribution of the time to the [[most recent common ancestor]] of the sampled individuals. Such inference is analytically intractable for many demographic models, but the authors presented ways of simulating coalescent trees under the putative models. A sample from the posterior of model parameters was obtained by accepting/rejecting proposals based on comparing the number of segregating sites in the synthetic and real data. This work was followed by an applied study on modeling the variation in human Y chromosome by [[Jonathan K. Pritchard]] ''etand alco-authors using the ABC method.''<ref name="Pritchard1999" /> using the ABC method. Finally, the term approximate Bayesian computation was established by Mark Beaumont ''etand al.''co-authors,<ref name="Beaumont2002" /> extending further the ABC methodology and discussing the suitability of the ABC-approach more specifically for problems in population genetics. Since then, ABC has spread to applications outside population genetics, such as systems biology, epidemiology, and [[phylogeography]].