where <math>p(\theta|D)</math> denotes the posterior, <math>p(D|\theta)</math> the likelihood, <math>p(\theta)</math> the prior, and <math>p(D)</math> the evidence (also referred to as the [[marginal likelihood]] or the prior predictive probability of the data). Note that the denominator <math>p(D)</math> is normalizing the total probability of the posterior density <math>p(\theta|D)</math> to one and can be calculated that way.
The prior represents beliefs or knowledge (such as f.e.g. physical constraints) about <math>\theta</math> before <math>D</math> is available. Since the prior narrows down uncertainty, the posterior estimates have less variance, but might be biased. For convenience the prior is often specified by choosing a particular distribution among a set of well-known and tractable families of distributions, such that both the evaluation of prior probabilities and random generation of values of <math>\theta</math> are relatively straightforward. For certain kinds of models, it is more pragmatic to specify the prior <math>p(\theta)</math> using a factorization of the joint distribution of all the elements of <math>\theta</math> in terms of a sequence of their conditional distributions. If one is only interested in the relative posterior plausibilities of different values of <math>\theta</math>, the evidence <math>p(D)</math> can be ignored, as it constitutes a [[Normalizing constant|normalising constant]], which cancels for any ratio of posterior probabilities. It remains, however, necessary to evaluate the likelihood <math>p(D|\theta)</math> and the prior <math>p(\theta)</math>. For numerous applications, it is [[computationally expensive]], or even completely infeasible, to evaluate the likelihood,<ref name="Busetto2009a" /> which motivates the use of ABC to circumvent this issue.