Metropolis–Hastings algorithm: Difference between revisions

Content deleted Content added
Removed statement about "controversy" about origins since sources largely agree on main points. Added first names to clarify which Rosenbluth is referred to.
unencyclopedic language, WP is not a text book, or how-to, addressing the audience
Line 12:
 
==Intuition==
The Metropolis–Hastings algorithm can draw samples from any [[probability distribution]] with [[probability density]] <math>P(x)</math>, provided that we know a function <math>f(x)</math> is known, proportional to the [[Probability density function|density]] <math>P</math> and the values of <math>f(x)</math> can be calculated. The requirement that <math>f(x)</math> must only be proportional to the density, rather than exactly equal to it, makes the Metropolis–Hastings algorithm particularly useful, because calculating the necessary normalization factor is often extremely difficult in practice.
 
The Metropolis–Hastings algorithm works by generating a sequence of sample values in such a way that, as more and more sample values are produced, the distribution of values more closely approximates the desired distribution. These sample values are produced iteratively, with the distribution of the next sample being dependent only on the current sample value (thus making the sequence of samples into a [[Markov chain]]). Specifically, at each iteration, the algorithm picks a candidate for the next sample value based on the current sample value. Then, with some probability, the candidate is either accepted (in which case the candidate value is used in the next iteration) or rejected (in which case the candidate value is discarded, and current value is reused in the next iteration)—the probability of acceptance is determined by comparing the values of the function <math>f(x)</math> of the current and candidate sample values with respect to the desired distribution.
Line 27:
# For each iteration ''t'':
#* ''Generate'' a candidate <math>x'</math> for the next sample by picking from the distribution <math>g(x'\mid x_t)</math>.
#* ''Calculate'' the ''acceptance ratio'' <math>\alpha = f(x')/f(x_t)</math>, which will be used to decide whether to accept or reject the candidate{{efn|In the original paper by Metropolis et al. (1953), <math>f</math> was actually the [[Boltzmann distribution]], as it was applied to physical systems in the context of [[statistical mechanics]] (e.g., a maximal-entropy distribution of microstates for a given temperature at thermal equilibrium). Consequently, the acceptance ratio was itself an exponential of the difference in the parameters of the numerator and denominator of this ratio.}}. Because ''f'' is proportional to the density of ''P'', weit havefollows that <math>\alpha = f(x')/f(x_t) = P(x')/P(x_t)</math>.
#* ''Accept or reject'':
#** Generate a uniform random number <math>u \in [0, 1]</math>.
Line 33:
#** If <math>u > \alpha</math>, then ''reject'' the candidate and set <math>x_{t+1} = x_t</math> instead.
 
This algorithm proceeds by randomly attempting to move about the sample space, sometimes accepting the moves and sometimes remaining in place. Note that theThe acceptance ratio <math>\alpha</math> indicates how probable the new proposed sample is with respect to the current sample, according to the distribution whose density is <math>P(x)</math>. If wean attempt is made to move to a point that is more probable than the existing point (i.e. a point in a higher-density region of <math>P(x)</math> corresponding to an <math>\alpha > 1 \ge u</math>), wethe algorithm will always accept the move. However, if wean attempt is made to move to a less probable point, wethe willmove is sometimes reject the moverejected, and the larger the relative drop is in probability, the more likely we are to reject the new point. will be rejected. Thus, wethe willalgorithm tendtends to stay in (and return large numbers of samples from) high-density regions of <math>P(x)</math>, while only occasionally visiting low-density regions. Intuitively, this is whythe reason that this algorithm works and returns samples that follow the desired distribution with density <math>P(x)</math>.
 
Compared with an algorithm like [[adaptive rejection sampling]]<ref name=":0">{{Cite journal |last=Gilks |first=W. R. |last2=Wild |first2=P. |date=1992-01-01 |title=Adaptive Rejection Sampling for Gibbs Sampling |journal=Journal of the Royal Statistical Society. Series C (Applied Statistics) |volume=41 |issue=2 |pages=337–348 |doi=10.2307/2347565 |jstor=2347565}}</ref> that directly generates independent samples from a distribution, Metropolis–Hastings and other MCMC algorithms have a number of disadvantages:
Line 62:
: <math>P(x'\mid x) = g(x' \mid x) A(x', x).</math>
 
Inserting this relation in the previous equation, we haveprovides
 
: <math>\frac{A(x', x)}{A(x, x')} = \frac{P(x')}{P(x)}\frac{g(x \mid x')}{g(x' \mid x)}.</math>
Line 107:
 
==Step-by-step instructions==
SupposeGiven the assumption that the most recent value sampled is <math>x_t</math>. To follow the Metropolis–Hastings algorithm, we next draw a new proposal state <math>x'</math> is drawn next, with probability density <math>g(x' \mid x_t)</math> and calculate a value
 
: <math>a = a_1 a_2,</math>