Metropolis–Hastings algorithm

In mathematics and physics, the Metropolis-Hastings algorithm is an algorithm used to generate a sequence of samples from the probability distribution of one or more variables. The purpose of such a sequence is to approximate the distribution (as with a histogram), or to compute an integral (such as an expected value). This algorithm is an example of a Markov chain Monte Carlo algorithm. It is a generalization of the Metropolis algorithm suggested by Hastings (citation below). The Gibbs sampling algorithm is a special case of the Metropolis-Hastings algorithm.

The Metropolis-Hastings algorithm can draw samples from any probability distribution $p(x)$ , requiring only that the density can be calculated at $x$ . The algorithm generates a set of states $x^{t}$ which is a Markov chain because each state $x^{t}$ depends only on the previous state $x^{t-1}$ . The algorithm depends on the creation of a proposal density $Q(x';x^{t})$ , which depends on the current state $x^{t}$ and which can generate a new proposed sample $x'$ . For example, the proposal density could be a Gaussian function centred on the current state $x^{t}$

Q(x';x^{t})\sim N(x^{t},\sigma ^{2}I)

reading $Q(x';x^{t})$ as the probability of generating $x'$ given the previous value $x^{t}$ .

This proposal density would generate samples centred around the current state with variance $\sigma ^{2}I$ . So we draw a new proposal state $x'$ with probability $Q(x';x^{t})$ and then calculate a value

a=a_{1}a_{2}\,

where

a_{1}={\frac {P(x')}{P(x^{t})}}

is the likelihood ratio between the proposed sample $x'$ and the previous sample $x^{t}$ , and

a_{2}={\frac {Q(x^{t};x')}{Q(x';x^{t})}}

is the ratio of the proposal density in two directions (from $x^{t}$ to $x'$ and vice versa). This is equal to 1 if the proposal density is symmetric. Then the new state $x^{t+1}$ is chosen with the rule

x^{t+1}=\left\{{\begin{matrix}x'&{\mbox{if }}a>1\\x'{\mbox{ with probability }}a,&{\mbox{if }}a<1\end{matrix}}\right.

The Markov chain is started from a random initial value $x^{0}$ and the algorithm is run for a few thousand iterations so that this initial state is "forgotten". These samples, which are discarded, are known as burn-in. The algorithm works best if the proposal density matches the shape of the target distribution $p(x)$ , that is $Q(x';x^{t})\approx p(x')$ , but in most cases this is unknown. If a Gaussian proposal is used the variance parameter $\sigma ^{2}$ has to be tuned during the burn-in period. This is usually done by calculating the acceptance rate, which is the fraction of proposed samples that is accepted in a window of the last $N$ samples. This is usually set to be around 60%. If the proposal steps are too small the chain will mix slowly (i.e., it will move around the space slowly and converge slowly to $p(x)$ ). If the proposal steps are too large the acceptance rate will be very low because the proposals are likely to land in regions of much lower probability density so $a_{1}$ will be very small.

Metropolis–Hastings algorithm

See also