==Step-by-step instructions==
Suppose that the most recent value sampled is <math>x_t\,</math>. To follow the Metropolis–Hastings algorithm, we next draw a new proposal state <math>x'\,</math> with probability density <math>g(x' | x_t)\,</math>, and calculate a value
: <math>a = a_1 a_2,</math>
a = a_1 a_2\,
</math>
where
: <math>a_1 = \frac{P(x')}{P(x_t)} \,\!</math>▼
:<math>
▲a_1 = \frac{P(x')}{P(x_t)} \,\!
</math>
is the probability (e.g., Bayesian posterior) ratio between the proposed sample <math>x'\,</math> and the previous sample <math>x_t\,</math>, and
: <math>a_2 = \frac{g(x_t | x')}{g(x' | x_t)} </math>▼
:<math>
▲a_2 = \frac{g(x_t | x')}{g(x' | x_t)}
</math>
is the ratio of the proposal density in two directions (from <math>x_t\,</math> to <math>x'\,</math> and ''vice versa''conversely).
This is equal to 1 if the proposal density is symmetric.
Then the new state <math>\displaystyle x_{t+1}</math> is chosen according to the following rules.
: If <math>a \geq 1</math>:
:: <math>x_{t+1} = x'</math>,
: else:
\mbox{If } a \geq 1: & \\
&:: <math>x_{t+1} = x',
x' & \ mboxtext{ with probability } a , \\ ▼
</math>
x_t & \ mboxtext{ with probability } 1 - a. ▼
:<math>
\begin{matrix}
\mbox{else} & \\
& x_{t+1} = \left\{
\begin{array}{lr}
▲ x' & \mbox{ with probability }a \\
▲ x_t & \mbox{ with probability }1-a.
\end{array}
\right.
\end{matrix}
</math>
The Markov chain is started from an arbitrary initial value <math>\displaystyle x_0</math>, and the algorithm is run for many iterations until this initial state is "forgotten".
These samples, which are discarded, are known as ''burn-in''. The remaining set of accepted values of <math>x</math> represent a [[Sample (statistics)|sample]] from the distribution <math>P(x)</math>.
The algorithm works best if the proposal density matches the shape of the target distribution <math>\displaystyle P(x)</math>, from which direct sampling is difficult, that is <math>g(x' | x_t) \approx P(x') \,\!</math>.
If a Gaussian proposal density <math>\displaystyle g</math> is used, the variance parameter <math>\displaystyle \sigma^2</math> has to be tuned during the burn-in period.
This is usually done by calculating the ''acceptance rate'', which is the fraction of proposed samples that is accepted in a window of the last <math>\displaystyle N</math> samples.
The desired acceptance rate depends on the target distribution, however it has been shown theoretically that the ideal acceptance rate for a one-dimensional Gaussian distribution is approxabout 50%, decreasing to approxabout 23% for an <math>\displaystyle N</math>-dimensional Gaussian target distribution.<ref name=Roberts/>
If <math>\displaystyle \sigma^2</math> is too small, the chain will ''mix slowly'' (i.e., the acceptance rate will be high, but successive samples will move around the space slowly, and the chain will converge only slowly to <math>\displaystyle P(x)</math>). On the other hand,
if <math>\displaystyle \sigma^2</math> is too large, the acceptance rate will be very low because the proposals are likely to land in regions of much lower probability density, so <math>\displaystyle a_1</math> will be very small, and again the chain will converge very slowly. One typically tunes the proposal distribution so that the algorithms accepts on the order of 30% of all samples --– in line with the theoretical estimates mentioned in the previous paragraph.
[[Image:3dRosenbrock.png|thumb|350px|The result of three [[Markov chain]]s running on the 3D [[Rosenbrock function]] using the Metropolis-HastingsMetropolis–Hastings algorithm. The algorithm samples from regions where the [[posterior probability]] is high, and the chains begin to mix in these regions. The approximate position of the maximum has been illuminated. Note that the red points are the ones that remain after the burn-in process. The earlier ones have been discarded.]]
==See also==
|