Metropolis–Hastings algorithm: Difference between revisions

Content deleted Content added
m added category "sampling techniques"
How was this not in Category:Algorithms; copyeditting; changed pdf to lowercase; etc.
Line 1:
[[Image:Metropolis_hastings_algorithm.png|thumb|350px|The Sampling [[probability distribution | distribution]] ''<math>Q''</math> determines the next point to move to in the [[random walk]].]]
In [[mathematics]] and [[physics]],
the '''Metropolis-Hastings algorithm''' is an [[algorithm]] to generate a sequence of samples from the [[joint distribution]] of two or more variables.
The purpose of such a sequence is to approximate the joint distribution (as with a histogram), or to compute an integral (such as an expected value).
This algorithm is an example of a [[Markov chain Monte Carlo]] algorithm.
It is a generalization of the Metropolis algorithm suggested by Hastings (citation below).
The [[Gibbs sampling]] algorithm is a special case of the Metropolis-Hastings algorithm.
 
The Metropolis-Hastings algorithm can draw samples from any [[probability distribution]]
''P<math>p(x)''</math>, requiring only that the density can be calculated at ''<math>x''</math>.
The algorithm generates a set of states x<supmath>x^t</supmath> which is a [[Markov chain]] because each state x<supmath>x^t</supmath> depends only on the previous state x<supmath>x^{t-1}</supmath>.
The algorithm depends on the creation of a ''proposal density'' <math>Q(x'; x<sup>^t)</supmath>), which depends on the current state x<supmath>x^t</supmath> and which can generate a new proposed sample <math>x'</math>.
For example, the proposal density could be a [[Gaussian function]] centred on the current state x<supmath>x^t</supmath>
 
:<math>
Line 13 ⟶ 17:
</math>
 
reading <math>Q(x'; x<sup>^t)</supmath>) as the probability of generating <math>x'</math> given the previous value x<supmath>x^t</supmath>.
 
This proposal density would generate samples centred around the current state with variance &sigma;<supmath>\sigma^2 I</supmath>I.
So we draw a new proposal state <math>x'</math> with probability <math>Q(x'; x<sup>^t)</supmath>) and then calculate a value
 
:<math>
Line 27 ⟶ 32:
</math>
 
is the likelihood ratio between the proposed sample <math>x'</math> and the previous sample x<supmath>x^t</supmath>, and
 
:<math>
Line 33 ⟶ 38:
</math>
 
is the ratio of the proposal density in two directions (from x<supmath>x^t</supmath> to <math>x'</math> and <em>''vice versa</em>'').
This is equal to 1 if the proposal density is symmetric.
Then the new state x<supmath>x^{t+1}</supmath> is chosen with the rule
 
:<math>
x^{t+1}
x^{t+1}=\left\{\begin{matrix} x' & \mbox{if }a > 1 \\ x'\mbox{ with probability }a, & \mbox{if }a < 1 \end{matrix}\right.
=
\left
\{
\begin{matrix}
x' & \mbox{if }a > 1 \\
x^{t+1}=\left\{\begin{matrix} x' & \mbox{if }a > 1 \\ x'\mbox{ with probability }a, & \mbox{if }a < 1 \end{matrix}\right.
\end{matrix}
\right.
</math>
 
The Markov chain is started from a random initial value x<supmath>x^0</supmath> and the algorithm is run for a few thousand iterations so that this initial state is "forgotten".
These samples, which are discarded, are known as <em>''burn-in</em>''.
The algorithm works best if the proposal density matches the shape of the target distribution P<math>p(x)</math>, that is <math>Q(x'; x<sup>^t</sup>) &asymp;\approx Pp(x')</math>, but in most cases this is unknown.
If a Gaussian proposal is used the variance parameter &sigma;<supmath>\sigma^2</supmath> has to be tuned during the burn-in period.
This is usually done by calculating the ''acceptance rate'', which is the fraction of proposed samples that is accepted in a window of the last <math>N</math> samples.
This is usually set to be around 60%.
If the proposal steps are too small the chain will ''mix slowly'' (i.e., it will move around the space slowly and converge slowly to P<math>p(x)</math>).
If the proposal steps are too large the acceptance rate will be very low because the proposals are likely to land in regions of much lower probability density so a<submath>1a_1</submath> will be very small.
 
== References ==
 
* Chib, Siddhartha and Edward Greenberg: "Understanding the Metropolis&ndash;Hastings Algorithm". American Statistician, 49, 327&ndash;335, 1995
* W.K. Hastings. "Monte Carlo Sampling Methods Using Markov Chains and Their Applications", ''Biometrika'', 57:97-109, 1970.
* N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller. "Equations of State Calculations by Fast Computing Machines". ''Journal of Chemical Physics'', 21:1087-1091, 1953.
 
[[Category:Algorithms]]
[[Category:Probability and statistics]]
[[Category:Physics]]
[[Category:Sampling techniques]]