Revision as of 02:35, 10 February 2005 edit Forderud (talk \| contribs) Extended confirmed users 3,474 edits m added category "sampling techniques" ← Previous edit		Revision as of 18:18, 25 March 2005 edit undo Cburnett (talk \| contribs) Administrators 30,708 edits How was this not in Category:Algorithms; copyeditting; changed pdf to lowercase; etc. Next edit →
Line 1: [[Image:Metropolis_hastings_algorithm.png\|thumb\|350px\|The Sampling [[probability distribution \| distribution]] ''<math>Q''</math> determines the next point to move to in the [[random walk]].]] In [[mathematics]] and [[physics]], the '''Metropolis-Hastings algorithm''' is an [[algorithm]] to generate a sequence of samples from the [[joint distribution]] of two or more variables. The purpose of such a sequence is to approximate the joint distribution (as with a histogram), or to compute an integral (such as an expected value). This algorithm is an example of a [[Markov chain Monte Carlo]] algorithm. It is a generalization of the Metropolis algorithm suggested by Hastings (citation below). The [[Gibbs sampling]] algorithm is a special case of the Metropolis-Hastings algorithm. The Metropolis-Hastings algorithm can draw samples from any [[probability distribution]] ~~''P~~<math>p(x)''</math>, requiring only that the density can be calculated at ''<math>x''</math>. The algorithm generates a set of states x<~~sup~~math>x^t</~~sup~~math> which is a [[Markov chain]] because each state x<~~sup~~math>x^t</~~sup~~math> depends only on the previous state x<~~sup~~math>x^{t-1}</~~sup~~math>. The algorithm depends on the creation of a ''proposal density'' <math>Q(x'; x~~<sup>~~^t)</~~sup~~math>), which depends on the current state x<~~sup~~math>x^t</~~sup~~math> and which can generate a new proposed sample <math>x'</math>. For example, the proposal density could be a [[Gaussian function]] centred on the current state x<~~sup~~math>x^t</~~sup~~math> :<math> Line 13 ⟶ 17: </math> reading <math>Q(x'; x~~<sup>~~^t)</~~sup~~math>) as the probability of generating <math>x'</math> given the previous value x<~~sup~~math>x^t</~~sup~~math>. This proposal density would generate samples centred around the current state with variance ~~σ~~<~~sup~~math>\sigma^2 I</~~sup~~math>I. So we draw a new proposal state <math>x'</math> with probability <math>Q(x'; x~~<sup>~~^t)</~~sup~~math>) and then calculate a value :<math> Line 27 ⟶ 32: </math> is the likelihood ratio between the proposed sample <math>x'</math> and the previous sample x<~~sup~~math>x^t</~~sup~~math>, and :<math> Line 33 ⟶ 38: </math> is the ratio of the proposal density in two directions (from x<~~sup~~math>x^t</~~sup~~math> to <math>x'</math> and ~~<em>~~''vice versa~~</em>~~''). This is equal to 1 if the proposal density is symmetric. Then the new state x<~~sup~~math>x^{t+1}</~~sup~~math> is chosen with the rule :<math> x^{t+1} x^{t+1}=\left\{\begin{matrix} x' & \mbox{if }a > 1 \\ x'\mbox{ with probability }a, & \mbox{if }a < 1 \end{matrix}\right.▼ = \left \{ \begin{matrix} x' & \mbox{if }a > 1 \\ ▲~~x^{t+1}=\left\{\begin{matrix}~~ ~~x' & \mbox{if }a > 1 \\~~ x'\mbox{ with probability }a, & \mbox{if }a < 1 ~~\end{matrix}\right.~~ \end{matrix} \right. </math> The Markov chain is started from a random initial value x<~~sup~~math>x^0</~~sup~~math> and the algorithm is run for a few thousand iterations so that this initial state is "forgotten". These samples, which are discarded, are known as ~~<em>~~''burn-in~~</em>~~''. The algorithm works best if the proposal density matches the shape of the target distribution P<math>p(x)</math>, that is <math>Q(x'; x~~<sup>~~^t~~</sup>~~) ~~≈~~\approx Pp(x')</math>, but in most cases this is unknown. If a Gaussian proposal is used the variance parameter ~~σ~~<~~sup~~math>\sigma^2</~~sup~~math> has to be tuned during the burn-in period. This is usually done by calculating the ''acceptance rate'', which is the fraction of proposed samples that is accepted in a window of the last <math>N</math> samples. This is usually set to be around 60%. If the proposal steps are too small the chain will ''mix slowly'' (i.e., it will move around the space slowly and converge slowly to P<math>p(x)</math>). If the proposal steps are too large the acceptance rate will be very low because the proposals are likely to land in regions of much lower probability density so a<~~sub~~math>1a_1</~~sub~~math> will be very small. == References == * Chib, Siddhartha and Edward Greenberg: "Understanding the Metropolis–Hastings Algorithm". American Statistician, 49, 327–335, 1995 * W.K. Hastings. "Monte Carlo Sampling Methods Using Markov Chains and Their Applications", ''Biometrika'', 57:97-109, 1970. * N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller. "Equations of State Calculations by Fast Computing Machines". ''Journal of Chemical Physics'', 21:1087-1091, 1953. [[Category:Algorithms]] [[Category:Probability and statistics]] [[Category:Physics]] [[Category:Sampling techniques]]

Metropolis–Hastings algorithm: Difference between revisions