Revision as of 13:04, 25 March 2024 edit Justinkunimune (talk \| contribs) Extended confirmed users 2,307 edits move images around. the 1D flowchart most clearly and succinctly illustrates the algorithm in my opinion, so it should go first. the one that was previously at the top is unclear; I can't figure out what proposal method is being used to generate these dashed curves and I doubt most readers will be able to either. ← Previous edit		Revision as of 13:47, 25 March 2024 edit undo Justinkunimune (talk \| contribs) Extended confirmed users 2,307 edits add information about proposal functions Next edit →
Line 2: [[File:Flowchart-of-Metropolis-Hastings-M-H-algorithm-for-the-parameter-estimation-using-the.png\|thumb\|300px\|The Metropolis-Hastings algorithm sampling a [[Normal distribution\|normal]] one-dimensional [[Posterior probability\|posterior]] probability distribution.]] In [[statistics]] and [[statistical physics]], the '''Metropolis–Hastings algorithm''' is a [[Markov chain Monte Carlo]] (MCMC) method for obtaining a sequence of [[pseudo-random number sampling\|random samples]] from a [[probability distribution]] from which direct sampling is difficult. ~~This~~New samples are added to the sequence in two steps: first a new sample is proposed based on the previous sample, then the proposed sample is either added to the sequence or rejected depending on the value of the probability distribution at that point. The resulting sequence can be used to approximate the distribution (e.g. to generate a [[histogram]]) or to [[Monte Carlo integration\|compute an integral]] (e.g. an [[expected value]]). Metropolis–Hastings and other MCMC algorithms are generally used for sampling from multi-dimensional distributions, especially when the number of dimensions is high. For single-dimensional distributions, there are usually other methods (e.g. [[adaptive rejection sampling]]) that can directly return independent samples from the distribution, and these are free from the problem of [[autocorrelation\|autocorrelated]] samples that is inherent in MCMC methods. ==History== Line 11 ⟶ 13: This contradicts an account by Edward Teller, who states in his memoirs that the five authors of the 1953 article worked together for "days (and nights)".<ref name=Teller/> In contrast, the detailed account by Rosenbluth credits Teller with a crucial but early suggestion to "take advantage of [[statistical mechanics]] and take ensemble averages instead of following detailed [[kinematics]]". This, says Rosenbluth, started him thinking about the generalized Monte Carlo approach – a topic which he says he had discussed often with [[John von Neumann\|John Von Neumann]]. Arianna Rosenbluth recounted (to Gubernatis in 2003) that Augusta Teller started the computer work, but that Arianna herself took it over and wrote the code from scratch. In an oral history recorded shortly before his death,<ref name=Barth/> Rosenbluth again credits Teller with posing the original problem, himself with solving it, and Arianna with programming the computer. ==~~Intuition~~Description== The Metropolis–Hastings algorithm can draw samples from any [[probability distribution]] with [[probability density]] <math>P(x)</math>, provided that we know a function <math>f(x)</math> proportional to the [[Probability density function\|density]] <math>P</math> and the values of <math>f(x)</math> can be calculated. The requirement that <math>f(x)</math> must only be proportional to the density, rather than exactly equal to it, makes the Metropolis–Hastings algorithm particularly useful, because ~~calculating~~it removes the ~~necessary~~need to calculate the density's normalization factor, which is often extremely difficult in practice. The Metropolis–Hastings algorithm generates a sequence of sample values in such a way that, as more and more sample values are produced, the distribution of values more closely approximates the desired distribution. These sample values are produced iteratively, with the distribution of the next sample being dependent only on the current sample value, thus making the sequence of samples into a [[Markov chain]]. Specifically, at each iteration, the algorithm ~~picks~~proposes a candidate for the next sample value based on the current sample value. Then, with some probability, the candidate is either accepted, in which case the candidate value is used in the next iteration, or it is rejected in which case the candidate value is discarded, and current value is reused in the next iteration. The probability of acceptance is determined by comparing the values of the function <math>f(x)</math> of the current and candidate sample values with respect to the desired distribution.▼ The method used to propose new candidates is characterized by the probability distribution <math>g(x\mid y)</math> (sometimes written <math>Q(x\mid y)</math>) of a new proposed sample <math>x</math> given the previous sample <math>y</math>. This is called the ''proposal density'', ''proposal function'', or ''jumping distribution''. A common choice for <math>g(x\mid y)</math> is a [[Gaussian distribution]] centered at <math>y</math>, so that points closer to <math>y</math> are more likely to be visited next, making the sequence of samples into a [[Gaussian random walk]]. In the original paper by Metropolis et al. (1953), <math>g(x\mid y)</math> was suggested to be a uniform distribution limited to some maximum distance from <math>y</math>. More complicated proposal functions are also possible, such as those of [[Hamiltonian Monte Carlo]], [[Langevin Monte Carlo]], or [[preconditioned Crank–Nicolson]]. ▲The Metropolis–Hastings algorithm generates a sequence of sample values in such a way that, as more and more sample values are produced, the distribution of values more closely approximates the desired distribution. These sample values are produced iteratively, with the distribution of the next sample being dependent only on the current sample value, thus making the sequence of samples into a [[Markov chain]]. Specifically, at each iteration, the algorithm picks a candidate for the next sample value based on the current sample value. Then, with some probability, the candidate is either accepted, in which case the candidate value is used in the next iteration, or it is rejected in which case the candidate value is discarded, and current value is reused in the next iteration. The probability of acceptance is determined by comparing the values of the function <math>f(x)</math> of the current and candidate sample values with respect to the desired distribution. For the purpose of illustration, the Metropolis algorithm, a special case of the Metropolis–Hastings algorithm where the proposal function is symmetric, is described below. Line 22 ⟶ 26: ;Metropolis algorithm (symmetric proposal distribution): Let <math>f(x)</math> be a function that is proportional to the desired probability density function <math>P(x)</math> (a.k.a. a target distribution){{efn\|In the original paper by Metropolis et al. (1953), <math>f</math> was taken to be the [[Boltzmann distribution]] as the specific application considered was [[Monte Carlo integration]] of [[equation of state\|equations of state]] in [[physical chemistry]]; the extension by Hastings generalized to an arbitrary distribution <math>f</math>.}}. # Initialization: Choose an arbitrary point <math>x_t</math> to be the first observation in the sample and choose ana ~~arbitrary~~proposal ~~probability density~~function <math>g(x\mid y)~~</math> (sometimes written <math>Q(x\mid y)</math>) that suggests a candidate for the next sample value <math>x</math>, given the previous sample value <math>y~~</math>. In this section, <math>g</math> is assumed to be symmetric; in other words, it must satisfy <math>g(x\mid y) = g(y\mid x)</math>. A usual choice is to let <math>g(x\mid y)</math> be a [[Gaussian distribution]] centered at <math>y</math>, so that points closer to <math>y</math> are more likely to be visited next, making the sequence of samples into a [[random walk]]{{efn\|In the original paper by Metropolis et al. (1953), <math>g(x\mid y)</math> was suggested to be a random translation with uniform density over some prescribed range.}}. The function <math>g</math> is referred to as the ''proposal density'' or ''jumping distribution''. # For each iteration ''t'': #* ''~~Generate~~Propose'' a candidate <math>x'</math> for the next sample by picking from the distribution <math>g(x'\mid x_t)</math>. #* ''Calculate'' the ''acceptance ratio'' <math>\alpha = f(x')/f(x_t)</math>, which will be used to decide whether to accept or reject the candidate{{efn\|In the original paper by Metropolis et al. (1953), <math>f</math> was actually the [[Boltzmann distribution]], as it was applied to physical systems in the context of [[statistical mechanics]] (e.g., a maximal-entropy distribution of microstates for a given temperature at thermal equilibrium). Consequently, the acceptance ratio was itself an exponential of the difference in the parameters of the numerator and denominator of this ratio.}}. Because ''f'' is proportional to the density of ''P'', we have that <math>\alpha = f(x')/f(x_t) = P(x')/P(x_t)</math>. #* ''Accept or reject'': Line 154 ⟶ 158: * [[Genetic algorithm]]s * [[Gibbs sampling]] * [[Hamiltonian Monte Carlo]] * [[Mean-field particle methods]] * [[Metropolis-adjusted Langevin algorithm]] * [[Metropolis light transport]] * [[Multiple-try Metropolis]] * [[Parallel tempering]] * [[Preconditioned Crank–Nicolson algorithm]] * [[Particle filter\|Sequential Monte Carlo]] * [[Simulated annealing]]

Metropolis–Hastings algorithm: Difference between revisions