Content deleted Content added
m Fixing broken anchor: Remove 1 notification (When checking links to John von Neumann) |
Citation bot (talk | contribs) Added article-number. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Abductive | Category:Monte Carlo methods | #UCB_Category 35/65 |
||
(3 intermediate revisions by 2 users not shown) | |||
Line 4:
In [[statistics]] and [[statistical physics]], the '''Metropolis–Hastings algorithm''' is a [[Markov chain Monte Carlo]] (MCMC) method for obtaining a sequence of [[pseudo-random number sampling|random samples]] from a [[probability distribution]] from which direct sampling is difficult. New samples are added to the sequence in two steps: first a new sample is proposed based on the previous sample, then the proposed sample is either added to the sequence or rejected depending on the value of the probability distribution at that point. The resulting sequence can be used to approximate the distribution (e.g. to generate a [[histogram]]) or to [[Monte Carlo integration|compute an integral]] (e.g. an [[expected value]]).
Metropolis–Hastings and other MCMC algorithms are generally used for sampling from multi-dimensional distributions, especially when the number of dimensions is high.
==History==
The algorithm is named in part for [[Nicholas Metropolis]], the first coauthor of a 1953 paper, entitled ''[[Equation of State Calculations by Fast Computing Machines]]'', with [[Arianna W. Rosenbluth]], [[Marshall Rosenbluth]], [[Augusta H. Teller]] and [[Edward Teller]]. For many years the algorithm was known simply as the ''Metropolis algorithm''.<ref>{{Cite book |last1=Kalos |first1=Malvin H. |title=Monte Carlo Methods Volume I: Basics |last2=Whitlock |first2=Paula A. |publisher=Wiley |year=1986 |___location=New York |pages=78–88}}</ref><ref>{{Cite journal |last=Tierney |first=Luke |date=1994 |title=Markov chains for exploring posterior distributions |url=https://projecteuclid.org/journals/annals-of-statistics/volume-22/issue-4/Markov-Chains-for-Exploring-Posterior-Distributions/10.1214/aos/1176325750.full |journal=The Annals of Statistics |volume=22 |issue=4 |pages=1701–1762|doi=10.1214/aos/1176325750 }}</ref> The paper proposed the algorithm for the case of symmetrical proposal distributions, but in 1970, [[W.K. Hastings]] extended it to the more general case.<ref name=Hastings/> The generalized method was eventually identified by both names, although the first use of the term "Metropolis-Hastings algorithm" is unclear.
Some controversy exists with regard to credit for development of the Metropolis algorithm. Metropolis, who was familiar with the computational aspects of the method, had coined the term "Monte Carlo" in an earlier article with [[Stanisław Ulam]], and led the group in the Theoretical Division that designed and built the [[MANIAC I]] computer used in the experiments in 1952.
This contradicts an account by Edward Teller, who states in his memoirs that the five authors of the 1953 article worked together for "days (and nights)".<ref name=Teller/> In contrast, the detailed account by Rosenbluth credits Teller with a crucial but early suggestion to "take advantage of [[statistical mechanics]] and take ensemble averages instead of following detailed [[kinematics]]".
==Description==
Line 35:
#** If <math>u > \alpha</math>, then ''reject'' the candidate and set <math>x_{t+1} = x_t</math> instead.
This algorithm proceeds by randomly attempting to move about the sample space, sometimes accepting the moves and sometimes remaining in place. <math>P(x)</math> at specific point <math>x</math> is proportional to the iterations spent on the point by the algorithm. Note that the acceptance ratio <math>\alpha</math> indicates how probable the new proposed sample is with respect to the current sample, according to the distribution whose density is <math>P(x)</math>.
Compared with an algorithm like [[adaptive rejection sampling]]<ref name=":0">{{Cite journal |last1=Gilks |first1=W. R. |last2=Wild |first2=P. |date=1992-01-01 |title=Adaptive Rejection Sampling for Gibbs Sampling |journal=Journal of the Royal Statistical Society. Series C (Applied Statistics) |volume=41 |issue=2 |pages=337–348 |doi=10.2307/2347565 |jstor=2347565}}</ref> that directly generates independent samples from a distribution, Metropolis–Hastings and other MCMC algorithms have a number of disadvantages:
* The samples are [[autocorrelation|autocorrelated]].
* Although the Markov chain eventually converges to the desired distribution, the initial samples may follow a very different distribution, especially if the starting point is in a region of low density. As a result, a ''burn-in'' period is typically necessary,<ref>{{Cite book |title=Bayesian data analysis |date=2004 |publisher=Chapman & Hall / CRC |others=Gelman, Andrew |isbn=978-1584883883 |edition=2nd |___location=Boca Raton, Fla. |oclc=51991499}}</ref> where an initial number of samples are thrown away.
On the other hand, most simple [[rejection sampling]] methods suffer from the "[[curse of dimensionality]]", where the probability of rejection increases exponentially as a function of the number of dimensions.
In [[multivariate distribution|multivariate]] distributions, the classic Metropolis–Hastings algorithm as described above involves choosing a new multi-dimensional sample point.
==Formal derivation==
Line 87:
## ''Increment'': set <math>t = t + 1</math>.
Provided that specified conditions are met, the empirical distribution of saved states <math>x_0, \ldots, x_T</math> will approach <math>P(x)</math>. The number of iterations (<math>T</math>) required to effectively estimate <math>P(x)</math> depends on the number of factors, including the relationship between <math>P(x)</math> and the proposal distribution and the desired accuracy of estimation.<ref>Raftery, Adrian E., and Steven Lewis. "How Many Iterations in the Gibbs Sampler?" ''In Bayesian Statistics 4''. 1992.</ref>
It is important to notice that it is not clear, in a general problem, which distribution <math>g(x' \mid x)</math> one should use or the number of iterations necessary for proper estimation; both are free parameters of the method, which must be adjusted to the particular problem in hand.
Line 137:
</math>
The Markov chain is started from an arbitrary initial value <math>x_0</math>, and the algorithm is run for many iterations until this initial state is "forgotten".
The algorithm works best if the proposal density matches the shape of the target distribution <math>P(x)</math>, from which direct sampling is difficult, that is <math>g(x' \mid x_t) \approx P(x')</math>.
Line 144:
The desired acceptance rate depends on the target distribution, however it has been shown theoretically that the ideal acceptance rate for a one-dimensional Gaussian distribution is about 50%, decreasing to about 23% for an <math>N</math>-dimensional Gaussian target distribution.<ref name=Roberts/> These guidelines can work well when sampling from sufficiently regular Bayesian posteriors as they often follow a multivariate normal distribution as can be established using the [[Bernstein–von Mises theorem]].<ref>{{Cite journal |last1=Schmon |first1=Sebastian M. |last2=Gagnon |first2=Philippe |date=2022-04-15 |title=Optimal scaling of random walk Metropolis algorithms using Bayesian large-sample asymptotics |journal=Statistics and Computing |language=en |volume=32 |issue=2 |pages=28 |doi=10.1007/s11222-022-10080-8 |issn=0960-3174 |pmc=8924149 |pmid=35310543}}</ref>
If <math>\sigma^2</math> is too small, the chain will ''mix slowly'' (i.e., the acceptance rate will be high, but successive samples will move around the space slowly, and the chain will converge only slowly to <math>P(x)</math>).
if <math>\sigma^2</math> is too large, the acceptance rate will be very low because the proposals are likely to land in regions of much lower probability density, so <math>a_1</math> will be very small, and again the chain will converge very slowly. One typically tunes the proposal distribution so that the algorithms accepts on the order of 30% of all samples – in line with the theoretical estimates mentioned in the previous paragraph.
Line 170:
<ref name="Teller">Teller, Edward. ''Memoirs: A Twentieth-Century Journey in Science and Politics''. [[Perseus Publishing]], 2001, p. 328</ref>
<ref name="Barth">Rosenbluth, Marshall. [https://www.aip.org/history-programs/niels-bohr-library/oral-histories/28636-1 "Oral History Transcript"]. American Institute of Physics</ref>
<ref name="Gubernatis">{{Cite journal |last=J.E. Gubernatis |year=2005 |title=Marshall Rosenbluth and the Metropolis Algorithm |url=https://zenodo.org/record/1231899 |journal=[[Physics of Plasmas]] |volume=12 |issue=5 |
<ref name="Rosenbluth">{{Cite journal |last=M.N. Rosenbluth |year=2003 |title=Genesis of the Monte Carlo Algorithm for Statistical Mechanics |journal=[[AIP Conference Proceedings]] |volume=690 |pages=22–30 |bibcode=2003AIPC..690...22R |doi=10.1063/1.1632112}}</ref>
<!--<ref name="Dyson">{{Cite journal |last=F. Dyson |year=2006 |title=Marshall N. Rosenbluth |journal=[[Proceedings of the American Philosophical Society]] |volume=250 |pages=404}}</ref>-->
|