Metropolis–Hastings algorithm: Difference between revisions

Content deleted Content added
History: This is an original argument: nothing here says that Rosenbluth had such a reputation for truth that the consensus of reliable sources is that Rosenbluth's account must be correct. That's an inference made by a wikipedia editor based on an irrelevant character reference
Reference edited with ProveIt. Commented out unused ref, please check if it should be included
Line 35:
This algorithm proceeds by randomly attempting to move about the sample space, sometimes accepting the moves and sometimes remaining in place. Note that the acceptance ratio <math>\alpha</math> indicates how probable the new proposed sample is with respect to the current sample, according to the distribution whose density is <math>P(x)</math>. If we attempt to move to a point that is more probable than the existing point (i.e. a point in a higher-density region of <math>P(x)</math> corresponding to an <math>\alpha > 1 \ge u</math>), we will always accept the move. However, if we attempt to move to a less probable point, we will sometimes reject the move, and the larger the relative drop in probability, the more likely we are to reject the new point. Thus, we will tend to stay in (and return large numbers of samples from) high-density regions of <math>P(x)</math>, while only occasionally visiting low-density regions. Intuitively, this is why this algorithm works and returns samples that follow the desired distribution with density <math>P(x)</math>.
 
Compared with an algorithm like [[adaptive rejection sampling]]<ref name=":0">{{Cite journal |titlelast=Gilks |first=W. R. |last2=Wild |first2=P. |date=1992-01-01 |title=Adaptive Rejection Sampling for Gibbs Sampling |jstor = 2347565 |journal = Journal of the Royal Statistical Society. Series C (Applied Statistics) |date = 1992-01-01 |pages = 337–348 |volume = 41 |issue = 2 |pages=337–348 |doi = 10.2307/2347565 |first1 jstor= W. R. |last1 = Gilks |first2 = P. |last2 = Wild2347565}}</ref> that directly generates independent samples from a distribution, Metropolis–Hastings and other MCMC algorithms have a number of disadvantages:
* The samples are correlated. Even though over the long term they do correctly follow <math>P(x)</math>, a set of nearby samples will be correlated with each other and not correctly reflect the distribution. This means that effective sample sizes can be significantly lower than the number of samples actually taken, leading to large errors.
* Although the Markov chain eventually converges to the desired distribution, the initial samples may follow a very different distribution, especially if the starting point is in a region of low density. As a result, a ''burn-in'' period is typically necessary,<ref>{{Cite book |title=Bayesian data analysis |date=2004 |publisher=Chapman & Hall / CRC |others=Gelman, Andrew |isbn=978-1584883883 |edition= 2nd |___location=Boca Raton, Fla. |oclc=51991499}}</ref> where an initial number of samples are thrown away.
 
On the other hand, most simple [[rejection sampling]] methods suffer from the "[[curse of dimensionality]]", where the probability of rejection increases exponentially as a function of the number of dimensions. Metropolis–Hastings, along with other MCMC methods, do not have this problem to such a degree, and thus are often the only solutions available when the number of dimensions of the distribution to be sampled is high. As a result, MCMC methods are often the methods of choice for producing samples from [[hierarchical Bayesian model]]s and other high-dimensional statistical models used nowadays in many disciplines.
 
In [[multivariate distribution|multivariate]] distributions, the classic Metropolis–Hastings algorithm as described above involves choosing a new multi-dimensional sample point. When the number of dimensions is high, finding the suitable jumping distribution to use can be difficult, as the different individual dimensions behave in very different ways, and the jumping width (see above) must be "just right" for all dimensions at once to avoid excessively slow mixing. An alternative approach that often works better in such situations, known as [[Gibbs sampling]], involves choosing a new sample for each dimension separately from the others, rather than choosing a sample for all dimensions at once. That way, the problem of sampling from potentially high-dimensional space will be reduced to a collection of problems to sample from small dimensionality.<ref>{{Cite journal |last=Lee |first=Se Yoon |year=2021 |title = Gibbs sampler and coordinate ascent variational inference: A set-theoretical review |journal=Communications in Statistics - Theory and Methods|year=2021 |volume=51 |issue=6 |pages=1549–1568 |arxiv=2008.01006 |doi=10.1080/03610926.2021.1921214|arxiv=2008.01006 |s2cid=220935477}}</ref> This is especially applicable when the multivariate distribution is composed of a set of individual [[random variable]]s in which each variable is conditioned on only a small number of other variables, as is the case in most typical [[hierarchical Bayesian model|hierarchical models]]. The individual variables are then sampled one at a time, with each variable conditioned on the most recent values of all the others. Various algorithms can be used to choose these individual samples, depending on the exact form of the multivariate distribution: some possibilities are the [[adaptive rejection sampling]] methods,<ref name=":0" /> the adaptive rejection Metropolis sampling algorithm,<ref>{{Cite journal |titlelast=Gilks |first=W. R. |last2=Best |first2=N. G. |author-link2=Nicky Best |last3=Tan |first3=K. K. C. |date=1995-01-01 |title=Adaptive Rejection Metropolis Sampling within Gibbs Sampling |jstor = 2986138 |journal = Journal of the Royal Statistical Society. Series C (Applied Statistics) |date = 1995-01-01 |pages = 455–472 |volume = 44 |issue = 4 |pages=455–472 |doi = 10.2307/2986138 |first1 jstor= W. R. |last1 = Gilks |first2 = N. G. |last2 = Best |author2-link= Nicky Best |first3 = K. K. C. |last3 = Tan2986138}}</ref> a simple one-dimensional Metropolis–Hastings step, or [[slice sampling]].
 
==Formal derivation==
Line 138:
If a Gaussian proposal density <math>g</math> is used, the variance parameter <math>\sigma^2</math> has to be tuned during the burn-in period.
This is usually done by calculating the ''acceptance rate'', which is the fraction of proposed samples that is accepted in a window of the last <math>N</math> samples.
The desired acceptance rate depends on the target distribution, however it has been shown theoretically that the ideal acceptance rate for a one-dimensional Gaussian distribution is about 50%, decreasing to about 23% for an <math>N</math>-dimensional Gaussian target distribution.<ref name=Roberts/> These guidelines can work well when sampling from sufficiently regular Bayesian posteriors as they often follow a multivariate normal distribution as can be established using the [[Bernstein-von Mises theorem]].<ref>{{Cite journal |last1last=Schmon |first1first=Sebastian M. |last2=Gagnon |first2=Philippe |date=2022-04-15 |title=Optimal scaling of random walk Metropolis algorithms using Bayesian large-sample asymptotics |journal=Statistics and Computing |language=en |volume=32 |issue=2 |pages=28 |doi=10.1007/s11222-022-10080-8 | pmid=35310543 |issn=0960-3174 |pmc=8924149 |pmid=35310543}}</ref>
 
If <math>\sigma^2</math> is too small, the chain will ''mix slowly'' (i.e., the acceptance rate will be high, but successive samples will move around the space slowly, and the chain will converge only slowly to <math>P(x)</math>). On the other hand,
Line 162:
 
==References==
{{Reflist|refs=
<ref name="Hastings">{{Cite journal |last=Hastings |first=W.K. |year=1970 |title=Monte Carlo Sampling Methods Using Markov Chains and Their Applications |journal=[[Biometrika]] |volume=57 |issue=1 |pages=97&ndash;109 |bibcode=1970Bimka..57...97H |doi=10.1093/biomet/57.1.97 |jstor=2334940 |zbl=0219.65008}}</ref>
refs=
<ref name="Teller">Teller, Edward. ''Memoirs: A Twentieth-Century Journey in Science and Politics''. [[Perseus Publishing]], 2001, p. 328</ref>
<ref name=Hastings>{{cite journal
<ref name="Barth">Rosenbluth, Marshall. [https://www.aip.org/history-programs/niels-bohr-library/oral-histories/28636-1 "Oral History Transcript"]. American Institute of Physics</ref>
|first=W.K. |last=Hastings
<ref name="Gubernatis">{{citeCite journal |last=J.E. Gubernatis |year=2005 |title=Marshall Rosenbluth and the Metropolis Algorithm |authorurl=J.Ehttps://zenodo. Gubernatisorg/record/1231899 |journal=[[Physics of Plasmas]] | volume=12 |issue=5 |pages=057303| year=2005| doi=10.1063/1.1887186 | bibcode=2005PhPl...12e7303G |issuedoi=510.1063/1.1887186}}</ref>
|title=Monte Carlo Sampling Methods Using Markov Chains and Their Applications
<ref name="Rosenbluth">{{citeCite journal |last=M.N. Rosenbluth |year=2003 |title=Genesis of the Monte Carlo Algorithm for Statistical Mechanics|author=M.N. Rosenbluth |journal=[[AIP Conference Proceedings]] | volume=690 | pages=22–30 | yearbibcode=20032003AIPC..690...22R | doi=10.1063/1.1632112 }}</ref>
|journal=[[Biometrika]]
<!--<ref name="Dyson">{{citeCite journal |last=F. Dyson |year=2006 |title=Marshall N. Rosenbluth|author=F. Dyson |journal=[[Proceedings of the American Philosophical Society]] | volume=250 | pages=404 | year=2006 }}</ref>-->
|volume=57 |issue=1 |pages=97&ndash;109 |year=1970
<ref name="Roberts">{{Cite journal |last=Roberts |first=G.O. |last2=Gelman |first2=A. |last3=Gilks |first3=W.R. |year=1997 |title=Weak convergence and optimal scaling of random walk Metropolis algorithms |url=http://www.stat.columbia.edu/~gelman/research/published/theory7.ps |journal=[[Ann. Appl. Probab.]] |volume=7 |issue=1 |pages=110&ndash;120 |citeseerx=10.1.1.717.2582 |doi=10.1214/aoap/1034625254}}</ref>
|jstor=2334940 | zbl = 0219.65008 |doi=10.1093/biomet/57.1.97
<ref name="Roberts_Casella">{{citeCite book |titlelast=MonteRobert Carlo Statistical Methods|first=Christian |url=https://archive.org/details/springer_10.1007-978-1-4757-4145-2 |last1title=RobertMonte |first1=ChristianCarlo Statistical Methods |last2=Casella |first2=George |yearpublisher= 2004Springer |publisheryear=Springer2004 |isbn=978-0387212395 }}</ref>
|bibcode=1970Bimka..57...97H}}</ref>
<ref name="Newman_Barkema">{{citeCite book |last=Newman |first=M. E. J. |title=Monte Carlo Methods in Statistical Physics |last1=Newman |first1=M. E. J. |last2=Barkema |first2=G. T. |year= 1999 |publisher=Oxford University Press |___locationyear=USA1999 |isbn=978-0198517979 |___location=USA}}</ref>
<ref name=Teller>Teller, Edward. ''Memoirs: A Twentieth-Century Journey in Science and Politics''. [[Perseus Publishing]], 2001, p. 328</ref>
<ref name=Barth>Rosenbluth, Marshall. [https://www.aip.org/history-programs/niels-bohr-library/oral-histories/28636-1 "Oral History Transcript"]. American Institute of Physics</ref>
<ref name=Gubernatis>{{cite journal |title=Marshall Rosenbluth and the Metropolis Algorithm |author=J.E. Gubernatis |journal=[[Physics of Plasmas]] | volume=12| pages=057303| year=2005| doi=10.1063/1.1887186 | bibcode=2005PhPl...12e7303G |issue=5
|url=https://zenodo.org/record/1231899 }}</ref>
<ref name=Rosenbluth>{{cite journal |title=Genesis of the Monte Carlo Algorithm for Statistical Mechanics|author=M.N. Rosenbluth |journal=[[AIP Conference Proceedings]] | volume=690 | pages=22–30 | year=2003 | doi=10.1063/1.1632112
|bibcode=2003AIPC..690...22R }}</ref>
<ref name=Dyson>{{cite journal |title=Marshall N. Rosenbluth|author=F. Dyson |journal=[[Proceedings of the American Philosophical Society]] | volume=250 | pages=404 | year=2006
}}</ref>
<ref name=Roberts>{{cite journal
|first1=G.O. |last1=Roberts
|first2=A. |last2=Gelman
|first3=W.R. |last3=Gilks
|title=Weak convergence and optimal scaling of random walk Metropolis algorithms
|journal=[[Ann. Appl. Probab.]]
|volume=7 |issue=1 |pages=110&ndash;120 |year=1997
|doi=10.1214/aoap/1034625254
|url=http://www.stat.columbia.edu/~gelman/research/published/theory7.ps|citeseerx=10.1.1.717.2582}}</ref>
<ref name="Roberts_Casella">{{cite book |title=Monte Carlo Statistical Methods |url=https://archive.org/details/springer_10.1007-978-1-4757-4145-2 |last1=Robert |first1=Christian |last2=Casella |first2=George |year= 2004 |publisher=Springer |isbn=978-0387212395 }}</ref>
<ref name="Newman_Barkema">{{cite book |title=Monte Carlo Methods in Statistical Physics |last1=Newman |first1=M. E. J. |last2=Barkema |first2=G. T. |year= 1999 |publisher=Oxford University Press |___location=USA |isbn=978-0198517979 }}</ref>
}}
==Notes==