Metropolis–Hastings algorithm: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Add: bibcode, s2cid, pages. Formatted dashes. | Use this bot. Report bugs. | Suggested by Abductive | #UCB_toolbar
Line 39:
 
Compared with an algorithm like [[adaptive rejection sampling]]<ref name=":0">{{Cite journal |title = Adaptive Rejection Sampling for Gibbs Sampling |jstor = 2347565 |journal = Journal of the Royal Statistical Society. Series C (Applied Statistics) |date = 1992-01-01 |pages = 337–348 |volume = 41 |issue = 2 |doi = 10.2307/2347565 |first1 = W. R. |last1 = Gilks |first2 = P. |last2 = Wild}}</ref> that directly generates independent samples from a distribution, Metropolis–Hastings and other MCMC algorithms have a number of disadvantages:
* The samples are correlated. Even though over the long term they do correctly follow <math>P(x)</math>, a set of nearby samples will be correlated with each other and not correctly reflect the distribution. This means that if we want a set of independent samples, we have to throw away the majority of samples and only take every ''n''theffective sample, for some value of ''n'' (typically determined by examining the [[autocorrelation]] between adjacent samples). Autocorrelationsizes can be reducedsignificantly bylower increasingthan the ''jumping width'' (the average sizenumber of asamples jump, which is related to the variance of the jumpingactually distribution)taken, but this will also increase the likelihood of rejection of the proposed jump. Too large or too small a jumping size will leadleading to a ''slow-mixing'' Markov chain, i.e. a highly correlated set of samples, so that a very large number of samples will be needed to get a reasonable estimate of any desired property of the distributionerrors.
* Although the Markov chain eventually converges to the desired distribution, the initial samples may follow a very different distribution, especially if the starting point is in a region of low density. As a result, a ''burn-in'' period is typically necessary,<ref>{{Cite book |title=Bayesian data analysis |date=2004 |publisher=Chapman & Hall / CRC |others=Gelman, Andrew |isbn=978-1584883883 |edition= 2nd |___location=Boca Raton, Fla. |oclc=51991499}}</ref> where an initial number of samples (e.g. the first 1000 or so) are thrown away.
 
On the other hand, most simple [[rejection sampling]] methods suffer from the "[[curse of dimensionality]]", where the probability of rejection increases exponentially as a function of the number of dimensions. Metropolis–Hastings, along with other MCMC methods, do not have this problem to such a degree, and thus are often the only solutions available when the number of dimensions of the distribution to be sampled is high. As a result, MCMC methods are often the methods of choice for producing samples from [[hierarchical Bayesian model]]s and other high-dimensional statistical models used nowadays in many disciplines.