Metropolis–Hastings algorithm: Difference between revisions

Content deleted Content added
m fixed spelling typo
clarify, grammar, link
Line 15:
 
==Intuition==
The Metropolis–Hastings algorithm can draw samples from any [[probability distribution]] <math>P(x)</math>, provided that thewe value ofknow a function <math>f(x)</math> proportional to the [[Probability density function|density]] of <math>P</math> canand bethe calculated{{clarify|reason=Whatvalues exactly isof <math>f(x)?</math> Howcan dobe we determine it? How is it related to the P(x)?|date=May 2019}}calculated. The requirement that <math>f(x)</math> must only be proportional to the density, rather than exactly equal to it, makes the Metropolis–Hastings algorithm particularly useful, because calculating the necessary normalization factor is often extremely difficult in practice{{citation needed|date=May 2019}}.
 
The Metropolis–Hastings algorithm works by generating a sequence of sample values in such a way that, as more and more sample values are produced, the distribution of values more closely approximates the desired distribution <math>P(x)</math>. These sample values are produced iteratively, with the distribution of the next sample being dependent only on the current sample value (thus making the sequence of samples into a [[Markov chain]]). Specifically, at each iteration, the algorithm picks a candidate for the next sample value based on the current sample value. Then, with some probability, the candidate is either accepted (in which case the candidate value is used in the next iteration) or rejected (in which case the candidate value is discarded, and current value is reused in the next iteration)—the probability of acceptance is determined by comparing the values of the function <math>f(x)</math> of the current and candidate sample values with respect to the desired distribution <math>P(x)</math>.
Line 43:
On the other hand, most simple [[rejection sampling]] methods suffer from the "[[curse of dimensionality]]", where the probability of rejection increases exponentially as a function of the number of dimensions. Metropolis–Hastings, along with other MCMC methods, do not have this problem to such a degree, and thus are often the only solutions available when the number of dimensions of the distribution to be sampled is high. As a result, MCMC methods are often the methods of choice for producing samples from [[hierarchical Bayesian model]]s and other high-dimensional statistical models used nowadays in many disciplines.
 
In [[multivariate distribution|multivariate]] distributions, the classic Metropolis–Hastings algorithm as described above involves choosing a new multi-dimensional sample point. When the number of dimensions is high, finding the right jumping distribution to use can be difficult, as the different individual dimensions behave in very different ways, and the jumping width (see above) must be "just right" for all dimensions at once to avoid excessively slow mixing. An alternative approach that often works better in such situations, known as [[Gibbs sampling]], involves choosing a new sample for each dimension separately from the others, rather than choosing a sample for all dimensions at once. This is especially applicable when the multivariate distribution is composed out of a set of individual [[random variable]]s in which each variable is conditioned on only a small number of other variables, as is the case in most typical [[hierarchical Bayesian model|hierarchical model]]s. The individual variables are then sampled one at a time, with each variable conditioned on the most recent values of all the others. Various algorithms can be used to choose these individual samples, depending on the exact form of the multivariate distribution: some possibilities are the [[adaptive rejection sampling]] methods,<ref name=":0" /><ref>{{Cite journal|title = Concave-Convex Adaptive Rejection Sampling|journal = Journal of Computational and Graphical Statistics|date = 2011-01-01|issn = 1061-8600|pages = 670–691|volume = 20|issue = 3|doi = 10.1198/jcgs.2011.09058|first = Dilan|last = Görür|first2 = Yee Whye|last2 = Teh}}</ref><ref>{{Cite journal|title = A Rejection Technique for Sampling from T-concave Distributions|journal = ACM Trans. Math. Softw.|date = 1995-06-01|issn = 0098-3500|pages = 182–193|volume = 21|issue = 2|doi = 10.1145/203082.203089|first = Wolfgang|last = Hörmann|citeseerx = 10.1.1.56.6055}}</ref><ref>{{Cite journal|title = A generalization of the adaptive rejection sampling algorithm|journal = Statistics and Computing|date = 2010-08-25|issn = 0960-3174|pages = 633–647|volume = 21|issue = 4|doi = 10.1007/s11222-010-9197-9|first = Luca|last = Martino|first2 = Joaquín|last2 = Míguez|hdl = 10016/16624}}</ref> the adaptive rejection Metropolis sampling algorithm<ref>{{Cite journal|title = Adaptive Rejection Metropolis Sampling within Gibbs Sampling|jstor = 2986138|journal = Journal of the Royal Statistical Society. Series C (Applied Statistics)|date = 1995-01-01|pages = 455–472|volume = 44|issue = 4|doi = 10.2307/2986138|first = W. R.|last = Gilks|first2 = N. G.|last2 = Best|author2-link= Nicky Best |first3 = K. K. C.|last3 = Tan}}</ref> or its improvements<ref>{{Cite journal|title = Independent Doubly Adaptive Rejection Metropolis Sampling Within Gibbs Sampling|journal = IEEE Transactions on Signal Processing|date = 2015-06-01|issn = 1053-587X|pages = 3123–3138|volume = 63|issue = 12|doi = 10.1109/TSP.2015.2420537|first = L.|last = Martino|first2 = J.|last2 = Read|first3 = D.|last3 = Luengo|arxiv = 1205.5494|bibcode = 2015ITSP...63.3123M}}</ref><ref>{{Cite journal|title = Adaptive rejection Metropolis sampling using Lagrange interpolation polynomials of degree 2|journal = Computational Statistics & Data Analysis|date = 2008-03-15|pages = 3408–3423|volume = 52|issue = 7|doi = 10.1016/j.csda.2008.01.005|first = Renate|last = Meyer|first2 = Bo|last2 = Cai|first3 = François|last3 = Perron}}</ref> (see [http://a2rms.sourceforge.net matlab code]), a simple one-dimensional Metropolis–Hastings step, or [[slice sampling]].
 
==Formal derivation==