Revision as of 09:02, 24 November 2019 edit Mikhail Ryazanov (talk \| contribs) Extended confirmed users 24,662 edits m →Intuition: style, punct., fmt. ← Previous edit		Revision as of 09:09, 24 November 2019 edit undo Mikhail Ryazanov (talk \| contribs) Extended confirmed users 24,662 edits m →Formal derivation: fmt., punct. Next edit →
Line 46: ==Formal derivation== The purpose of the Metropolis–Hastings algorithm is to generate a collection of states according to a desired distribution <math>P(x)</math>. To accomplish this, the algorithm uses a [[Markov process]], which asymptotically reaches a unique [[Markov chain#Steady-state analysis and limiting distributions\|stationary distribution]] <math>\pi(x)</math> such that <math>\pi(x) = P(x)</math> .<ref name=Roberts_Casella/> A Markov process is uniquely defined by its transition probabilities, <math>P(x' \| x)</math>, the probability of transitioning from any given state, <math>x</math>, to any other given state, <math>x'</math>. It has a unique stationary distribution <math>\pi(x)</math> when the following two conditions are met:<ref name=Roberts_Casella/> # ''~~'existence~~Existence of stationary distribution''': there must exist a stationary distribution <math>\pi(x)</math>. A sufficient but not necessary condition is [[Markov chain#Reversible Markov chain\|detailed balance]], which requires that each transition <math>x \~~rightarrow~~to x'</math> is reversible: for every pair of states <math>x, x'</math>, the probability of being in state <math>x</math> and transitioning to state <math>x'</math> must be equal to the probability of being in state <math>x'</math> and transitioning to state <math>x</math>, <math>\pi(x) P(x' \| x) = \pi(x') P(x \| x')</math>. # ''~~'uniqueness~~Uniqueness of stationary distribution''': the stationary distribution <math>\pi(x)</math> must be unique. This is guaranteed by [[Markov Chain#Ergodicity\|ergodicity]] of the Markov process, which requires that every state must (1) be aperiodic—the system does not return to the same state at fixed intervals; and (2) be positive recurrent—the expected number of steps for returning to the same state is finite. The Metropolis–Hastings algorithm involves designing a Markov process (by constructing transition probabilities) ~~which~~that fulfills the two above conditions, such that its stationary distribution <math>\pi(x)</math> is chosen to be <math>P(x)</math>. The derivation of the algorithm starts with the condition of detailed balance: : <math>P(x' \| x) P(x) = P(x \| x') P(x'),</math> which is re-written as : <math>\frac{P(x' \| x)}{P(x \| x')} = \frac{P(x')}{P(x)}.</math>. The approach is to separate the transition in two sub-steps; the proposal and the acceptance-rejection. The '''proposal distribution''' <math>~~\displaystyle~~ g(x' \| x)</math> is the conditional probability of proposing a state <math>x'</math> given <math>x</math>, and the '''acceptance ratio''' <math>~~\displaystyle~~ A(x' , x)</math> is the probability to accept the proposed state <math>x'</math>. The transition probability can be written as the product of them: : <math>P(x'\|x) = g(x' \| x) A(x' , x).</math> . Inserting this relation in the previous equation, we have : <math>\frac{A(x' , x)}{A(x , x')} = \frac{P(x')}{P(x)}\frac{g(x \| x')}{g(x' \| x)}.</math> . The next step in the derivation is to choose an acceptance ratio that fulfills the condition above. One common choice is the Metropolis choice: : <math>A(x' , x) = \min\left(1, \frac{P(x')}{P(x)} \frac{g(x \| x')}{g(x' \| x)}\right).</math> For this Metropolis acceptance ratio <math>A</math>, either <math>A(x', x) = 1</math> or <math>A(x, x') = 1</math> and, either way, the condition is satisfied. The Metropolis–Hastings algorithm thus consists in the following: # Initialise ## Pick an initial state <math>x_0</math>;. ## Set <math>t = 0</math>;. # Iterate ## '''Generate:''' ~~randomly generate~~ a random candidate state <math>x'</math> according to <math>g(x' \| x_t)</math>;. ## '''Calculate:''~~' calculate~~ the acceptance probability <math ~~display="inline"~~>A(x' , x_t) = \min\left(1, \frac{P(x')}{P(x_t)} \frac{g(x_t \| x')}{g(x' \| x_t)}\right)</math>; . ## '''Accept or ~~Reject:~~reject''' : ### generate a uniform random number <math>u \in [0, 1]</math>; ### if <math>u \le A(x' , x_t)</math>, then ''accept'' the new state and set <math>x_{t+1} = x'</math>; ### if <math>u > A(x' , x_t)</math>, then ''reject'' the new state, and copy the old state forward <math>x_{t+1} = x_{t}</math>;. ## '''Increment:''' : set <math ~~display="inline"~~>t = t + 1</math>;. Provided that specified conditions are met, the empirical distribution of saved states <math>x_0, \ldots, x_T</math> will approach <math>P(x)</math>. The number of iterations (<math>T</math>) required to effectively estimate <math>P(x)</math> depends on the number of factors, including the relationship between <math>P(x)</math> and the proposal distribution and the desired accuracy of estimation.<ref>Raftery, Adrian E., and Steven Lewis. "How Many Iterations in the Gibbs Sampler?." ''In Bayesian Statistics 4''. 1992.</ref> For distribution on discrete state spaces, it has to be of the order of the [[autocorrelation]] time of the Markov process.<ref name=Newman_Barkema/> It is important to notice that it is not clear, in a general problem, which distribution <math>~~\displaystyle~~ g(x' \| x)</math> one should use or the number of iterations necessary for proper estimation; both are free parameters of the method, which must be adjusted to the particular problem in hand. == Use in numerical integration ==

Metropolis–Hastings algorithm: Difference between revisions