Metropolis–Hastings algorithm: Difference between revisions

Content deleted Content added
m Intuition: style, punct., fmt.
m Formal derivation: fmt., punct.
Line 46:
 
==Formal derivation==
The purpose of the Metropolis–Hastings algorithm is to generate a collection of states according to a desired distribution <math>P(x)</math>. To accomplish this, the algorithm uses a [[Markov process]], which asymptotically reaches a unique [[Markov chain#Steady-state analysis and limiting distributions|stationary distribution]] <math>\pi(x)</math> such that <math>\pi(x) = P(x)</math> .<ref name=Roberts_Casella/>
 
A Markov process is uniquely defined by its transition probabilities, <math>P(x' | x)</math>, the probability of transitioning from any given state, <math>x</math>, to any other given state, <math>x'</math>. It has a unique stationary distribution <math>\pi(x)</math> when the following two conditions are met:<ref name=Roberts_Casella/>
# '''existenceExistence of stationary distribution''': there must exist a stationary distribution <math>\pi(x)</math>. A sufficient but not necessary condition is [[Markov chain#Reversible Markov chain|detailed balance]], which requires that each transition <math>x \rightarrowto x'</math> is reversible: for every pair of states <math>x, x'</math>, the probability of being in state <math>x</math> and transitioning to state <math>x'</math> must be equal to the probability of being in state <math>x'</math> and transitioning to state <math>x</math>, <math>\pi(x) P(x' | x) = \pi(x') P(x | x')</math>.
# '''uniquenessUniqueness of stationary distribution''': the stationary distribution <math>\pi(x)</math> must be unique. This is guaranteed by [[Markov Chain#Ergodicity|ergodicity]] of the Markov process, which requires that every state must (1) be aperiodic—the system does not return to the same state at fixed intervals; and (2) be positive recurrent—the expected number of steps for returning to the same state is finite.
 
The Metropolis–Hastings algorithm involves designing a Markov process (by constructing transition probabilities) whichthat fulfills the two above conditions, such that its stationary distribution <math>\pi(x)</math> is chosen to be <math>P(x)</math>. The derivation of the algorithm starts with the condition of detailed balance:
 
: <math>P(x' | x) P(x) = P(x | x') P(x'),</math>
 
which is re-written as
 
: <math>\frac{P(x' | x)}{P(x | x')} = \frac{P(x')}{P(x)}.</math>.
 
The approach is to separate the transition in two sub-steps; the proposal and the acceptance-rejection. The '''proposal distribution''' <math>\displaystyle g(x' | x)</math> is the conditional probability of proposing a state <math>x'</math> given <math>x</math>, and the '''acceptance ratio''' <math>\displaystyle A(x' , x)</math> is the probability to accept the proposed state <math>x'</math>. The transition probability can be written as the product of them:
 
: <math>P(x'|x) = g(x' | x) A(x' , x).</math> .
 
Inserting this relation in the previous equation, we have
 
: <math>\frac{A(x' , x)}{A(x , x')} = \frac{P(x')}{P(x)}\frac{g(x | x')}{g(x' | x)}.</math> .
 
The next step in the derivation is to choose an acceptance ratio that fulfills the condition above. One common choice is the Metropolis choice:
 
: <math>A(x' , x) = \min\left(1, \frac{P(x')}{P(x)} \frac{g(x | x')}{g(x' | x)}\right).</math>
 
For this Metropolis acceptance ratio <math>A</math>, either <math>A(x', x) = 1</math> or <math>A(x, x') = 1</math> and, either way, the condition is satisfied.
 
The Metropolis–Hastings algorithm thus consists in the following:
 
# Initialise
## Pick an initial state <math>x_0</math>;.
## Set <math>t = 0</math>;.
# Iterate
## '''Generate:''' randomly generate a random candidate state <math>x'</math> according to <math>g(x' | x_t)</math>;.
## '''Calculate:''' calculate the acceptance probability <math display="inline">A(x' , x_t) = \min\left(1, \frac{P(x')}{P(x_t)} \frac{g(x_t | x')}{g(x' | x_t)}\right)</math>; .
## '''Accept or Reject:reject''' :
### generate a uniform random number <math>u \in [0, 1]</math>;
### if <math>u \le A(x' , x_t)</math>, then ''accept'' the new state and set <math>x_{t+1} = x'</math>;
### if <math>u > A(x' , x_t)</math>, then ''reject'' the new state, and copy the old state forward <math>x_{t+1} = x_{t}</math>;.
## '''Increment:''' : set <math display="inline">t = t + 1</math>;.
 
Provided that specified conditions are met, the empirical distribution of saved states <math>x_0, \ldots, x_T</math> will approach <math>P(x)</math>. The number of iterations (<math>T</math>) required to effectively estimate <math>P(x)</math> depends on the number of factors, including the relationship between <math>P(x)</math> and the proposal distribution and the desired accuracy of estimation.<ref>Raftery, Adrian E., and Steven Lewis. "How Many Iterations in the Gibbs Sampler?." ''In Bayesian Statistics 4''. 1992.</ref> For distribution on discrete state spaces, it has to be of the order of the [[autocorrelation]] time of the Markov process.<ref name=Newman_Barkema/>
 
It is important to notice that it is not clear, in a general problem, which distribution <math>\displaystyle g(x' | x)</math> one should use or the number of iterations necessary for proper estimation; both are free parameters of the method, which must be adjusted to the particular problem in hand.
 
== Use in numerical integration ==