Content deleted Content added
InfCompact (talk | contribs) m →Use in numerical integration: Can't integrate arbitrary functions, rather only measurable functions. |
No edit summary |
||
Line 28:
# Initialization: Choose an arbitrary point <math>x_t</math> to be the first sample and choose an arbitrary probability density <math>g(x|y)</math> (sometimes written <math>Q(x|y)</math>) that suggests a candidate for the next sample value <math>x</math>, given the previous sample value <math>y</math>. In this section, <math>g</math> is assumed to be symmetric; in other words, it must satisfy <math>g(x|y) = g(y|x)</math>. A usual choice is to let <math>g(x|y)</math> be a [[Gaussian distribution]] centered at <math>y</math>, so that points closer to <math>y</math> are more likely to be visited next, making the sequence of samples into a [[random walk]]. The function <math>g</math> is referred to as the ''proposal density'' or ''jumping distribution''.
# For each iteration ''t'':
#* ''Generate'' a candidate <math>x'</math> for the next sample by picking from the distribution <math>g(x'
#* ''Calculate'' the ''acceptance ratio'' <math>\alpha = f(x')/f(x_t)</math>, which will be used to decide whether to accept or reject the candidate. Because ''f'' is proportional to the density of ''P'', we have that <math>\alpha = f(x')/f(x_t) = P(x')/P(x_t)</math>.
#* ''Accept or reject'':
Line 48:
The purpose of the Metropolis–Hastings algorithm is to generate a collection of states according to a desired distribution <math>P(x)</math>. To accomplish this, the algorithm uses a [[Markov process]], which asymptotically reaches a unique [[Markov chain#Steady-state analysis and limiting distributions|stationary distribution]] <math>\pi(x)</math> such that <math>\pi(x) = P(x)</math>.<ref name=Roberts_Casella/>
A Markov process is uniquely defined by its transition probabilities <math>P(x'
# ''Existence of stationary distribution'': there must exist a stationary distribution <math>\pi(x)</math>. A sufficient but not necessary condition is [[Markov chain#Reversible Markov chain|detailed balance]], which requires that each transition <math>x \to x'</math> is reversible: for every pair of states <math>x, x'</math>, the probability of being in state <math>x</math> and transitioning to state <math>x'</math> must be equal to the probability of being in state <math>x'</math> and transitioning to state <math>x</math>, <math>\pi(x) P(x'
# ''Uniqueness of stationary distribution'': the stationary distribution <math>\pi(x)</math> must be unique. This is guaranteed by [[Markov Chain#Ergodicity|ergodicity]] of the Markov process, which requires that every state must (1) be aperiodic—the system does not return to the same state at fixed intervals; and (2) be positive recurrent—the expected number of steps for returning to the same state is finite.
The Metropolis–Hastings algorithm involves designing a Markov process (by constructing transition probabilities) that fulfills the two above conditions, such that its stationary distribution <math>\pi(x)</math> is chosen to be <math>P(x)</math>. The derivation of the algorithm starts with the condition of detailed balance:
: <math>P(x'
which is re-written as
: <math>\frac{P(x'
The approach is to separate the transition in two sub-steps; the proposal and the acceptance-rejection. The ''proposal distribution'' <math>g(x'
: <math>P(x'
Inserting this relation in the previous equation, we have
: <math>\frac{A(x', x)}{A(x, x')} = \frac{P(x')}{P(x)}\frac{g(x
The next step in the derivation is to choose an acceptance ratio that fulfills the condition above. One common choice is the Metropolis choice:
: <math>A(x', x) = \min\left(1, \frac{P(x')}{P(x)} \frac{g(x
For this Metropolis acceptance ratio <math>A</math>, either <math>A(x', x) = 1</math> or <math>A(x, x') = 1</math> and, either way, the condition is satisfied.
Line 79:
## Set <math>t = 0</math>.
# Iterate
## ''Generate'' a random candidate state <math>x'</math> according to <math>g(x'
## ''Calculate'' the acceptance probability <math>A(x', x_t) = \min\left(1, \frac{P(x')}{P(x_t)} \frac{g(x_t
## ''Accept or reject'':
### generate a uniform random number <math>u \in [0, 1]</math>;
Line 89:
Provided that specified conditions are met, the empirical distribution of saved states <math>x_0, \ldots, x_T</math> will approach <math>P(x)</math>. The number of iterations (<math>T</math>) required to effectively estimate <math>P(x)</math> depends on the number of factors, including the relationship between <math>P(x)</math> and the proposal distribution and the desired accuracy of estimation.<ref>Raftery, Adrian E., and Steven Lewis. "How Many Iterations in the Gibbs Sampler?" ''In Bayesian Statistics 4''. 1992.</ref> For distribution on discrete state spaces, it has to be of the order of the [[autocorrelation]] time of the Markov process.<ref name=Newman_Barkema/>
It is important to notice that it is not clear, in a general problem, which distribution <math>g(x'
== Use in numerical integration ==
Line 104:
: <math>
P(E) = \int_\Omega P(E
</math>
and, thus, estimating <math>P(E)</math> can be accomplished by estimating the expected value of the [[indicator function]] <math>A_E(x) \equiv \mathbf{1}_E(x)</math>, which is 1 when <math>E(x) \in [E, E + \Delta E]</math> and zero otherwise.
Line 111:
==Step-by-step instructions==
Suppose that the most recent value sampled is <math>x_t</math>. To follow the Metropolis–Hastings algorithm, we next draw a new proposal state <math>x'</math> with probability density <math>g(x'
: <math>a = a_1 a_2,</math>
Line 121:
is the probability (e.g., Bayesian posterior) ratio between the proposed sample <math>x'</math> and the previous sample <math>x_t</math>, and
: <math>a_2 = \frac{g(x_t
is the ratio of the proposal density in two directions (from <math>x_t</math> to <math>x'</math> and conversely).
Line 127:
Then the new state <math>x_{t+1}</math> is chosen according to the following rules.
: If <math>a \geq 1{:}</math>
:: <math>x_{t+1} = x',</math>
: else:
:: <math>x_{t+1} =
Line 140:
These samples, which are discarded, are known as ''burn-in''. The remaining set of accepted values of <math>x</math> represent a [[Sample (statistics)|sample]] from the distribution <math>P(x)</math>.
The algorithm works best if the proposal density matches the shape of the target distribution <math>P(x)</math>, from which direct sampling is difficult, that is <math>g(x'
If a Gaussian proposal density <math>g</math> is used, the variance parameter <math>\sigma^2</math> has to be tuned during the burn-in period.
This is usually done by calculating the ''acceptance rate'', which is the fraction of proposed samples that is accepted in a window of the last <math>N</math> samples.
|