Revision as of 19:07, 24 February 2020 edit InfCompact (talk \| contribs) 65 edits m →Use in numerical integration: Can't integrate arbitrary functions, rather only measurable functions. Tag: 2017 wikitext editor ← Previous edit		Revision as of 03:39, 27 February 2020 edit undo 2601:445:4380:7dd0:9136:acf2:254e:8b1b (talk) No edit summary Next edit →
Line 28: # Initialization: Choose an arbitrary point <math>x_t</math> to be the first sample and choose an arbitrary probability density <math>g(x\|y)</math> (sometimes written <math>Q(x\|y)</math>) that suggests a candidate for the next sample value <math>x</math>, given the previous sample value <math>y</math>. In this section, <math>g</math> is assumed to be symmetric; in other words, it must satisfy <math>g(x\|y) = g(y\|x)</math>. A usual choice is to let <math>g(x\|y)</math> be a [[Gaussian distribution]] centered at <math>y</math>, so that points closer to <math>y</math> are more likely to be visited next, making the sequence of samples into a [[random walk]]. The function <math>g</math> is referred to as the ''proposal density'' or ''jumping distribution''. # For each iteration ''t'': #* ''Generate'' a candidate <math>x'</math> for the next sample by picking from the distribution <math>g(x'\|\mid x_t)</math>. #* ''Calculate'' the ''acceptance ratio'' <math>\alpha = f(x')/f(x_t)</math>, which will be used to decide whether to accept or reject the candidate. Because ''f'' is proportional to the density of ''P'', we have that <math>\alpha = f(x')/f(x_t) = P(x')/P(x_t)</math>. #* ''Accept or reject'': Line 48: The purpose of the Metropolis–Hastings algorithm is to generate a collection of states according to a desired distribution <math>P(x)</math>. To accomplish this, the algorithm uses a [[Markov process]], which asymptotically reaches a unique [[Markov chain#Steady-state analysis and limiting distributions\|stationary distribution]] <math>\pi(x)</math> such that <math>\pi(x) = P(x)</math>.<ref name=Roberts_Casella/> A Markov process is uniquely defined by its transition probabilities <math>P(x' \|\mid x)</math>, the probability of transitioning from any given state <math>x</math> to any other given state <math>x'</math>. It has a unique stationary distribution <math>\pi(x)</math> when the following two conditions are met:<ref name=Roberts_Casella/> # ''Existence of stationary distribution'': there must exist a stationary distribution <math>\pi(x)</math>. A sufficient but not necessary condition is [[Markov chain#Reversible Markov chain\|detailed balance]], which requires that each transition <math>x \to x'</math> is reversible: for every pair of states <math>x, x'</math>, the probability of being in state <math>x</math> and transitioning to state <math>x'</math> must be equal to the probability of being in state <math>x'</math> and transitioning to state <math>x</math>, <math>\pi(x) P(x' \|mid x) = \pi(x') P(x \|\mid x')</math>. # ''Uniqueness of stationary distribution'': the stationary distribution <math>\pi(x)</math> must be unique. This is guaranteed by [[Markov Chain#Ergodicity\|ergodicity]] of the Markov process, which requires that every state must (1) be aperiodic—the system does not return to the same state at fixed intervals; and (2) be positive recurrent—the expected number of steps for returning to the same state is finite. The Metropolis–Hastings algorithm involves designing a Markov process (by constructing transition probabilities) that fulfills the two above conditions, such that its stationary distribution <math>\pi(x)</math> is chosen to be <math>P(x)</math>. The derivation of the algorithm starts with the condition of detailed balance: : <math>P(x' \|\mid x) P(x) = P(x \|\mid x') P(x'),</math> which is re-written as : <math>\frac{P(x' \|\mid x)}{P(x \|\mid x')} = \frac{P(x')}{P(x)}.</math> The approach is to separate the transition in two sub-steps; the proposal and the acceptance-rejection. The ''proposal distribution'' <math>g(x' \|\mid x)</math> is the conditional probability of proposing a state <math>x'</math> given <math>x</math>, and the ''acceptance ratio'' <math>A(x', x)</math> is the probability to accept the proposed state <math>x'</math>. The transition probability can be written as the product of them: : <math>P(x'\|\mid x) = g(x' \|\mid x) A(x', x).</math> Inserting this relation in the previous equation, we have : <math>\frac{A(x', x)}{A(x, x')} = \frac{P(x')}{P(x)}\frac{g(x \|\mid x')}{g(x' \|\mid x)}.</math> The next step in the derivation is to choose an acceptance ratio that fulfills the condition above. One common choice is the Metropolis choice: : <math>A(x', x) = \min\left(1, \frac{P(x')}{P(x)} \frac{g(x \|\mid x')}{g(x' \|\mid x)}\right).</math> For this Metropolis acceptance ratio <math>A</math>, either <math>A(x', x) = 1</math> or <math>A(x, x') = 1</math> and, either way, the condition is satisfied. Line 79: ## Set <math>t = 0</math>. # Iterate ## ''Generate'' a random candidate state <math>x'</math> according to <math>g(x' \|\mid x_t)</math>. ## ''Calculate'' the acceptance probability <math>A(x', x_t) = \min\left(1, \frac{P(x')}{P(x_t)} \frac{g(x_t \|\mid x')}{g(x' \|\mid x_t)}\right)</math>;. ## ''Accept or reject'': ### generate a uniform random number <math>u \in [0, 1]</math>; Line 89: Provided that specified conditions are met, the empirical distribution of saved states <math>x_0, \ldots, x_T</math> will approach <math>P(x)</math>. The number of iterations (<math>T</math>) required to effectively estimate <math>P(x)</math> depends on the number of factors, including the relationship between <math>P(x)</math> and the proposal distribution and the desired accuracy of estimation.<ref>Raftery, Adrian E., and Steven Lewis. "How Many Iterations in the Gibbs Sampler?" ''In Bayesian Statistics 4''. 1992.</ref> For distribution on discrete state spaces, it has to be of the order of the [[autocorrelation]] time of the Markov process.<ref name=Newman_Barkema/> It is important to notice that it is not clear, in a general problem, which distribution <math>g(x' \|\mid x)</math> one should use or the number of iterations necessary for proper estimation; both are free parameters of the method, which must be adjusted to the particular problem in hand. == Use in numerical integration == Line 104: : <math> P(E) = \int_\Omega P(E\|\mid x) P(x) \,dx = \int_\Omega \delta\big(E - E(x)\big) P(x) \,dx = E_X\big(P(E\|\mid X)\big) </math> and, thus, estimating <math>P(E)</math> can be accomplished by estimating the expected value of the [[indicator function]] <math>A_E(x) \equiv \mathbf{1}_E(x)</math>, which is 1 when <math>E(x) \in [E, E + \Delta E]</math> and zero otherwise. Line 111: ==Step-by-step instructions== Suppose that the most recent value sampled is <math>x_t</math>. To follow the Metropolis–Hastings algorithm, we next draw a new proposal state <math>x'</math> with probability density <math>g(x' \|\mid x_t)</math> and calculate a value : <math>a = a_1 a_2,</math> Line 121: is the probability (e.g., Bayesian posterior) ratio between the proposed sample <math>x'</math> and the previous sample <math>x_t</math>, and : <math>a_2 = \frac{g(x_t \|\mid x')}{g(x' \|\mid x_t)}</math> is the ratio of the proposal density in two directions (from <math>x_t</math> to <math>x'</math> and conversely). Line 127: Then the new state <math>x_{t+1}</math> is chosen according to the following rules. : If <math>a \geq 1{:}</math>: :: <math>x_{t+1} = x',</math>, : else: :: <math>x_{t+1} = Line 140: These samples, which are discarded, are known as ''burn-in''. The remaining set of accepted values of <math>x</math> represent a [[Sample (statistics)\|sample]] from the distribution <math>P(x)</math>. The algorithm works best if the proposal density matches the shape of the target distribution <math>P(x)</math>, from which direct sampling is difficult, that is <math>g(x' \|\mid x_t) \approx P(x')</math>. If a Gaussian proposal density <math>g</math> is used, the variance parameter <math>\sigma^2</math> has to be tuned during the burn-in period. This is usually done by calculating the ''acceptance rate'', which is the fraction of proposed samples that is accepted in a window of the last <math>N</math> samples.

Metropolis–Hastings algorithm: Difference between revisions