Content deleted Content added
m Moved content under Notes to References |
m Bot: link syntax and minor changes |
||
Line 1:
{{Short description|Mathematical methods used in
{{For|the method of approximation in quantum mechanics|Variational method (quantum mechanics)}}
Line 70:
As the ''log-[[model evidence|evidence]]'' <math>\log P(\mathbf{X})</math> is fixed with respect to <math>Q</math>, maximizing the final term <math>\mathcal{L}(Q)</math> minimizes the KL divergence of <math>Q</math> from <math>P</math>. By appropriate choice of <math>Q</math>, <math>\mathcal{L}(Q)</math> becomes tractable to compute and to maximize. Hence we have both an analytical approximation <math>Q</math> for the posterior <math>P(\mathbf{Z}\mid \mathbf{X})</math>, and a lower bound <math>\mathcal{L}(Q)</math> for the log-evidence <math>\log P(\mathbf{X})</math> (since the KL-divergence is non-negative).
The lower bound <math>\mathcal{L}(Q)</math> is known as the (negative) '''variational free energy''' in analogy with [[thermodynamic free energy]] because it can also be expressed as a negative energy <math>\operatorname{E}_{Q}[\log P(\mathbf{Z},\mathbf{X})]</math> plus the [[Entropy (information theory) |entropy]] of <math>Q</math>. The term <math>\mathcal{L}(Q)</math> is also known as '''Evidence Lower BOund''', abbreviated as [[Evidence lower bound | '''ELBO''']], to emphasize that it is a lower bound on the log-evidence of the data.
=== Proofs ===
Line 112:
:<math>q_j^{*}(\mathbf{Z}_j\mid \mathbf{X}) = \frac{e^{\operatorname{E}_{i \neq j} [\ln p(\mathbf{Z}, \mathbf{X})]}}{\int e^{\operatorname{E}_{i \neq j} [\ln p(\mathbf{Z}, \mathbf{X})]}\, d\mathbf{Z}_j}</math>
where <math>\operatorname{E}_{i \neq j} [\ln p(\mathbf{Z}, \mathbf{X})]</math> is the [[expected value|expectation]] of the logarithm of the [[joint probability]] of the data and latent variables, taken over all variables not in the partition: refer to
In practice, we usually work in terms of logarithms, i.e.:
Line 125:
==A duality formula for variational inference==
[[File:CAVI algorithm explain.jpg|600px|thumb|right|Pictorial illustration of coordinate ascent variational inference algorithm by the duality formula
The following theorem is referred to as a duality formula for variational inference.<ref name=Yoon2021/> It explains some important properties of the variational distributions used in variational Bayes methods.
Line 527:
* [[Generalized filtering]]: a variational filtering scheme for nonlinear state space models.
* [[Calculus of variations]]: the field of mathematical analysis that deals with maximizing or minimizing functionals.
* [[Maximum entropy discrimination]]: This is a variational inference framework that allows for introducing and accounting for additional large-margin constraints
==References==
|