Variational Bayesian methods: Difference between revisions

Content deleted Content added
m Moved content under Notes to References
FrescoBot (talk | contribs)
m Bot: link syntax and minor changes
Line 1:
{{Short description|Mathematical methods used in Bayesian inference and machine learning}}
{{For|the method of approximation in quantum mechanics|Variational method (quantum mechanics)}}
 
Line 70:
As the ''log-[[model evidence|evidence]]'' <math>\log P(\mathbf{X})</math> is fixed with respect to <math>Q</math>, maximizing the final term <math>\mathcal{L}(Q)</math> minimizes the KL divergence of <math>Q</math> from <math>P</math>. By appropriate choice of <math>Q</math>, <math>\mathcal{L}(Q)</math> becomes tractable to compute and to maximize. Hence we have both an analytical approximation <math>Q</math> for the posterior <math>P(\mathbf{Z}\mid \mathbf{X})</math>, and a lower bound <math>\mathcal{L}(Q)</math> for the log-evidence <math>\log P(\mathbf{X})</math> (since the KL-divergence is non-negative).
 
The lower bound <math>\mathcal{L}(Q)</math> is known as the (negative) '''variational free energy''' in analogy with [[thermodynamic free energy]] because it can also be expressed as a negative energy <math>\operatorname{E}_{Q}[\log P(\mathbf{Z},\mathbf{X})]</math> plus the [[Entropy (information theory) |entropy]] of <math>Q</math>. The term <math>\mathcal{L}(Q)</math> is also known as '''Evidence Lower BOund''', abbreviated as [[Evidence lower bound | '''ELBO''']], to emphasize that it is a lower bound on the log-evidence of the data.
 
=== Proofs ===
Line 112:
:<math>q_j^{*}(\mathbf{Z}_j\mid \mathbf{X}) = \frac{e^{\operatorname{E}_{i \neq j} [\ln p(\mathbf{Z}, \mathbf{X})]}}{\int e^{\operatorname{E}_{i \neq j} [\ln p(\mathbf{Z}, \mathbf{X})]}\, d\mathbf{Z}_j}</math>
 
where <math>\operatorname{E}_{i \neq j} [\ln p(\mathbf{Z}, \mathbf{X})]</math> is the [[expected value|expectation]] of the logarithm of the [[joint probability]] of the data and latent variables, taken over all variables not in the partition: refer to <ref name=Yoon2021>{{Cite journal |last=Lee|first=Se Yoon| title = Gibbs sampler and coordinate ascent variational inference: A set-theoretical review|journal=Communications in Statistics - Theory and Methods|year=2021|pages=1–21|doi=10.1080/03610926.2021.1921214|arxiv=2008.01006|s2cid=220935477}}</ref> for a derivation of the distribution <math>q_j^{*}(\mathbf{Z}_j\mid \mathbf{X})</math>.
 
In practice, we usually work in terms of logarithms, i.e.:
Line 125:
 
==A duality formula for variational inference==
[[File:CAVI algorithm explain.jpg|600px|thumb|right|Pictorial illustration of coordinate ascent variational inference algorithm by the duality formula <ref name=Yoon2021/>.]]
 
The following theorem is referred to as a duality formula for variational inference.<ref name=Yoon2021/> It explains some important properties of the variational distributions used in variational Bayes methods.
Line 527:
* [[Generalized filtering]]: a variational filtering scheme for nonlinear state space models.
* [[Calculus of variations]]: the field of mathematical analysis that deals with maximizing or minimizing functionals.
* [[Maximum entropy discrimination]]: This is a variational inference framework that allows for introducing and accounting for additional large-margin constraints <ref>Sotirios P. Chatzis, “[http://proceedings.mlr.press/v28/chatzis13.pdf Infinite Markov-Switching Maximum Entropy Discrimination Machines],” Proc. 30th International Conference on Machine Learning (ICML). Journal of Machine Learning Research: Workshop and Conference Proceedings, vol. 28, no. 3, pp. 729–737, June 2013.</ref>
 
==References==