Revision as of 17:12, 5 April 2022 edit VaudevillianScientist (talk \| contribs) Autopatrolled, Extended confirmed users 10,201 edits m Moved content under Notes to References Tag: 2017 wikitext editor ← Previous edit		Revision as of 00:10, 21 April 2022 edit undo FrescoBot (talk \| contribs) Bots 1,152,845 edits m Bot: link syntax and minor changes Next edit →
Line 1: {{Short description\|Mathematical methods used in Bayesian inference and machine learning}} {{For\|the method of approximation in quantum mechanics\|Variational method (quantum mechanics)}} Line 70: As the ''log-[[model evidence\|evidence]]'' <math>\log P(\mathbf{X})</math> is fixed with respect to <math>Q</math>, maximizing the final term <math>\mathcal{L}(Q)</math> minimizes the KL divergence of <math>Q</math> from <math>P</math>. By appropriate choice of <math>Q</math>, <math>\mathcal{L}(Q)</math> becomes tractable to compute and to maximize. Hence we have both an analytical approximation <math>Q</math> for the posterior <math>P(\mathbf{Z}\mid \mathbf{X})</math>, and a lower bound <math>\mathcal{L}(Q)</math> for the log-evidence <math>\log P(\mathbf{X})</math> (since the KL-divergence is non-negative). The lower bound <math>\mathcal{L}(Q)</math> is known as the (negative) '''variational free energy''' in analogy with [[thermodynamic free energy]] because it can also be expressed as a negative energy <math>\operatorname{E}_{Q}[\log P(\mathbf{Z},\mathbf{X})]</math> plus the [[Entropy (information theory) \|entropy]] of <math>Q</math>. The term <math>\mathcal{L}(Q)</math> is also known as '''Evidence Lower BOund''', abbreviated as [[Evidence lower bound \| '''ELBO''']], to emphasize that it is a lower bound on the log-evidence of the data. === Proofs === Line 112: :<math>q_j^{}(\mathbf{Z}_j\mid \mathbf{X}) = \frac{e^{\operatorname{E}_{i \neq j} [\ln p(\mathbf{Z}, \mathbf{X})]}}{\int e^{\operatorname{E}_{i \neq j} [\ln p(\mathbf{Z}, \mathbf{X})]}\, d\mathbf{Z}_j}</math> where <math>\operatorname{E}_{i \neq j} [\ln p(\mathbf{Z}, \mathbf{X})]</math> is the [[expected value\|expectation]] of the logarithm of the [[joint probability]] of the data and latent variables, taken over all variables not in the partition: refer to <ref name=Yoon2021>{{Cite journal \|last=Lee\|first=Se Yoon\| title = Gibbs sampler and coordinate ascent variational inference: A set-theoretical review\|journal=Communications in Statistics - Theory and Methods\|year=2021\|pages=1–21\|doi=10.1080/03610926.2021.1921214\|arxiv=2008.01006\|s2cid=220935477}}</ref> for a derivation of the distribution <math>q_j^{}(\mathbf{Z}_j\mid \mathbf{X})</math>. In practice, we usually work in terms of logarithms, i.e.: Line 125: ==A duality formula for variational inference== [[File:CAVI algorithm explain.jpg\|600px\|thumb\|right\|Pictorial illustration of coordinate ascent variational inference algorithm by the duality formula <ref name=Yoon2021/>.]] The following theorem is referred to as a duality formula for variational inference.<ref name=Yoon2021/> It explains some important properties of the variational distributions used in variational Bayes methods. Line 527: * [[Generalized filtering]]: a variational filtering scheme for nonlinear state space models. * [[Calculus of variations]]: the field of mathematical analysis that deals with maximizing or minimizing functionals. * [[Maximum entropy discrimination]]: This is a variational inference framework that allows for introducing and accounting for additional large-margin constraints <ref>Sotirios P. Chatzis, “[http://proceedings.mlr.press/v28/chatzis13.pdf Infinite Markov-Switching Maximum Entropy Discrimination Machines],” Proc. 30th International Conference on Machine Learning (ICML). Journal of Machine Learning Research: Workshop and Conference Proceedings, vol. 28, no. 3, pp. 729–737, June 2013.</ref> ==References==

Variational Bayesian methods: Difference between revisions