Variational Bayesian methods: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Alter: template type. Add: s2cid, pages. Removed parameters. Formatted dashes. | Use this bot. Report bugs. | Suggested by AManWithNoPlan | #UCB_webform 443/563
No edit summary
Line 112:
:<math>q_j^{*}(\mathbf{Z}_j\mid \mathbf{X}) = \frac{e^{\operatorname{E}_{i \neq j} [\ln p(\mathbf{Z}, \mathbf{X})]}}{\int e^{\operatorname{E}_{i \neq j} [\ln p(\mathbf{Z}, \mathbf{X})]}\, d\mathbf{Z}_j}</math>
 
where <math>\operatorname{E}_{i \neq j} [\ln p(\mathbf{Z}, \mathbf{X})]</math> is the [[expected value|expectation]] of the logarithm of the [[joint probability]] of the data and latent variables, taken over all variables not in the partition: refer to <ref name=Yoon2021>{{Cite journal |last=Lee|first=Se Yoon| title = Gibbs sampler and coordinate ascent variational inference: A set-theoretical review|journal=Communications in Statistics - Theory and Methods|year=2021|pages=1–21|doi=10.1080/03610926.2021.1921214|arxiv=2008.01006|s2cid=220935477}}</ref> for a derivation of the distribution <math>q_j^{*}(\mathbf{Z}_j\mid \mathbf{X})</math>.
 
In practice, we usually work in terms of logarithms, i.e.:
Line 125:
 
==A duality formula for variational inference==
[[File:CAVI algorithm explain.jpg|600px|thumb|right|Pictorial illustration of coordinate ascent variational inference algorithm by the duality formula <ref>{{Cite journal |last=Lee|first=Se Yoon| title = Gibbs sampler and coordinate ascent variational inference: A set-theoretical review|journal=Communications in Statistics - Theory and Methods|year=2021|pages=1–21|doi=10.1080/03610926.2021.1921214|arxiv=2008.01006|s2cidname=220935477}}</refYoon2021>.]]
 
The following theorem is referred to as a duality formula for variational inference.<ref>{{Cite journal |last=Lee|first=Se Yoon| title = Gibbs sampler and coordinate ascent variational inference: A set-theoretical review|journal=Communications in Statistics - Theory and Methods|year=2021|pages=1–21|doi=10.1080/03610926.2021.1921214|arxiv=2008.01006|s2cidname=220935477}}</refYoon2021> It explains some important properties of the variational distributions used in variational Bayes methods.
 
{{EquationRef|3|Theorem}} Consider two [[probability spaces]] <math>(\Theta,\mathcal{F},P)</math> and <math>(\Theta,\mathcal{F},Q)</math> with <math>Q \ll P</math>. Assume that there is a common dominating [[probability measure]] <math>\lambda</math> such that <math>P \ll \lambda</math> and <math>Q \ll \lambda</math>. Let <math>h</math> denote any real-valued [[random variable]] on <math>(\Theta,\mathcal{F},P)</math> that satisfies <math>h \in L_1(P)</math>. Then the following equality holds