Content deleted Content added
EitanPorat (talk | contribs) Remove Proofs section Tags: Reverted section blanking |
m Reverted edits by EitanPorat (talk) to last version by Maxeto0910 |
||
Line 72:
The lower bound <math>\mathcal{L}(Q)</math> is known as the (negative) '''variational free energy''' in analogy with [[thermodynamic free energy]] because it can also be expressed as a negative energy <math>\operatorname{E}_{Q}[\log P(\mathbf{Z},\mathbf{X})]</math> plus the [[Entropy (information theory) |entropy]] of <math>Q</math>. The term <math>\mathcal{L}(Q)</math> is also known as '''Evidence Lower Bound''', abbreviated as [[Evidence lower bound | '''ELBO''']], to emphasize that it is a lower bound on the log-evidence of the data.
=== Proofs ===
By the generalized Pythagorean theorem of [[Bregman divergence]], of which KL-divergence is a special case, it can be shown that:<ref name=Tran2018>{{cite arXiv|title=Copula Variational Bayes inference via information geometry|first1=Viet Hung|last1=Tran|year=2018|eprint=1803.10998|class=cs.IT}}</ref><ref name="Martin2014"/>
[[File:Bregman_divergence_Pythagorean.png|right|300px|thumb|Generalized Pythagorean theorem for [[Bregman divergence]]<ref name="Martin2014">{{cite journal |last1=Adamčík |first1=Martin |title=The Information Geometry of Bregman Divergences and Some Applications in Multi-Expert Reasoning |journal=Entropy |date=2014 |volume=16 |issue=12 |pages=6338–6381|bibcode=2014Entrp..16.6338A |doi=10.3390/e16126338 |doi-access=free }}</ref>]]
:<math>
D_{\mathrm{KL}}(Q\parallel P) \geq D_{\mathrm{KL}}(Q\parallel Q^{*}) + D_{\mathrm{KL}}(Q^{*}\parallel P), \forall Q^{*} \in\mathcal{C}
</math>
where <math>\mathcal{C}</math> is a convex set and the equality holds if:
:<math> Q = Q^{*} \triangleq \arg\min_{Q\in\mathcal{C}}D_{\mathrm{KL}}(Q\parallel P). </math>
In this case, the global minimizer <math>Q^{*}(\mathbf{Z}) = q^{*}(\mathbf{Z}_1\mid\mathbf{Z}_2)q^{*}(\mathbf{Z}_2) = q^{*}(\mathbf{Z}_2\mid\mathbf{Z}_1)q^{*}(\mathbf{Z}_1),</math> with <math>\mathbf{Z}=\{\mathbf{Z_1},\mathbf{Z_2}\},</math> can be found as follows:<ref name=Tran2018/>
:<math> q^{*}(\mathbf{Z}_2)
= \frac{P(\mathbf{X})}{\zeta(\mathbf{X})}\frac{P(\mathbf{Z}_2\mid\mathbf{X})}{\exp(D_{\mathrm{KL}}(q^{*}(\mathbf{Z}_1\mid\mathbf{Z}_2)\parallel P(\mathbf{Z}_1\mid\mathbf{Z}_2,\mathbf{X})))}
= \frac{1}{\zeta(\mathbf{X})}\exp\mathbb{E}_{q^{*}(\mathbf{Z}_1\mid\mathbf{Z}_2)}\left(\log\frac{P(\mathbf{Z},\mathbf{X})}{q^{*}(\mathbf{Z}_1\mid\mathbf{Z}_2)}\right),</math>
in which the normalizing constant is:
:<math>\zeta(\mathbf{X})
=P(\mathbf{X})\int_{\mathbf{Z}_2}\frac{P(\mathbf{Z}_2\mid\mathbf{X})}{\exp(D_{\mathrm{KL}}(q^{*}(\mathbf{Z}_1\mid\mathbf{Z}_2)\parallel P(\mathbf{Z}_1\mid\mathbf{Z}_2,\mathbf{X})))}
= \int_{\mathbf{Z}_{2}}\exp\mathbb{E}_{q^{*}(\mathbf{Z}_1\mid\mathbf{Z}_2)}\left(\log\frac{P(\mathbf{Z},\mathbf{X})}{q^{*}(\mathbf{Z}_1\mid\mathbf{Z}_2)}\right).</math>
The term <math>\zeta(\mathbf{X})</math> is often called the [[model evidence|evidence]] lower bound ('''ELBO''') in practice, since <math>P(\mathbf{X})\geq\zeta(\mathbf{X})=\exp(\mathcal{L}(Q^{*}))</math>,<ref name=Tran2018/> as shown above.
By interchanging the roles of <math>\mathbf{Z}_1</math> and <math>\mathbf{Z}_2,</math> we can iteratively compute the approximated <math>q^{*}(\mathbf{Z}_1)</math> and <math>q^{*}(\mathbf{Z}_2)</math> of the true model's marginals <math>P(\mathbf{Z}_1\mid\mathbf{X})</math> and <math>P(\mathbf{Z}_2\mid\mathbf{X}),</math> respectively. Although this iterative scheme is guaranteed to converge monotonically,<ref name=Tran2018/> the converged <math>Q^{*}</math> is only a local minimizer of <math>D_{\mathrm{KL}}(Q\parallel P)</math>.
If the constrained space <math>\mathcal{C}</math> is confined within independent space, i.e. <math>q^{*}(\mathbf{Z}_1\mid\mathbf{Z}_2) = q^{*}(\mathbf{Z_1}),</math>the above iterative scheme will become the so-called mean field approximation <math>Q^{*}(\mathbf{Z}) = q^{*}(\mathbf{Z}_1)q^{*}(\mathbf{Z}_2),</math>as shown below.
==Mean field approximation==
|