Revision as of 23:48, 18 March 2023 edit EitanPorat (talk \| contribs) 9 edits Remove Proofs section Tags: Reverted section blanking ← Previous edit		Revision as of 23:48, 18 March 2023 edit undo Taking Out The Trash (talk \| contribs) Extended confirmed users, Page movers, Pending changes reviewers, Rollbackers 14,697 edits m Reverted edits by EitanPorat (talk) to last version by Maxeto0910 Tag: Rollback Next edit →
Line 72: The lower bound <math>\mathcal{L}(Q)</math> is known as the (negative) '''variational free energy''' in analogy with [[thermodynamic free energy]] because it can also be expressed as a negative energy <math>\operatorname{E}_{Q}[\log P(\mathbf{Z},\mathbf{X})]</math> plus the [[Entropy (information theory) \|entropy]] of <math>Q</math>. The term <math>\mathcal{L}(Q)</math> is also known as '''Evidence Lower Bound''', abbreviated as [[Evidence lower bound \| '''ELBO''']], to emphasize that it is a lower bound on the log-evidence of the data. === Proofs === By the generalized Pythagorean theorem of [[Bregman divergence]], of which KL-divergence is a special case, it can be shown that:<ref name=Tran2018>{{cite arXiv\|title=Copula Variational Bayes inference via information geometry\|first1=Viet Hung\|last1=Tran\|year=2018\|eprint=1803.10998\|class=cs.IT}}</ref><ref name="Martin2014"/> [[File:Bregman_divergence_Pythagorean.png\|right\|300px\|thumb\|Generalized Pythagorean theorem for [[Bregman divergence]]<ref name="Martin2014">{{cite journal \|last1=Adamčík \|first1=Martin \|title=The Information Geometry of Bregman Divergences and Some Applications in Multi-Expert Reasoning \|journal=Entropy \|date=2014 \|volume=16 \|issue=12 \|pages=6338–6381\|bibcode=2014Entrp..16.6338A \|doi=10.3390/e16126338 \|doi-access=free }}</ref>]] :<math> D_{\mathrm{KL}}(Q\parallel P) \geq D_{\mathrm{KL}}(Q\parallel Q^{}) + D_{\mathrm{KL}}(Q^{}\parallel P), \forall Q^{} \in\mathcal{C} </math> where <math>\mathcal{C}</math> is a convex set and the equality holds if: :<math> Q = Q^{} \triangleq \arg\min_{Q\in\mathcal{C}}D_{\mathrm{KL}}(Q\parallel P). </math> In this case, the global minimizer <math>Q^{}(\mathbf{Z}) = q^{}(\mathbf{Z}_1\mid\mathbf{Z}_2)q^{}(\mathbf{Z}_2) = q^{}(\mathbf{Z}_2\mid\mathbf{Z}_1)q^{}(\mathbf{Z}_1),</math> with <math>\mathbf{Z}=\{\mathbf{Z_1},\mathbf{Z_2}\},</math> can be found as follows:<ref name=Tran2018/> :<math> q^{}(\mathbf{Z}_2) = \frac{P(\mathbf{X})}{\zeta(\mathbf{X})}\frac{P(\mathbf{Z}_2\mid\mathbf{X})}{\exp(D_{\mathrm{KL}}(q^{}(\mathbf{Z}_1\mid\mathbf{Z}_2)\parallel P(\mathbf{Z}_1\mid\mathbf{Z}_2,\mathbf{X})))} = \frac{1}{\zeta(\mathbf{X})}\exp\mathbb{E}_{q^{}(\mathbf{Z}_1\mid\mathbf{Z}_2)}\left(\log\frac{P(\mathbf{Z},\mathbf{X})}{q^{}(\mathbf{Z}_1\mid\mathbf{Z}_2)}\right),</math> in which the normalizing constant is: :<math>\zeta(\mathbf{X}) =P(\mathbf{X})\int_{\mathbf{Z}_2}\frac{P(\mathbf{Z}_2\mid\mathbf{X})}{\exp(D_{\mathrm{KL}}(q^{}(\mathbf{Z}_1\mid\mathbf{Z}_2)\parallel P(\mathbf{Z}_1\mid\mathbf{Z}_2,\mathbf{X})))} = \int_{\mathbf{Z}_{2}}\exp\mathbb{E}_{q^{}(\mathbf{Z}_1\mid\mathbf{Z}_2)}\left(\log\frac{P(\mathbf{Z},\mathbf{X})}{q^{}(\mathbf{Z}_1\mid\mathbf{Z}_2)}\right).</math> The term <math>\zeta(\mathbf{X})</math> is often called the [[model evidence\|evidence]] lower bound ('''ELBO''') in practice, since <math>P(\mathbf{X})\geq\zeta(\mathbf{X})=\exp(\mathcal{L}(Q^{}))</math>,<ref name=Tran2018/> as shown above. By interchanging the roles of <math>\mathbf{Z}_1</math> and <math>\mathbf{Z}_2,</math> we can iteratively compute the approximated <math>q^{}(\mathbf{Z}_1)</math> and <math>q^{}(\mathbf{Z}_2)</math> of the true model's marginals <math>P(\mathbf{Z}_1\mid\mathbf{X})</math> and <math>P(\mathbf{Z}_2\mid\mathbf{X}),</math> respectively. Although this iterative scheme is guaranteed to converge monotonically,<ref name=Tran2018/> the converged <math>Q^{}</math> is only a local minimizer of <math>D_{\mathrm{KL}}(Q\parallel P)</math>. If the constrained space <math>\mathcal{C}</math> is confined within independent space, i.e. <math>q^{}(\mathbf{Z}_1\mid\mathbf{Z}_2) = q^{}(\mathbf{Z_1}),</math>the above iterative scheme will become the so-called mean field approximation <math>Q^{}(\mathbf{Z}) = q^{}(\mathbf{Z}_1)q^{*}(\mathbf{Z}_2),</math>as shown below. ==Mean field approximation==

Variational Bayesian methods: Difference between revisions