Revision as of 09:52, 18 November 2024 edit 2001:620:618:5d0:2:80b3:0:c2 (talk) Aligned proofs ← Previous edit		Revision as of 18:32, 21 January 2025 edit undo Zxdk97188 (talk \| contribs) 294 edits Link suggestions feature: 3 links added. Tags: Visual edit Newcomer task Suggested: add links Next edit →
Line 76: ===Proofs=== By the generalized [[Pythagorean theorem]] of [[Bregman divergence]], of which KL-divergence is a special case, it can be shown that:<ref name=Tran2018>{{cite arXiv\|title=Copula Variational Bayes inference via information geometry\|first1=Viet Hung\|last1=Tran\|year=2018\|eprint=1803.10998\|class=cs.IT}}</ref><ref name="Martin2014"/> [[File:Bregman_divergence_Pythagorean.png\|right\|300px\|thumb\|Generalized Pythagorean theorem for [[Bregman divergence]]<ref name="Martin2014">{{cite journal \|last1=Adamčík \|first1=Martin \|title=The Information Geometry of Bregman Divergences and Some Applications in Multi-Expert Reasoning \|journal=Entropy \|date=2014 \|volume=16 \|issue=12 \|pages=6338–6381\|bibcode=2014Entrp..16.6338A \|doi=10.3390/e16126338 \|doi-access=free }}</ref>]] :<math> Line 140: :<math> \log E_P[\exp h] = \text{sup}_{Q \ll P} \{ E_Q[h] - D_\text{KL}(Q \parallel P)\}.</math> Further, the supremum on the right-hand side is attained [[if and only if]] it holds :<math> \frac{q(\theta)}{p(\theta)} = \frac{\exp h(\theta)}{E_P[\exp h]},</math> Line 210: </math> In the above derivation, <math>C</math>, <math>C_2</math> and <math>C_3</math> refer to values that are constant with respect to <math>\mu</math>. Note that the term <math>\operatorname{E}_{\tau}[\ln p(\tau)]</math> is not a function of <math>\mu</math> and will have the same value regardless of the value of <math>\mu</math>. Hence in line 3 we can absorb it into the [[constant term]] at the end. We do the same thing in line 7. The last line is simply a quadratic polynomial in <math>\mu</math>. Since this is the logarithm of <math>q_\mu^(\mu)</math>, we can see that <math>q_\mu^(\mu)</math> itself is a [[Gaussian distribution]].

Variational Bayesian methods: Difference between revisions