Variational Bayesian methods: Difference between revisions

Content deleted Content added
Aligned proofs
Zxdk97188 (talk | contribs)
Link suggestions feature: 3 links added.
Line 76:
 
===Proofs===
By the generalized [[Pythagorean theorem]] of [[Bregman divergence]], of which KL-divergence is a special case, it can be shown that:<ref name=Tran2018>{{cite arXiv|title=Copula Variational Bayes inference via information geometry|first1=Viet Hung|last1=Tran|year=2018|eprint=1803.10998|class=cs.IT}}</ref><ref name="Martin2014"/>
[[File:Bregman_divergence_Pythagorean.png|right|300px|thumb|Generalized Pythagorean theorem for [[Bregman divergence]]<ref name="Martin2014">{{cite journal |last1=Adamčík |first1=Martin |title=The Information Geometry of Bregman Divergences and Some Applications in Multi-Expert Reasoning |journal=Entropy |date=2014 |volume=16 |issue=12 |pages=6338–6381|bibcode=2014Entrp..16.6338A |doi=10.3390/e16126338 |doi-access=free }}</ref>]]
:<math>
Line 140:
:<math> \log E_P[\exp h] = \text{sup}_{Q \ll P} \{ E_Q[h] - D_\text{KL}(Q \parallel P)\}.</math>
 
Further, the supremum on the right-hand side is attained [[if and only if]] it holds
 
:<math> \frac{q(\theta)}{p(\theta)} = \frac{\exp h(\theta)}{E_P[\exp h]},</math>
Line 210:
</math>
 
In the above derivation, <math>C</math>, <math>C_2</math> and <math>C_3</math> refer to values that are constant with respect to <math>\mu</math>. Note that the term <math>\operatorname{E}_{\tau}[\ln p(\tau)]</math> is not a function of <math>\mu</math> and will have the same value regardless of the value of <math>\mu</math>. Hence in line 3 we can absorb it into the [[constant term]] at the end. We do the same thing in line 7.
 
The last line is simply a quadratic polynomial in <math>\mu</math>. Since this is the logarithm of <math>q_\mu^*(\mu)</math>, we can see that <math>q_\mu^*(\mu)</math> itself is a [[Gaussian distribution]].