Content deleted Content added
MishchenkoA (talk | contribs) |
MishchenkoA (talk | contribs) Make the usage of the term "evidence" internally consistent, and consistent with other pages (for example, the page for ELBO). |
||
Line 4:
'''Variational Bayesian methods''' are a family of techniques for approximating intractable [[integral]]s arising in [[Bayesian inference]] and [[machine learning]]. They are typically used in complex [[statistical model]]s consisting of observed variables (usually termed "data") as well as unknown [[parameter]]s and [[latent variable]]s, with various sorts of relationships among the three types of [[random variable]]s, as might be described by a [[graphical model]]. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:
#To provide an analytical approximation to the [[posterior probability]] of the unobserved variables, in order to do [[statistical inference]] over these variables.
#To derive a [[lower bound]] for the [[marginal likelihood]] (sometimes called the
In the former purpose (that of approximating a posterior probability), variational Bayes is an alternative to [[Monte Carlo sampling]] methods — particularly, [[Markov chain Monte Carlo]] methods such as [[Gibbs sampling]] — for taking a fully Bayesian approach to [[statistical inference]] over complex [[probability distribution|distributions]] that are difficult to evaluate directly or [[sample (statistics)|sample]]. In particular, whereas Monte Carlo techniques provide a numerical approximation to the exact posterior using a set of samples, Variational Bayes provides a locally-optimal, exact analytical solution to an approximation of the posterior.
Line 66:
</math>
As the ''log
The lower bound <math>\mathcal{L}(Q)</math> is known as the (negative) '''variational free energy''' in analogy with [[thermodynamic free energy]] because it can also be expressed as a negative energy <math>\operatorname{E}_{Q}[\log P(\mathbf{Z},\mathbf{X})]</math> plus the [[Entropy (information theory) |
=== Proofs ===
By the generalized Pythagorean theorem of [[Bregman divergence]], of which KL-divergence is a special case, it can be shown that:<ref name=Tran2018>{{cite arxiv|title=Copula Variational Bayes inference via information geometry|first1=Viet Hung|last1=Tran|year=2018|eprint=1803.10998|class=cs.IT}}</ref><ref name="Martin2014"/>
[[File:Bregman_divergence_Pythagorean.png|right|300px|thumb|Generalized Pythagorean theorem for [[Bregman divergence]]
.<ref name="Martin2014">{{cite journal |last1=Adamčík |first1=Martin |title=The Information Geometry of Bregman Divergences and Some Applications in Multi-Expert Reasoning |journal=Entropy |date=2014 |volume=16 |issue=12 |pages=6338–6381|bibcode=2014Entrp..16.6338A |doi=10.3390/e16126338 |doi-access=free }}</ref>]]
|