Variational Bayesian methods: Difference between revisions

Content deleted Content added
Make the usage of the term "evidence" internally consistent, and consistent with other pages (for example, the page for ELBO).
Line 4:
'''Variational Bayesian methods''' are a family of techniques for approximating intractable [[integral]]s arising in [[Bayesian inference]] and [[machine learning]]. They are typically used in complex [[statistical model]]s consisting of observed variables (usually termed "data") as well as unknown [[parameter]]s and [[latent variable]]s, with various sorts of relationships among the three types of [[random variable]]s, as might be described by a [[graphical model]]. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:
#To provide an analytical approximation to the [[posterior probability]] of the unobserved variables, in order to do [[statistical inference]] over these variables.
#To derive a [[lower bound]] for the [[marginal likelihood]] (sometimes called the "''evidence"'') of the observed data (i.e. the [[marginal probability]] of the data given the model, with marginalization performed over unobserved variables). This is typically used for performing [[model selection]], the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data. (See also the [[Bayes factor]] article.)
 
In the former purpose (that of approximating a posterior probability), variational Bayes is an alternative to [[Monte Carlo sampling]] methods — particularly, [[Markov chain Monte Carlo]] methods such as [[Gibbs sampling]] — for taking a fully Bayesian approach to [[statistical inference]] over complex [[probability distribution|distributions]] that are difficult to evaluate directly or [[sample (statistics)|sample]]. In particular, whereas Monte Carlo techniques provide a numerical approximation to the exact posterior using a set of samples, Variational Bayes provides a locally-optimal, exact analytical solution to an approximation of the posterior.
Line 66:
</math>
 
As the ''log -[[model evidence|evidence]]'' <math>\log P(\mathbf{X})</math> is fixed with respect to <math>Q</math>, maximizing the final term <math>\mathcal{L}(Q)</math> minimizes the KL divergence of <math>Q</math> from <math>P</math>. By appropriate choice of <math>Q</math>, <math>\mathcal{L}(Q)</math> becomes tractable to compute and to maximize. Hence we have both an analytical approximation <math>Q</math> for the posterior <math>P(\mathbf{Z}\mid \mathbf{X})</math>, and a lower bound <math>\mathcal{L}(Q)</math> for the log-evidence <math>\log P(\mathbf{X})</math> (since the KL-divergence is non-negative).
 
The lower bound <math>\mathcal{L}(Q)</math> is known as the (negative) '''variational free energy''' in analogy with [[thermodynamic free energy]] because it can also be expressed as a negative energy <math>\operatorname{E}_{Q}[\log P(\mathbf{Z},\mathbf{X})]</math> plus the [[Entropy (information theory) | entropy]] of <math>Q</math>. The term <math>\mathcal{L}(Q)</math> is also known as '''Evidence Lower BOund''', abbreviated as [[Evidence lower bound | '''ELBO''']], to emphasize that it is a lower bound on the log-evidence of the data.
 
=== Proofs ===
 
By the generalized Pythagorean theorem of [[Bregman divergence]], of which KL-divergence is a special case, it can be shown that:<ref name=Tran2018>{{cite arxiv|title=Copula Variational Bayes inference via information geometry|first1=Viet Hung|last1=Tran|year=2018|eprint=1803.10998|class=cs.IT}}</ref><ref name="Martin2014"/>
[[File:Bregman_divergence_Pythagorean.png|right|300px|thumb|Generalized Pythagorean theorem for [[Bregman divergence]]
.<ref name="Martin2014">{{cite journal |last1=Adamčík |first1=Martin |title=The Information Geometry of Bregman Divergences and Some Applications in Multi-Expert Reasoning |journal=Entropy |date=2014 |volume=16 |issue=12 |pages=6338–6381|bibcode=2014Entrp..16.6338A |doi=10.3390/e16126338 |doi-access=free }}</ref>]]