Revision as of 13:48, 8 February 2022 edit MishchenkoA (talk \| contribs) 62 edits m →Derivation of q(μ) Tag: Visual edit ← Previous edit		Revision as of 14:56, 8 February 2022 edit undo MishchenkoA (talk \| contribs) 62 edits Make the usage of the term "evidence" internally consistent, and consistent with other pages (for example, the page for ELBO). Tag: Visual edit Next edit →
Line 4: '''Variational Bayesian methods''' are a family of techniques for approximating intractable [[integral]]s arising in [[Bayesian inference]] and [[machine learning]]. They are typically used in complex [[statistical model]]s consisting of observed variables (usually termed "data") as well as unknown [[parameter]]s and [[latent variable]]s, with various sorts of relationships among the three types of [[random variable]]s, as might be described by a [[graphical model]]. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes: #To provide an analytical approximation to the [[posterior probability]] of the unobserved variables, in order to do [[statistical inference]] over these variables. #To derive a [[lower bound]] for the [[marginal likelihood]] (sometimes called the "''evidence"'') of the observed data (i.e. the [[marginal probability]] of the data given the model, with marginalization performed over unobserved variables). This is typically used for performing [[model selection]], the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data. (See also the [[Bayes factor]] article.) In the former purpose (that of approximating a posterior probability), variational Bayes is an alternative to [[Monte Carlo sampling]] methods — particularly, [[Markov chain Monte Carlo]] methods such as [[Gibbs sampling]] — for taking a fully Bayesian approach to [[statistical inference]] over complex [[probability distribution\|distributions]] that are difficult to evaluate directly or [[sample (statistics)\|sample]]. In particular, whereas Monte Carlo techniques provide a numerical approximation to the exact posterior using a set of samples, Variational Bayes provides a locally-optimal, exact analytical solution to an approximation of the posterior. Line 66: </math> As the ''log -[[model evidence\|evidence]]'' <math>\log P(\mathbf{X})</math> is fixed with respect to <math>Q</math>, maximizing the final term <math>\mathcal{L}(Q)</math> minimizes the KL divergence of <math>Q</math> from <math>P</math>. By appropriate choice of <math>Q</math>, <math>\mathcal{L}(Q)</math> becomes tractable to compute and to maximize. Hence we have both an analytical approximation <math>Q</math> for the posterior <math>P(\mathbf{Z}\mid \mathbf{X})</math>, and a lower bound <math>\mathcal{L}(Q)</math> for the log-evidence <math>\log P(\mathbf{X})</math> (since the KL-divergence is non-negative). The lower bound <math>\mathcal{L}(Q)</math> is known as the (negative) '''variational free energy''' in analogy with [[thermodynamic free energy]] because it can also be expressed as a negative energy <math>\operatorname{E}_{Q}[\log P(\mathbf{Z},\mathbf{X})]</math> plus the [[Entropy (information theory) \| entropy]] of <math>Q</math>. The term <math>\mathcal{L}(Q)</math> is also known as '''Evidence Lower BOund''', abbreviated as [[Evidence lower bound \| '''ELBO''']], to emphasize that it is a lower bound on the log-evidence of the data. === Proofs === By the generalized Pythagorean theorem of [[Bregman divergence]], of which KL-divergence is a special case, it can be shown that:<ref name=Tran2018>{{cite arxiv\|title=Copula Variational Bayes inference via information geometry\|first1=Viet Hung\|last1=Tran\|year=2018\|eprint=1803.10998\|class=cs.IT}}</ref><ref name="Martin2014"/> [[File:Bregman_divergence_Pythagorean.png\|right\|300px\|thumb\|Generalized Pythagorean theorem for [[Bregman divergence]] .<ref name="Martin2014">{{cite journal \|last1=Adamčík \|first1=Martin \|title=The Information Geometry of Bregman Divergences and Some Applications in Multi-Expert Reasoning \|journal=Entropy \|date=2014 \|volume=16 \|issue=12 \|pages=6338–6381\|bibcode=2014Entrp..16.6338A \|doi=10.3390/e16126338 \|doi-access=free }}</ref>]]

Variational Bayesian methods: Difference between revisions