Variational Bayesian methods: Difference between revisions

Content deleted Content added
Line 7:
#To derive a [[lower bound]] for the [[marginal likelihood]] (sometimes called the ''evidence'') of the observed data (i.e. the [[marginal probability]] of the data given the model, with marginalization performed over unobserved variables). This is typically used for performing [[model selection]], the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data. (See also the [[Bayes factor]] article.)
 
In the former purpose (that of approximating a posterior probability), variational Bayes is an alternative to [[Monte Carlo sampling]] methods — particularlymethods—particularly, [[Markov chain Monte Carlo]] methods such as [[Gibbs sampling]] — for—for taking a fully Bayesian approach to [[statistical inference]] over complex [[probability distribution|distributions]] that are difficult to evaluate directly or [[sample (statistics)|sample]]. In particular, whereas Monte Carlo techniques provide a numerical approximation to the exact posterior using a set of samples, variational Bayes provides a locally-optimal, exact analytical solution to an approximation of the posterior.
 
Variational Bayes can be seen as an extension of the EM ([[Expectation–maximization algorithm|expectation-maximization]] (EM) algorithm from [[maximum a posteriori estimation]] (MAP estimation) of the single most probable value of each parameter to fully Bayesian estimation which computes (an approximation to) the entire [[posterior distribution]] of the parameters and latent variables. As in EM, it finds a set of optimal parameter values, and it has the same alternating structure as does EM, based on a set of interlocked (mutually dependent) equations that cannot be solved analytically.
 
For many applications, variational Bayes produces solutions of comparable accuracy to Gibbs sampling at greater speed. However, deriving the set of equations used to update the parameters iteratively often requires a large amount of work compared with deriving the comparable Gibbs sampling equations. This is the case even for many models that are conceptually quite simple, as is demonstrated below in the case of a basic non-hierarchical model with only two parameters and no latent variables.