Content deleted Content added
m Reverted edits by EitanPorat (talk) to last version by Maxeto0910 |
→Mean field approximation: - specify that expectation is over q^* |
||
Line 108:
:<math>Q(\mathbf{Z}) = \prod_{i=1}^M q_i(\mathbf{Z}_i\mid \mathbf{X})</math>
It can be shown using the [[calculus of variations]] (hence the name "variational Bayes") that the "best" distribution <math>q_j^{*}</math> for each of the factors <math>q_j</math> (in terms of the distribution minimizing the KL divergence, as described above)
:<math>q_j^{*}(\mathbf{Z}_j\mid \mathbf{X}) = \frac{e^{\operatorname{E}_{
where <math>\operatorname{E}_{
In practice, we usually work in terms of logarithms, i.e.:
:<math>\ln q_j^{*}(\mathbf{Z}_j\mid \mathbf{X}) = \operatorname{E}_{
The constant in the above expression is related to the [[normalizing constant]] (the denominator in the expression above for <math>q_j^{*}</math>) and is usually reinstated by inspection, as the rest of the expression can usually be recognized as being a known type of distribution (e.g. [[Gaussian distribution|Gaussian]], [[gamma distribution|gamma]], etc.).
Using the properties of expectations, the expression <math>\operatorname{E}_{
In other words, for each of the partitions of variables, by simplifying the expression for the distribution over the partition's variables and examining the distribution's functional dependency on the variables in question, the family of the distribution can usually be determined (which in turn determines the value of the constant). The formula for the distribution's parameters will be expressed in terms of the prior distributions' hyperparameters (which are known constants), but also in terms of expectations of functions of variables in other partitions. Usually these expectations can be simplified into functions of expectations of the variables themselves (i.e. the [[mean]]s); sometimes expectations of squared variables (which can be related to the [[variance]] of the variables), or expectations of higher powers (i.e. higher [[moment (mathematics)|moment]]s) also appear. In most cases, the other variables' distributions will be from known families, and the formulas for the relevant expectations can be looked up. However, those formulas depend on those distributions' parameters, which depend in turn on the expectations about other variables. The result is that the formulas for the parameters of each variable's distributions can be expressed as a series of equations with mutual, [[nonlinear]] dependencies among the variables. Usually, it is not possible to solve this system of equations directly. However, as described above, the dependencies suggest a simple iterative algorithm, which in most cases is guaranteed to converge. An example will make this process clearer.
|