Cross-entropy method: Difference between revisions

Content deleted Content added
various cleanups as per Wikipedia:Manual of Style
No edit summary
Line 10:
#Update the parameters of the random mechanism based on the data to produce a "better" sample in the next iteration. This step involves minimizing the ''Cross Entropy'' or [[Kullback-Leibler divergence]].
===Estimation via Importanceimportance Sampling=sampling==
Consider the general problem of estimating the quantity <math>\ell = \mathbb{E}_{\mathbf{u}}[H(\mathbf{X})] = \int H(\mathbf{x})\, f(\mathbf{x}; \mathbf{u})\, \textrm{d}\mathbf{x}</math>, where <math>H</math> is some ''performance function'' and <math>f(\mathbf{x};\mathbf{u})</math> is a member of some parametric family of distributions. Using [[importance sampling]] this quantity can be estimated as <math>\hat{\ell} = \frac{1}{N} \sum_{i=1}^N H(\mathbf{X}_i) \frac{f(\mathbf{X}_i; \mathbf{u})}{g(\mathbf{X}_i)}</math>, where <math>\mathbf{X}_1,\dots,\mathbf{X}_N</math> is a random sample from <math>g\,</math>. For positive <math>H</math>, the theoretically ''optimal'' importance sampling [[probability density function|density]] (pdf)is given by
<math> g^*(\mathbf{x}) = H(\mathbf{x}) f(\mathbf{x};\mathbf{u})/\ell</math>. This, however, depends on the unknown <math>\ell</math>. The CE method aims to approximate the optimal pdf by adaptively selecting members of the parametric family that are closest (in the [[Kullback-Leibler divergence|Kullback-Leibler]] sense) to the optimal pdf <math>g^*</math>.
 
===Generic CE algorithm===
1. Choose initial parameter vector <math>\mathbf{v}^{(0)}</math>; set t = 1.
2. Generate a random sample <math>\mathbf{X}_1,\dots,\mathbf{X}_N</math> from <math>f(\cdot;\mathbf{v}^{(t-1)})</math>
Line 26:
* When <math>H(\mathbf{X}) = \mathrm{I}_{\{\mathbf{x}\in A\}}</math> and <math>f(\mathbf{X}_i;\mathbf{u}) = f(\mathbf{X}_i;\mathbf{v}^{(t-1)})</math>, then <math>\mathbf{v}^{(t)}</math> corresponds to the [[Maximum likelihood|maximum likelihood estimator]] based on those <math>\mathbf{X}_k \in A</math>.
 
=== Continuous optimization&mdash;example===
The same CE algorithm can be used for optimization, rather than estimation.
Suppose the problem is to maximize some function <math>S(x)</math>, for example,
Line 43:
This yields the following randomized algorithm for this problem.
 
====Pseudo-code====
1. mu:=-6; sigma2:=100; t:=0; maxits=100; // Initialize parameters
2. N:=100; Ne:=10; //