Revision as of 22:05, 17 August 2006 edit Michael Hardy (talk \| contribs) Administrators 210,596 edits various cleanups as per Wikipedia:Manual of Style ← Previous edit		Revision as of 22:06, 17 August 2006 edit undo Michael Hardy (talk \| contribs) Administrators 210,596 edits No edit summary Next edit →
Line 10: #Update the parameters of the random mechanism based on the data to produce a "better" sample in the next iteration. This step involves minimizing the ''Cross Entropy'' or [[Kullback-Leibler divergence]]. ===Estimation via ~~Importance~~importance ~~Sampling=~~sampling== Consider the general problem of estimating the quantity <math>\ell = \mathbb{E}_{\mathbf{u}}[H(\mathbf{X})] = \int H(\mathbf{x})\, f(\mathbf{x}; \mathbf{u})\, \textrm{d}\mathbf{x}</math>, where <math>H</math> is some ''performance function'' and <math>f(\mathbf{x};\mathbf{u})</math> is a member of some parametric family of distributions. Using [[importance sampling]] this quantity can be estimated as <math>\hat{\ell} = \frac{1}{N} \sum_{i=1}^N H(\mathbf{X}_i) \frac{f(\mathbf{X}_i; \mathbf{u})}{g(\mathbf{X}_i)}</math>, where <math>\mathbf{X}_1,\dots,\mathbf{X}_N</math> is a random sample from <math>g\,</math>. For positive <math>H</math>, the theoretically ''optimal'' importance sampling [[probability density function\|density]] (pdf)is given by <math> g^(\mathbf{x}) = H(\mathbf{x}) f(\mathbf{x};\mathbf{u})/\ell</math>. This, however, depends on the unknown <math>\ell</math>. The CE method aims to approximate the optimal pdf by adaptively selecting members of the parametric family that are closest (in the [[Kullback-Leibler divergence\|Kullback-Leibler]] sense) to the optimal pdf <math>g^</math>. ===Generic CE algorithm=== 1. Choose initial parameter vector <math>\mathbf{v}^{(0)}</math>; set t = 1. 2. Generate a random sample <math>\mathbf{X}_1,\dots,\mathbf{X}_N</math> from <math>f(\cdot;\mathbf{v}^{(t-1)})</math> Line 26: * When <math>H(\mathbf{X}) = \mathrm{I}_{\{\mathbf{x}\in A\}}</math> and <math>f(\mathbf{X}_i;\mathbf{u}) = f(\mathbf{X}_i;\mathbf{v}^{(t-1)})</math>, then <math>\mathbf{v}^{(t)}</math> corresponds to the [[Maximum likelihood\|maximum likelihood estimator]] based on those <math>\mathbf{X}_k \in A</math>. === Continuous optimization—example=== The same CE algorithm can be used for optimization, rather than estimation. Suppose the problem is to maximize some function <math>S(x)</math>, for example, Line 43: This yields the following randomized algorithm for this problem. ====Pseudo-code==== 1. mu:=-6; sigma2:=100; t:=0; maxits=100; // Initialize parameters 2. N:=100; Ne:=10; //

Cross-entropy method: Difference between revisions