Cross-entropy method: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 21:25, 19 July 2016 edit Bender235 (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers, Rollbackers, Template editors 472,805 edits m →External links: clean up; http->https (see this RfC) using AWB ← Previous edit		Latest revision as of 19:50, 23 April 2025 edit undo 179.148.106.148 (talk) →Software implementations: Add CEopt Matlab package to the list.
(38 intermediate revisions by 23 users not shown)
Line 1: {{Short description\|Monte Carlo method for importance sampling and optimization}} ~~{{Multiple issues\|~~ The '''cross-entropy''' ('''CE''') '''method''' is a [[Monte Carlo method\|Monte Carlo]] method for [[importance sampling]] and [[Optimization (mathematics)\|optimization]]. It is applicable to both [[Combinatorial optimization\|combinatorial]] and [[Continuous optimization\|continuous]] problems, with either a static or noisy objective. ~~{{no footnotes\|date=September 2013}}~~ ~~{{refimprove\|date=September 2013}}~~ }} The method approximates the optimal importance sampling estimator by repeating two phases:<ref>Rubinstein, R.Y., and Kroese, D.P. (2004)., ''The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning'', Springer-Verlag, New York {{ISBN\|978-0-387-21240-1}}.</ref>▼ ~~The '''cross-entropy''' ('''CE''') '''method''' attributed to [[Reuven Rubinstein]] is a general [[Monte Carlo method\|Monte Carlo]] approach to~~ ~~[[Combinatorial optimization\|combinatorial]] and [[Continuous optimization\|continuous]] multi-extremal [[Optimization (mathematics)\|optimization]] and [[importance sampling]].~~ ~~The method originated from the field of ''rare event simulation'', where~~ ~~very small probabilities need to be accurately estimated, for example in network reliability analysis, queueing models, or performance analysis of telecommunication systems.~~ The CE method can be applied to static and noisy combinatorial optimization problems such as the [[traveling salesman problem]], the [[quadratic assignment problem]], [[Sequence alignment\|DNA sequence alignment]], the [[Maxcut\|max-cut]] problem and the buffer allocation problem, as well as continuous [[global optimization]] problems with many local [[extremum\|extrema]]. #Draw a sample from a [[probability distribution]]. ~~In a nutshell the CE method consists of two phases:~~ #Minimize the ''[[cross-entropy]]'' between this distribution and a target distribution to produce a better sample in the next iteration. [[Reuven Rubinstein]] developed the method in the context of ''rare-event simulation'', where tiny probabilities must be estimated, for example in network reliability analysis, queueing models, or performance analysis of telecommunication systems. The method has also been applied to the [[traveling salesman problem\|traveling salesman]], [[quadratic assignment problem\|quadratic assignment]], [[Sequence alignment\|DNA sequence alignment]], [[Maxcut\|max-cut]] and buffer allocation problems. ~~#Generate a random data sample (trajectories, vectors, etc.) according to a specified mechanism.~~ #Update the parameters of the random mechanism based on the data to produce a "better" sample in the next iteration. This step involves minimizing the [[cross entropy\|''cross-entropy'']] or [[Kullback–Leibler divergence]]. ==Estimation via importance sampling== Consider the general problem of estimating the quantity Consider the general problem of estimating the quantity <math>\ell = \mathbb{E}_{\mathbf{u}}[H(\mathbf{X})] = \int H(\mathbf{x})\, f(\mathbf{x}; \mathbf{u})\, \textrm{d}\mathbf{x}</math>, where <math>H</math> is some ''performance function'' and <math>f(\mathbf{x};\mathbf{u})</math> is a member of some [[parametric family]] of distributions. Using [[importance sampling]] this quantity can be estimated as <math>\hat{\ell} = \frac{1}{N} \sum_{i=1}^N H(\mathbf{X}_i) \frac{f(\mathbf{X}_i; \mathbf{u})}{g(\mathbf{X}_i)}</math>, where <math>\mathbf{X}_1,\dots,\mathbf{X}_N</math> is a random sample from <math>g\,</math>. For positive <math>H</math>, the theoretically ''optimal'' importance sampling [[probability density function\|density]] (pdf) is given by <math> g^(\mathbf{x}) = H(\mathbf{x}) f(\mathbf{x};\mathbf{u})/\ell</math>. This, however, depends on the unknown <math>\ell</math>. The CE method aims to approximate the optimal PDF by adaptively selecting members of the parametric family that are closest (in the [[Kullback–Leibler divergence\|Kullback–Leibler]] sense) to the optimal PDF <math>g^</math>.▼ <math>\ell = \mathbb{E}_{\mathbf{u}}[H(\mathbf{X})] = \int H(\mathbf{x})\, f(\mathbf{x}; \mathbf{u})\, \textrm{d}\mathbf{x}</math>, where <math>H</math> is some ''performance function'' and <math>f(\mathbf{x};\mathbf{u})</math> is a member of some [[parametric family]] of distributions. Using [[importance sampling]] this quantity can be estimated as <math>\hat{\ell} = \frac{1}{N} \sum_{i=1}^N H(\mathbf{X}_i) \frac{f(\mathbf{X}_i; \mathbf{u})}{g(\mathbf{X}_i)}</math>, where <math>\mathbf{X}_1,\dots,\mathbf{X}_N</math> is a random sample from <math>g\,</math>. For positive <math>H</math>, the theoretically ''optimal'' importance sampling [[probability density function\|density]] (PDF) is given by <math> g^(\mathbf{x}) = H(\mathbf{x}) f(\mathbf{x};\mathbf{u})/\ell</math>. ▲~~<math> g^(\mathbf{x}) = H(\mathbf{x}) f(\mathbf{x};\mathbf{u})/\ell</math>.~~ This, however, depends on the unknown <math>\ell</math>. The CE method aims to approximate the optimal PDF by adaptively selecting members of the parametric family that are closest (in the [[Kullback–Leibler divergence\|Kullback–Leibler]] sense) to the optimal PDF <math>g^</math>. ==Generic CE algorithm== # Choose initial parameter vector <math>\mathbf{v}^{(0)}</math>; set t = 1. # Generate a random sample <math>\mathbf{X}_1,\dots,\mathbf{X}_N</math> from <math>f(\cdot;\mathbf{v}^{(t-1)})</math> # Solve for <math>\mathbf{v}^{(t)}</math>, where<br><math>\mathbf{v}^{(t)} = \mathop{\textrm{argmax}}_{\mathbf{uv}} \frac{1}{N} \sum_{i=1}^N H(\mathbf{X}_i)\frac{f(\mathbf{X}_i;\mathbf{u})}{f(\mathbf{X}_i;\mathbf{v}^{(t-1)})} \log f(\mathbf{X}_i;\mathbf{v~~}^{(t-1)~~})</math> # If convergence is reached then '''stop'''; otherwise, increase t by 1 and reiterate from step 2. Line 32 ⟶ 37: == Continuous optimization—example== The same CE algorithm can be used for optimization, rather than estimation. Suppose the problem is to maximize some function <math>S~~(x)~~</math>, for example, <math>S(x) = \textrm{e}^{-(x-2)^2} + 0.8\,\textrm{e}^{-(x+2)^2}</math>. To apply CE, one considers first the ''associated stochastic problem'' of estimating Line 47 ⟶ 52: This yields the following randomized algorithm that happens to coincide with the so-called Estimation of Multivariate Normal Algorithm (EMNA), an [[estimation of distribution algorithm]]. ===~~Pseudo-code~~Pseudocode=== ~~1. mu:=-6; sigma2:=100; t:=0; maxits=100;~~ ''// Initialize parameters'' μ := −6 ~~2. N:=100; Ne:=10; //~~ σ2 := 100 3. while t < maxits and sigma2 > epsilon // While maxits not exceeded and not converged▼ t := 0 4. X = SampleGaussian(mu,sigma2,N); // Obtain N samples from current sampling distribution▼ maxits := 100 5. S = exp(-(X-2)^2) + 0.8 exp(-(X+2)^2); // Evaluate objective function at sampled points▼ N := 100 6. X = sort(X,S); // Sort X by objective function values (in descending order)▼ Ne := 10 ~~7. mu = mean(X(1:Ne)); sigma2=var(X(1:Ne)); // Update parameters of sampling distribution~~ ▲ ~~3. while t < maxits and sigma2 > epsilon~~ ''// While maxits not exceeded and not converged'' ~~8. t = t+1; // Increment iteration counter~~ '''while''' t < maxits '''and''' σ2 > ε '''do''' 9. return mu // Return mean of final sampling distribution as solution▼ ▲ ~~4. X = SampleGaussian(mu,sigma2,N);~~ ''// Obtain N samples from current sampling distribution'' X := SampleGaussian(μ, σ2, N) ▲ 5. ~~S = exp(-(X-2)^2) + 0.8 exp(-(X+2)^2);~~ ''// Evaluate objective function at sampled points'' S := exp(−(X − 2) ^ 2) + 0.8 exp(−(X + 2) ^ 2) ▲ ~~6. X = sort(X,S);~~ ''// Sort X by objective function values (in descending order)'' X := sort(X, S) ''// Update parameters of sampling distribution via elite samples'' μ := mean(X(1:Ne)) σ2 := var(X(1:Ne)) t := t + 1 ▲ ~~9. return mu~~ ''// Return mean of final sampling distribution as solution'' '''return''' μ ==Related methods== * [[Simulated annealing]] * [[Genetic algorithms]] * [[Harmony search]] * [[Estimation of distribution algorithm]] * [[Tabu search]] * [[Natural Evolution Strategy]] * [[Ant colony optimization algorithms]] ==See also== * [[Cross entropy]] * [[Kullback–Leibler divergence]] * [[Randomized algorithm]] * [[Importance sampling]] == Journal papers == ==References==▼ * De Boer, P.-T., Kroese, D.P., Mannor, S. and Rubinstein, R.Y. (2005). A Tutorial on the Cross-Entropy Method. ''Annals of Operations Research'', '''134''' (1), 19–67.[http://www.maths.uq.edu.au/~kroese/ps/aortut.pdf] Rubinstein, R.Y. (1997). Optimization of Computer ~~simulation~~Simulation Models with Rare Events, ''European Journal of ~~Operations~~Operational Research'', '''99''', 89–112. ▲Rubinstein, R.Y., Kroese, D.P. (2004). ''The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning'', Springer-Verlag, New York. ==Software implementations== ~~==External links==~~ * [~~http~~https://~~www.cemethod~~ceopt.org ~~Homepage for the~~'''CEopt''' CEMatlab ~~method~~package] * [https://cran.r-project.org/web/packages/CEoptim/index.html '''CEoptim''' R package] * [https://www.nuget.org/packages/Novacta.Analytics '''Novacta.Analytics''' .NET library] ▲==References== {{reflist}} [[Category:Heuristics]]