Cross-entropy method: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 04:42, 30 June 2006 edit Jitse Niesen (talk \| contribs) Extended confirmed users 17,194 edits m moved Cross entropy method to Cross-entropy method: correct spelling, and the name used in article ← Previous edit		Latest revision as of 19:50, 23 April 2025 edit undo 179.148.106.148 (talk) →Software implementations: Add CEopt Matlab package to the list.
(79 intermediate revisions by 52 users not shown)
Line 1: {{Short description\|Monte Carlo method for importance sampling and optimization}} ~~The '''Cross-Entropy (CE) method''' attributed to Reuven Rubinstein is a general [[Monte_Carlo_method\|Monte Carlo]] approach to~~ The '''cross-entropy''' ('''CE''') '''method''' is a [[Monte Carlo method\|Monte Carlo]] method for [[importance sampling]] and [[Optimization (mathematics)\|optimization]]. It is applicable to both [[Combinatorial optimization\|combinatorial]] and [[Continuous optimization\|continuous]] problems, with either a static or noisy objective. ~~[[Combinatorial_optimization\|combinatorial]] and [[Continuous_optimization\|continuous]] multi-extremal [[Optimization_(mathematics)\|optimization]] and [[importance sampling]].~~ ~~The method originated from the field of ''rare event simulation'', where~~ ~~very small probabilities need to be accurately estimated, for example in network reliability analysis, queueing models, or performance analysis of telecommunication systems.~~ The CE method can be applied to static and noisy combinatorial optimization problems such as the [[Traveling_salesman_problem\|traveling salesman problem]], the [[Quadratic_assignment_problem\|quadratic assignment problem]], [[Sequence_alignment\|DNA sequence alignment]], the [[Maxcut\|max-cut]] problem and the buffer allocation problem, as well as continuous [[Global_optimization\|global optimization]] problems with many local [[Extrema\|extrema]]. The method approximates the optimal importance sampling estimator by repeating two phases:<ref>Rubinstein, R.Y., and Kroese, D.P. (2004)., ''The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning'', Springer-Verlag, New York {{ISBN\|978-0-387-21240-1}}.</ref>▼ ~~In a nutshell the CE method consists of two phases:~~ #Draw a sample from a [[probability distribution]]. ~~#Generate a random data sample (trajectories, vectors, etc.) according to a specified mechanism.~~ #~~Update~~Minimize the ~~parameters~~''[[cross-entropy]]'' ofbetween ~~the~~this ~~random~~distribution ~~mechanism~~and ~~based~~a ontarget ~~the data~~distribution to produce a "better" sample in the next iteration~~. This step involves minimizing the ''Cross Entropy'' or [[Kullback-Leibler divergence\|Kullback-Leibler]] divergence~~. [[Reuven Rubinstein]] developed the method in the context of ''rare-event simulation'', where tiny probabilities must be estimated, for example in network reliability analysis, queueing models, or performance analysis of telecommunication systems. The method has also been applied to the [[traveling salesman problem\|traveling salesman]], [[quadratic assignment problem\|quadratic assignment]], [[Sequence alignment\|DNA sequence alignment]], [[Maxcut\|max-cut]] and buffer allocation problems. ===Estimation via Importance Sampling===▼ Consider the general problem of estimating the quantity <math>\ell = \mathbb{E}_{\mathbf{u}}[H(\mathbf{X})] = \int H(\mathbf{x})\, f(\mathbf{x}; \mathbf{u})\, \textrm{d}\mathbf{x}</math>, where <math>H</math> is some ''performance function'' and <math>f(\mathbf{x};\mathbf{u})</math> is a member of some parametric family of distributions. Using [[importance sampling]] this quantity can be estimated as <math>\hat{\ell} = \frac{1}{N} \sum_{i=1}^N H(\mathbf{X}_i) \frac{f(\mathbf{X}_i; \mathbf{u})}{g(\mathbf{X}_i)}</math>, where <math>\mathbf{X}_1,\dots,\mathbf{X}_N</math> is a random sample from <math>g\,</math>. For positive <math>H</math>, the theoretically ''optimal'' importance sampling [[probability density function\|density]] (pdf)is given by <math> g^(\mathbf{x}) = H(\mathbf{x}) f(\mathbf{x};\mathbf{u})/\ell</math>. This, however, depends on the unknown <math>\ell</math>. The CE method aims to approximate the optimal pdf by adaptively selecting members of the parametric family that are closest (in the [[Kullback-Leibler divergence\|Kullback-Leibler]] sense) to the optimal pdf <math>g^</math>.▼ ▲===Estimation via ~~Importance~~importance ~~Sampling=~~sampling== ===Generic CE Algorithm===▼ Consider the general problem of estimating the quantity 1. Choose initial parameter vector <math>\mathbf{v}^{(0)}</math>; set t = 1.▼ 2. Generate a random sample <math>\mathbf{X}_1,\dots,\mathbf{X}_N</math> from <math>f(\cdot;\mathbf{v}^{(t-1)})</math>▼ ~~3. Solve for <math>\mathbf{v}^{(t)}</math>, where~~ <math>\mathbf{v}^{(t)} = \mathop{\textrm{argmax}}_{\mathbf{v}} \frac{1}{N} \sum_{i=1}^N H(\mathbf{X}_i) \frac{f(\mathbf{X}_i;\mathbf{u})}{f(\mathbf{X}_i;\mathbf{v}^{(t-1)})} \log f(\mathbf{X}_i;\mathbf{v})</math>▼ 4. If convergence is reached then '''stop'''; otherwise, increase t by 1 and reiterate from step 2.▼ <math>\ell = \mathbb{E}_{\mathbf{u}}[H(\mathbf{X})] = \int H(\mathbf{x})\, f(\mathbf{x}; \mathbf{u})\, \textrm{d}\mathbf{x}</math>, In several cases, the solution to step 3 can be found ''analytically''. Situations in which this occurs are▼ When <math>f\,</math> belongs to the [[Exponential_family\|natural exponential family]]▼ * When <math>f\,</math> is [[discrete]] with finite [[Support (mathematics)\|support]]▼ * When <math>H(\mathbf{X}) = \mathrm{I}_{\{\mathbf{x}\in A\}}</math> and <math>f(\mathbf{X}_i;\mathbf{u}) = f(\mathbf{X}_i;\mathbf{v}^{(t-1)})</math>, then <math>\mathbf{v}^{(t)}</math> corresponds to the [[Maximum_likelihood\|Maximum Likelihood Estimator]] based on those <math>\mathbf{X}_k \in A</math>. ▼ where <math>H</math> is some ''performance function'' and <math>f(\mathbf{x};\mathbf{u})</math> is a member of some [[parametric family]] of distributions. Using [[importance sampling]] this quantity can be estimated as <math>\hat{\ell} = \frac{1}{N} \sum_{i=1}^N H(\mathbf{X}_i) \frac{f(\mathbf{X}_i; \mathbf{u})}{g(\mathbf{X}_i)}</math>, ~~=== Continuous Optimization - Example===~~ where <math>\mathbf{X}_1,\dots,\mathbf{X}_N</math> is a random sample from <math>g\,</math>. For positive <math>H</math>, the theoretically ''optimal'' importance sampling [[probability density function\|density]] (PDF) is given by <math> g^(\mathbf{x}) = H(\mathbf{x}) f(\mathbf{x};\mathbf{u})/\ell</math>. ▲~~<math> g^(\mathbf{x}) = H(\mathbf{x}) f(\mathbf{x};\mathbf{u})/\ell</math>.~~ This, however, depends on the unknown <math>\ell</math>. The CE method aims to approximate the optimal ~~pdf~~PDF by adaptively selecting members of the parametric family that are closest (in the [[~~Kullback-Leibler~~Kullback–Leibler divergence\|~~Kullback-Leibler~~Kullback–Leibler]] sense) to the optimal ~~pdf~~PDF <math>g^</math>. ▲===Generic CE ~~Algorithm=~~algorithm== ▲ 1.# Choose initial parameter vector <math>\mathbf{v}^{(0)}</math>; set t = 1. ▲ 2.# Generate a random sample <math>\mathbf{X}_1,\dots,\mathbf{X}_N</math> from <math>f(\cdot;\mathbf{v}^{(t-1)})</math> ▲# Solve for <math>\mathbf{v}^{(t)}</math>, where<br><math>\mathbf{v}^{(t)} = \mathop{\textrm{argmax}}_{\mathbf{v}} \frac{1}{N} \sum_{i=1}^N H(\mathbf{X}_i) \frac{f(\mathbf{X}_i;\mathbf{u})}{f(\mathbf{X}_i;\mathbf{v}^{(t-1)})} \log f(\mathbf{X}_i;\mathbf{v})</math> ▲ 4.# If convergence is reached then '''stop'''; otherwise, increase t by 1 and reiterate from step 2. ▲In several cases, the solution to step 3 can be found ''analytically''. Situations in which this occurs are ▲ When <math>f\,</math> belongs to the [[~~Exponential_family~~Exponential family\|natural exponential family]] ▲* When <math>f\,</math> is [[discrete space\|discrete]] with finite [[Support (mathematics)\|support]] ▲* When <math>H(\mathbf{X}) = \mathrm{I}_{\{\mathbf{x}\in A\}}</math> and <math>f(\mathbf{X}_i;\mathbf{u}) = f(\mathbf{X}_i;\mathbf{v}^{(t-1)})</math>, then <math>\mathbf{v}^{(t)}</math> corresponds to the [[~~Maximum_likelihood\|~~Maximum ~~Likelihood~~likelihood\|maximum likelihood ~~Estimator~~estimator]] based on those <math>\mathbf{X}_k \in A</math>. == Continuous optimization—example== The same CE algorithm can be used for optimization, rather than estimation. Suppose the problem is to maximize some function <math>S~~(x)~~</math>, for example, <math>S(x) = \textrm{e}^{-(x-2)^2} + 0.8\,\textrm{e}^{-(x+2)^2}</math>. To apply CE, one considers first the ''associated stochastic problem'' of estimating Line 43 ⟶ 50: parametric family are the sample mean and sample variance corresponding to the ''elite samples'', which are those samples that have objective function value <math>\geq\gamma</math>. The worst of the elite samples is then used as the level parameter for the next iteration. This yields the following randomized algorithm ~~for~~that ~~this~~happens ~~problem~~to coincide with the so-called Estimation of Multivariate Normal Algorithm (EMNA), an [[estimation of distribution algorithm]]. ===Pseudocode=== ~~====Pseudo-code====~~ ~~1. mu:=-6; sigma2:=100; t:=0; maxits=100;~~ ''// Initialize parameters'' μ := −6 ~~2. N:=100; Ne:=10; //~~ σ2 := 100 ~~3. while t < maxits and sigma2 > epsilon // While not converged and maxits not exceeded~~ t := 0 4. X = SampleGaussian(mu,sigma2,N); // Obtain N samples from current sampling distribution▼ maxits := 100 5. S = exp(-(X-2)^2) + 0.8 exp(-(X+2)^2); // Evaluate objective function at sampled points▼ N := 100 6. X = sort(X,S); // Sort X by objective function values (in descending order)▼ Ne := 10 ~~7. mu = mean(X(1:Ne)); sigma2=var(X(1:Ne)); // Update parameters of sampling distribution~~ ''// While maxits not exceeded and not converged'' ~~8. t = t+1; // Increment iteration counter~~ '''while''' t < maxits '''and''' σ2 > ε '''do''' 9. return mu // Return mean of final sampling distribution as solution▼ ▲ ~~4. X = SampleGaussian(mu,sigma2,N);~~ ''// Obtain N samples from current sampling distribution'' X := SampleGaussian(μ, σ2, N) ▲ 5. ~~S = exp(-(X-2)^2) + 0.8 exp(-(X+2)^2);~~ ''// Evaluate objective function at sampled points'' S := exp(−(X − 2) ^ 2) + 0.8 exp(−(X + 2) ^ 2) ▲ ~~6. X = sort(X,S);~~ ''// Sort X by objective function values (in descending order)'' X := sort(X, S) ''// Update parameters of sampling distribution via elite samples'' μ := mean(X(1:Ne)) σ2 := var(X(1:Ne)) t := t + 1 ▲ ~~9. return mu~~ ''// Return mean of final sampling distribution as solution'' '''return''' μ ==Related methods== * [[Simulated annealing]] * [[Genetic algorithms]] * [[~~Tabu~~Harmony search]] * [[Estimation of distribution algorithm]] * [[Tabu search]] * [[Natural Evolution Strategy]] * [[Ant colony optimization algorithms]] ==See also== * [[Cross entropy]] * [[~~Kullback-Leibler~~Kullback–Leibler divergence]] * [[Randomized algorithm]] * [[Importance sampling]] == Journal papers == ==References==▼ * De Boer, P.-T., Kroese, D.P., Mannor, S. and Rubinstein, R.Y. (2005). A Tutorial on the Cross-Entropy Method. ''Annals of Operations Research'', '''134''' (1), ~~19--67~~19–67.[http://www.maths.uq.edu.au/~kroese/ps/aortut.pdf] Rubinstein, R.Y. (1997). Optimization of Computer ~~simulation~~Simulation Models with Rare Events, ''European Journal of ~~Operations~~Operational Research'', '''99''', ~~89-112~~89–112. ▲Rubinstein, R.Y., Kroese, D.P. (2004). ''The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning'', Springer-Verlag, New York. ==Software implementations== ~~==External links==~~ * [https://ceopt.org '''CEopt''' Matlab package] [http://www.cemethod.org/ Homepage for the CE method] [https://cran.r-project.org/web/packages/CEoptim/index.html '''CEoptim''' R package] * [https://www.nuget.org/packages/Novacta.Analytics '''Novacta.Analytics''' .NET library] ▲==References== {{reflist}} [[Category:Heuristics]] [[Category:Optimization algorithms and methods]] [[Category:Monte Carlo methods]] [[Category:Machine learning]]