Cross-entropy method: Difference between revisions

Content deleted Content added
m External links: clean up; http->https (see this RfC) using AWB
Software implementations: Add CEopt Matlab package to the list.
 
(38 intermediate revisions by 23 users not shown)
Line 1:
{{Short description|Monte Carlo method for importance sampling and optimization}}
{{Multiple issues|
The '''cross-entropy''' ('''CE''') '''method''' is a [[Monte Carlo method|Monte Carlo]] method for [[importance sampling]] and [[Optimization (mathematics)|optimization]]. It is applicable to both [[Combinatorial optimization|combinatorial]] and [[Continuous optimization|continuous]] problems, with either a static or noisy objective.
{{no footnotes|date=September 2013}}
{{refimprove|date=September 2013}}
}}
 
*The method approximates the optimal importance sampling estimator by repeating two phases:<ref>Rubinstein, R.Y., and Kroese, D.P. (2004)., ''The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning'', Springer-Verlag, New York {{ISBN|978-0-387-21240-1}}.</ref>
The '''cross-entropy''' ('''CE''') '''method''' attributed to [[Reuven Rubinstein]] is a general [[Monte Carlo method|Monte Carlo]] approach to
[[Combinatorial optimization|combinatorial]] and [[Continuous optimization|continuous]] multi-extremal [[Optimization (mathematics)|optimization]] and [[importance sampling]].
The method originated from the field of ''rare event simulation'', where
very small probabilities need to be accurately estimated, for example in network reliability analysis, queueing models, or performance analysis of telecommunication systems.
The CE method can be applied to static and noisy combinatorial optimization problems such as the [[traveling salesman problem]], the [[quadratic assignment problem]], [[Sequence alignment|DNA sequence alignment]], the [[Maxcut|max-cut]] problem and the buffer allocation problem, as well as continuous [[global optimization]] problems with many local [[extremum|extrema]].
 
#Draw a sample from a [[probability distribution]].
In a nutshell the CE method consists of two phases:
#Minimize the ''[[cross-entropy]]'' between this distribution and a target distribution to produce a better sample in the next iteration.
 
[[Reuven Rubinstein]] developed the method in the context of ''rare-event simulation'', where tiny probabilities must be estimated, for example in network reliability analysis, queueing models, or performance analysis of telecommunication systems. The method has also been applied to the [[traveling salesman problem|traveling salesman]], [[quadratic assignment problem|quadratic assignment]], [[Sequence alignment|DNA sequence alignment]], [[Maxcut|max-cut]] and buffer allocation problems.
 
#Generate a random data sample (trajectories, vectors, etc.) according to a specified mechanism.
#Update the parameters of the random mechanism based on the data to produce a "better" sample in the next iteration. This step involves minimizing the [[cross entropy|''cross-entropy'']] or [[Kullback–Leibler divergence]].
==Estimation via importance sampling==
Consider the general problem of estimating the quantity
Consider the general problem of estimating the quantity <math>\ell = \mathbb{E}_{\mathbf{u}}[H(\mathbf{X})] = \int H(\mathbf{x})\, f(\mathbf{x}; \mathbf{u})\, \textrm{d}\mathbf{x}</math>, where <math>H</math> is some ''performance function'' and <math>f(\mathbf{x};\mathbf{u})</math> is a member of some [[parametric family]] of distributions. Using [[importance sampling]] this quantity can be estimated as <math>\hat{\ell} = \frac{1}{N} \sum_{i=1}^N H(\mathbf{X}_i) \frac{f(\mathbf{X}_i; \mathbf{u})}{g(\mathbf{X}_i)}</math>, where <math>\mathbf{X}_1,\dots,\mathbf{X}_N</math> is a random sample from <math>g\,</math>. For positive <math>H</math>, the theoretically ''optimal'' importance sampling [[probability density function|density]] (pdf) is given by
 
<math> g^*(\mathbf{x}) = H(\mathbf{x}) f(\mathbf{x};\mathbf{u})/\ell</math>. This, however, depends on the unknown <math>\ell</math>. The CE method aims to approximate the optimal PDF by adaptively selecting members of the parametric family that are closest (in the [[Kullback–Leibler divergence|Kullback–Leibler]] sense) to the optimal PDF <math>g^*</math>.
<math>\ell = \mathbb{E}_{\mathbf{u}}[H(\mathbf{X})] = \int H(\mathbf{x})\, f(\mathbf{x}; \mathbf{u})\, \textrm{d}\mathbf{x}</math>,
 
where <math>H</math> is some ''performance function'' and <math>f(\mathbf{x};\mathbf{u})</math> is a member of some [[parametric family]] of distributions. Using [[importance sampling]] this quantity can be estimated as
 
<math>\hat{\ell} = \frac{1}{N} \sum_{i=1}^N H(\mathbf{X}_i) \frac{f(\mathbf{X}_i; \mathbf{u})}{g(\mathbf{X}_i)}</math>,
 
where <math>\mathbf{X}_1,\dots,\mathbf{X}_N</math> is a random sample from <math>g\,</math>. For positive <math>H</math>, the theoretically ''optimal'' importance sampling [[probability density function|density]] (PDF) is given by
 
<math> g^*(\mathbf{x}) = H(\mathbf{x}) f(\mathbf{x};\mathbf{u})/\ell</math>.
 
<math> g^*(\mathbf{x}) = H(\mathbf{x}) f(\mathbf{x};\mathbf{u})/\ell</math>. This, however, depends on the unknown <math>\ell</math>. The CE method aims to approximate the optimal PDF by adaptively selecting members of the parametric family that are closest (in the [[Kullback–Leibler divergence|Kullback–Leibler]] sense) to the optimal PDF <math>g^*</math>.
 
==Generic CE algorithm==
# Choose initial parameter vector <math>\mathbf{v}^{(0)}</math>; set t = 1.
# Generate a random sample <math>\mathbf{X}_1,\dots,\mathbf{X}_N</math> from <math>f(\cdot;\mathbf{v}^{(t-1)})</math>
# Solve for <math>\mathbf{v}^{(t)}</math>, where<br><math>\mathbf{v}^{(t)} = \mathop{\textrm{argmax}}_{\mathbf{uv}} \frac{1}{N} \sum_{i=1}^N H(\mathbf{X}_i)\frac{f(\mathbf{X}_i;\mathbf{u})}{f(\mathbf{X}_i;\mathbf{v}^{(t-1)})} \log f(\mathbf{X}_i;\mathbf{v}^{(t-1)})</math>
# If convergence is reached then '''stop'''; otherwise, increase t by 1 and reiterate from step 2.
 
Line 32 ⟶ 37:
== Continuous optimization&mdash;example==
The same CE algorithm can be used for optimization, rather than estimation.
Suppose the problem is to maximize some function <math>S(x)</math>, for example,
<math>S(x) = \textrm{e}^{-(x-2)^2} + 0.8\,\textrm{e}^{-(x+2)^2}</math>.
To apply CE, one considers first the ''associated stochastic problem'' of estimating
Line 47 ⟶ 52:
This yields the following randomized algorithm that happens to coincide with the so-called Estimation of Multivariate Normal Algorithm (EMNA), an [[estimation of distribution algorithm]].
 
===Pseudo-codePseudocode===
1. mu:=-6; sigma2:=100; t:=0; maxits=100; ''// Initialize parameters''
&mu; := −6
2. N:=100; Ne:=10; //
&sigma;2 := 100
3. while t < maxits and sigma2 > epsilon // While maxits not exceeded and not converged
t := 0
4. X = SampleGaussian(mu,sigma2,N); // Obtain N samples from current sampling distribution
maxits := 100
5. S = exp(-(X-2)^2) + 0.8 exp(-(X+2)^2); // Evaluate objective function at sampled points
N := 100
6. X = sort(X,S); // Sort X by objective function values (in descending order)
Ne := 10
7. mu = mean(X(1:Ne)); sigma2=var(X(1:Ne)); // Update parameters of sampling distribution
3. while t < maxits and sigma2 > epsilon ''// While maxits not exceeded and not converged''
8. t = t+1; // Increment iteration counter
'''while''' t < maxits '''and''' &sigma;2 > &epsilon; '''do'''
9. return mu // Return mean of final sampling distribution as solution
4. X = SampleGaussian(mu,sigma2,N); ''// Obtain N samples from current sampling distribution''
X := SampleGaussian(&mu;, &sigma;2, N)
5. S = exp(-(X-2)^2) + 0.8 exp(-(X+2)^2); ''// Evaluate objective function at sampled points''
S := exp(−(X − 2) ^ 2) + 0.8 exp(−(X + 2) ^ 2)
6. X = sort(X,S); ''// Sort X by objective function values (in descending order)''
X := sort(X, S)
''// Update parameters of sampling distribution via elite samples''
&mu; := mean(X(1:Ne))
&sigma;2 := var(X(1:Ne))
t := t + 1
9. return mu ''// Return mean of final sampling distribution as solution''
'''return''' &mu;
 
==Related methods==
* [[Simulated annealing]]
* [[Genetic algorithms]]
* [[Harmony search]]
* [[Estimation of distribution algorithm]]
* [[Tabu search]]
* [[Natural Evolution Strategy]]
* [[Ant colony optimization algorithms]]
 
==See also==
* [[Cross entropy]]
* [[Kullback–Leibler divergence]]
* [[Randomized algorithm]]
* [[Importance sampling]]
 
== Journal papers ==
==References==
* De Boer, P.-T., Kroese, D.P., Mannor, S. and Rubinstein, R.Y. (2005). A Tutorial on the Cross-Entropy Method. ''Annals of Operations Research'', '''134''' (1), 19–67.[http://www.maths.uq.edu.au/~kroese/ps/aortut.pdf]
*Rubinstein, R.Y. (1997). Optimization of Computer simulationSimulation Models with Rare Events, ''European Journal of OperationsOperational Research'', '''99''', 89–112.
*Rubinstein, R.Y., Kroese, D.P. (2004). ''The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning'', Springer-Verlag, New York.
 
==Software implementations==
==External links==
* [httphttps://www.cemethodceopt.org Homepage for the'''CEopt''' CEMatlab methodpackage]
* [https://cran.r-project.org/web/packages/CEoptim/index.html '''CEoptim''' R package]
* [https://www.nuget.org/packages/Novacta.Analytics '''Novacta.Analytics''' .NET library]
 
==References==
{{reflist}}
 
[[Category:Heuristics]]