Content deleted Content added
ce |
Citation bot (talk | contribs) Removed URL that duplicated identifier. Removed access-date with no URL. Removed parameters. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 512/1032 |
||
(17 intermediate revisions by 10 users not shown) | |||
Line 1:
{{Short description|Iterative simulation method}}
'''Consensus-based optimization (CBO)'''<ref name=":0">{{Cite journal |last1=Pinnau |first1=René |last2=Totzeck |first2=Claudia |last3=Tse |first3=Oliver |last4=Martin |first4=Stephan |date=January 2017 |title=A consensus-based model for global optimization and its mean-field limit |url=https://www.worldscientific.com/doi/abs/10.1142/S0218202517400061 |journal=Mathematical Models and Methods in Applied Sciences |language=en |volume=27 |issue=1 |pages=183–204 |doi=10.1142/S0218202517400061 |arxiv=1604.05648 |s2cid=119296432 |issn=0218-2025}}</ref> is a multi-agent [[derivative-free optimization]] method, designed to obtain solutions for global optimization problems of the form <math display="block">\min_{x\in \cal{X}} f(x),</math>
[[File:CBORastrigin.gif|thumb|Behavior of CBO on the [[Rastrigin function]]. '''Blue:''' Particles, '''Pink:''' drift vectors and consensus point.]]
where <math>f:\mathcal{X}\to\R</math> denotes the objective function acting on the state space <math>\cal{X}</math>, which
==
Consider an ensemble of points <math>x_t = (x_t^1,\dots, x_t^N) \in {\cal{X}}^N</math>, dependent of the time <math>t\in[0,\infty)</math>. Then the update for the <math>i</math>th particle is formulated as a stochastic differential equation,
<math>dx^i_t = -\lambda\, \underbrace{(x^i_t-c_\alpha(x_t))\,dt}_{\text{consensus drift}} + \sigma \underbrace{D(x^i_t-c_{\alpha}(x_t))\,dB^i_t}_{\text{scaled diffusion}},</math> with the following components: * '''The consensus point''' <math>c_{\alpha}(x)</math>: The key idea of CBO is that in each step the particles “agree” on a common consensus point, by computing an average of their positions, weighted by their current objective function value <math display="block">c_\alpha(x_t) = \frac{1}{\sum_{i=1}^N \omega_\alpha(x^i_t)} \sum_{i=1}^N x^i_t\ \omega_\alpha(x^i_t), \quad\text{ with }\quad \omega_\alpha(\,\cdot\,) = \mathrm{exp}(-\alpha f(\,\cdot\,)).
</math>This point is then used in the '''drift''' term <math>x^i_t-c_\alpha(x_t)</math>, which moves each particle into the direction of the consensus point.
* '''Scaled noise:''' For each <math>t\geq 0</math> and <math>i=1,\dots,N</math>, we denote by <math>B^i_t</math> independent standard Brownian motions. The function <math>D:{\cal{X}}\to\R^s</math> incorporates the drift of the <math>i</math>th particle and determines the noise model. The most common choices are:
** ''Isotropic noise'', <math>D(\cdot) = \|\cdot \|</math>: In this case <math>s=1</math> and every component of the noise vector is scaled equally. This was used in the original version of the algorithm.<ref name=":0" />
** ''Anisotropic noise<ref>{{cite
* '''Hyperparameters:''' The parameter <math>\sigma \geq 0</math> scales the influence of the noise term. The parameter <math>\alpha \geq 0</math> determines the separation effect of the particles:<ref name=":0" />
** in the limit <math>\alpha\to 0</math> every particle is assigned the same weight and the consensus point is a regular mean.
** In the limit <math>\alpha\to\infty</math> the consensus point corresponds to the particle with the best objective value, completely ignoring the position of other points in the ensemble.
==
In practice, the SDE
== Variants ==▼
=== Sampling ===▼
Consensus-based optimization can be transformed into a sampling method<ref>{{Citation |last1=Carrillo |first1=J. A. |title=Consensus Based Sampling |date=2021-11-04 |arxiv=2106.02519 |last2=Hoffmann |first2=F. |last3=Stuart |first3=A. M. |last4=Vaes |first4=U.}}</ref> by modifying the noise term and choosing appropriate hyperparameters. Namely, one considers the following SDE
▲In practice, the SDE in {{EquationNote|2=(1)}} is discretized via the [[Euler–Maruyama method]] such that the following explicit update formula for the ensemble <math>x = (x^1,\dots,x^N)</math> is obtained,<math display="block">x^i \gets x^i-\lambda\, (x^i-c_\alpha(x))\,dt + \sigma D(x^i-c_{\alpha}(x))\, B^i.</math>If one can employ an efficient implementation of the [[LogSumExp]] functions, this can be beneficial for numerical stability of the consensus point computation. We refer to existing implementation in [[Python (programming language)|Python]] [https://pdips.github.io/CBXpy/] and [[Julia (programming language)|Julia]] [https://github.com/PdIPS/CBX.jl].
<math>dx^i_t = -(x^i_t-c_\alpha(x_t))\,dt + \sqrt{2 \tilde{\lambda}^{-1}\, C_\alpha(x_t)}\,dB^i_t,</math>
▲== Variants ==
where the weighted covariance matrix is defined as
<math>C_\alpha(x_t) := \frac{1}{\sum_{i=1}^N \omega_\alpha(x_t^i)}\sum_{i=1}^N (x_t^i - c(x_t)) \otimes (x_t^i - c(x_t)) \omega(x_t^i) </math>.
▲=== Sampling ===
If the parameters are chosen such that <math>\tilde{\lambda}^{-1} = (1 + \alpha)</math> the above scheme creates approximate samples of a probability distribution with a density, that is proportional to <math>\exp(-\alpha f)</math>.
=== Polarization ===
If the function <math> f</math> is multi-modal, i.e., has more than one global minimum, the standard CBO algorithm can only find one of these points. However, one can “polarize”<ref>{{
</math>In this case, the drift is a vector field over the state space <math>\cal{X}
</math>. Intuitively, particles are now not only attracted to other particles based on their objective value, but also based on their spatial locality. For a constant kernel function, the polarized version corresponds to standard CBO and is therefore a generalization. We briefly give some examples of common configurations:
Line 44 ⟶ 48:
</math> determines the communication radius of particles. This choice corresponds to a local convex regularization of the objective function <math>f
</math>.
*
</math>, together with no noise (i.e. <math>\sigma = 0
</math>) and an Euler–Maruyama discretization with step size <math>dt=1
</math>, corresponds to the mean-shift algorithm.
* '''Bounded confidence model''': When choosing a constant objective function, no noise model, but also the special kernel function <math>k(x,\tilde x) = 1_{\|x-\tilde x\| \leq \kappa}
</math>, the SDE in
== See also ==
* [[Particle Swarm Optimization]]
* [[Simulated annealing]]
▲* [[Ant colony optimization algorithms]]<br />
== References ==
{{reflist}}
[[Category:Optimization algorithms and methods]]
[[Category:Metaheuristics]]
|