Content deleted Content added
Dummy edit to reset G13 clock after undeletion (rfud-helper) |
Added CBS |
||
Line 5:
'''Consensus-based optimization (CBO)'''<ref name=":0">{{Cite journal |last1=Pinnau |first1=René |last2=Totzeck |first2=Claudia |last3=Tse |first3=Oliver |last4=Martin |first4=Stephan |date=January 2017 |title=A consensus-based model for global optimization and its mean-field limit |url=https://www.worldscientific.com/doi/abs/10.1142/S0218202517400061 |journal=Mathematical Models and Methods in Applied Sciences |language=en |volume=27 |issue=1 |pages=183–204 |doi=10.1142/S0218202517400061 |arxiv=1604.05648 |s2cid=119296432 |issn=0218-2025}}</ref> is a multi-agent [[derivative-free optimization]] method, designed to obtain solutions for global optimization problems of the form <math display="block">\min_{x\in \cal{X}} f(x),</math>
[[File:CBORastrigin.gif|thumb|Behavior of CBO on the [[Rastrigin function]]. '''Blue:''' Particles, '''Pink:''' drift vectors and consensus point.]]
where <math>f:\mathcal{X}\to\R</math> denotes the objective function acting on the state space <math>\cal{X}</math>, which
== The algorithm ==
Line 19:
** in the limit <math>\alpha\to 0</math> every particle is assigned the same weight and the consensus point is a regular mean.
** In the limit <math>\alpha\to\infty</math> the consensus point corresponds to the particle with the best objective value, completely ignoring the position of other points in the ensemble.
== Notes on Implementation ==
Line 30 ⟶ 28:
=== Sampling ===
Consensus-based optimization can be transformed into a sampling method<ref>{{Citation |last=Carrillo |first=J. A. |title=Consensus Based Sampling |date=2021-11-04 |url=https://arxiv.org/abs/2106.02519 |access-date=2024-08-17 |doi=10.48550/arXiv.2106.02519 |last2=Hoffmann |first2=F. |last3=Stuart |first3=A. M. |last4=Vaes |first4=U.}}</ref> by modifying the noise term and choosing appropriate hyperparameters. Namely, one considers the following SDE{{NumBlk|::|<math>dx^i_t = -(x^i_t-c_\alpha(x_t))\,dt + \sqrt{2 \tilde{\lambda}^{-1}\, C_\alpha(x_t)}\,dB^i_t,</math>|{{EquationRef|2}}}}where the weighted covariance matrix is defined as
<math>C_\alpha(x_t) := \frac{1}{\sum_{i=1}^N \omega_\alpha(x_t^i)}\sum_{i=1}^N (x_t^i - c(x_t)) \otimes (x_t^i - c(x_t)) \omega(x_t^i) </math>.
=== Polarization ===▼
If the function <math> f</math> is multi-modal, i.e., has more than one global minimum, the standard CBO algorithm can only find one of these points. However, one can “polarize”<ref>{{cite arXiv |last1=Bungert |first1=Leon |title=Polarized consensus-based dynamics for optimization and sampling |date=2023-10-09 |eprint=2211.05238 |last2=Roith |first2=Tim |last3=Wacker |first3=Philipp|class=math.OC }}</ref> the consensus computation by introducing a kernel <math>k: \cal{X}\times\cal{X}\to[0,\infty)</math> that includes local information into the weighting. In this case, every particle has its own version of the consensus point, which is computed as▼
If the parameters are chosen such that <math>\tilde{\lambda}^{-1} = (1 + \alpha)</math> the above scheme creates approximate samples of a probability distribution with a density, that is proportional to <math>\exp(-\alpha f)</math>.
▲=== Polarization ===
▲If the function <math> f</math> is multi-modal, i.e., has more than one global minimum, the standard CBO algorithm can only find one of these points. However, one can “polarize”<ref>{{cite arXiv |last1=Bungert |first1=Leon |title=Polarized consensus-based dynamics for optimization and sampling |date=2023-10-09 |eprint=2211.05238 |last2=Roith |first2=Tim |last3=Wacker |first3=Philipp|class=math.OC }}</ref> the consensus computation by introducing a kernel <math>k: \cal{X}\times\cal{X}\to[0,\infty)</math> that includes local information into the weighting. In this case, every particle has its own version of the consensus point, which is computed as<math display="block">c_\alpha^j(x) = \frac{1}{\sum_{i=1}^N \omega_\alpha^j(x^i)} \sum_{i=1}^N x^i\ \omega_\alpha^j(x^i), \quad\text{ with }\quad \omega_\alpha^j(\,\cdot\,) = \mathrm{exp}(-\alpha f(\,\cdot\,))\, k(\cdot,x^j).
</math>In this case, the drift is a vector field over the state space <math>\cal{X}
</math>. Intuitively, particles are now not only attracted to other particles based on their objective value, but also based on their spatial locality. For a constant kernel function, the polarized version corresponds to standard CBO and is therefore a generalization. We briefly give some examples of common configurations:
|