Revision as of 08:45, 5 February 2024 edit Bonaventura97 (talk \| contribs) 12 edits mNo edit summary Tags: Visual edit Mobile edit Mobile web edit ← Previous edit		Revision as of 09:30, 5 February 2024 edit undo Bonaventura97 (talk \| contribs) 12 edits Added section on polar CBO Tags: Visual edit Mobile edit Mobile web edit Next edit →
Line 15: * '''Scaled noise:''' For each <math>t\geq 0</math> and <math>i=1,\dots,N</math>, we denote by <math>B^i_t</math> independent standard Brownian motions. The function <math>D:{\cal{X}}\to\R^s</math> incorporates the drift of the <math>i</math>th particle and determines the noise model. The most common choices are: ''Isotropic noise'', <math>D(\cdot) = \\|\cdot \\|</math>: In this case <math>s=1</math> and every component of the noise vector is scaled equally. This was used in the original version of the algorithm<ref name=":0" />. ''Anisotropic noise<ref>{{Citation \|last=Carrillo \|first=José A. \|title=A consensus-based global optimization method for high dimensional machine learning problems \|date=2020-03-04 \|url=http://arxiv.org/abs/1909.09249 \|access-date=2024-02-05 \|doi=10.48550/arXiv.1909.09249 \|last2=Jin \|first2=Shi \|last3=Li \|first3=Lei \|last4=Zhu \|first4=Yuhua}}</ref>'', <math>D(\cdot) = \|\cdot\|</math>: In the special case, where <math>{\cal{X}}\subset \R^d</math>, this means that <math>s=d</math> and <math>D</math> applies the absolute value function component-wise. Here, every component of the noise vector is scaled, dependent on the corresponding entry of the drift vector. * '''Hyperparameters:''' The parameter <math>\sigma \geq 0</math> scales the influence of the noise term. The parameter <math>\alpha \geq 0</math> determines the separation effect of the particles<ref name=":0" />: ** in the limit <math>\alpha\to 0</math> every particle is assigned the same weight and the consensus point is a regular mean. Line 25: == Notes on Implementation == In practice, the SDE in {{EquationNote\|2=(1)}} is discretized via the [[Euler–Maruyama method]] such that the following explicit update formula for the ensemble <math>x = (x^1,\dots,x^N)</math> is obtained,<math display="block">x^i \gets x^i-\lambda\, (x^i-c_\alpha(x))\,dt + \sigma D(x^i-c_{\alpha}(x))\, B^i.</math>If one can employ an efficient implementation of the [[LogSumExp]] functions, this can be beneficial for numerical stability of the consensus point computation. We refer to existing implementation in [[Python (programming language)\|Python]] [https://pdips.github.io/CBXpy/] and [[Julia (programming language)\|Julia]] [https://github.com/PdIPS/CBX.jl]. == Variants == === CBO with momentum === === Sampling === === Polarization === If the function <math> f</math> is multi-modal, i.e., has more than one global minimum, the standard CBO algorithm can only find one of these points. However, one can “polarize”<ref>{{Citation \|last=Bungert \|first=Leon \|title=Polarized consensus-based dynamics for optimization and sampling \|date=2023-10-09 \|url=http://arxiv.org/abs/2211.05238 \|access-date=2024-02-05 \|doi=10.48550/arXiv.2211.05238 \|last2=Roith \|first2=Tim \|last3=Wacker \|first3=Philipp}}</ref> the consensus computation by introducing a kernel <math>k: \cal{X}\times\cal{X}\to[0,\infty)</math> that includes local information into the weighting. In this case, every particle has its own version of the consensus point, which is computed as <math display="block">c_\alpha^j(x) = \frac{1}{\sum_{i=1}^N \omega_\alpha^j(x^i)} \sum_{i=1}^N x^i\ \omega_\alpha^j(x^i), \quad\text{ with }\quad \omega_\alpha^j(\,\cdot\,) = \mathrm{exp}(-\alpha f(\,\cdot\,))\, k(\cdot,x^j). == See also ==▼ </math>In this case, the drift is a vector field over the state space <math>\cal{X} </math>. Intuitively, particles are now not only attracted to other particles based on their objective value, but also based on their spatial locality. For a constant kernel function, the polarized version corresponds to standard CBO and is therefore a generalization. We briefly give some examples of common configurations: * '''Gaussian kernel''' <math>k(\cdot,\cdot) = \exp\left(- \frac{1}{2\kappa^{2} \alpha} \\|\cdot-\cdot\\|^2_2 \right) [[Particle Swarm Optimization]]▼ </math>: the parameter <math>\kappa </math> determines the communication radius of particles. This choice corresponds to a local convex regularization of the objective function <math>f </math>. * [[Mean-shift algorithm\|'''Mean-shift algorithm''']]<ref>{{Cite journal \|last=Fukunaga \|first=K. \|last2=Hostetler \|first2=L. \|date=1975-01 \|title=The estimation of the gradient of a density function, with applications in pattern recognition \|url=http://ieeexplore.ieee.org/document/1055330/ \|journal=IEEE Transactions on Information Theory \|language=en \|volume=21 \|issue=1 \|pages=32–40 \|doi=10.1109/TIT.1975.1055330 \|issn=0018-9448}}</ref>: Employing polarized CBO for a constant objective function <math>f </math>, together with no noise (i.e. <math>\sigma = 0 </math>) and an Euler–Maruyama discretization with step size <math>dt=1 </math>, corresponds to the mean-shift algorithm. * '''Bounded confidence model''': When choosing a constant objective function, no noise model, but also the special kernel function <math>k(x,\tilde x) = 1_{\\|x-\tilde x\\| \leq \kappa} </math>, the SDE in {{EquationNote\|2=(1)}} transforms to a ODE known as the bounded confidence model<ref>{{Cite journal \|last=Deffuant \|first=Guillaume \|last2=Neau \|first2=David \|last3=Amblard \|first3=Frederic \|last4=Weisbuch \|first4=Gérard \|date=2000-01 \|title=Mixing beliefs among interacting agents \|url=https://www.worldscientific.com/doi/abs/10.1142/S0219525900000078 \|journal=Advances in Complex Systems \|language=en \|volume=03 \|issue=01n04 \|pages=87–98 \|doi=10.1142/S0219525900000078 \|issn=0219-5259}}</ref>, which arises in opinion dynamics. ▲== See also == [[Simulated annealing]]▼ ▲* [[Particle Swarm Optimization]] [[Ant colony optimization algorithms]]▼ ▲* [[Simulated annealing]] ▲* [[Ant colony optimization algorithms]]<br /> == References ==

Consensus based optimization: Difference between revisions