Estimation of distribution algorithm: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 07:56, 4 October 2021 edit David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,587 edits →Bivariate factorizations: lc ← Previous edit		Latest revision as of 09:55, 21 August 2025 edit undo WhereIsAprilYet? (talk \| contribs) 116 edits Link suggestions feature: 3 links added. Tags: Visual edit Newcomer task Suggested: add links
(12 intermediate revisions by 7 users not shown)
Line 1: {{Short description\|Family of stochastic optimization methods}} [[Image:Eda mono-variant gauss iterations.svg\|thumb\|350px\|Estimation of distribution algorithm. For each iteration ''i'', a random draw is performed for a population ''P'' in a distribution ''PDu''. The distribution parameters ''PDe'' are then estimated using the selected points ''PS''. The illustrated example optimizes a continuous objective function ''f(X)'' with a unique optimum ''O''. The sampling (following a normal distribution ''N'') concentrates around the optimum as one goes along unwinding algorithm.]] '''''Estimation of distribution algorithms''''' ('''EDAs'''), sometimes called '''''probabilistic model-building genetic algorithms''''' (PMBGAs),<ref>{{Citation\|last=Pelikan\|first=Martin\|date=2005-02-21\|pages=13–30\|publisher=Springer Berlin Heidelberg\|language=en\|doi=10.1007/978-3-540-32373-0_2\|isbn=9783540237747\|title=Hierarchical Bayesian Optimization Algorithm\|volume=170\|series=Studies in Fuzziness and Soft Computing\|chapter=Probabilistic Model-Building Genetic Algorithms}}</ref> are [[stochastic optimization]] methods that guide the search for the optimum by building and sampling explicit probabilistic models of promising candidate solutions. Optimization is viewed as a series of incremental updates of a probabilistic model, starting with the model encoding an uninformative prior over admissible solutions and ending with the model that generates only the global optima.<ref>{{cite book\|author1=Pedro Larrañaga\|author2=Jose A. Lozano\|title=Estimation of Distribution Algorithms a New Tool for Evolutionary Computation\|date=2002\|publisher=Springer US\|___location=Boston, MA\|isbn=978-1-4615-1539-5}}</ref><ref>{{cite book\|author1=Jose A. Lozano\|author2=Larrañaga, P.\|author3=Inza, I.\|author4=Bengoetxea, E.\|title=Towards a new evolutionary computation advances in the estimation of distribution algorithms\|date=2006\|publisher=Springer\|___location=Berlin\|isbn=978-3-540-32494-2}}</ref><ref>{{cite book\|last1=Pelikan\|first1=Martin\|last2=Sastry\|first2=Kumara\|last3=Cantú-Paz\|first3=Erick\|title=Scalable optimization via probabilistic modeling : from algorithms to applications ; with 26 tables\|date=2006\|publisher=Springer\|___location=Berlin\|isbn=978-3540349532}}</ref> EDAs belong to the class of [[evolutionary algorithms]]. The main difference between EDAs and most conventional evolutionary algorithms is that evolutionary algorithms generate new candidate solutions using an ''implicit'' distribution defined by one or more variation operators, whereas EDAs use an ''explicit'' probability distribution encoded by a [[Bayesian network]], a [[multivariate normal distribution]], or another model class. Similarly as other evolutionary algorithms, EDAs can be used to solve optimization problems defined over a number of representations from vectors to [[LISP]] style S expressions, and the quality of candidate solutions is often evaluated using one or more objective functions. Line 32 ⟶ 33: ===Univariate marginal distribution algorithm (UMDA)=== The UMDA<ref>{{cite journal\|last1=Mühlenbein\|first1=Heinz\|title=The Equation for Response to Selection and Its Use for Prediction\|journal=Evol. Computation\|date=1 September 1997\|volume=5\|issue=3\|pages=303–346\|doi=10.1162/evco.1997.5.3.303\|pmid=10021762\|s2cid=2593514 \|url=http://dl.acm.org/citation.cfm?id=1326756\|issn=1063-6560\|url-access=subscription}}</ref> is a simple EDA that uses an operator <math>\alpha_{UMDA}</math> to estimate marginal probabilities from a selected population <math>S(P(t))</math>. By assuming <math>S(P(t))</math> contain <math>\lambda</math> elements, <math>\alpha_{UMDA}</math> produces probabilities: <math> Line 80 ⟶ 81: ===Mutual information maximizing input clustering (MIMIC)=== The MIMIC<ref>{{cite journal\|last1=Bonet\|first1=Jeremy S. De\|last2=Isbell\|first2=Charles L.\|last3=Viola\|first3=Paul\|title=MIMIC: Finding Optima by Estimating Probability Densities\|journal=Advances in Neural Information Processing Systems\|date=1 January 1996\|pages=424\|citeseerx=10.1.1.47.6497}}</ref> factorizes the [[joint probability distribution]] in a chain-like model representing successive dependencies between variables. It finds a permutation of the decision variables, <math>r : i \mapsto j</math>, such that <math>x_{r(1)}x_{r(2)},\dots,x_{r(N)}</math> minimizes the [[~~Kullback-Leibler~~Kullback–Leibler divergence]] in relation to the true probability distribution, i.e. <math>\pi_{r(i+1)} = \{X_{r(i)}\}</math>. MIMIC models a distribution <math> Line 93 ⟶ 94: ===Bivariate marginal distribution algorithm (BMDA)=== The BMDA<ref>{{cite book\|last1=Pelikan\|first1=Martin\|last2=Muehlenbein\|first2=Heinz\|title=Advances in Soft Computing \|chapter=The Bivariate Marginal Distribution Algorithm~~\|journal=Advances~~ ~~in Soft Computing~~\|date=1 January 1999\|pages=521–535\|doi=10.1007/978-1-4471-0819-1_39\|isbn=978-1-85233-062-0\|citeseerx=10.1.1.55.1151}}</ref> factorizes the joint probability distribution in bivariate distributions. First, a randomly chosen variable is added as a node in a graph, the most dependent variable to one of those in the graph is chosen among those not yet in the graph, this procedure is repeated until no remaining variable depends on any variable in the graph (verified according to a threshold value). The resulting model is a forest with multiple trees rooted at nodes <math>\Upsilon_t</math>. Considering <math>I_t</math> the non-root variables, BMDA estimates a factorized distribution in which the root variables can be sampled independently, whereas all the others must be conditioned to the parent variable <math>\pi_i</math>. Line 117 ⟶ 118: ===Extended compact genetic algorithm (eCGA)=== The ECGA<ref>{{cite ~~book~~thesis\|last1=Harik\|first1=Georges Raif\|title=Learning Gene Linkage to Efficiently Solve Problems of Bounded Difficulty Using Genetic Algorithms\|publisher=University of Michigan\|url=http://dl.acm.org/citation.cfm?id=269517\|year=1997\|type=phd }}</ref> was one of the first EDA to employ multivariate factorizations, in which high-order dependencies among decision variables can be modeled. Its approach factorizes the joint probability distribution in the product of multivariate marginal distributions. Assume <math>T_\text{eCGA}=\{\tau_1,\dots,\tau_\Psi\}</math> is a set of subsets, in which every <math>\tau\in T_\text{eCGA}</math> is a linkage set, containing <math>\|\tau\|\leq K</math> variables. The factorized joint probability distribution is represented as follows <math> Line 129 ⟶ 130: </math> The CPC, on the other hand, quantifies the data compression in terms of entropy of the [[marginal distribution]] over all partitions, where <math>\lambda</math> is the selected population size, <math>\|\tau\|</math> is the number of decision variables in the linkage set <math>\tau</math> and <math>H(\tau)</math> is the [[joint entropy]] of the variables in <math>\tau</math> <math> Line 148 ⟶ 149: </math> The Bayesian network structure, on the other hand, must be built iteratively (linkage-learning). It starts with a network without edges and, at each step, adds the edge which better improves some scoring metric (e.g. [[Bayesian information criterion]] (BIC) or Bayesian-Dirichlet metric with likelihood equivalence (BDe)).<ref>{{cite journal\|last1=Larrañaga\|first1=Pedro\|last2=Karshenas\|first2=Hossein\|last3=Bielza\|first3=Concha\|last4=Santana\|first4=Roberto\|title=A review on probabilistic graphical models in evolutionary computation\|journal=Journal of Heuristics\|date=21 August 2012\|volume=18\|issue=5\|pages=795–819\|doi=10.1007/s10732-012-9208-4\|s2cid=9734434 \|url=http://oa.upm.es/15826/}}</ref> The scoring metric evaluates the network structure according to its accuracy in modeling the selected population. From the built network, BOA samples new promising solutions as follows: (1) it computes the ancestral ordering for each variable, each node being preceded by its parents; (2) each variable is sampled conditionally to its parents. Given such scenario, every BOA step can be defined as <math> Line 155 ⟶ 156: ===Linkage-tree Genetic Algorithm (LTGA)=== The LTGA<ref>{{cite book\|last1=Thierens\|first1=Dirk\|title~~=The Linkage Tree Genetic Algorithm\|journal~~=Parallel Problem Solving from Nature, PPSN XI \|chapter=The Linkage Tree Genetic Algorithm\|date=11 September 2010\|pages=264–273\|doi=10.1007/978-3-642-15844-5_27\|isbn=978-3-642-15843-8}}</ref> differs from most EDA in the sense it does not explicitly model a probability distribution but only a linkage model, called linkage-tree. A linkage <math>T</math> is a set of linkage sets with no probability distribution associated, therefore, there is no way to sample new solutions directly from <math>T</math>. The linkage model is a linkage-tree produced stored as a [[Family of sets]] (FOS). <math> Line 174 ⟶ 175: <math>x_i[\tau]</math>:= <math>x_j[\tau]</math> '''if''' <math>f(x_i) \leq f_{x_i}</math> '''then''' <math>x_i[\tau]:= x_j[\tau]</math> '''return''' <math>P(t)</math> {{algorithm-end}} Line 186 ⟶ 187: ==Other== * Probability collectives (PC)<ref>{{cite journal\|last1=WOLPERT\|first1=DAVID H.\|title=Advances in Distributed Optimization Using Probability Collectives\|last2=STRAUSS\|first2=CHARLIE E. M.\|last3=RAJNARAYAN\|first3=DEV\|journal=Advances in Complex Systems\|date=December 2006\|volume=09\|issue=4\|pages=383–436\|doi=10.1142/S0219525906000884\|citeseerx=10.1.1.154.6395}}</ref><ref>{{cite journal\|last1=Pelikan\|first1=Martin\|last2=Goldberg\|first2=David E.\|last3=Lobo\|first3=Fernando G.\|title=A Survey of Optimization by Building and Using Probabilistic Models\|journal=Computational Optimization and Applications\|date=2002\|volume=21\|issue=1\|pages=5–20\|doi=10.1023/A:1013500812258}}</ref> * Hill climbing with learning (HCwL)<ref>{{Cite journal\|~~last~~last1=Rudlof\|~~first~~first1=Stephan\|last2=Köppen\|first2=Mario\|date=1997\|title=Stochastic Hill Climbing with Learning by Vectors of Normal Distributions\|pages=60–70 \|citeseerx=10.1.1.19.3536\|language=en}}</ref> * Estimation of multivariate normal algorithm (EMNA){{Citation needed\|date=June 2018}} * Estimation of Bayesian networks algorithm (EBNA){{Citation needed\|date=June 2018}} * Stochastic hill climbing with learning by vectors of normal distributions (SHCLVND)<ref>{{Cite journal\|~~last~~last1=Rudlof\|~~first~~first1=Stephan\|last2=Köppen\|first2=Mario\|date=1997\|title=Stochastic Hill Climbing with Learning by Vectors of Normal Distributions\|pages=60––70\|citeseerx=10.1.1.19.3536}}</ref> * Real-coded PBIL{{Citation needed\|date=June 2018}} * Selfish Gene Algorithm (SG)<ref>{{Cite book\|~~last~~last1=Corno\|~~first~~first1=Fulvio\|last2=Reorda\|first2=Matteo Sonza\|last3=Squillero\|first3=Giovanni\|date=1998-02-27\|title=The selfish gene algorithm: a new evolutionary optimization strategy\|publisher=ACM\|pages=349–355\|doi=10.1145/330560.330838\|isbn=978-0897919692\|s2cid=9125252 }}</ref> * Compact Differential Evolution (cDE)<ref>{{Cite journal\|~~last~~last1=Mininno\|~~first~~first1=Ernesto\|last2=Neri\|first2=Ferrante\|last3=Cupertino\|first3=Francesco\|last4=Naso\|first4=David\|date=2011\|title=Compact Differential Evolution\|journal=IEEE Transactions on Evolutionary Computation\|language=en-US\|volume=15\|issue=1\|pages=32–54\|doi=10.1109/tevc.2010.2058120\|s2cid=20582233 \|issn=1089-778X}}</ref> and its variants<ref>{{Cite journal\|~~last~~last1=Iacca\|~~first~~first1=Giovanni\|last2=Caraffini\|first2=Fabio\|last3=Neri\|first3=Ferrante\|date=2012\|title=Compact Differential Evolution Light: High Performance Despite Limited Memory Requirement and Modest Computational Overhead\|journal=Journal of Computer Science and Technology\|language=en\|volume=27\|issue=5\|pages=1056–1076\|doi=10.1007/s11390-012-1284-2\|s2cid=3184035 \|issn=1000-9000\|hdl=2086/11740\|hdl-access=free}}</ref><ref>{{Citation\|~~last~~last1=Iacca\|~~first~~first1=Giovanni\|title=Opposition-Based Learning in Compact Differential Evolution\|date=2011\|last2=Neri\|first2=Ferrante\|last3=Mininno\|first3=Ernesto\|work=Applications of Evolutionary Computation\|pages=264–273\|publisher=Springer Berlin Heidelberg\|language=en\|doi=10.1007/978-3-642-20525-5_27\|isbn=9783642205248\|hdl=11572/196440\|hdl-access=free}}</ref><ref>{{Cite book\|~~last~~last1=Mallipeddi\|~~first~~first1=Rammohan\|last2=Iacca\|first2=Giovanni\|last3=Suganthan\|first3=Ponnuthurai Nagaratnam\|last4=Neri\|first4=Ferrante\|last5=Mininno\|first5=Ernesto\|~~date~~title=2011 IEEE Congress of Evolutionary Computation (CEC) \|~~title~~chapter=Ensemble strategies in Compact Differential Evolution \|~~journal~~date=2011\|pages=1972–1977 ~~IEEE Congress of Evolutionary Computation (CEC)~~\|language=en-US\|publisher=IEEE\|doi=10.1109/cec.2011.5949857\|isbn=9781424478347\|s2cid=11781300 }}</ref><ref>{{Cite journal\|~~last~~last1=Neri\|~~first~~first1=Ferrante\|last2=Iacca\|first2=Giovanni\|last3=Mininno\|first3=Ernesto\|date=2011\|title=Disturbed Exploitation compact Differential Evolution for limited memory optimization problems\|journal=Information Sciences\|volume=181\|issue=12\|pages=2469–2487\|doi=10.1016/j.ins.2011.02.004\|issn=0020-0255}}</ref><ref>{{Cite book\|~~last~~last1=Iacca\|~~first~~first1=Giovanni\|last2=Mallipeddi\|first2=Rammohan\|last3=Mininno\|first3=Ernesto\|last4=Neri\|first4=Ferrante\|last5=Suganthan\|first5=Pannuthurai Nagaratnam\|~~date~~title=2011~~\|title=Global~~ ~~supervision~~IEEE ~~for~~Symposium ~~compact~~on Differential Evolution (SDE) \|~~journal~~chapter=~~2011~~Global ~~IEEE~~supervision ~~Symposium~~for oncompact Differential Evolution ~~(SDE)~~\|date=2011\|pages=1–8 \|language=en-US\|publisher=IEEE\|doi=10.1109/sde.2011.5952051\|isbn=9781612840710\|s2cid=8874851 }}</ref><ref>{{Cite book\|~~last~~last1=Iacca\|~~first~~first1=Giovanni\|last2=Mallipeddi\|first2=Rammohan\|last3=Mininno\|first3=Ernesto\|last4=Neri\|first4=Ferrante\|last5=Suganthan\|first5=Pannuthurai Nagaratnam\|~~date~~title=2011 IEEE Workshop on Memetic Computing (MC) \|~~title~~chapter=Super-fit and population size reduction in compact Differential Evolution \|~~journal~~date=2011\|pages=1–8 ~~IEEE Workshop on Memetic Computing (MC)~~\|language=en-US\|publisher=IEEE\|doi=10.1109/mc.2011.5953633\|isbn=9781612840659\|s2cid=5692951 }}</ref> * Compact Particle Swarm Optimization (cPSO)<ref>{{Cite journal\|~~last~~last1=Neri\|~~first~~first1=Ferrante\|last2=Mininno\|first2=Ernesto\|last3=Iacca\|first3=Giovanni\|date=2013\|title=Compact Particle Swarm Optimization\|journal=Information Sciences\|volume=239\|pages=96–121\|doi=10.1016/j.ins.2013.03.026\|issn=0020-0255}}</ref> * Compact Bacterial Foraging Optimization (cBFO)<ref>{{Citation\|~~last~~last1=Iacca\|~~first~~first1=Giovanni\|title=Compact Bacterial Foraging Optimization\|date=2012\|last2=Neri\|first2=Ferrante\|last3=Mininno\|first3=Ernesto\|work=Swarm and Evolutionary Computation\|pages=84–92\|publisher=Springer Berlin Heidelberg\|language=en\|doi=10.1007/978-3-642-29353-5_10\|hdl=11572/196442 \|isbn=9783642293528\|hdl-access=free}}</ref> * Probabilistic incremental program evolution (PIPE)<ref>{{Cite journal\|~~last~~last1=Salustowicz\|~~first~~first1=null\|last2=Schmidhuber\|first2=null\|date=1997\|title=Probabilistic incremental program evolution\|journal=Evolutionary Computation\|volume=5\|issue=2\|pages=123–141\|issn=1530-9304\|pmid=10021756\|doi=10.1162/evco.1997.5.2.123\|s2cid=10759266 \|url=http://depositonce.tu-berlin.de/handle/11303/1046}}</ref> * Estimation of Gaussian networks algorithm (EGNA){{Citation needed\|date=June 2018}} * Estimation multivariate normal algorithm with thresheld convergence<ref>{{Cite book\|~~last~~last1=Tamayo-Vera\|~~first~~first1=Dania\|last2=Bolufe-Rohler\|first2=Antonio\|last3=Chen\|first3=Stephen\|~~date~~title=2016 IEEE Congress on Evolutionary Computation (CEC) \|~~title~~chapter=Estimation multivariate normal algorithm with thresheld convergence \|~~journal~~date=2016\|pages=3425–3432 ~~IEEE Congress on Evolutionary Computation (CEC)~~\|language=en-US\|publisher=IEEE\|doi=10.1109/cec.2016.7744223\|isbn=9781509006236\|s2cid=33114730 }}</ref> * Dependency Structure Matrix Genetic Algorithm (DSMGA)<ref>{{Citation\|~~last~~last1=Yu\|~~first~~first1=Tian-Li\|title=Genetic Algorithm Design Inspired by Organizational Theory: Pilot Study of a Dependency Structure Matrix Driven Genetic Algorithm\|date=2003\|work=Genetic and Evolutionary Computation — GECCO 2003\|pages=1620–1621\|publisher=Springer Berlin Heidelberg\|language=en\|doi=10.1007/3-540-45110-2_54\|isbn=9783540406037\|last2=Goldberg\|first2=David E.\|last3=Yassine\|first3=Ali\|last4=Chen\|first4=Ying-Ping}}</ref><ref>{{Cite book\|~~last~~last1=Hsu\|~~first~~first1=Shih-Huan\|last2=Yu\|first2=Tian-Li\|date=2015-07-11\|title=Optimization by Pairwise Linkage Detection, Incremental Linkage Set, and Restricted / Back Mixing: DSMGA-II\|publisher=ACM\|pages=519–526\|doi=10.1145/2739480.2754737\|isbn=9781450334723\|arxiv=1807.11669\|s2cid=17031156 }}</ref> ==Related== * [[CMA-ES]] * [[Cross-entropy method]] * [[Ant colony optimization algorithms]] ==References==