Estimation of distribution algorithm: Difference between revisions

Content deleted Content added
m cleanup
Link suggestions feature: 3 links added.
 
(25 intermediate revisions by 17 users not shown)
Line 1:
{{Short description|Family of stochastic optimization methods}}
[[Image:Eda mono-variant gauss iterations.svg|thumb|350px|Estimation of distribution algorithm. For each iteration ''i'', a random draw is performed for a population ''P'' in a distribution ''PDu''. The distribution parameters ''PDe'' are then estimated using the selected points ''PS''. The illustrated example optimizes a continuous objective function ''f(X)'' with a unique optimum ''O''. The sampling (following a normal distribution ''N'') concentrates around the optimum as one goes along unwinding algorithm.]]
 
'''''Estimation of distribution algorithms''''' ('''EDAs'''), sometimes called '''''probabilistic model-building genetic algorithms''''' (PMBGAs),<ref>{{Citation|last=Pelikan|first=Martin|title=Probabilistic Model-Building Genetic Algorithms|date=2005-02-21|url=https://doi.org/10.1007/978-3-540-32373-0_2|work=Hierarchical Bayesian Optimization Algorithm|pages=13–30|publisher=Springer Berlin Heidelberg|language=en|doi=10.1007/978-3-540-32373-0_2|isbn=9783540237747|access-datetitle=Hierarchical Bayesian Optimization Algorithm|volume=170|series=2018Studies in Fuzziness and Soft Computing|chapter=Probabilistic Model-06-16Building Genetic Algorithms}}</ref> are [[stochastic optimization]] methods that guide the search for the optimum by building and sampling explicit probabilistic models of promising candidate solutions. Optimization is viewed as a series of incremental updates of a probabilistic model, starting with the model encoding thean uniformuninformative distributionprior over admissible solutions and ending with the model that generates only the global optima.<ref>{{cite book|author1=Pedro Larrañaga|author2=Jose A. Lozano|title=Estimation of Distribution Algorithms a New Tool for Evolutionary Computation|date=2002|publisher=Springer US|___location=Boston, MA|isbn=978-1-4615-1539-5}}</ref><ref>{{cite book|author1=Jose A. Lozano|author2=Larrañaga, P.|author3=Inza, I.|author4=Bengoetxea, E.|title=Towards a new evolutionary computation advances in the estimation of distribution algorithms|date=2006|publisher=Springer|___location=Berlin|isbn=978-3-540-32494-2}}</ref><ref>{{cite book|last1=Pelikan|first1=Martin|last2=Sastry|first2=Kumara|last3=Cantú-Paz|first3=Erick|title=Scalable optimization via probabilistic modeling : from algorithms to applications ; with 26 tables|date=2006|publisher=Springer|___location=Berlin|isbn=3540349537978-3540349532}}</ref>
 
EDAs belong to the class of [[evolutionary algorithms]]. The main difference between EDAs and most conventional evolutionary algorithms is that evolutionary algorithms generate new candidate solutions using an ''implicit'' distribution defined by one or more variation operators, whereas EDAs use an ''explicit'' probability distribution encoded by a [[Bayesian network]], a [[multivariate normal distribution]], or another model class. Similarly as other evolutionary algorithms, EDAs can be used to solve optimization problems defined over a number of representations from vectors to [[LISP]] style S expressions, and the quality of candidate solutions is often evaluated using one or more objective functions.
 
The general procedure of an EDA is outlined in the following:
 
<source lang=python>
''t'' := 0
initialize model M(0) to represent uniform distribution over admissible solutions
'''while''' (termination criteria not met) '''do'''
''P'' := generate N>0 candidate solutions by sampling M(''t'')
''F'' := evaluate all candidate solutions in ''P''
M(t + 1) := adjust_model(''P'', ''F'', M(''t''))
''t'' := ''t'' + 1
</source>
 
Using explicit probabilistic models in optimization allowed EDAs to feasibly solve optimization problems that were notoriously difficult for most conventional evolutionary algorithms and traditional optimization techniques, such as problems with high levels of [[epistasis]]{{Citation needed|date=September 2017}}. Nonetheless, the advantage of EDAs is also that these algorithms provide an optimization practitioner with a series of probabilistic models that reveal a lot of information about the problem being solved. This information can in turn be used to design problem-specific neighborhood operators for local search, to bias future runs of EDAs on a similar problem, or to create an efficient computational model of the problem.
Line 33:
 
===Univariate marginal distribution algorithm (UMDA)===
The UMDA<ref>{{cite journal|last1=Mühlenbein|first1=Heinz|title=The Equation for Response to Selection and Its Use for Prediction|journal=Evol. Computation|date=1 September 1997|volume=5|issue=3|pages=303–346|doi=10.1162/evco.1997.5.3.303|pmid=10021762|s2cid=2593514 |url=http://dl.acm.org/citation.cfm?id=1326756|issn=1063-6560|url-access=subscription}}</ref> is a simple EDA that uses an operator <math>\alpha_{UMDA}</math> to estimate marginal probabilities from a selected population <math>S(P(t))</math>. By assuming <math>S(P(t))</math> contain <math>\lambda</math> elements, <math>\alpha_{UMDA}</math> produces probabilities:
 
<math>
Line 52:
</math>
 
where <math>\gamma\in(0,1]</math> is a parameter defining the [[learning rate]], a small value determines that the previous model <math>p_t(X_i)</math> should be only slightly modified by the new solutions sampled. PBIL can be described as
 
<math>
Line 59:
 
===Compact genetic algorithm (cGA)===
The CGA,<ref>{{cite journal|last1=Harik|first1=G.R.|last2=Lobo|first2=F.G.|last3=Goldberg|first3=D.E.|title=The compact genetic algorithm|journal=IEEE Transactions on Evolutionary Computation|date=1999|volume=3|issue=4|pages=287–297|doi=10.1109/4235.797971}}</ref> also relies on the implicit populations defined by univariate distributions. At each generation <math>t</math>, two individuals <math>x,y</math> are sampled, <math>P(t)=\beta_2(D(t))</math>. The population <math>P(t)</math> is then sortsorted in decreasing order of fitness, <math>S_{\text{Sort}(f)}(P(t))</math>, with <math>u</math> being the best and <math>v</math> being the worst solution. The CGA estimates univariate probabilities as follows
 
<math>
Line 65:
</math>
 
where, <math>\gamma\in(0,1]</math> is a constant defining the [[learning rate]], usually set to <math>\gamma=1/N</math>. The CGA can be defined as
 
<math>
Line 78:
</math>
 
Bivariate and multivariate distributions are usually represented as Probabilisticprobabilistic [[Graphicalgraphical Modelsmodel]]s (graphs), in which edges denote statistical dependencies (or conditional probabilities) and vertices denote variables. To learn the structure of a PGM from data linkage-learning is employed.
 
===Mutual information maximizing input clustering (MIMIC)===
The MIMIC<ref>{{cite journal|last1=Bonet|first1=Jeremy S. De|last2=Isbell|first2=Charles L.|last3=Viola|first3=Paul|title=MIMIC: Finding Optima by Estimating Probability Densities|journal=Advances in Neural Information Processing Systems|date=1 January 1996|pages=424|url=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.6497|publisher=The MIT Press}}</ref> factorizes the [[joint probability distribution]] in a chain-like model representing successive dependencies between variables. It finds a permutation of the decision variables, <math>r : i \mapsto j</math>, such that <math>x_{r(1)}x_{r(2)},\dots,x_{r(N)}</math> minimizes the [[Kullback-LeiblerKullback–Leibler divergence]] in relation to the true probability distribution, i.e. <math>\pi_{r(i+1)} = \{X_{r(i)}\}</math>. MIMIC models a distribution
 
<math>
Line 94:
 
===Bivariate marginal distribution algorithm (BMDA)===
The BMDA<ref>{{cite journalbook|last1=Pelikan|first1=Martin|last2=Muehlenbein|first2=Heinz|title=Advances in Soft Computing |chapter=The Bivariate Marginal Distribution Algorithm|journal=Advances in Soft Computing|date=1 January 1999|pages=521–535|doi=10.1007/978-1-4471-0819-1_39|urlisbn=https://link.springer.com/chapter/10.1007%2F978978-1-447185233-0819062-1_390|publisherciteseerx=Springer London10.1.1.55.1151}}</ref> factorizes the joint probability distribution in bivariate distributions. First, a randomly chosen variable is added as a node in a graph, the most dependent variable to one of those in the graph is chosen among those not yet in the graph, this procedure is repeated until no remaining variable depends on any variable in the graph (verified according to a threshold value).
 
The resulting model is a forest with multiple trees rooted at nodes <math>\Upsilon_t</math>. Considering <math>I_t</math> the non-root variables, BMDA estimates a factorized distribution in which the root variables can be sampled independently, whereas all the others must be conditioned to the parent variable <math>\pi_i</math>.
Line 118:
 
===Extended compact genetic algorithm (eCGA)===
The ECGA<ref>{{cite bookthesis|last1=Harik|first1=Georges Raif|title=Learning Gene Linkage to Efficiently Solve Problems of Bounded Difficulty Using Genetic Algorithms|publisher=University of Michigan|url=http://dl.acm.org/citation.cfm?id=269517|year=1997|type=phd }}</ref> was one of the first EDA to employ multivariate factorizations, in which high-order dependencies among decision variables can be modeled. Its approach factorizes the joint probability distribution in the product of multivariate marginal distributions. Assume <math>T_\text{eCGA}=\{\tau_1,\dots,\tau_\Psi\}</math> is a set of subsets, in which every <math>\tau\in T_\text{eCGA}</math> is a linkage set, containing <math>|\tau|\leq K</math> variables. The factorized joint probability distribution is represented as follows
 
<math>
Line 130:
</math>
 
The CPC, on the other hand, quantifies the data compression in terms of entropy of the [[marginal distribution]] over all partitions, where <math>\lambda</math> is the selected population size, <math>|\tau|</math> is the number of decision variables in the linkage set <math>\tau</math> and <math>H(\tau)</math> is the [[joint entropy]] of the variables in <math>\tau</math>
 
<math>
Line 143:
 
===Bayesian optimization algorithm (BOA)===
The BOA<ref>{{cite journal|last1=Pelikan|first1=Martin|last2=Goldberg|first2=David E.|last3=Cantu-Paz|first3=Erick|title=BOA: The Bayesian Optimization Algorithm|date=1 January 1999|pages=525–532|urlpublisher=http://Morgan Kaufmann|citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.8131|publisher=Morgan Kaufmann}}</ref><ref>{{cite book|last1=Pelikan|first1=Martin|title=Hierarchical Bayesian optimization algorithm : toward a new generation of evolutionary algorithms|date=2005|publisher=Springer|___location=Berlin [u.a.]|isbn=978-3-540-23774-7|edition=1st}}</ref><ref>{{cite journal|last1=Wolpert|first1=David H.|last2=Rajnarayan|first2=Dev|title=Using Machine Learning to Improve Stochastic Optimization|journal=Proceedings of the 17th AAAI Conference on Late-Breaking Developments in the Field of Artificial Intelligence|date=1 January 2013|pages=146–148|url=http://dl.acm.org/citation.cfm?id=2908286.2908335|publisherseries=AAAI PressAaaiws'13-17}}</ref> uses Bayesian networks to model and sample promising solutions. Bayesian networks are directed acyclic graphs, with nodes representing variables and edges representing conditional probabilities between pair of variables. The value of a variable <math>x_i</math> can be conditioned on a maximum of <math>K</math> other variables, defined in <math>\pi_i</math>. BOA builds a PGM encoding a factorized joint distribution, in which the parameters of the network, i.e. the conditional probabilities, are estimated from the selected population using the maximum likelihood estimator.
 
<math>
Line 149:
</math>
 
The Bayesian network structure, on the other hand, must be built iteratively (linkage-learning). It starts with a network without edges and, at each step, adds the edge which better improves some scoring metric (e.g. [[Bayesian information criterion]] (BIC) or Bayesian-Dirichlet metric with likelihood equivalence (BDe)).<ref>{{cite journal|last1=Larrañaga|first1=Pedro|last2=Karshenas|first2=Hossein|last3=Bielza|first3=Concha|last4=Santana|first4=Roberto|title=A review on probabilistic graphical models in evolutionary computation|journal=Journal of Heuristics|date=21 August 2012|volume=18|issue=5|pages=795–819|doi=10.1007/s10732-012-9208-4|s2cid=9734434 |url=http://oa.upm.es/15826/}}</ref> The scoring metric evaluates the network structure according to its accuracy in modeling the selected population. From the built network, BOA samples new promising solutions as follows: (1) it computes the ancestral ordering for each variable, each node being preceded by its parents; (2) each variable is sampled conditionally to its parents. Given such scenario, every BOA step can be defined as
 
<math>
Line 156:
 
===Linkage-tree Genetic Algorithm (LTGA)===
The LTGA<ref>{{cite journalbook|last1=Thierens|first1=Dirk|title=The Linkage Tree Genetic Algorithm|journal=Parallel Problem Solving from Nature, PPSN XI |chapter=The Linkage Tree Genetic Algorithm|date=11 September 2010|pages=264–273|doi=10.1007/978-3-642-15844-5_27|urlisbn=https://link.springer.com/chapter/10.1007%2F978978-3-642-1584415843-5_27#page-1|publisher=Springer Berlin Heidelberg8}}</ref> differs from most EDA in the sense it does not explicitly model a probabilistyprobability distribution but only a linkage model, called linkage-tree. A linkage <math>T</math> is a set of linkage sets with no probability distribution associated, therefore, there is no way to sample new solutions directly from <math>T</math>. The linkage model is a linkage-tree produced stored as a [[Family of sets]] (FOS).
 
<math>
Line 175:
<math>x_i[\tau]</math>:= <math>x_j[\tau]</math>
'''if''' <math>f(x_i) \leq f_{x_i}</math> '''then'''
<math>x_i[\tau]:= x_j[\tau]</math>
'''return''' <math>P(t)</math>
{{algorithm-end}}
Line 186:
 
==Other==
* Probability collectives (PC)<ref>{{cite journal|last1=WOLPERT|first1=DAVID H.|title=Advances in Distributed Optimization Using Probability Collectives|last2=STRAUSS|first2=CHARLIE E. M.|last3=RAJNARAYAN|first3=DEV|title=ADVANCES IN DISTRIBUTED OPTIMIZATION USING PROBABILITY COLLECTIVES|journal=Advances in Complex Systems|date=December 2006|volume=09|issue=044|pages=383–436|doi=10.1142/S0219525906000884|citeseerx=10.1.1.154.6395}}</ref><ref>{{cite journal|last1=Pelikan|first1=Martin|last2=Goldberg|first2=David E.|last3=Lobo|first3=Fernando G.|title=A Survey of Optimization by Building and Using Probabilistic Models|journal=Computational Optimization and Applications|date=2002|volume=21|issue=1|pages=5–20|doi=10.1023/A:1013500812258}}</ref>
* Hill climbing with learning (HCwL)<ref>{{Cite journal|lastlast1=Rudlof|firstfirst1=Stephan|last2=Köppen|first2=Mario|date=1997|title=Stochastic Hill Climbing with Learning by Vectors of Normal Distributions|urlpages=http://60–70 |citeseerx.ist.psu.edu/viewdoc/similar?doi=10.1.1.19.3536&type=ab|language=en}}</ref>
* Estimation of multivariate normal algorithm (EMNA){{Citation needed|date=June 2018}}
* Estimation of Bayesian networks algorithm (EBNA){{Citation needed|date=June 2018}}
* Stochastic hill climbing with learning by vectors of normal distributions (SHCLVND)<ref>{{Cite journal|lastlast1=Rudlof|firstfirst1=Stephan|last2=Köppen|first2=Mario|date=1997|title=Stochastic Hill Climbing with Learning by Vectors of Normal Distributions|urlpages=http://60––70|citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.3536|pages=60––70}}</ref>
* Real-coded PBIL{{Citation needed|date=June 2018}}
* Selfish Gene Algorithm (SG)<ref>{{Cite journalbook|lastlast1=Corno|firstfirst1=Fulvio|last2=Reorda|first2=Matteo Sonza|last3=Squillero|first3=Giovanni|date=1998-02-27|title=The selfish gene algorithm: a new evolutionary optimization strategy|url=http://dl.acm.org/citation.cfm?id=330560.330838|publisher=ACM|pages=349–355|doi=10.1145/330560.330838|isbn=0897919696978-0897919692|s2cid=9125252 }}</ref>
* Compact Differential Evolution (cDE)<ref>{{Cite journal|lastlast1=Mininno|firstfirst1=Ernesto|last2=Neri|first2=Ferrante|last3=Cupertino|first3=Francesco|last4=Naso|first4=David|date=2011|title=Compact Differential Evolution|url=http://ieeexplore.ieee.org/document/5675671/|journal=IEEE Transactions on Evolutionary Computation|language=en-US|volume=15|issue=1|pages=32–54|doi=10.1109/tevc.2010.2058120|s2cid=20582233 |issn=1089-778X|via=}}</ref> and its variants<ref>{{Cite journal|lastlast1=Iacca|firstfirst1=Giovanni|last2=Caraffini|first2=Fabio|last3=Neri|first3=Ferrante|date=2012|title=Compact Differential Evolution Light: High Performance Despite Limited Memory Requirement and Modest Computational Overhead|url=https://doi.org/10.1007/s11390-012-1284-2|journal=Journal of Computer Science and Technology|language=en|volume=27|issue=5|pages=1056–1076|doi=10.1007/s11390-012-1284-2|s2cid=3184035 |issn=1000-9000|viahdl=2086/11740|hdl-access=free}}</ref><ref>{{Citation|lastlast1=Iacca|firstfirst1=Giovanni|title=Opposition-Based Learning in Compact Differential Evolution|date=2011|url=https://doi.org/10.1007/978-3-642-20525-5_27|last2=Neri|first2=Ferrante|last3=Mininno|first3=Ernesto|work=Applications of Evolutionary Computation|pages=264–273|publisher=Springer Berlin Heidelberg|language=en|doi=10.1007/978-3-642-20525-5_27|isbn=9783642205248|hdl=11572/196440|hdl-access-date=2018-06-15free}}</ref><ref>{{Cite journalbook|lastlast1=Mallipeddi|firstfirst1=Rammohan|last2=Iacca|first2=Giovanni|last3=Suganthan|first3=Ponnuthurai Nagaratnam|last4=Neri|first4=Ferrante|last5=Mininno|first5=Ernesto|datetitle=2011 IEEE Congress of Evolutionary Computation (CEC) |titlechapter=Ensemble strategies in Compact Differential Evolution |urldate=http://ieeexplore.ieee.org/document/5949857/2011|journalpages=20111972–1977 IEEE Congress of Evolutionary Computation (CEC)|language=en-US|publisher=IEEE|volume=|pages=|doi=10.1109/cec.2011.5949857|isbn=9781424478347|vias2cid=11781300 }}</ref><ref>{{Cite journal|lastlast1=Neri|firstfirst1=Ferrante|last2=Iacca|first2=Giovanni|last3=Mininno|first3=Ernesto|date=2011|title=Disturbed Exploitation compact Differential Evolution for limited memory optimization problems|url=https://doi.org/10.1016/j.ins.2011.02.004|journal=Information Sciences|volume=181|issue=12|pages=2469–2487|doi=10.1016/j.ins.2011.02.004|issn=0020-0255|via=}}</ref><ref>{{Cite journalbook|lastlast1=Iacca|firstfirst1=Giovanni|last2=Mallipeddi|first2=Rammohan|last3=Mininno|first3=Ernesto|last4=Neri|first4=Ferrante|last5=Suganthan|first5=Pannuthurai Nagaratnam|datetitle=2011 IEEE Symposium on Differential Evolution (SDE) |titlechapter=Global supervision for compact Differential Evolution |urldate=http://ieeexplore.ieee.org/document/5952051/2011|journalpages=20111–8 IEEE Symposium on Differential Evolution (SDE)|language=en-US|publisher=IEEE|volume=|pages=|doi=10.1109/sde.2011.5952051|isbn=9781612840710|vias2cid=8874851 }}</ref><ref>{{Cite journalbook|lastlast1=Iacca|firstfirst1=Giovanni|last2=Mallipeddi|first2=Rammohan|last3=Mininno|first3=Ernesto|last4=Neri|first4=Ferrante|last5=Suganthan|first5=Pannuthurai Nagaratnam|datetitle=2011 IEEE Workshop on Memetic Computing (MC) |titlechapter=Super-fit and population size reduction in compact Differential Evolution |urldate=http://ieeexplore.ieee.org/document/5953633/2011|journalpages=20111–8 IEEE Workshop on Memetic Computing (MC)|language=en-US|publisher=IEEE|volume=|pages=|doi=10.1109/mc.2011.5953633|isbn=9781612840659|vias2cid=5692951 }}</ref>
* Compact Particle Swarm Optimization (cPSO)<ref>{{Cite journal|lastlast1=Neri|firstfirst1=Ferrante|last2=Mininno|first2=Ernesto|last3=Iacca|first3=Giovanni|date=2013|title=Compact Particle Swarm Optimization|url=https://doi.org/10.1016/j.ins.2013.03.026|journal=Information Sciences|volume=239|pages=96–121|doi=10.1016/j.ins.2013.03.026|issn=0020-0255|via=}}</ref>
* Compact Bacterial Foraging Optimization (cBFO)<ref>{{Citation|lastlast1=Iacca|firstfirst1=Giovanni|title=Compact Bacterial Foraging Optimization|date=2012|url=https://doi.org/10.1007/978-3-642-29353-5_10|last2=Neri|first2=Ferrante|last3=Mininno|first3=Ernesto|work=Swarm and Evolutionary Computation|pages=84–92|publisher=Springer Berlin Heidelberg|language=en|doi=10.1007/978-3-642-29353-5_10|hdl=11572/196442 |isbn=9783642293528|hdl-access-date=2018-06-15free}}</ref>
* Probabilistic incremental program evolution (PIPE)<ref>{{Cite journal|lastlast1=Salustowicz|firstfirst1=null|last2=Schmidhuber|first2=null|date=1997|title=Probabilistic incremental program evolution|journal=Evolutionary Computation|volume=5|issue=2|pages=123–141|issn=1530-9304|pmid=10021756|doi=10.1162/evco.1997.5.2.123|s2cid=10759266 |url=http://depositonce.tu-berlin.de/handle/11303/1046}}</ref>
* Estimation of Gaussian networks algorithm (EGNA){{Citation needed|date=June 2018}}
* Estimation multivariate normal algorithm with thresheld convergence<ref>{{Cite journalbook|lastlast1=Tamayo-Vera|firstfirst1=Dania|last2=Bolufe-Rohler|first2=Antonio|last3=Chen|first3=Stephen|datetitle=2016 IEEE Congress on Evolutionary Computation (CEC) |titlechapter=Estimation multivariate normal algorithm with thresheld convergence |urldate=https://doi.org/10.1109/CEC.2016.7744223|journalpages=20163425–3432 IEEE Congress on Evolutionary Computation (CEC)|language=en-US|publisher=IEEE|volume=|pages=|doi=10.1109/cec.2016.7744223|isbn=9781509006236|vias2cid=33114730 }}</ref>
* Dependency Structure Matrix Genetic Algorithm (DSMGA)<ref>{{Citation|last1=Yu|first1=Tian-Li|title=Genetic Algorithm Design Inspired by Organizational Theory: Pilot Study of a Dependency Structure Matrix Driven Genetic Algorithm|date=2003|work=Genetic and Evolutionary Computation — GECCO 2003|pages=1620–1621|publisher=Springer Berlin Heidelberg|language=en|doi=10.1007/3-540-45110-2_54|isbn=9783540406037|last2=Goldberg|first2=David E.|last3=Yassine|first3=Ali|last4=Chen|first4=Ying-Ping}}</ref><ref>{{Cite book|last1=Hsu|first1=Shih-Huan|last2=Yu|first2=Tian-Li|date=2015-07-11|title=Optimization by Pairwise Linkage Detection, Incremental Linkage Set, and Restricted / Back Mixing: DSMGA-II|publisher=ACM|pages=519–526|doi=10.1145/2739480.2754737|isbn=9781450334723|arxiv=1807.11669|s2cid=17031156 }}</ref>
 
==Related==
 
* [[CMA-ES]]
* [[Cross-entropy method]]
* [[Ant colony optimization algorithms]]
 
==References==