Content deleted Content added
Citation bot (talk | contribs) m Alter: title, isbn, template type, issue. Add: arxiv, url, year, isbn, citeseerx, chapter, series, volume. Removed parameters. | You can use this bot yourself. Report bugs here. | User-activated. |
Link suggestions feature: 3 links added. |
||
(22 intermediate revisions by 15 users not shown) | |||
Line 1:
{{Short description|Family of stochastic optimization methods}}
[[Image:Eda mono-variant gauss iterations.svg|thumb|350px|Estimation of distribution algorithm. For each iteration ''i'', a random draw is performed for a population ''P'' in a distribution ''PDu''. The distribution parameters ''PDe'' are then estimated using the selected points ''PS''. The illustrated example optimizes a continuous objective function ''f(X)'' with a unique optimum ''O''. The sampling (following a normal distribution ''N'') concentrates around the optimum as one goes along unwinding algorithm.]]
'''''Estimation of distribution algorithms''''' ('''EDAs'''), sometimes called '''''probabilistic model-building genetic algorithms''''' (PMBGAs),<ref>{{Citation|last=Pelikan|first=Martin|date=2005-02-21|pages=13–30|publisher=Springer Berlin Heidelberg|language=en|doi=10.1007/978-3-540-32373-0_2|isbn=9783540237747|title=Hierarchical Bayesian Optimization Algorithm|volume=170|series=Studies in Fuzziness and Soft Computing|chapter=Probabilistic Model-Building Genetic Algorithms}}</ref> are [[stochastic optimization]] methods that guide the search for the optimum by building and sampling explicit probabilistic models of promising candidate solutions. Optimization is viewed as a series of incremental updates of a probabilistic model, starting with the model encoding
EDAs belong to the class of [[evolutionary algorithms]]. The main difference between EDAs and most conventional evolutionary algorithms is that evolutionary algorithms generate new candidate solutions using an ''implicit'' distribution defined by one or more variation operators, whereas EDAs use an ''explicit'' probability distribution encoded by a [[Bayesian network]], a [[multivariate normal distribution]], or another model class. Similarly as other evolutionary algorithms, EDAs can be used to solve optimization problems defined over a number of representations from vectors to [[LISP]] style S expressions, and the quality of candidate solutions is often evaluated using one or more objective functions.
The general procedure of an EDA is outlined in the following:
''t'' := 0
initialize model M(0) to represent uniform distribution over admissible solutions
'''while''' (termination criteria not met) '''do'''
''P'' := generate N>0 candidate solutions by sampling M(''t'')
''F'' := evaluate all candidate solutions in ''P''
M(t + 1) := adjust_model(''P'', ''F'', M(''t''))
''t'' := ''t'' + 1
Using explicit probabilistic models in optimization allowed EDAs to feasibly solve optimization problems that were notoriously difficult for most conventional evolutionary algorithms and traditional optimization techniques, such as problems with high levels of [[epistasis]]{{Citation needed|date=September 2017}}. Nonetheless, the advantage of EDAs is also that these algorithms provide an optimization practitioner with a series of probabilistic models that reveal a lot of information about the problem being solved. This information can in turn be used to design problem-specific neighborhood operators for local search, to bias future runs of EDAs on a similar problem, or to create an efficient computational model of the problem.
Line 33:
===Univariate marginal distribution algorithm (UMDA)===
The UMDA<ref>{{cite journal|last1=Mühlenbein|first1=Heinz|title=The Equation for Response to Selection and Its Use for Prediction|journal=Evol. Computation|date=1 September 1997|volume=5|issue=3|pages=303–346|doi=10.1162/evco.1997.5.3.303|pmid=10021762|s2cid=2593514 |url=http://dl.acm.org/citation.cfm?id=1326756|issn=1063-6560|url-access=subscription}}</ref> is a simple EDA that uses an operator <math>\alpha_{UMDA}</math> to estimate marginal probabilities from a selected population <math>S(P(t))</math>. By assuming <math>S(P(t))</math> contain <math>\lambda</math> elements, <math>\alpha_{UMDA}</math> produces probabilities:
<math>
Line 52:
</math>
where <math>\gamma\in(0,1]</math> is a parameter defining the [[learning rate]], a small value determines that the previous model <math>p_t(X_i)</math> should be only slightly modified by the new solutions sampled. PBIL can be described as
<math>
Line 59:
===Compact genetic algorithm (cGA)===
The CGA,<ref>{{cite journal|last1=Harik|first1=G.R.|last2=Lobo|first2=F.G.|last3=Goldberg|first3=D.E.|title=The compact genetic algorithm|journal=IEEE Transactions on Evolutionary Computation|date=1999|volume=3|issue=4|pages=287–297|doi=10.1109/4235.797971}}</ref> also relies on the implicit populations defined by univariate distributions. At each generation <math>t</math>, two individuals <math>x,y</math> are sampled, <math>P(t)=\beta_2(D(t))</math>. The population <math>P(t)</math> is then
<math>
Line 65:
</math>
where, <math>\gamma\in(0,1]</math> is a constant defining the [[learning rate]], usually set to <math>\gamma=1/N</math>. The CGA can be defined as
<math>
Line 78:
</math>
Bivariate and multivariate distributions are usually represented as
===Mutual information maximizing input clustering (MIMIC)===
The MIMIC<ref>{{cite journal|last1=Bonet|first1=Jeremy S. De|last2=Isbell|first2=Charles L.|last3=Viola|first3=Paul|title=MIMIC: Finding Optima by Estimating Probability Densities|journal=Advances in Neural Information Processing Systems|date=1 January 1996|pages=424|citeseerx=10.1.1.47.6497}}</ref> factorizes the [[joint probability distribution]] in a chain-like model representing successive dependencies between variables. It finds a permutation of the decision variables, <math>r : i \mapsto j</math>, such that <math>x_{r(1)}x_{r(2)},\dots,x_{r(N)}</math> minimizes the [[
<math>
Line 94:
===Bivariate marginal distribution algorithm (BMDA)===
The BMDA<ref>{{cite book|last1=Pelikan|first1=Martin|last2=Muehlenbein|first2=Heinz|title=Advances in Soft Computing |chapter=The Bivariate Marginal Distribution Algorithm
The resulting model is a forest with multiple trees rooted at nodes <math>\Upsilon_t</math>. Considering <math>I_t</math> the non-root variables, BMDA estimates a factorized distribution in which the root variables can be sampled independently, whereas all the others must be conditioned to the parent variable <math>\pi_i</math>.
Line 118:
===Extended compact genetic algorithm (eCGA)===
The ECGA<ref>{{cite
<math>
Line 130:
</math>
The CPC, on the other hand, quantifies the data compression in terms of entropy of the [[marginal distribution]] over all partitions, where <math>\lambda</math> is the selected population size, <math>|\tau|</math> is the number of decision variables in the linkage set <math>\tau</math> and <math>H(\tau)</math> is the [[joint entropy]] of the variables in <math>\tau</math>
<math>
Line 149:
</math>
The Bayesian network structure, on the other hand, must be built iteratively (linkage-learning). It starts with a network without edges and, at each step, adds the edge which better improves some scoring metric (e.g. [[Bayesian information criterion]] (BIC) or Bayesian-Dirichlet metric with likelihood equivalence (BDe)).<ref>{{cite journal|last1=Larrañaga|first1=Pedro|last2=Karshenas|first2=Hossein|last3=Bielza|first3=Concha|last4=Santana|first4=Roberto|title=A review on probabilistic graphical models in evolutionary computation|journal=Journal of Heuristics|date=21 August 2012|volume=18|issue=5|pages=795–819|doi=10.1007/s10732-012-9208-4|s2cid=9734434 |url=http://oa.upm.es/15826/}}</ref> The scoring metric evaluates the network structure according to its accuracy in modeling the selected population. From the built network, BOA samples new promising solutions as follows: (1) it computes the ancestral ordering for each variable, each node being preceded by its parents; (2) each variable is sampled conditionally to its parents. Given such scenario, every BOA step can be defined as
<math>
Line 156:
===Linkage-tree Genetic Algorithm (LTGA)===
The LTGA<ref>{{cite book|last1=Thierens|first1=Dirk|title
<math>
Line 175:
<math>x_i[\tau]</math>:= <math>x_j[\tau]</math>
'''if''' <math>f(x_i) \leq f_{x_i}</math> '''then'''
<math>x_i[\tau]:= x_j[\tau]</math>
'''return''' <math>P(t)</math>
{{algorithm-end}}
Line 186:
==Other==
* Probability collectives (PC)<ref>{{cite journal|last1=WOLPERT|first1=DAVID H.|title=Advances in Distributed Optimization Using Probability Collectives|last2=STRAUSS|first2=CHARLIE E. M.|last3=RAJNARAYAN|first3=DEV
* Hill climbing with learning (HCwL)<ref>{{Cite journal|
* Estimation of multivariate normal algorithm (EMNA){{Citation needed|date=June 2018}}
* Estimation of Bayesian networks algorithm (EBNA){{Citation needed|date=June 2018}}
* Stochastic hill climbing with learning by vectors of normal distributions (SHCLVND)<ref>{{Cite journal|
* Real-coded PBIL{{Citation needed|date=June 2018}}
* Selfish Gene Algorithm (SG)<ref>{{Cite book|
* Compact Differential Evolution (cDE)<ref>{{Cite journal|
* Compact Particle Swarm Optimization (cPSO)<ref>{{Cite journal|
* Compact Bacterial Foraging Optimization (cBFO)<ref>{{Citation|
* Probabilistic incremental program evolution (PIPE)<ref>{{Cite journal|
* Estimation of Gaussian networks algorithm (EGNA){{Citation needed|date=June 2018}}
* Estimation multivariate normal algorithm with thresheld convergence<ref>{{Cite book|
* Dependency Structure Matrix Genetic Algorithm (DSMGA)<ref>{{Citation|
==Related==
* [[CMA-ES]]
* [[Cross-entropy method]]
* [[Ant colony optimization algorithms]]
==References==
|