Estimation of distribution algorithm: Difference between revisions

Content deleted Content added
No edit summary
BG19bot (talk | contribs)
m WP:CHECKWIKI error fix for #03. Missing Reflist. Do general fixes if a problem exists. -
Line 1:
 
= Overview =
[[Image:Eda mono-variant gauss iterations.svg|thumb|350px|Estimation of distribution algorithm. For each iteration ''i'', a random draw is performed for a population ''P'' in a distribution ''PDu''. The distribution parameters ''PDe'' are then estimated using the selected points ''PS''. The illustrated example optimizes a continuous objective function ''f(X)'' with a unique optimum ''O''. The sampling (following a normal distribution ''N'') concentrates around the optimum as one goes along unwinding algorithm.]]
 
Line 32 ⟶ 30:
</math>
 
Such factorizations are used in many different EDAs, next we describe some of them.
 
===[[Univariate Marginal Distribution Algorithm]] (UMDA)===
Line 46 ⟶ 44:
D(t+1) = \alpha_\text{UMDA} \circ S \circ \beta_{\lambda}(D(t)).
</math>
 
 
===[[Population-based incremental learning]] (PBIL)===
The PBIL,<ref>{{cite journal|last1=Baluja|first1=Shummet|title=Population-Based Incremental Learning: A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning|date=1 January 1994|url=http://dl.acm.org/citation.cfm?id=865123|publisher=Carnegie Mellon University}}</ref>, represents the population implicitly by its model, from which it samples new solutions and updates the model. At each generation, <math>\mu</math> individuals are sampled and <math>\lambda\leq \mu</math> are selected. Such individuals are then used to update the model as follows
 
<math>
Line 60 ⟶ 57:
D(t+1) = \alpha_\text{PIBIL} \circ S \circ \beta_\mu(D(t))
</math>
 
 
 
===[[Compact Genetic Algorithm]] (cGA)===
The CGA,<ref>{{cite journal|last1=Harik|first1=G.R.|last2=Lobo|first2=F.G.|last3=Goldberg|first3=D.E.|title=The compact genetic algorithm|journal=IEEE Transactions on Evolutionary Computation|date=1999|volume=3|issue=4|pages=287–297|doi=10.1109/4235.797971}}</ref>, also relies on the implicit populations defined by univariate distributions. At each generation <math>t</math>, two individuals <math>x,y</math> are sampled, <math>P(t)=\beta_2(D(t))</math>. The population <math>P(t)</math> is then sort in decreasing order of fitness, <math>S_{\text{Sort}(f)}(P(t))</math>, with <math>u</math> being the best and <math>v</math> being the worst solution. The CGA estimates univariate probabilities as follows
 
<math>
Line 75 ⟶ 70:
D(t+1) = \alpha_\text{CGA} \circ S_{\text{Sort}(f)} \circ \beta_2(D(t))
</math>
 
 
 
==Bivariate Factorizations==
Line 86 ⟶ 79:
 
Bivariate and multivariate distributions are usually represented as Probabilistic [[Graphical Models]] (graphs), in which edges denote statistical dependencies (or conditional probabilities) and vertices denote variables. To learn the structure of a PGM from data linkage-learning is employed.
 
 
===[[Mutual Information Maximizing Input Clustering]] (MIMIC)===
Line 102 ⟶ 94:
 
===[[Bivariate Marginal Distribution Algorithm]] (BMDA)===
The BMDA<ref>{{cite journal|last1=Pelikan|first1=Martin|last2=Muehlenbein|first2=Heinz|title=The Bivariate Marginal Distribution Algorithm|journal=Advances in Soft Computing|date=1 January 1999|pages=521–535|doi=10.1007/978-1-4471-0819-1_39|url=http://link.springer.com/chapter/10.1007%2F978-1-4471-0819-1_39|publisher=Springer London|language=en}}</ref> factorizes the joint probability distribution in bivariate distributions. First, a randomly chosen variable is added as a node in a graph, the most dependent variable to one of those in the graph is chosen among those not yet in the graph, this procedure is repeated until no remaining variable depends on any variable in the graph (verified according to a threshold value).
 
The resulting model is a forest with multiple trees rooted at nodes <math>\Upsilon_t</math>. Considering <math>I_t</math> the non-root variables, BMDA estimates a factorized distribution in which the root variables can be sampled independently, whereas all the others must be conditioned to the parent variable <math>\pi_i</math>.
 
<math>
Line 115 ⟶ 107:
P(t+1) = \beta_\mu \circ \alpha_\text{BMDA} \circ S(P(t)).
</math>
 
 
==Multivariate Factorizations==
Line 125 ⟶ 116:
 
The learning of PGMs encoding multivariate distributions is a computationally expensive task, therefore, it is usual for EDAs to estimate multivariate statistics from bivariate statistics. Such relaxation allows PGM to be built in polynomial time in <math>N</math>, however, it also limits the generality of such EDAs <ref>{{cite journal|last1=Martins|first1=Jean P.|last2=Delbem|first2=Alexandre C.B.|title=Pairwise independence and its impact on Estimation of Distribution Algorithms|journal=Swarm and Evolutionary Computation|date=April 2016|volume=27|pages=80–96|doi=10.1016/j.swevo.2015.10.001}}</ref><ref>{{cite journal|last1=Martins|first1=Jean P.|last2=Fonseca|first2=Carlos M.|last3=Delbem|first3=Alexandre C.B.|title=On the performance of linkage-tree genetic algorithms for the multidimensional knapsack problem|journal=Neurocomputing|date=December 2014|volume=146|pages=17–29|doi=10.1016/j.neucom.2014.04.069}}</ref><ref>{{cite journal|last1=Martins|first1=Jean P.|last2=Delbem|first2=Alexandre Claudio Botazzo|title=The Influence of Linkage-learning in the Linkage-tree GA when Solving Multidimensional Knapsack Problems|journal=Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation|date=1 January 2013|pages=821–828|doi=10.1145/2463372.2463476|url=http://dl.acm.org/citation.cfm?id=2463476|publisher=ACM}}</ref><ref>{{cite journal|last1=Martins|first1=Jean P.|last2=Delbem|first2=Alexandre C.B.|title=Multimodality and the Linkage-learning Difficulty of Additively Separable Functions|journal=Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation|date=1 January 2014|pages=365–372|doi=10.1145/2576768.2598281|url=http://dl.acm.org/citation.cfm?id=2598281|publisher=ACM}}</ref><ref>{{cite journal|last1=Martins|first1=J. P.|last2=Neto|first2=C. B.|last3=Crocomo|first3=M. K.|last4=Vittori|first4=K.|last5=Delbem|first5=A. C. B.|title=A comparison of Linkage-learning-based Genetic Algorithms in Multidimensional Knapsack Problems|journal=2013 IEEE Congress on Evolutionary Computation|date=1 June 2013|pages=502–509|doi=10.1109/CEC.2013.6557610}}</ref>
 
 
===[[Extended Compact Genetic Algorithm]] (eCGA)===
Line 134 ⟶ 124:
</math>
 
The ECGA popularized the term [[linkage-learning]] as denoting procedures that identify linkage sets. Its linkage-learning procedure relies on two measures: (1) the Model Complexity (MC) and (2) the Compressed Population Complexity (CPC). The MC quantifies the model representation size in terms of number of bits required to store all the marginal probabilities
 
<math>
Line 145 ⟶ 135:
CPC = \lambda \sum_{\tau\in T_\text{eCGA}} H(\tau).
</math>
 
 
The linkage-learning in ECGA works as follows: (1) Insert each variable in a cluster, (2) compute CCC = MC + CPC of the current linkage sets, (3) verify the increase on CCC provided by joining pairs of clusters, (4) effectively joins those clusters with highest CCC improvement. This procedure is repeated until no CCC improvements are possible and produces a linkage model <math>T_\text{eCGA}</math>. The ECGA works with concrete populations, therefore, using the factorized distribution modeled by ECGA, it can be described as
Line 152 ⟶ 141:
P(t+1) = \beta_\mu \circ \alpha_\text{eCGA} \circ S(P(t))
</math>
 
 
===[[Bayesian Optimization Algorithm]] (BOA)===
The BOA<ref>{{cite journal|last1=Pelikan|first1=Martin|last2=Goldberg|first2=David E.|last3=Cantu-Paz|first3=Erick|title=BOA: The Bayesian Optimization Algorithm|date=1 January 1999|pages=525–532|url=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.8131|publisher=Morgan Kaufmann}}</ref> uses Bayesian networks to model and sample promising solutions. Bayesian networks are directed acyclic graphs, with nodes representing variables and edges representing conditional probabilities between pair of variables. The value of a variable <math>x_i</math> can be conditioned on a maximum of <math>K</math> other variables, defined in <math>\pi_i</math>. BOA builds a PGM encoding a factorized joint distribution, in which the parameters of the network, i.e. the conditional probabilities, are estimated from the selected population using the maximum likelihood estimator.
 
<math>
Line 161 ⟶ 149:
</math>
 
The Bayesian network structure, on the other hand, must be built iteratively (linkage-learning). It starts with a network without edges and, at each step, adds the edge which better improves some scoring metric (e.g. Bayesian information criterion (BIC) or Bayesian-Dirichlet metric with likelihood equivalence (BDe)).<ref>{{cite journal|last1=Larrañaga|first1=Pedro|last2=Karshenas|first2=Hossein|last3=Bielza|first3=Concha|last4=Santana|first4=Roberto|title=A review on probabilistic graphical models in evolutionary computation|journal=Journal of Heuristics|date=21 August 2012|volume=18|issue=5|pages=795–819|doi=10.1007/s10732-012-9208-4}}</ref>. The scoring metric evaluates the network structure according to its accuracy in modeling the selected population. From the built network, BOA samples new promising solutions as follows: (1) it computes the ancestral ordering for each variable, each node being preceded by its parents; (2) each variable is sampled conditionally to its parents. Given such scenario, every BOA step can be defined as
 
<math>
P(t+1) = \beta_\mu \circ \alpha_\text{BOA} \circ S(P(t))
</math>
 
 
===[[Linkage-tree Genetic Algorithm]] (LTGA)===
The LTGA<ref>{{cite journal|last1=Thierens|first1=Dirk|title=The Linkage Tree Genetic Algorithm|journal=Parallel Problem Solving from Nature, PPSN XI|date=11 September 2010|pages=264–273|doi=10.1007/978-3-642-15844-5_27|url=http://link.springer.com/chapter/10.1007%2F978-3-642-15844-5_27#page-1|publisher=Springer Berlin Heidelberg|language=en}}</ref> differs from most EDA in the sense it does not explicitly model a probabilisty distribution but only a linkage model, called linkage-tree. A linkage <math>T</math> is a set of linkage sets with no probability distribution associated, therefore, there is no way to sample new solutions directly from <math>T</math>. The linkage model is a linkage-tree produced stored as a [[Family of sets]] (FOS).
 
<math>
Line 197 ⟶ 184:
P(t+1) = R_{\text{LTGA}}(P(t)) \circ \alpha_\text{LTGA} (P(t))
</math>
 
 
 
Better-known EDAs include
<ref>{{cite journal|last1=Pelikan|first1=Martin|last2=Goldberg|first2=David E.|last3=Lobo|first3=Fernando G.|title=A Survey of Optimization by Building and Using Probabilistic Models|journal=Computational Optimization and Applications|date=2002|volume=21|issue=1|pages=5–20|doi=10.1023/A:1013500812258}}</ref>
 
 
 
 
* [[Probability Collectives]] (PC)<ref>{{cite journal|last1=WOLPERT|first1=DAVID H.|last2=STRAUSS|first2=CHARLIE E. M.|last3=RAJNARAYAN|first3=DEV|title=ADVANCES IN DISTRIBUTED OPTIMIZATION USING PROBABILITY COLLECTIVES|journal=Advances in Complex Systems|date=December 2006|volume=09|issue=04|pages=383–436|doi=10.1142/S0219525906000884}}</ref>
 
* [[Hill Climbing with Learning]] (HCwL)
* [[Estimation of Multivariate Normal Algorithm]] (EMNA)
* [[Bayesian Optimization Algorithm]] (BOA)<ref>{{cite book|last1=Pelikan|first1=Martin|title=Hierarchical Bayesian optimization algorithm : toward a new generation of evolutionary algorithms|date=2005|publisher=Springer|___location=Berlin [u.a.]|isbn=978-3-540-23774-7|edition=1st ed.}}</ref><ref>{{cite journal|last1=Wolpert|first1=David H.|last2=Rajnarayan|first2=Dev|title=Using Machine Learning to Improve Stochastic Optimization|journal=Proceedings of the 17th AAAI Conference on Late-Breaking Developments in the Field of Artificial Intelligence|date=1 January 2013|pages=146–148|url=http://dl.acm.org/citation.cfm?id=2908286.2908335|publisher=AAAI Press}}</ref>
 
* [[Estimation of Bayesian Networks Algorithm]] (EBNA)
* [[Stochastic hill climbing with learning by vectors of normal distributions]] (SHCLVND)
Line 217 ⟶ 197:
* [[Probabilistic Incremental Program Evolution]] (PIPE)
* [[Estimation of Gaussian Networks Algorithm]] (EGNA)
 
 
 
==References==
{{Reflist}}
 
 
 
 
 
 
 
 
 
 
{{DEFAULTSORT:Estimation Of Distribution Algorithm}}