Академический Документы
Профессиональный Документы
Культура Документы
(PMBGA)
|Flow Chart of EDA| Tutorial on EDA | pdf version of EDA | Summary of different
EDAs|
Abstract
Keywords
Estimation of Distribution Algorithm, 2-opt local heuristic, Laplace corrections, n-
Queen problem, Partially Matched Crossover, Subset Sum problem
1. Introduction
In EDAs the problem specific interactions among the variables of individuals are taken
into consideration. In Evolutionary Computations the interactions are kept implicitly
in mind whereas in EDAs the interrelations are expressed explicitly through the joint
probability distribution associated with the individuals of variables selected at each
generation. The probability distribution is calculated from a database of selected
individuals of previous generation. The selections methods used in Genetic Algorithm
may be used here. Then sampling this probability distribution generates offspring.
Neither crossover nor mutation has been applied in EDAs. But the estimation of the
joint probability distribution associated with the database containing the selected
individuals is not an easy task. The following is a pseudocode for EDA approach:
Step 2: Repeat steps 3-5 for l=1, 2, … until the stopping criteria met
se
Dl 1
Step 3: Select N<=M individuals from Dl-1 according to selection method
Step 4: x) p ( x | Dthe
pl (Estimate se
l 1 ) probability distribution of an individual
being among the selected individuals
The easiest way to calculate the estimation of probability distribution is to consider all
the variables in a problem as univariate. Then the joint probability distribution
becomes the product of the marginal probabilities of n variables,
pl (x) i 1 p( xi )
n
i.e. .
N
j 1
j ( X i xi | Dlse1 )
p (x )
l i with
N
j ( X i xi | Dlse1 ) 1 se
if in the jth case of Dl 1 , Xi=xi ; =0, otherwise.
1
N
p l 1 ( xi ) (1 ) pl ( xi ) k 1
xil,k :M xl
N
where aÎ(0,1] and i ,k :M represents the value of xi
at kth selected individual. The update rules shifts the vector towards the best of
generated individuals.
In CGAs the vector of probabilities is initialized with probability of each variable 0.5.
Then two individuals are generated randomly by using this vector of probabilities and
rank them by evaluating. Then the probability vector pl(x) is updated towards the best
one. This process of adaptation continues until the vector of probabilities converges.
All the above mentioned algorithm provides better results for variable with no
significant interaction among variables (Mü hlenbein, 1998; Harik et al., 1998; Pelikan
and Mü hlenbein ,1999) but for higher order interaction it cannot provide better
results.
Factorized Distribution Algorithm (FDA) (Mü hlenbein et al. 1998), Extended Compact
Genetic Algorithm (ECGA) (Harik, 1999), Bayesian Optimization Algorithm(BOA)
(Pelikan et al., 2000), Estimation of Bayesian Network Algorithm (EBNA) (Larraň aga
et al., 2000) can capture the multiple dependencies among variables.
FDA uses a fixed factorization of the distribution of the uniformly scaled additively
decomposed function to generate new individuals. It efficiently optimizes a class of
binary functions, which are too difficult for traditional GAs. FDA incrementally
computes Boltzmann distributions by using Boltzmann selection. FDA converges in
polynomial time if the search distribution can be formed so that the number of
parameters used is polynomially bounded in n (Mü hlenbein and Mahnig,2002). But
the problem of FDA is the requirement of fixed distribution and additively
decomposed function.
BOA uses the techniques from modeling data by Bayesian Networks to estimate the
joint probability distributions of selected individuals. Then new population is
generated based on this estimation. It uses the BDe(Bayesian Dirichlet equivalence)
metric to measure the goodness of each structure. This Bayesian metric has the
property that the scores of two structures that reflect the same conditional
dependency or independency are the same. In BOA prior information about the
problem can be incorporated to enhance the estimation and better convergence
(Pelikan et al., 2000). In order to reduce the cardinality of search space BOA imposes
restriction on the number of parents a node may have. For the problems where a node
may have more than 2 parents, the situation is complicated to solve.
p l ( x) p l ( x c )
cCl
where Cl denotes the set of groups in the lth generation and pl(xc) represents the
marginal distribution of the variables Xc , that is the variables that belong to the cth
group in the lth generation. ECGA uses model complexity and population complexity to
measure the quality of the marginal distribution. It can cover any number of
interactions among variables but the problem is, it does not consider conditional
probabilities which is insufficient for highly overlapping cases.
and .For the case of expected values when the selection is elitist EBNAs
converge to a population that contains the global optimum whereas for maximum
likelihood case it is not guaranteed( Gonzá lez et al.).
4. Univariate Marginal Distribution Algorithm (UMDA)
N
j 1
j ( X i xi | Dlse1 )
p (x )
l i with
N
j ( X i xi | Dlse1 ) 1 se
if in the jth case of Dl 1 , Xi=xi
=0, otherwise.
Step-2: Repeat steps 3-5 for l=1,2,… until stopping criteria is met
Step-3: Dl-1¬ Select N£M individuals from Dl-1 according to selection method
Gonzá lez et al. has shown that some instances with pl(x)³d>0 visits populations of D*
which contains global optimum infinitely with probability 1 and if the selection is
elitist, then UMDA may converge to a population that contains the global optimum.
But the joint probability distribution of UMDA can be zero for some x; for example,
( X xi | DlSe1 ) 0
when the selected individuals at the previous steps are such that j i
for all j=1, 2, …, N. Hence pl(x)=0. So UMDA sometimes may not visit a global
optimum (Gonzá lez et al.).
To overcome the problems the way of calculating the probabilities should be changed.
One possible solution is to apply Laplace correction (Cestnik, 1990). Now
N
j 1
j ( X i xi | DlSe1 ) 1
p ( X i xi | DSe
l 1 )
N ri where ri is the number of different values
that variable Xi may take.
It is the problem of finding what subset of a list of integers has a given sum. The
subset sum is an integer relation problem where the relation coefficients are either 0
or 1.If there are n integers, the solution space is 2n which is the number of subsets for
n variables. For small n exact solutions can be found by backtracking method but for
larger n state space tree grows exponentially. It is an NP-Complete problem.
EDAs.
With Laplace correction it is guaranteed that UMDA will find optimum value within a
small number of generations.
Let the solution be X={X1, X2,…, Xn} where Xi represents the column position in row i
where queen i can be placed. As all queens must be on different columns, all Xi s must
be different. So the solution will be a permutation of the numbers 1 through n and the
solution space is drastically reduced to n!.
We can easily fulfill the first two constraints-no two queens on the same row or
column by allowing distinct value for each Xi. But how to test whether two queens at
positions (i,j) and (k,l)[i=row,j=column] are on the same diagonal or not? We can test
it by some observations:
Every element on the same diagonal that runs from upper left to lower right has the
same row and column value [(i,i) form] or same (row-column) value.
Every element on the same diagonal that goes from the upper right to the lower left
has same row+column value
That is, if abs(j-l)=abs(i-k) then the two queens are on the same diagonal.
So the pseudocode for the testing of the constraints for the queens at position X i and Xj
is:
return NOT_FEASIBLE;
The initial population of size M is generated randomly with constraints that all values
in an individual are distinct numbers of the set {1, 2,…,n}.By doing this I have implicitly
satisfied the constraints that no two queens are on the same row or column. Then,
fitness calculation is just to check whether two queens are on the same diagonal or
not.
In each generation the best half of the individuals of the previous generation are
selected for the calculation of joint probability distribution using marginal frequency
of each Xi. During calculation of marginal distribution of each variable Laplace
correction has been used. By applying Laplace correction I have ensured that the
probability of any variable will be greater than zero and hence increase the joint
probability distribution.
Then M new individuals are generated. During generation of each individual I have
applied probabilistic modification to enforce the first constraint of n-Queen. In
probabilistic modification when some variables are selected their probabilities for the
next turn have been zero and the probability of non selected variables are increased
proportionately. Consider the following example:
of X1
Position 1 2 3 1 2 3
/
Variable
X1 0. 0. 0. 0. 0 0
7 1 5 7
X2 0. 0. 0. 0. 0.6 0.3
1 6 1 1 5 5
X3 0. 0. 0. 01 0.3 0.6
2 3 4 . 5 5
If X1 is selected for first position then the probability of X1 for 2nd and 3rd position
should be zero and the probabilities of X2 and X3 will increase proportionally. The
temporary table should look like the one at the right above. By doing this we can
ensure that distinct value will be generated for each component of an individual if
Roulette Wheel selection is used.
For replacement strategy I have use elitism The algorithm stops when fitness of the
best individual is n; that is, all queens are at non-attacking position or certain number
of generations have passed.
Fitness improvement heuristics can be applied to the problem and performs much
better than blind search using UMDA. For combinatorial optimization 2-opt (Croes,
1992) algorithm or Partially Matched Crossover (PMX) (Goldberg et al., 1987) is
widely used. I have used 2-opt algorithm. The resulting algorithm is:
Step 1: Create the initial population of size M randomly with the constraint
that all elements in an individual
are distinct
Step 2: Apply 2-opt algorithm to the initial population
Step 3: Select the best half of the population
Step 4: Estimate joint probability distribution of an individual being among
the selected individuals using
Figure 1. Algorithm for solution of n-Queen problem by UMDA with 2-opt local
heuristic
5.3.2 n-Queen problem by Genetic Genetic Algorithm with Partially
Algorithm Matched Crossover and Swap Mutation.
Replacement strategy is elitism. The
Using same representation, initial
genetic algorithm for the problem is:
population and fitness function I apply
Step 1: Create the initial population of size M randomly with the constraint
that all elements in an individual
are distinct and evaluate each
Step 2: Repeat Step 3-8 until (the fitness of the best individual=n)
Step 3: Randomly select two parents
Step 4: Apply Partial Matching Crossover with crossover probability pc
Here I have generated positive integers randomly. Then expected sum has been
generated by randomly selecting those generated integers so that there is always a
solution. For trivial case I have chosen expected sum equal to the sum of all
generated integers. In this case the solution is {1,1,1,1,…,1}. The parameters for the
problems are:
80
60 UMDA
40 GA
20
0
0 50 100
Problem Size
Figure 3. Average no. of generations required for the trivial solution (expected
sum=sum of all integers) of Subset Sum Problem
Average tim e(sec) required for trivial
solution of Subset Sum problem
3
2.5
2
Time(sec)
1.5 UMDA
1 GA
0.5
0
0 50 100
Problem Size
Figure 4. Average time (sec) required for the trivial solution (expected
sum=sum of all integers) of Subset Sum Problem
25
20
15
10
UMDA
5
GA
0
0 50 100
Problem Size
Figure 5. Average no. of generations required for the random sum (expected
sum<sum of all integers) of Subset Sum Problem
Average Time(sec) required for Subset
Sum problem (random sum )
1.2
0.8
Time(sec)
0.6
0.4
0.2 UMDA
0 GA
0 20 40 60 80 100
Problem Size
Figure 6. Average time (sec) required for random sum (expected sum<sum of
all integers) of Subset Sum Problem
OneMax function is a trivial one. It has only one solution of all ones. For this function
I choose the following parameters:
1.5
required
UMDA
1
0.5 GA
0
-0.5 0 50 100 150
Problem Size
Figure 7. Average time (sec) required until solution for OneMaX function
60
No. of
UMDA
40
GA
20
0
0 50 100 150
Problem Size
For n-Queen problem the solution space is n! and some of them are feasible solution.
For n=1,2,3 no solutions exist. The following table gives an idea about solution for
different n:
For small n one possible solution can be found quickly by using Backtracking
Method. But for larger n backtracking may not be right choice as it may consume a
lot of memory for recursive function, neither it would be possible to try all
permutations in solution space. So EDA can be a choice for the problem to get
solution in a reasonable time limit. I apply the simplest one UMDA with 2-opt
algorithm as local heuristics with the following parameters:
100 GA with PM X
50
0
-50 0 10 20 30 40
No. of Queens
7. Discussion
From the experimental results we find that for medium size of problem UMDA
provide better results in respect of generation when the expected sum is the sum of
a proper set of the set of integers. But calculation of probability distribution takes
time. For trivial solution (when the final solution consists of all integer) UMDA
outperforms GA in both respects. This is due to the fact that the variables in the
problem are less interdependent.
The OneMax function is a trivial one and the variables are independent of one
another. So UMDA provides better results than those of GA.
From the experimental results we see that UMDA with 2-opt as local heuristics
produces solution slower than two other methods, while GA with Partially Matched
Crossover produces result very fast for higher variable size. This is due to that fact
that the variables, which represent the positions of the queens in a checkerboard,
are highly co-related. As I have said, UMDA cannot capture the interdependencies
among the variables of a problem. For highly correlated problems we may get the
dependencies among the variables by Bayesian networks or other methods, which is
my future work. But the number of generations required until solution is relatively
low compared to GA with PMX. This is due to fact that the algorithms spend much
time in 2-opt algorithm, which searches each individual to determine whether there
is a legal move possible or not.
8. Conclusion
10. References
[2] Baluja, S. and Davies, S.(1997). Using optimal dependency trees for combinatorial
optimization: Learning the structure of search space. Technical Report CMU-CS-97-
107, Carnegie Mellon University, Pittsburgh, Pennsylvania.
[3] Baluja,S. and Caruana,R. (1995): Removing the genetics from standard genetic
algorithm. In A. Priedits and S. Russell, editors, Proceedings of the International
Conference on Machine Learning, Morgan Kaufman, 38-46.
[5] Cooper,G.F. and Herskovits, E.A.(1992). A Bayesian method for the induction of
probabilistic networks from data. Machine Learning, 9: 309-347.
[8] Gonzá lez, C., Lozano,J.A. and Larraň aga,P. Mathematical modeling of discrete
estimation of distribution algorithms. In P. Larraň aga and J.A. Lozano,editors,
Estimation of Distribution Algorithms: A New Tool for Evolutionary Optimization.
Kluwer Academic Publishers, Boston, 2001.
[9] Harik, G.(1999).Linkage learning via probabilistic modeling in the ECGA. Illigal
Report No. 99010,Illinois Genetic Algorithm Laboratory, University of Illinois,
Urbana, Illinois.
[10] Harik,G. R., Lobo, F.G. and Goldberg, D.E.(1998).The compact genetic algorithm.
In Proceedings of the IEEE Conference on Evolutionary Computation, 523-528
[14] Goldberg, D.E. and Lingle R. (1987). Alleles, loci, and the traveling salesman
problem. In John J. Gerenstette, editor, Proceedings of the International Conference
on Genetic Algorithms and their Applications, Morgan Kaufmann Pulishers,Inc.
1987.
[16] Larraň aga,P., Etxeberria, R., Lozano,J.A. and Peň a, J.M.(2000). Combinatorial
Optimization by learning and simulation of Bayesian networks. In Proceedings of the
Sixteenth Conference on Uncertainty in Artificial Intelligence, Stanford, 343-352
[17] Mü hlenbein, H. (1998). The equation for response to selection and its use for
prediction. Evolutionary Computation, 5(3): 303-346.
[18] Mü hlenbein,H. and Mahnig,T.(1999). The Factorized Distribution Algorithm for
additively decomposed functions. Proceedings of the 1999 Congress on Evolutionary
Computation, IEEE press,752-759.