Вы находитесь на странице: 1из 17

Statistics and Computing (1991) 1, 7 5 - 9 1

Genetic algorithms for numerical


optimization
ZBIGNIEW MICHALEWICZ ~ and C E Z A R Y Z. J A N I K O W 2
tDepartment of Computer Science, University of North Carolina, Charlotte, NC 28223, USA
2Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599, USA

Received February 1990 and accepted June 1990

Genetic algorithms (GAs) are stochastic adaptive algorithms whose search method is based
on simulation of natural genetic inheritance and Darwinian striving for survival. They can be
used to find approximate solutions to numerical optimization problems in cases where
finding the exact optimum is prohibitively expensive, or where no algorithm is known.
However, such applications can encounter problems that sometimes delay, if not prevent,
finding the optimal solutions with desired precision. In this paper we describe applications of
GAs to numerical optimization, present three novel ways to handle such problems, and give
some experimental results.
Keywords: Genetic algorithm, random algorithm, optimization technique, constraint han-
dling, local tuning, convergence

1. Introduction GAs have been quite successfully applied to optimiza-


tion problems like wire routing, scheduling, adaptive con-
In the 1950s, von Neumann created a theory of self- trol, game playing, cognitive modeling, the transportation
reproducing automata (von Neumann, 1966), which laid problem, the traveling salesman problem, optimal control
the foundations for the field of genetic algorithms. In the problems, etc. (see Booker, 1982; DeJong, 1985; Grefen-
late 1950s Holland (1959) continued this idea. In his more stette, 1985; 1987a; Goldberg, 1989; Schaffer, 1989; Vig-
recent research, Holland (1975) discussed the ability of a naux and Michalewicz, 1989; 1990; Michalewicz and
simple bit-string representation to encode complicated Janikow, 1991; Michalewicz et al, 1990).
structures and the transformations to improve them. The However, as stated by DeJong (1985):
main result of this work was a demonstration that, with
because of this historical focus and emphasis on function
the proper control structure, rapid improvements to bit optimization applications, it is easy to fall into the trap of
strings could occur under certain transformations. Even perceiving GAs themselves as optimization algorithms and
in large and complicated search spaces, given certain con- then being surprised and/or disappointed when they fail to
ditions on the problem domain, GAs would tend to con- find an 'obvious' optimum in a particular search space. My
verge on solutions that were globally optimal or nearly suggestion for avoiding this perceptual trap is to think of
SO. GAs as a (highly idealized) simulation of a natural process
Genetic algorithms (Holland, 1975; DeJong, 1985; and as such they embody the goals and purposes (if any)
Davis, 1987; Goldberg, 1989) implement these ideas; they of that natural process. I am not sure if anyone is up to the
are a class of probabilistic algorithms that start with a task of defining the goals and purpose of evolutionary
population of randomly generated feasible solutions. systems; however, I think it's fair to say that such systems
are not generally perceived as functions optimizers.
These solutions 'evolve' towards better ones by applying
genetic operators modeled on the genetic processes occur- The purpose of this paper is twofold. First, we provide
ring in nature. ( F o r a comparison of GAs with other a survey of genetic algorithms, discussing what they are,
optimization methods the reader is referred to Ackley, why they work, and bow they work. Second, we present
1987.) various modifications of the classical GAs; these modifica-
0960-3174/91 $03.00 + .12 9 1991 Chapman & Hall
76 Michalewicz and Janikow

tions result in a system which can be perceived as a Procedure genetic algorithm


function optimizer. begin
The paper is organized as follows. In Section 2 we t---O
explain the main idea behind GAs, and in Section 3 we initialize P(t)
describe the major problems that GA implementations evaluate P(t)
while (not termination-condition) do
encounter. Then, in Sections 4-6 we discuss the major
begin
problems and proposed solutions in detail. The discussion t=t+l
is supported by a series of experiments. Section 7 gives select P(t) from P(t - 1)
conclusions and directions for future work. recombine P(t)
evaluate P(t)
end
2. Genetic algorithms end

In this section we introduce genetic algorithms, Fig. 1. A simple genetic algorithm.


present their theoretical foundations, and describe their
applicability. second gene would produce the offspring (al, bl, ca, d2, ea)
and (a2, b2, cl, dl, el). The intuition behind the applicabil-
ity of the crossover operator is information exchange
2.1 What they are between different potential solutions.
GAs represent a class of adaptive algorithms whose search Mutation arbitrarily alters one or more genes of a
methods are based on simulation of natural genetics. They selected chromosome, by a random change with a proba-
belong to the class of probabilistic algorithms; yet, they bility equal to the mutation rate. The intuition behind the
are very different from random algorithms as they com- mutation operator is the introduction of some extra vari-
bine elements of directed and stochastic search. Also, for ability into the population.
hard optimization problems, they are superior to hill- A genetic algorithm for a particular problem must have
climbing methods, since at any time GAs provide for both the following five components: a genetic representation for
exploitation of the best solutions and exploration of the potential solutions to the problem; a way to create an
search space. Because of this, GAs are also more robust initial population of potential solutions; an evaluation
than existing directed search methods. Another important function that plays the role of the environment, rating
property of such genetic-based search methods is their solutions in terms of their 'fitness'; genetic operators that
domain-independence. alter the composition of children during reproduction; and
In general, a GA performs a multi-directional search values for various parameters that the genetic algorithm
by maintaining a population of potential solutions and uses (population size, probabilities of applying genetic
encourages information formation and exchange between operators, etc.)
these directions. This population undergoes a simulated
evolution: at each generation the relatively 'good' 2.2 Why they work
solutions reproduce, while the relatively 'bad' solutions
die. To distinguish between different solutions we need The theoretical foundations of GAs rely on a binary string
some evaluation function which plays the role of an representation of solutions, and on the notion of a schema
environment. (see, e.g. Holland, 1975)--a template allowing explo-
The structure of a simple GA is shown in Fig. l. During ration of similarities among chromosomes. A schema is
iteration t, the GA maintains a population of potential built by introducing a new don't care symbol (*) into the
solutions (called chromosomes following the natural termi- alphabet of genes--such a schema represents all strings (a
nology), P(t)= {x] . . . . , x~}. Each solution x~ is evalu- hyperplane, or subset of the search space) which match it
ated to give some measure of its 'fitness'. Then, a new on all positions other than *. In a population of size n of
population (iteration t + 1) is formed by selecting the chromosomes of length m, between 2m and 2ran different
more fitted individuals. Some members of this new popu- schemata may be represented; at least n 3 of them are
lation undergo reproduction by means of crossover and processed usefully--Holland has called this property an
mutation, to form new solutions. implicit parallelism, as it is obtained without any extra
Crossover combines the features of two parent chromo- memory or processing requirements.
somes to form two similar offspring by swapping corre- Two other important notions, associated with the
sponding segments of the parents. For example, if the schema, are necessary to derive the theoretical basis. The
parents are represented by five-dimensional vectors schema order, o(H), is the number of non-don't care
(al, bl, el, dl, el) and (a 2, b2, ca, d2, Ca) (with each element positions. It defines the speciality of a schema. The schema
called a gene), then crossing the chromosomes after the defining length, l(H), is the distance between the first and
Genetic algorithms for numerical optimization 77

the last non-don't care symbols of a chromosome. It cations we rely mostly on empirical results. During the last
defines the compactness of information contained in a 15 years many GA applications were developed which
schema. supported the building block hypothesis in many different
Assuming that the selective probability is proportional problem domains. Nevertheless, this hypothesis suggests
to fitness, and independent probabilities, Pc and Pro, for that the problem of coding for a GA is critical for its
crossover and mutation, respectively, we can derive the performance, and that such a coding should satisfy the
following growth equation (see e.g. Goldberg, 1989): idea of short building blocks.
Since the binary alphabet offers the maximum number
f(H, t)
m(H, t + l) > m ( H , t ) f(t) of schemata per bit of information of any coding (see
Goldberg, 1989), the bit string representation of solutions
has dominated genetic algorithm research. Additionally,
- P c ~I(H) - pmo(H) ] ( 1)
such coding provides simplicity of analysis and elegance of
available operators. But the 'implicit parallelism' result
where m(H, t) is the number of schema H at time t, f(H, t)
does not depend on the use of bit strings (see Antonisse
is the average fitness of schema H at time t, and f(t) is the
and Keller, 1987)--hence it may be worthwhile experi-
average fitness of the population. This equation is also
menting with richer data structures and other 'genetic'
based on the assumption that the fitness function f returns
operators. This may be important in particular in the
only positive values; when applying GAs to optimization
presence of non-trivial constraints on potential solutions
problems where the optimization function may return
to the problem.
negative values, some additional mapping between opti-
mization and fitness functions is required (see Goldberg,
1989). 2.3 How they work
The growth equation (1) shows that selection increases
Suppose we wish to maximize a function of k variables,
sampling rate of the above-average schemata, and that this
f(xl . . . . . xk): ~k ~ R. Suppose further, that each variable
change is exponential. The sampling itself does not intro-
xi can take values from a domain Di = [ai, b;] ___. We
duce any new schemata (not represented in the initial t = 0
wish to optimize the function f with some precision:
sampling). This is exactly why the crossover operator is
suppose six decimal places for the variables' values is
introduced--to enable structured yet random information
desired.
exchange. Additionally, the mutation operator introduces
It is clear that to achieve such precision each domain D~
greater variability into the population. The combined (dis-
should be cut into (b; - a~) 106 equal size ranges. Let us
ruptive) effect of these operators on a schema is not
denote by mi the smallest integer such that
significant if the schema is short and low-order. This result
(b~- a i ) 106< 2m~- 1 . Then, a representation having
can be stated as:
each variable coded as a binary string of length m i clearly
satisfies the precision requirement. Additionally, the fol-
Theorem 1 (Schema Theorem) Short, low-order, above-
lowing formula easily interprets each such string:
average schemata receive exponentially increasing trials in
subsequent generations of a genetic algorithm. bi - ai
x~ = a~ + decimal( 1001 . . . 0012) - -
2 mi -- 1
An immediate result of this theorem is that GAs explore
the search space by short schemata which, subsequently, where decimal(string2) represents the decimal value of that
are used for information exchange during crossover: binary string.
Now, each chromosome (as a potential solution) is
Hypothesis 1 (Building Block Hypothesis) A genetic al- represented by a binary string of length m = Ek= 1 m~; the
gorithm seeks near-optimal performance through the juxta- first ml bits map into a value from the range [al, bl], the
position of short, low-order, high-performance schemata, next group of m 2 bits map into a value from the range
called the building blocks. [a2, b2] , the last group of mk bits map into a value from
the range [ak, bk].
As stated by Goldberg (1989): To initialize the population, if we do not have any
intuition about the distribution of potential optima, we
Just as a child creates magnificent fortresses through the can simply set some number, pop_size, of chromosomes
arrangement of simple blocks of wood, so does a genetic randomly in a bitwise fashion. Otherwise, we can provide
algorithm seek near-optimal performance through the jux-
some initial (potential) solutions.
taposition of short, low-order, high performance schemata,
or building blocks. The rest of the algorithm is straightforward: in each
generation we evaluate each chromosome (using the func-
Although some research has been done to prove this tion f on the decoded sequences of variables), select a new
hypothesis (e.g. Bethke, 1980), for most non-trivial appli- population according to the probability distribution based
78 Michalewicz and Janikow

on fitness values, and recombine the chromosomes in the The next 15 bits
new population by mutation and crossover operators.
After a number of generations, when no further improve- 1111 lO0 lOlO0010
ment is observed, the best chromosome represents a (pos- represent
sibly the global) optimal solution. In practice, we stop the 5.8 - 4 . 1
x 2 = 4.1 + decimal(lllll00101000102) 215_ 1
algorithm after a fixed number of iterations, depending on
speed and resource criteria. 1.7
= 4.1 + 31906 - - = 4.1 + 1.655330 = 5.755330.
32767
2.4 An example The last 19 bits

Let us assume we wish to find the m a x i m u m of the 1010100001000100101


following function:
represent
f(xl, x2, x3) = 3.5(xl - 2.1x2) 3 -- N/XIX2 -J- log2(x 3 + 1) 50.0 - 0.0
x 3 = 0.0 + decimal(lOlOlO00010001001012) x
219- 1
sin2(x3 + re) 50.0
= 0.0 + 344613 x - - - 32.864857.
524288
where - 3 . 0 < - x l -< 12.1, 4.1 -< x2 -< 5.8, and 0.0_<xa_< 50.0.
The required precision is four decimal places for each So the chromosome
variable. 0100010010110100001111100101000101010100001000100101
The domain of xl has length 15.1; the precision require-
ment implies that the range [ - 3 . 0 , 12.1] should be divided corresponds to
into at least 15.1 10 000 equal size ranges. This means
that 18 bits are required for this part of the chromosome: (Xl, x2, x3) = (1.052426, 5.755330, 32.864857).

217 < 151 000 < 2 TM The fitness value for this chromosome is

f(1.052426, 5.755330, 32.864857) = -4704.82.


The domain of x 2 has length 1.7; the precision require-
ment implies that the range [4.1, 5.8] should be divided To optimize the function f using a GA, we create a
into at least 1.7 x 10 000 equal size ranges. This means population of such chromosomes (the typical population
that 15 bits are required for this part of the chromosome: size for such numerical optimization varies from 50 to
100). All 52 bits for each chromosome are initialized
214 < 17 000 <- 215
randomly. Each chromosome is evaluated and a new
population is formed by selecting the more fitted individu-
The domain of x3 has length 50.0; the precision require-
als according to their fitness (we normally assume all
ment implies that the range [0.0, 50.0] should be divided
evaluations are non-negative; there are ways to enforce
into at least 50.0 x 10 000 equal size ranges. This means
that). Some chromosomes from the new population would
that 19 bits are required as this part of the chromosome:
undergo reproduction by means of crossover and muta-
218 < 500 000 <- 219 tion, to form new chromosomes (new solutions). For
example, two chromosomes:
The total length of a chromosome (solution vector) is
then 18 + 15 + 19 = 52 bits; the first 18 bits code Xl, bits 01000100101101000011111 ]00101000101010100001000100101
19-33 code x2, and bits 34-52 code x 3. 00001101111100000101010110000010101000001000011111100
Let us consider an example chromosome:
may be selected as parents for the crossover; the crossover
010001001011010000Ill1100101000101010100001000100101 point is (randomly) selected after the 23rd bit (as marked
above); the resulting offspring are
The first 18 bits
01000100101101000011111110000010101000001000011111100
010001001011010000
00001101111100000101010100101000101010100001000100101
represent Xl = --3.0 + decimal(0100010010110100002)

12.1 - - ( - - 3 . 0 ) 15.1 3. Problems with genetic algorithms


= - - 3 . 0 + 70352
2 TM - 1 262143
However, applications such as these can encounter prob-
= - 3 . 0 + 4.052426 = 1.052426. lems that sometimes delay, if not prevent, finding the
Genetic algorithms for numerical optimization 79

optimal solutions with a desired precision. Such problems observed by Grefenstette (1987a):
originate from many sources,--for example, the coding,
Like natural genetic systems, GAs progress by virtue of
which moves the operational search space away from the changing the distribution of high performance substruc-
problem space; insufficient number of iterations, and in- tures in the overall population; individual structures are
sufficient population size. These problems then manifest not the focus of attention. Once the high performance
themselves in premature convergence of the entire popu- regions of the search space are identified by a GA, it may
lation to a non-global optimum; inability to perform fine be useful to invoke a local search routine to optimize the
local tuning; or inability to operate in the presence of members of the final population.
non-trivial constraints. In this section we describe these
Local search requires the utilization of schemata of
three manifestations and in Sections 4 - 6 we present quite
higher order and longer defining length that those sug-
novel ways to handle them.
gested by Theorem 1. Holland (1975) suggested that the
GA should be used as a preprocessor to perform the
3.1 Premature convergence initial search, before turning the search process over to a
system that can employ domain knowledge to guide the
Premature convergence is a common problem of GAs
local search. Additionally, there are problems where the
and other optimization algorithms. If convergence occurs
domains of parameters are unlimited, the number of
too rapidly, then the valuable information developed in
parameters is quite large, and high precision is required.
part of the population is often lost. Implementations of
These requirements imply that the length of the (binary)
genetic algorithms are prone to converge prematurely
solution vector is quite significant (for 100 variables with
before the optimal solution has been found. As stated by
domains in the range [ - 5 0 0 , 500], where the six digits'
Booker (1987):
precision after the decimal point is required, the length of
While the performance of most implementations is com- the binary solution vector is 3000). For such problems
parable to or better than the performance of many other the performance of GAs is quite poor.
search techniques, it [the GA] still fails to live up to the
high expectations engendered by the theory. The problem
is that, while the theory points to sampling rates and 3.3 Constraints
search behavior in the limit, any implementation uses a
finite population or set of sample points. Estimates based The central problem in applications of GAs is that of
on finite samples inevitably have a sampling error and lead constraints; until recently there was no promising
to search trajectories much different from those theoreti- methodology for handling them.
cally predicted. This problem is manifested in practice as a Traditionally, to solve a constrained optimization
premature loss of diversity in the population with the problem using the GA approach we use some penalty
search converging to a sub-optimal solution. functions. However, such approaches suffer from the dis-
To improve the performance of GAs, DeJong (1975) advantage of being tailored to the problem and are not
investigated five modifications of the basic algorithm. sufficiently general to handle a variety of problems.
These modifications were called the elitist model, the In this approach we generate potential solutions with-
expected value model, the elitist expected value model, the out considering the constraints, but incorporating in the
crowding factor model, and the generalized crossover evaluation function penalties for suppressing illegal candi-
model. A few years later, Brindle (1981) examined five dates. However, though the evaluation function is usually
further modifications: deterministic sampling, remainder well defined, there is no accepted melthodology for
stochastic sampling without replacement, stochastic sam- combining it with the penalty (Richardson et al, 1989).
piing without replacement, remainder stochastic sampling Davis (1987) discusses a problem in deciding how large a
with replacement, and stochastic tournament. A detailed penalty to impose:
discussion of all these modifications is presented by Gold- If one incorporates a high penalty into the evaluation
berg (1989). Quite another direction in relaxing this prob- routine and the domain is one in which production of an
lem borrows ideas from simulated annealing (see e.g. individual violating the constraint is likely, one runs the
Sirag and Weisser, 1987). risk of creating a genetic algorithm that spends most of its
In general, most approaches which attempted to im- time evaluating illegal individuals. Further, it can happen
prove the convergence of GAs presented some modifica- that when a legal individual is found, it drives the others
tions to the selection routine. out and the population converges on it without finding
better individuals, since the likely paths to other legal
individuals require the production of illegal individuals as
3.2 Fine local tuning intermediate structures, and the penalties for violating the
constraints make it unlikely that such intermediate struc-
Genetic algorithms display inherent difficulties in per- ture will reproduce. If one imposes moderate penalties, the
forming local search for the numerical applications. As system may evolve individuals that violate the constraint
80 Michalewicz and Janikow
but are rated better than those that do not because the rest of ior only through finite sampling from the population.
the evaluation function can be satisfied better by accepting the Therefore, we define a measure using statistical random
moderate constraint penalty than by avoiding it. sample variance and mean (we call such a measure a
span):
4. Premature convergence
/ 1
x/n-li=l
As stated in Section 3, one of the problems encountered by
st= 1 n

- r. f ( x ~ )
GA applications is premature convergence of the entire ni=l

population to a non-global optimum. The premature con- which can be rewritten as:
vergence is closely related to the characteristics of the
function itself, and to the magnitude and kind of errors I n . Elf2(x~)
__ /=t
introduced by the sampling mechanism. s_
The problem of premature convergence is primarily
related to the existence of local optima, and depends on
both function characteristics and sampling of the solution This measure is normalized so that it is quite function-
space9 For example, assume that stveP(t) is close to some independent.
local optimum, and f(stv) is much greater than the average Genetic algorithm applications normally require that all
evaluationf(t). Also, assume that there is no s~ close to the chromosomes evaluate to non-negative numbers; such a
global maximum sought; this might be the case for multi- requirement relates to the sampling mechanism and can be
modal functions. In such a case, there is a fast convergence easily enforced9 In such a case m a x ( s t ) = x / ~ since
toward that local optimum, with little chance of more ] En l f 2i _< (nf/)2. However, such a maximal value is ob-
global exploration needed to search for other optima. tained only when all but one evaluations are z e r o - - a very
While such a behavior is permissible at the later evolution- unlikely case. Experiments show that most spans are less
ary stages, and even desired at the final ones, it is quite than one. Moreover, we have determined experimentally,
disturbing very early. We propose an approach (see by running different functions with differently scaled s and
Janikow and Michalewicz, 1991; Janikow and Michalewicz, observing the average optimization behavior, that s* -- 0.1
1990) which diminishes this problem by decreasing the gives the best possible trade-off between space exploration
speed of convergence during the early stages of population and speed of convergence; therefore, we subsequently treat
existence. all cases with respect to the relationship between st and s*.
The other problem, also reflecting on the convergence, is This particular choice of s* is rather rough and more
related to shifts in the average population fitness. Consider extensive experiments are planned to approximate it in a
two functions: fl(s), and f2(s) = f l ( s ) + const, which have more systematic way. Moreover, it would be interesting to
the same relative optima. One would expect that both can see whether it could be approximated theoretically; how-
be optimized with similar degree of difficulty. However, if ever, the mechanism we are about to introduce is little
const ~>f~(s), then either the function f2(s) will suffer from affected by some variations of s*.
much slower convergence than the function fl(s), or the To preserve efficiency we do not want to recalculate st at
function fl(s) will converge possibly to a local optimum. each iteration. This is actually not necessary as this mea-
Some of the previous approaches to this problem used sure finds the function's characteristics determinable from
rank instead of actual values f(s~) to guide the selective a random sample. We have determined experimentally
process. Such an approach suffers from some drawbacks. that very often the initial population provides a very good
First, it puts the responsibility on the user to decide on the approximation. However, if desired, such sampling may be
best selection mechanism. Second, it totally ignores infor- performed for some time before the algorithm actually
mation it holds about the absolute evaluations of different starts iterating (we call such a span So).
chromosomes. Third, it treats all cases uniformly, regard- For the construction of our scaling mechanism, which
less of the magnitude of the problem. we call a non-uniform power scaling, we use two dynamic
To deal with such cases we introduce a measure of the parameters: the span s and population age t (this parame-
problematic characteristics of the function being opti- ter is taken to be the iteration number of the algorithm).
mized, which we later incorporate into the sampling mech- What we seek is a mechanism which produces more ran-
anism. dom search for small ages and increasingly selects only the
best chromosomes at late stages of the population life.
Moreover, the net effect of such a mechansim should
4.1 The scaling mechanism depend on the span; a greater span should cause the whole
We are mostly interested in some average behavior of the mechanism to put less emphasis on chromosomes' fitness,
function being optimized. However, we know such behav- somehow randomizing the sampling. The scaling itself uses
Genetic algorithms for numerical optimization 81

the power-law scaling: C: a = 1000, q = 1. This function is a constant transfor-


mation of the A function. Such a shift causes the average
f; =iF span to be decreased dramatically to So ~-0.001.
where k ~ 0 forces a random search, while k > 1 forces
Note that f ( x ) > 0 for a > 0. In order to visualize better
sampling to be allocated to only the most fitted chromo-
such characteristics, cross-sections of these three functions
somes. We construct k so that it changes from small to
are given in Fig. 3. For these particular experiments we
large values over the population age (with largest changes
used #elems = 10. Initial So for these three functions are
very early and very late), and the magnitude of k should
given in Fig. 2 along with the appropriate behavior of the
be larger and the speed of change smaller for problems
k exponent.
with lower span. The following equation satisfies these
goals:
4 . 3 Experiments and results

k=t~o) tan , / t - T ; 1 x To judge the quality of our non-uniform power scaling we


tried to maximize the three functions both with the mech-
where Pl is a system parameter determining the influence
anism on and off. We decided on two comparative mea-
of span s on the magnitude of k, and P2 and e are system
sures of the GA's performance. First, accuracy is defined
parameters determining the speed of change of k; e deter-
as the value of the best found chromosome, relative to the
mines how much the span affects such changes.
value of the global optimum:
best 2 a
4.2 The test case f
global _ a

To experiment with our scaling mechanism we decided to


f
use a single family of functions: We subtract the a parameter so that we can easily com-
#elems (ix 2) pare the results of different functions.
f(x) = a + ~ sin(xi)sin 2q xk e[0, 7r], Second, imprecision is defined as the standard deviation
i=1 T
of the optimal vector found. This measure evaluates the
k = 1 , . . . , #elems closeness of individual components selected in the found
optimal chromosome as follows:
which is a sine-modulating ( # elems)-dimensional function
of components with non-linearly increasing frequency.
This function, given appropriate parameters a and q,
could simulate functions of totally different characteristics
~ 5~iL~,e~ (inde~ ) 2
# elems - 1
(as mentioned in Section 3.1):
where indexi is the index of the ith component of the
A: a = 5.0, q = 1. This function, even though non-triv- found optimal chromosome (each dimension has its peaks
ial, exhibits rather nice characteristics: its average span ordered from 0 to the number of dimensions minus 1
So ~-0.1 matches the most desired s*. according to the modulated values of its peaks). For
B: a = 0.1, q = 250. This function is very difficult to example, imprecision = 0 means the generated chromo-
analyse numerically due to its high non-smoothness. Such some correctly selected the best modulated values along all
highly negative undesired characteristics are captured in its dimensions; however, it does not have to generate the best
So ~ 1.0. Because of such characteristics, any numerical function value due to local imperfectness associated with
methods, including GA, will tend toward false optima. finite iteration time.

k for function A k for function B k for function C


So = 0.1 So = 1.0 So = 0.001

1 A 1 . . . . . . . . 9" ....... ""fill J 1 o, ,o~


~'-. ................... , ..............
J
oooooo . . . . . .

0 . t 0 . t t
t=T t=T t=T
Fig. 2. k values for functions A, B, and C: co= 0.1, Pl = 0.05, p: = 0.1.
82 Michalewicz and Janikow

fB

1
a x
0 ) X

f~

a = 1000

. X

7~

Fig. 3. Cross-sections o f functions A, B, and C along the x 1 . . . . = Xlo plane.

We used a traditional binary representation with 30 bits independent runs, each with population size of 40 and
per variable; therefore, a chromosome was a vector of 260 iterated 5000 times.
binary bits for # e l e m s = 10. Moreover, the 30 bits gave us As expected, function B turned out to be the most
a precision of Ax = n/23~ -~ 3 x 10 - 9 . This error propa- difficult for the GA; its high span S o - 1.0 caused many
gated to function error faulty local optima to be included in the solutions. This is
# elems af also the case where our scaling mechanism gave best
Af(x) = Ax improvements, both in terms of faulty convergence and of
i= 1 Xi the absolute magnitude of solution vectors. The most
which, very pessimistically, can be approximated to average function A also benefited from the mechanism in
# elems
terms of both measures; this function has a usual span, but
Af(x) = A x ~' (1 + 4 q i ) = A x ( 1 0 + 2 2 0 q ) is still highly multimodal, difficult for any method. The
i=1 smallest influence was observed on function C, the one with
This, in turn, translates to about 7 x 10 - 7 for functions A a very small span; such characteristics prohibited faulty
and C, and about 1.6 x 10 - 4 for function B. peak selections in a natural way. The increased selective
During the runs we used one-bit mutations (0.001 rate pressure in older populations, generated by higher k-values,
on bits), single-point crossover (0.2 rate on chromosomes), slightly improved accuracy, while preserving high precision.
and inversion (0.02 rate on chromosomes). The results are We have yet to conduct more systematic testing in order
summarized in Table 1; they represent an average of 25 to optimize the various parameters used in our scaling

Table 1. Output summary for the three test functions

Accuracy Imprecision
The GA GA GA GA
absolute without with without with
Function optimum fglobal __ a scaling scaling scaling scaling

A 14.70 413 9.70 413 98.94 99.62 0.365 0.298


B 9.75 332 9.65 332 92.16 94.35 1.528 1.194
C 1009.70 413 9.70 413 99.42 99.51 0.258 0.257
Genetic algorithms for numerical optimization 83

mechanism. Nevertheless, these results indicate its useful- tion requires appropriately different operators; the inter-
ness as an automatic problem analyser in cases where the ested reader should refer to Janikow and Michalewicz
user is not expected to perform such an analysis. (1990).
As to the increased computational complexity of calcu-
lating the k parameter, this was rather an insignificant
5.1 The n o n - u n i f o r m operator
change for two reasons. First, the span s o capturing the
characteristics of the optimized function was calculated The non-uniform mutation operator was introduced in
only once at iteration t = 0. Moreover, to increase the two of our earlier papers (see Michalewicz and Janikow,
quality of this measure approximated by the final sampling, 1990 and Janikow and Michalewicz, 1990). As mentioned,
it was performed without the population size restriction; this is the operator responsible for the fine tuning capabil-
the number of such samples was set to 200. Second, the ities in a numerical search space. It is defined as follows: if
formula can be partially rewritten to account for the sty = ( v l , . . . , Vm) is a chromosome and the element vk is
constant and incremental parts of it. selected for mutation (the domain of vk is [lk, uk]), the
result is a vector s'v+ l = ( v l , . . . , v ' k , . . . , v ~ ) , with

5. Fine local tuning


, (Vk + A(t, Uk -- V,) if a random digit is 0
To improve the fine local tuning capabilities of a GA, vk = ~Vk -- A(t, Vk -- lk) if a random digit is 1
which is a must for high-precision problems, we designed The function A(t, y) returns a value in the range [0, y] such
a special mutation operator whose performance is quite that the probability of A(t, y) being close to 0 increases as
different from the traditional one. Recall that a traditional t increases. This property causes this operator to search the
mutation changes one bit of a chromosome at a time; space uniformly initially (when t is small), and very locally
therefore, such a change uses only local knowledge--only at later stages. We have used the following function:
the bit undergoing mutation is known. Such a bit, if
A(t, y) = y ( 1 -- r (l - ,/r)~),
located in the left portion of a sequence coding a variable,
is very significant to the absolute magnitude of the muta- where r is a random number from {0, 1}, Tis the maximal
tion effect on the variable. On the other hand, bits far to generation number, and b is a system parameter determin-
the right of such a sequence have quite a smaller influence ing the degree of non-uniformity.* Figure 4 displays the
while mutated. We decided to use such positional global value of A for two selected times; this picture clearly
knowledge in the following way: as the population ages, indicates the behavior of the operator.
bits located further to the right of each sequence coding
one variable have an increased probability of being mu-
5.2 The test case
tated, while those on the left have a smaller probability. In
other words, such a mutation causes global search of the To indicate the usefulness of the non-uniform mutation
search space at the beginning of the iterative process, but operator we selected a dynamic control problem without
only an increasingly local exploitation later on. We call constraints (for further results, especially for a successful
this a n o n - u n i f o r m mutation. comparison with a standard numerical optimization pack-
Here we actually selected to use a floating point rather age, see Janikow and Michalewicz, 1990). The problem is
than binary representation for two reasons. First, floating a one-dimensional linear-quadratic model:
point representation is much more natural for implement-
ing the non-uniform mutation because of equivalency of min q x ~ + ~ (sx 2 + r u 2)
problem and representation distances. Second, because of k=0
the fast propagation o f errors in the iterative definitions of subject to
states (see the test case in Section 5.2), it would be
necessary to use rather more than 30 bits per control state. Xk + 1 = aXk + bu k, k = 0, 1 . . . . ,N - 1
This, in the presence of 45 coded variables, would create where x0 is a given initial state, a, b, q, s, r are given
chromosomes over 2000 bits in length, leaving little hope constants, xk e N is a state, and u e Nw is the sought
of a resonable performance. A floating point representa- control vector. The optimal value can be analytically
expressed as
*For the binary representation we were using v~ = mutate(v#, V(t, mk) )
where mutate(vk,pos ) means mutate value o f variable k on bit pos (0 is J * = KoX ~
the least significant), mk is the binary length of variable k, and
where Kk is the solution of the Riccati equation
V(t, mk) = ~'[_A(t,mk)_] if a random digit is 0
(VA(t, mk)-] if a random digit is I Kk = s + ra2Kk + 1/(r + b2Kk + 1)
with the b parameterof Aadjustedappropriatelyif similarbehaviordesired. and K N = q.
84 Michalewicz and J a n i k o w
A(t, y) Aft, y)

YT t / T = 0.50, b = 2
Y
t / T = 0.90, b = 2

t" 0 D K
1
Fig. 4. A (t, y)for two selected times.

5.3 Experiments and results We experimented with ten different cases, as defined in
Table 2. F o r each, we repeated three separate runs o f
The exact representation is as follows: for a problem of m 40 000 generations and reported the best (with respect to
variables, each c h r o m o s o m e , representing a permissible final value) in Table 3. The same table also gives some
solution, is represented as a vector of m floating point intermediate results after a selected n u m b e r o f generations;
numbers sty = (Vl . . . . , vn) (when the generation n u m b e r t for example, the values in the 10 000 c o l u m n indicate the
and the c h r o m o s o m e n u m b e r i are not important, we partial results after 10000 generations, while running
simply write s). The precision o f such a representation is 40 000. It is important to note that such values are some-
fixed for a given machine, and based on the precision o f how worse than those obtained while running only 10 000
the floating point (or double, if needed) type. generations, due to the nature o f the n o n - u n i f o r m opera-

Table 2. The ten test cases for the dynamic control problem

Case N Xo s r q a b

1 45 100 1 1 1 1 1
2 45 100 10 1 1 1 1
3 45 100 1000 1 1 1 1
4 45 100 1 10 1 1 1
5 45 100 1 1000 1 1 1
6 45 100 1 1 0 1 1
7 45 100 1 1 1000 1 1
8 45 100 1 1 1 0.01 1
9 45 100 1 1 1 1 0.01
10 45 100 1 1 1 1 100

Table 3. Results of the modified genetic algorithm on the dynamic control problem

Generations
Case 1 100 1000 10 000 20 000 30 000 40 000 Factor

1 17 807.4 3.27 985 1.74689 1.61 866 1.61 825 1.61 804 1.61 803 104
2 13 670.4 5.33 177 1.45968 1.11 349 1.09 205 1.09 165 1.09 163 105
3 17 023.8 2.87 485 1.07974 1.00 968 1.00 126 1.00 104 1.00 103 107
4 15 077.3 8.64 310 3,75530 3.71 846 3.70 812 3.70 165 3.70 160 104
5 59 56.43 12.2559 2.89769 2.87 727 2.87 646 2.87 570 2.87 569 105
6 16 657.7 5.07 047 2.05314 1.61 869 1.61 830 1.61 806 1.61 806 104
7 288.0666 19.2684 7.02566 1.63 464 1.62 412 1.61 888 1.61 882 104
8 116.982 67.1758 1.92764 1.00 009 1.00 005 1.00 005 1.00 005 104
9 7.18 263 4.42 849 4.37093 4.31 504 4.31 024 4.31 004 4.31 004 105
10 987 0352 138 132 16096.0 1.38 244 1.00 041 1.00 010 1.00 010 104
Genetic algorithms f o r numerical optimization 85

Table 4. Comparison of solutions for the linear-quadratic dynamic control problem

GA GA
Exact solution w/ non-uniform mutation w/o non-uniform mutation
Case value value D (%) value D (%)

1 16 180.3399 16 180.3939 0.000 16 234.3233 0.334


2 109 160.7978 109 163.0278 0.000 113 807.2444 4.257
3 10 009 990.0200 10 010 391.3989 0.004 10 128 951.4515 1.188
4 37 015.6212 37 016.0806 0.001 37 035.5652 0.054
5 287 569.3725 287 569.7389 0,000 298 214.4587 3.702
6 16 180.3399 16 180.6166 0,002 16 238.2135 0.358
7 16 180.3399 16 188.2394 0.048 17 278.8502 6.786
8 10 000.5000 10 000.5000 0.000 10 000.5000 0.000
9 431 004.0987 431 004.4092 0.000 431 610.9771 0.141
10 10 000.9999 10 001.0045 0.000 10 439.2695 4.380

Table 5. Actual improvements on case 1

Generations CPU
GA 1 10 I00 1000 10 000 40 000 time

with non-uniform mutation 17 807 4000 293 564 32 798 17 468 16 186 16 180 719.4sec*
without non-uniform mutation 17 243 100 415 623 59 872 17 931 17 445 16 234 719.1 sec*

*Times are similar since the GA without the special mutation performed other operations at a higher rate in order to
achieve the same rate of breeding. For a comparison, a GA with a binary representation of 30 bits per variable used
12622 sec CPU for the same run (all runs reported from a DEC3100 station).

Number of improvements Number of improvements


4

T = 40 000, t's increments 400 ] T = 40 000, t's increments 400


l
lO0~,/. . ,. _ 1001[-- 9 ~. 9

I " "d" ~ " - ~ "~''~ o t


10 000 40 000 10 000 40 000
Fig. 5. Number of improvements on case 1.

tor. For all cases the population size was fixed at 100. The verged much faster to that solution (see Table 5 for case
domain of all variables was set to [ - 2 0 0 , 200]. 1). In other words it has an additional advantage in time
It is interesting to compare these results with the exact constrained situations.
solutions as well as those obtained from another G A - - Figure 5 illustrates the non-uniform mutation's effect on
exactly the same one but without the non-uniform muta- the evolutionary process. The new mutation causes quite
tion on. Table 4 summarizes the results; columns labeled an increase in the number of improvements observed in
D indicate the relative percentage errors. The G A using the population at the end of the population's life. More-
the non-uniform mutation clearly outperforms the other over, the smaller number of such improvements prior to
one with respect to the accuracy of the found optimal that time, together with an actually faster convergence,
solution; while the enhanced G A was rarely in error by clearly indicates a better overall search.
more than a few thousandths of 1%, the other one was In other publications (see Michalewicz et al, 1990 and
hardly ever better than 1% out. Moreover, it also con- Janikow and Michalewicz, 1990) we present more experi-
86 Michalewicz and Janikow

merits with other dynamic control problems: these prove the remaining two:
the superiority of genetic approaches over commercial
optimization packages as well. X3 = 5 -- X 1 -- X2

x4= 3 -xl

6. Constraints handling x5= 4 - x 2

x6 = 3 + x l + x 2
The proposed methodology for handling linear con-
straints by genetic algorithms (see Michalewicz and Thus, we have reduced the original problem to the opti-
Janikow, 1991) combines some previous ideas, but in a mization problem of two variables xl and x2:
totally new way and context. The main idea lies in a
careful elimination of the equation constraints, and de- g(xl, x2) =f(xl, x2, (5 - Xl - x2), (3 - xl),
signing d y n a m i c operators preserving the inequality con-
straints. Both types of constraint can be processed very (4 - x2), (3 + xl + x2))
efficiently if they contain only linear equations; so con- subject to the following constraints (inequalities only):
straints problems include many of the interesting opti-
mization cases. xl > 0, x2 > 0
Linear constraints are of two types: equalities and in-
equalities (the latter include all variables' domains). We 5-x~-x2--- 0
first eliminate all the equalities, reducing the number of
3-x1>0
variables and modifying the inequalities; reducing the set
of variables both decreases the length of the representa- 4--X2-->0
tion and reduces the search space. Left with only linear
3+xl+x2>0
inequalities, we deal with the convex search space which,
in the presence of dynamic and closed operators, can be These inequalities can be further reduced to:
searched without explicitly considering the constraints.
The problem then becomes one of designing such closed 0<x1<3
operators. We achieve this by defining them as being
context-dependent, that is, dynamically adjusting to the 0<x2<4
current context. xl+x2< 5
The set of equalities can be eliminated (on a one by
one basis) as follows: first transform an equality so that Now, given a chromosome (a point within the constrained
one of its variables is expressed in terms of the others, solution space), any operator must produce a new feasible
and then substitute all occurrences of this variable by solution (this is what we mean by the c l o s e d n e s s of opera-
such an expression, in all remaining equalities and all tors). This can be achieved by working within the current
inequalities. A detailed description of this approach, context; e.g. if x = (1.8, 2.3) is to be mutated on xl, then
along with theoretical foundations, can be found in the new value of this variable must be taken from the
Michalewicz and Janikow (1991). Here, we provide only range [0, 5 - x2] = [0, 2.7]. This signifies another fact: it is
an example. again much easier to define such operators on a floating
Suppose that we wish to minimize a function of six point representation, as it is not enough to deal locally
variables: with one bit at a time.

~ e ( X l , X 2 , X 3 , X4~ X 5 , X 6 )
6.1 The dynamic operators
subject to the following constraints:
The genetic operators are dynamic, i.e. the value of a
X 1 "q- X 2 ~- X 3 ~--- 5 vector component depends on the remaining values of the
vector.
x 4 + x 5 + x 6 = 10
The value of the ith component of a feasible solution
xl + x 4 = 3 s = (vl . . . . , Vm) is always in some (dynamic) range [l, u];
the bounds l and u depend on the other vector's values
xz + x5 = 4 vl . . . . . v i _ 1, vi+ 1, 9 9 9 Vm, and the set of inequalities. We
say that the ith component (ith gene) of the vector s is
X 1 2 O, X 2 ~ O, X 3 ~ O, X 4 ~ O, X 5 ~ O, X6 ~ 0
m o v a b l e if l < u.
We can take advantage of the presence of four indepen- Before we describe the operators, we present two impor-
dent equations and express four variables as functions of tant characteristics of convex spaces (due to the linearity
Genetic algorithms for numerical optimization 87

of the constraints, the solution space is always a convex of the operator defined in Section 5.1; the only change
space 5~), which play an essential role in the definition relates to the dynamic domains:
of these operators:
, vk + A(t, u(k) "'~ - vk) if a random digit is 0
(1) For any two points s~ and s2 in the solution space vk= v k + A ( t , vk -- l(k))
4 if a random digit is 1
5e, the linear combination as~ + ( 1 - a)s2, where a s [0, 1],
is a point in 5". 6.1.2 Crossover group
(2) For every point s o ~ ~ and any line p such that
So e p, p intersects the boundaries of 5e at precisely two Chromosomes are randomly selected in pairs for applica-
tion of the crossover operators according to appropriate
points, say lp~ and Up~
probabilities. The operators defined here are versions of
Since we are only interested in lines parallel to each axis, the ones outlined in Janikow and Michalewicz (1990) for
to simplify the notation we denote by l~o and u~o the ith floating point crossover, but adjusted to our dynamic
components of the vectors 1~, and Up, respectively, where domains:
the line p is parallel to the axis i. We assume further that
l~,-) -< u~i). (1) Simple crossover. This is defined as follows: if
Because of intuitional similarities, we cluster the opera- s~t=(Vl . . . . 9 vm) andsw' = ( w l , . . ., win) are crossed after
tors into the standard two classes: mutation and crossover. the kth position, the resulting offspring are: st~+~=
The proposed crossover and mutation operators use the (Vl,...,vk, w~+l,...,wm) and s~+ = , ., W k '
1 (W 1 . .

two properties to ensure that the offspring of a point in Vk+l . . . . , Vm). Note the the only permissible split points
the convex solution space 5~ belongs to 5P. For a detailed are between individual floating points (using float repre-
discussion on these topics, the reader is referred to sentation it is impossible to split anywhere else).
Michalewicz and Janikow (1990). However, such an operator may produce offspring out-
side the convex solution space 5~. To avoid this problem,
6.1.1 M u t a t i o n group we use the property of the convex spaces, which says that
Mutations are quite different from the traditional muta- there exists a ~ [0, 1] such that
tion operator with respect to both the actual mutation
(a gene, being a floating point number, is mutated in St+l =(Vl '
v " " ",
Vk, W k + t a + V k + l ( l _ a ) ' "" 9 ,

a dynamic range) and the selection of an applicable wma + vm(1 - a)) ~ 5e


gene. A traditional mutation is performed on static do-
mains for all genes. In such a case the order of possible and
mutations on a chromosome does not influence the out-
come. This is no longer true with the dynamic domains. s,+l = ( w l ,
w " 9 ', wk, v ~ + l a + w k + l ( 1 - - a ) , 9 9 ",

To solve the problem we proceed as follows: we ran-


domly select Pum pop_size chromosomes for uniform Vma + win(1 - a)) e 5e
mutation, Pbm x p o p ~ i z e chromosomes for boundary
The only question still to be answered is how to find the
mutation, and P~m p o p ~ i z e chromosomes for non-uni-
form mutation (all with possible repetitions), where Pum, largest a to obtain the greatest possible information
exchange: due to the real interval, we cannot perform an
Pbrn, and P~m are probabilities of the three mutations
defined below. Then, we perform these mutations in a extensive search. We implemented a binary search (to
random fashion on the selected chromosome: some depth only for efficiency). Then, a takes the
largest appropriate value found, or 0 if no value
(1) Uniform mutations. For this mutation we select satisfied the constraints. The necessity for such actions is
a random gene k (from the set of movable genes of small in general and decreases rapidly over the life of the
the given chromosome s determined by its current population. Note that the value of a is determined sepa-
context). If st~ = ( V l , . . . , v,~) is a chromosome and the rately for each single arthmetical crossover and each
kth component is the selected gene, the result is a gene.
vector -v~' + 1
( = Vl, .
'
vk,
. .,, Vm) , where v k'
. . . is a (2) Single arithmetical crossover. This is defined as fol-
random value (uniform probability distribution) from t
lows: if s t~ = (v 1 , . . . , Vm) and s w = (w~, . . . , Win) are to be
the range [1(~'~),u ~( j . The dynamic values l(~) and u(k) ~ crossed, the resulting offspring are st~+ 1 = ( v ~ , . . . ,
t /
are easily calculated from the set of constraints vk . . . . . Vm) and S,w+1 = ( w l . . . . , w k . . . . ,Wm), where
(inequalities); k e [ 1, m], v'k = awk + ( 1 - a)vk, and w~ =
(2) Boundary mutation. This is a variation of the uni- a r k + ( 1 - a ) w k. Here, a is a dynamic parameter cal-
form mutation with v; being either l(~'~)or u(~), each with culated in the given context (vectors s~, Sw) so that
equal probability; the operator is closed (points s~+~ and s~+1 are in the
(3) (Dynamic) non-uniform mutation. This is a version convex constrained space 5~). Actually, a is a random
88 Michalewicz and Janikow

c h o i c e f r o m the f o l l o w i n g r a n g e : where
a~
0, if0 < x/< 2

\vk-w~
sw

Wk--Vk/
sv
[max(l~7o - Wk, U(~, --V k ) , m i n ( l ~ ) - V__kk,U(~) --_ Wk)~
k \W~--Vk Vk--Wk/3
sw
ci,
g(x~) = ~ 2c"
I
3%
if 2 < xg < 4
if4<x~<_6
i f 6 < x ~ -< 8
i f Vk > Wk
[0, 0] if Vk = Wk 4% i f 8 < x ; < 10
[ 5ci, ifl0<xi
m a x l~) - - V k , udb_SWk , m i n l~) _-- Wk U(k) -- Vk
/ \ w ~ - v~ v~ - w~ / \ v~ - w~ ' w~ ---v~ w i t h p a r a m e t e r s ci as g i v e n in T a b l e 6 a n d s u b j e c t to t h e
if Vk < Wk following equality constraints:

TO i n c r e a s e the a p p l i c a b i l i t y o f this o p e r a t o r ( t o e n s u r e X 1 7!- X 2 "l" X 3 "l" X4 "l" X 5 "3i-X 6 "l" X 7 = 2 7


t h a t a will be n o n - z e r o , w h i c h a c t u a l l y a l w a y s nullifies t h e X 8 "l" X 9 "l" X10 "l" Xll "l" X12 "l" X13 "l" X14 = 2 8
results o f t h e o p e r a t o r ) it is wise to select the a p p l i c a b l e X15 "l" XI6 "l" X17 "4- X18 "l" X19 "l" X20 "l" X21 = 25
g e n e as a r a n d o m c h o i c e f r o m the i n t e r s e c t i o n o f m o v a b l e X22 "l" X23 "l" X24 "l" X 2 5 "l" X26 "l" .9(727"4- X28 = 2 0
genes o f b o t h c h r o m o s o m e s . N o t e a g a i n t h a t the v a l u e o f X29 "l" X30 "l" X31 "l" X32 "l" X33 "l" X34 "l" X35 = 20
a is d e t e r m i n e d s e p a r a t e l y f o r e a c h single a r i t h m e t i c a l X36 "l" X37 "l" X38 "l" X39 "l" X40 "l" X41 "l" X42 ~" 2 0
c r o s s o v e r a n d e a c h gene. X43 "l" X44 "l" X45 "l" X46 "l" X47 "l" X48 "l" X49 = 2 0
(3) Whole arithmetical crossover. T h i s is d e f i n e d as a
l i n e a r c o m b i n a t i o n o f t w o v e c t o r s : if s'~ a n d S~w are to b e X 1 "~ X8 "J- X15 "q- X22 + X29 "q- .X'36-t- X43 = 20
crossed, the r e s u l t i n g o f f s p r i n g are s~ +~ = aS'w + (1 - a)s~ X2 "J- X9 "q- X16 di- X23 + X30 + X37 "3VX44 ~- 20
and s,',,+ ~ = a s ~ + ( 1 - a ) s ~ . This operator uses a X3 + XlO + X~7 + X24 + X3~ X38 X45 = 20
simpler system parameter a ~ {0, 1} as it a l w a y s X4 + X~X XI8 + X25 X32 X39 X46 = 23
g u a r a n t e e s c l o s e d n e s s ( a c c o r d i n g to c h a r a c t e r i s t i c (1) o f X5 Xl 2 "l" X19 X26 "l" X33 "l" X40 "l" X47 = 26
this section). X6 "l" X13 "l" X20 "l" X27 21- X34 "l" X41 "l" X48 = 25
X7 X~4 X21 + X28 X35 X42 X49 = 26

6.2 The t e s t c a s e G i v e n t h a t , Vi, x; > 0, we also h a v e a d d i t i o n a l i m p l i c i t


d o m a i n i n e q u a l i t i e s , g i v e n in T a b l e 7.
W e selected the f o l l o w i n g p r o b l e m o f 49 v a r i a b l e s : m i n i -
T h e a b o v e is a c t u a l l y an e x a m p l e o f a t r a n s p o r t a t i o n
mize
problem with a reasonable stepwise cost function; for more
49
f(x) = ~ g(x,) o n t h a t the r e a d e r is r e f e r r e d to M i c h a l e w i c z , V i g n a u x a n d
i=1 H o b b s (1991) a n d V i g n a u x a n d M i c h a l e w i c z (1990).

Table 6. Parameters ci

cl = 0 c2 = 21 c3 = 50 c4 = 62 c5 = 93 c 6 = 77 e7 = 1000
c8 = 21 c9 = 0 Clo = 17 Cll = 54 c12 = 67 c13 = 1000 c14 = 48
c15 = 50 c16 = 17 c17 = 0 c18 = 60 c19 = 98 c20 = 67 c21 = 25
e22 = 62 cz3 = 54 c24 = 60 c25 = 0 c26 = 27 e27 1000
= C28 38
=

C29 : 93 c3o = 67 c31 = 98 c32 = 27 c33 0= C34 = 47 C35 = 42


C36 77~- c37 = 1000 c38 = 67 c39 = 1000 C4o = 47 c41 = 0 c42 = 35
c43 = 1000 c44 = 48 c45 = 25 c46 = 38 c47 = 42 c48 = 35 c49 -~- 0

Table 7. Upper bounds for variables x i

xj <- 20 x2 -< 20 x3 < 20 x4 -< 23 xs < 26 x 6 -< 25 x 7 -< 26


x 8 -< 20 x9 <- 20 Xlo < 20 xl~ -< 23 Xa2 < 26 x13 -< 25 x14 < 26
x~5 -< 20 x~6 < 20 x17 < 20 x18 -< 23 x~9 < 25 X2o < 25 x21 -< 25
X22 <~ 20 X23 <-- 20 XZ4 < 20 X25 <-- 20 X26 < 20 XZ7 < 20 X28 < 20
X29 ~ 20 X30 < 20 X31 < 20 X32 --< 20 X33 <-~20 X34 <-- 20 X3S --< 20
X36 ~-~20 X37 --< 20 X38 --<20 X39 --< 20 X40 <-- 20 X41 <~ 20 X42 --< 20
X43 ~---20 X44 <-- 20 X45 < 20 X46 <-- 20 X47 --<20 X48 <-- 20 X49 < 20
Genetic algorithms f o r n u m e r i c a l optimization 89

There are 13 independent and one dependent equations We experimented with various values o f p (close to 1), k
here; therefore, we eliminate thirteen variables: (close to 1 , where 14 is total n u m b e r o f equality con-
x ~ , . . . , x 8, x~5, x22, x36, x44. All remaining variables are s t r a i n t s - - t h e static d o m a i n constraints were naturally sa-
renamed Y ! , " " " , Y 3 6 preserving order, i.e. y~ = Xg, tisfied by a proper representation), and T = 8000. However,
Y2 = Xlo, 9 9 9 , Y6 = X14, Y7 : Xt6, 9 9 9 , Y36 = X49. Each of this m e t h o d did not lead to feasible solutions: in over 1200
these variables has to satisfy four two-sided inequalities, runs (with different r a n d o m n u m b e r generator seeds and
which result from the initial domains and our transforma- various values for parameters k and p) the best c h r o m o -
tions. Now, each c h r o m o s o m e is a float vector somes (after 8000 generations) violated significantly at least
(Yl," " ",Y36)" three constraints. F o r example, very often the algorithm
converged to a solution where the numbers on one diagonal
were equal to 20 and all others were zeros:
6.3 Experiments and results
X 1 = X9 = X17 = X25 = X33 = X 4 1 ~--- X 4 9 = 20
F o r comparative experiments we planned on using a stan-
dard G A a p p r o a c h to constraints by penalties (see e.g. x i - 0 for other values o f i
Richardson et al, 1989) and a version (the student version)
As to G A M S , which only works with continuous func-
o f G A M S , a package for the construction and solution of
tions, we reimplemented the problem using arctangent
large and compex mathematical p r o g r a m m i n g models
functions to approximate each o f the five steps. A parame-
( B r o o k e et al, 1988); we used the M I N O S version of the
ter PA was used to control the 'tightness' o f the fit:
optimizer. However, we did not succeed in the former: the
m a j o r issue in using the penalty functions a p p r o a c h is ~ a r c t a n ( P A ( x i - 2))/~ + 1 + ]
assigning weights to the c o n s t r a i n t s - - t h e s e weights play [ a r c t a n ( P A ( x ; - 4))/re + 89 |
the role o f penalties if a potential solution does not satisfy g(xi) = ci | a r c t a n ( P A ( x ; - 6))/n + 1 + |
them. In experiments with the above problem the evalua- | arctan(PA (xi - 8))/re + 89 /
tion function Eval was c o m p o s e d o f the optimization [ a r c t a n ( P A ( x i - l O ) ) / r c + 89 J
function f and the penalty P :
This m e t h o d allowed the system to find a feasible solution,
Eval(x) = f ( x ) + P but one m u c h worse than those o f our G A (Tables 8 and
9); the G A f o u n d an optimal value o f 24.15 while the
F o r our experiments we used a suggestion (see R i c h a r d s o n other system f o u n d one o f 96.00.
et al, 1989) to start with relaxed penalties and to tighten The results indicate that the usefulness o f our m e t h o d in
them as the run progresses. We used the presence o f m a n y constraints and its superiority over
some standard systems on non-trivial problems. O u r
P=kx xfx Z~41 de modified genetic algorithm was run for 8000 iterations,
with a population size o f 40. The parameters used in all
runs are displayed in Table 10. A single run o f 8000
w h e r e f i s the average fitness o f the population at the given
iterations took 2:28 C P U sec on a C r a y Y - M P . G A M S
generation, t, k and p are parameters, T is the m a x i m u m
was run on an Olivetti 386 with a m a t h coprocessor and a
n u m b e r of generations, and de. returns the 'degree of
single run t o o k 1:20 sec.
constraint violation'. F o r example, for a constraint:

ZiG w xi = val, W _~ { 1. . . . ,49},


7. Conclusions
and a c h r o m o s o m e (v~,...,/)49), the penalty (degree o f
constraint violation) de was In this paper we discussed the use o f genetic algorithms for
numerical optimization problems. In particular, we con-
de = [Zi~wVi - val[. centrated on various modifications o f classical genetic

Table 8. Solution found by our GA: f(x) ---24.15

xi = 20.00 x2 = 0.00 x3 = 0.00 x4 = 1.93 x5 = 1.63 x 6= 1.47 x 7 = 1.97


x8 = 0.00 x9 = 20.00 Xlo = 2.88 xli = 1.76 x~z = 1.47 xl3 = 1.89 x14 - 0.00
xx5 = 0.00 x16 = 0.00 x17 = 17.12 x18 = 1.90 xw = 1.99 Xzo = 1.10 Xza = 2.89
x22 = 0.00 x23 = 0.00 x24 = 0.00 x25 = 16.26 x26 = 0.85 x27 = 1.38 x28 - 1.51
X29 = 0.00 X30 = 0.00 X31 = 0.00 X32 = 0.00 X33 = 19.65 X34 = 0.00 X3S= 0.35
X36 = 0.00 X37 = 0.00 X38 = 0.00 X39 = 0.43 X40 = 0.41 X41 = 19.16 X42 = 0.00
X43 0.00 X44 = 0.00 X45 = 0.00 X46 = 0.72 X47 = 0.00 X48 = 0.00 X49 = 19.28
90 Michalewicz and Janikow

Table 9. Solution found by GAMS: f(x) = 96.00

xl = 20.00 x 2 : 1.29 x3 = 0.95 x4 = 1.58 x 5 = 1.52 x6 = 1.58 x 7 = 0.08


x8 = 0.00 x 9 = 18.71 Xl0 = 0.39 xll = 1.59 x12 = 1.58 x13 = 0.12 x14 = 5.61
x15 = 0.00 x16 = 0.00 x17 = 18.66 x18 = 1.56 x19 = 1.47 x20 = 1.59 x21 = 1.72
x22 = 0.00 x23 = 0.00 xz4 = 0.00 x25 = 18.27 x26 = 1.25 x27 = 0.00 x28 = 0.48
X29 = 0.00 X30 = 0.00 X31 ~-"0.00 X32 = 0.00 X33 = 19.47 X34 = 0.53 X35 = 0.00
X36 = 0.00 X37 ~---0.00 X38 = 0.00 -X39= 0.00 X40 -~-0.00 X41 = 20.00 X42 = 0.00
X43 = 0.00 X44 : 0.00 X45 = 0.00 X46 : 0.00 X47 = 0.71 X48 = 1.18 X49 -~ 18.11

Also, further extensions are p l a n n e d to h a n d l e discrete


Table 10. Parameters used for
the 7 7 transportation prob- variables (integer, boolean, n o m i n a l ) a n d to h a n d l e some
lem classes of n o n - l i n e a r constraints.

Parameter Value

pop_size 40
References
prop-mutum 0.08
pr ob-mUtbm 0.03 Ackley, D. H. (1987) An empiricial study of bit vector function
prob-mutnm 0.07 optimization, in Genetic Algorithms and Simulated Anneal-
prob_crosssc O.10 ing, Davis, L. (ed), Pitman, London, pp. 170-204.
prob_crosss~ O.10 Antonisse, J. (1989) A new interpretation of schema notation
that overturns the binary encoding constraint, in Proceed-
prob_crossw, O.10
a 0.25 ings of the Third International Conference on Genetic Al-
b 2.0 gorithms, J. Schaffer (ed.), Morgan Kaufmann, San Mateo,
California, pp. 86-91.
Key: population size Antonisse, H. J. and Keller, K. S. (1987) Genetic operators for
(pop-size), probability of uni- high level knowledge representation, in Proceedings of the
form mutation (prob-mutum), Second International Conference on Genetic Algorithms,
probability of boundary muta- MIT, Cambridge, Mass., J. J. Grenfenstette (ed.), Lawrence
tion (probJnUtbm), probability Erlbaum, Hillsdale, New Jersey.
of non-uniform mutation Bellman, R. (1957) Dynamic Programming, Princeton University
(prob~rnUtnm), probability of Press, Princeton, NJ.
simple crossover Bethke, A. D. (1980) Genetic algorithms as function optimizers.
(prob_erosssc), probability of
PhD dissertation, University of Michigan.
single arithmetical crossover
(prob_crosss~), probability of Bertsekas, D. P. (1987) Dynamic Programming. Deterministic and
whole arithmetical crossover Stochastic Models, Prentice Hall, Englewood Cliffs, NJ.
(prob_erosswa), coefficient a Booker, L. B. (1982) Intelligent behavior as an adaptation to
for the whole arithmetical the task environment, PhD dissertation, University of
crossover, coefficient b for A of Michigan.
the non-uniform mutation. Booker, L. B. (1987) Improving search in genetic algorithms, in
Genetic Algorithms and Simulated Annealing, L. Davis (ed.),
Pitman, London, pp. 61-73.
algorithms to overcome three m a j o r problems: h a n d l i n g o f Bosworth, J., Foo, N. and Zeigler, B. P. (1972) Comparison of
constraints, p r e m a t u r e convergence, a n d local fine tuning. Genetic Algorithms with Conjugate Gradient Methods (CR-
The p r e l i m i n a r y results of several experiments are m o r e 2093), Washington, DC: National Aeronautics and Space
t h a n e n c o u r a g i n g a n d suggest that the m e t h o d s are very Administration.
useful. They m a y lead to the solution of some difficult Brindle, A. (1981) Genetic algorithms for function optimization.
operations research problems. We are currently in the PhD dissertation, University of Alberta, Edmonton.
process of i m p l e m e n t i n g a single system to b r i n g all the Brooke, A., Kendrick, D., Meeraus, A. (1988) GAMS: A User's
Guide, The Scientific Press, Redwood City, California.
above ideas together. W h e n completed, the system will be
Davis, L. (ed.) (1987) Genetic Algorithms and Simulated Anneal-
c o m p a r e d with m a n y software o p t i m i z a t i o n packages o n
ing, Pitman, London.
different functions with n o n - t r i v i a l constraints. The system DeJong, K. A. (1975) An analysis of the behaviour of a class of
should be able to deal with very complex p r o b l e m s ( t h o u - genetic adaptive systems. PhD dissertation, University of
sands of variables, h u n d r e d s of constraints) since we need Michigan.
to represent only a relatively small p o p u l a t i o n of potential DeJong, K. A. (1985) Genetic algorithms: a 10 year perspective,
solution vectors a n d apply new genetic operators to them. in Proceedings of the First International Conference on Genetic
Genetic algorithms f o r numerical optimization 91

Algorithms, Pittsburgh, 24-26 July, J. J. Grefenstette (ed.), Michalewicz, Z. and Janikow, C. (1991) GENOCOP: a genetic
Lawrence Erlbaum, Hillsdale, New Jersey. algorithm for numerical optimization problems with con-
Dhar, V. and Ranganathan, N. (1990) Integer programming vs. straints, to appear in Communications of ACM.
expert systems: an experimental comparison. Communications Michalewicz, Z., Jankowski, A., Vignaux, G. A. (1990) The
of ACM, 33, 323-336. constraints problem in genetic algorithms, in Methodologies
Goldberg, D. E. (1989) Genetic Algorithms in Search, Optimization for Intelligent Systems: Selected Papers, M. L. Emrich, M. S.
and Machine Learning, Addison-Wesley, Reading, Massachu- Phifer, B. Huber, M. Zemankova and Z. W. Ras (eds),
setts. Proceedings of the Fifth International Symposium on
Grefenstette, J. J. (1985) Proceedings of the First International Methodologies of Intelligent Systems, Knoxville, 25-27 Oc-
Conference on Genetic Algorithms, Pittsburgh, 24-26 July, tober, pp. 142-157.
Lawrence Erlbaum, Hillsdale, New Jersey. Michalewicz, Z., Kazemi, M., Krawczyk, J. and Janikow, C.
Grefenstette, J. J. (1986) Optimization of control parameters for (1990) On dynamic control problem, Proceedings of the 29th
genetic algorithms, IEEE Transactions on Systems, Man, and IEEE Conference on Decision and Control, Honolulu, 5-7
Cybernetics, 16(1), 122-128. December.
Grefenstette, J. J. (1987a) Incorporating problem specific knowl- Neumann, J. von (1966) Theory of Self-Reproducing Automata,
edge into genetic algorithms, in Genetic Algorithms and Burks (ed.), University of Illinois Press.
Simulated Annealing, L. Davis (ed.), Pitman, London, pp. Richardson, J. T., Palmer, M. R., Liepins, G. and Hilliard, M.
42-60. (1989) Some guidelines for genetic algorithms with penalty
Grefenstette, J. J. (1987b) Proceedings of the Second International functions, in Proceedings of the Third International Confer-
Conference on Genetic Algorithms, MIT, Cambridge, Mass., ence on Genetic Algorithms, 4-7 June, J. Schaffer (ed.),
28-31 July, Lawrence Erlbaum, Hillsdale, New Jersey. Morgan Kaufmann, London.
Groves, L., Michalewicz, Z., Elia, P. and Janikow, C. (1990) Sasieni, M., Yaspan, A. and Friedman, L. (1959) Operations
Genetic algorithms for drawing directed graphs, in Proceed- Research Methods and Problems, John Wiley and Sons.
ings of the Fifth International Symposium on Methodologies of Schaffer, J. (1989) Proceedings of the Third International Confer-
Intelligent Systems, Knoxville, 25-27 October, pp. 268-276. ence on Genetic Algorithms, 4-7 June, Morgan Kaufmann,
Holland, J. (1959) A universal computer capable of executing an Redwood City, California.
arbitrary number of sub-programs simultaneously, in Pro- Schaffer, J., Caruana, R., Eshelman, L. and Das, R. (1989), A
ceedings of the 1959 EJCC, pp. 108-113. study of control parameters affecting online performance of
Holland, J. (1975) Adaptation in Natural and Artificial Systems, genetic algorithms for function optimization, in Proceedings
University of Michigan Press, Ann Arbor. of the Third International Conference on Genetic Algorithms,
Janikow, C. and Michalewiez, Z. (1990) Specialized genetic 4-7 June, J. Schaffer (ed.), Morgan Kaufmann, pp. 51-60.
algorithms for numerical optimization problems, in Proceed- Sirag, D. J. and Weisser, P. T. (1987) Toward a unified thermo-
ings of the International Conference on Toolsfor AI, Washing- dynamic genetic operator, in Proceedings of the Second
ton, 6-9 November, pp. 798-804. International Conference on Genetic Algorithms, 28-31 July,
Janikow, C. and Michalewicz, Z. (199l) Convergence problem Lawrence Erlbaum, Hillsdale, New Jersey.
in genetic algorithms, submitted for publication. Taha, H. A. (1982) Operations Research: An Introduction, Collier
Kirkpatrick, S., Gellatt, C. and Vecchi, M. (1983) Optimization Macmillan, London.
by simulated annealing, Science, 220(4598), 671. Ulam, S. (1949) On the Monte Carlo Method, Proceedings of the
Laarhoven, P. J. M. van and Aarts, E. H. L. (1987) Simulated 2rid Symposium on Large Scale Digital Calculating Machin-
Annealing: Theory and Applications, D. Reidel, Dordrecht. ery, Harvard University Press, pp. 207-212.
Michalewicz, Z., Vignaux, G. A. and Groves, L. (1989) Genetic Vignaux, G. A. and Michalewicz, Z. (1989) Genetic algorithms
algorithms for optimization problems, in Proceedings of the for the transporation problem, Proceedings of the 4th Inter-
llth NZ Computer Conference, Wellington, New Zealand, national Symposium on Methodologies for Intelligent Sys-
16-18 August, pp. 211-223. tems, Charlotte, NC, 12-14 October, pp. 252-259.
Michalewicz, Z., Vignaux, G. A. and Hobbs, M. (1991) A genetic Vignaux, G. A. and Michalewicz, Z. (1990) A genetic algorithm
algorithm for the nonlinear transportation problem, ORSA for the linear transportation problem. IEEE Transactions on
Journal on Computing, 3(4). Systems, Man, and Cybernetics, 21(2).

Вам также может понравиться