Вы находитесь на странице: 1из 44

Genetic Algorithms

Muhannad Harrim

Introduction

After scientists became disillusioned with classical and neo-classical attempts at modeling intelligence, they looked in other directions. Two prominent fields arose, connectionism (neural networking, parallel processing) and e olutionary computing. !t is the latter that this essay deals with genetic algorithms and genetic programming.

What is GA

A genetic algorithm (or GA) is a search techni"ue used in computing to find true or appro#imate solutions to optimi$ation and search problems. %enetic algorithms are categori$ed as global search heuristics. %enetic algorithms are a particular class of e olutionary algorithms that use techni"ues inspired by e olutionary biology such as inheritance, mutation, selection, and crosso er (also called recombination).

What is GA

%enetic algorithms are implemented as a computer simulation in which a population of abstract representations (called chromosomes or the genotype or the genome) of candidate solutions (called indi iduals, creatures, or phenotypes) to an optimi$ation problem e ol es toward better solutions. Traditionally, solutions are represented in binary as strings of &s and 's, but other encodings are also possible.

What is GA

The e olution usually starts from a population of randomly generated indi iduals and happens in generations. !n each generation, the fitness of e ery indi idual in the population is e aluated, multiple indi iduals are selected from the current population (based on their fitness), and modified (recombined and possibly mutated) to form a new population.

What is GA

The new population is then used in the ne#t iteration of the algorithm. (ommonly, the algorithm terminates when either a ma#imum number of generations has been produced, or a satisfactory fitness le el has been reached for the population. !f the algorithm has terminated due to a ma#imum number of generations, a satisfactory solution may or may not ha e been reached.

Key terms

Individual - Any possible solution Population - %roup of all individuals Search Space - All possible solutions to the problem Chromosome - )lueprint for an individual Trait - *ossible aspect (features) of an individual Allele - *ossible settings of trait (black, blond, etc.) Locus - The position of a gene on the chromosome Genome - (ollection of all chromosomes for an individual

(hromosome, %enes and %enomes

Genotype and Phenotype

Genotype: + *articular set of genes in a genome Phenotype: + *hysical characteristic of the genotype (smart, beautiful, healthy, etc.)

Genotype and Phenotype

GA Requirements

A typical genetic algorithm re"uires two things to be defined, a genetic representation of the solution domain, and a fitness function to e aluate the solution domain. A standard representation of the solution is as an array of bits. Arrays of other types and structures can be used in essentially the same way. The main property that makes these genetic representations con enient is that their parts are easily aligned due to their fi#ed si$e, that facilitates simple crosso er operation. -ariable length representations may also be used, but crosso er implementation is more comple# in this case. Tree-like representations are e#plored in %enetic programming.

Representation
(hromosomes could be,

)it strings (&'&' ... ''&&) .eal numbers (/0.1 -00.' ... &.& 23.1) *ermutations of element (4'' 40 45 ... 4' 4'6) 7ists of rules (.' .1 .0 ... .11 .10) *rogram elements (genetic programming) ... any data structure ...

GA Requirements

The fitness function is defined o er the genetic representation and measures the quality of the represented solution. The fitness function is always problem dependent. 8or instance, in the knapsack problem we want to ma#imi$e the total alue of ob9ects that we can put in a knapsack of some fi#ed capacity. A representation of a solution might be an array of bits, where each bit represents a different ob9ect, and the alue of the bit (& or ') represents whether or not the ob9ect is in the knapsack. :ot e ery such representation is alid, as the si$e of ob9ects may e#ceed the capacity of the knapsack. The fitness of the solution is the sum of alues of all ob9ects in the knapsack if the representation is alid, or & otherwise. !n some problems, it is hard or e en impossible to define the fitness e#pression; in these cases, interacti e genetic algorithms are used.

A itness unction

!asics o GA

The most common type of genetic algorithm works like this, a population is created with a group of indi iduals created randomly. The indi iduals in the population are then e aluated. The e aluation function is pro ided by the programmer and gi es the indi iduals a score based on how well they perform at the gi en task. Two indi iduals are then selected based on their fitness, the higher the fitness, the higher the chance of being selected. These indi iduals then <reproduce< to create one or more offspring, after which the offspring are mutated randomly. This continues until a suitable solution has been found or a certain number of generations ha e passed, depending on the needs of the programmer.

General Algorithm or GA

Initiali"ation !nitially many indi idual solutions are randomly generated to form an initial population. The population si$e depends on the nature of the problem, but typically contains se eral hundreds or thousands of possible solutions. Traditionally, the population is generated randomly, co ering the entire range of possible solutions (the search space). =ccasionally, the solutions may be <seeded< in areas where optimal solutions are likely to be found.

General Algorithm or GA

Selection
>uring each successi e generation, a proportion of the e#isting population is selected to breed a new generation. !ndi idual solutions are selected through a fitness-based process, where fitter solutions (as measured by a fitness function) are typically more likely to be selected. (ertain selection methods rate the fitness of each solution and preferentially select the best solutions. =ther methods rate only a random sample of the population, as this process may be ery time-consuming. Most functions are stochastic and designed so that a small proportion of less fit solutions are selected. This helps keep the di ersity of the population large, pre enting premature con ergence on poor solutions. *opular and well-studied selection methods include roulette wheel selection and tournament selection.

General Algorithm or GA

!n roulette wheel selection, indi iduals are gi en a probability of being selected that is directly proportionate to their fitness. Two indi iduals are then chosen randomly based on these probabilities and produce offspring.

General Algorithm or GA
.oulette ?heel@s Aelection *seudo (ode,
for all members of population sum BC fitness of this indi idual end for for all members of population probability C sum of probabilities B (fitness D sum) sum of probabilities BC probability end for loop until new population is full do this twice number C .andom between & and ' for all members of population if number E probability but less than ne#t probability then you ha e been selected end for end create offspring end loop

General Algorithm or GA

Reproduction
The ne#t step is to generate a second generation population of solutions from those selected through genetic operators, crosso er (also called recombination), andDor mutation. 8or each new solution to be produced, a pair of <parent< solutions is selected for breeding from the pool selected pre iously. )y producing a <child< solution using the abo e methods of crosso er and mutation, a new solution is created which typically shares many of the characteristics of its <parents<. :ew parents are selected for each child, and the process continues until a new population of solutions of appropriate si$e is generated.

General Algorithm or GA

These processes ultimately result in the ne#t generation population of chromosomes that is different from the initial generation. %enerally the a erage fitness will ha e increased by this procedure for the population, since only the best organisms from the first generation are selected for breeding, along with a small proportion of less fit solutions, for reasons already mentioned abo e.

Crossover

the most common type is single point crosso er. !n single point crosso er, you choose a locus at which you swap the remaining alleles from on parent to the other. This is comple# and is best understood isually. As you can see, the children take one section of the chromosome from each parent. The point at which the chromosome is broken depends on the randomly selected crosso er point. This particular method is called single point crosso er because only one crosso er point e#ists. Aometimes only child ' or child 1 is created, but oftentimes both offspring are created and put into the new population. (rosso er does not always occur, howe er. Aometimes, based on a set probability, no crosso er occurs and the parents are copied directly to the new population. The probability of crosso er occurring is usually F&G to 5&G.

Crossover

Mutation

After selection and crosso er, you now ha e a new population full of indi iduals. Aome are directly copied, and others are produced by crosso er. !n order to ensure that the indi iduals are not all e#actly the same, you allow for a small chance of mutation. Hou loop through all the alleles of all the indi iduals, and if that allele is selected for mutation, you can either change it by a small amount or replace it with a new alue. The probability of mutation is usually between ' and 1 tenths of a percent. Mutation is fairly simple. Hou 9ust change the selected alleles based on what you feel is necessary and mo e on. Mutation is, howe er, ital to ensuring genetic di ersity within the population.

Mutation

General Algorithm or GA

Termination
This generational process is repeated until a termination condition has been reached. (ommon terminating conditions are,

A solution is found that satisfies minimum criteria 8i#ed number of generations reached Allocated budget (computation timeDmoney) reached The highest ranking solutionIs fitness is reaching or has reached a plateau such that successi e iterations no longer produce better results Manual inspection Any (ombinations of the abo e

GA Pseudo#code
(hoose initial population 4 aluate the fitness of each indi idual in the population .epeat Aelect best-ranking indi iduals to reproduce )reed new generation through crosso er and mutation (genetic operations) and gi e birth to offspring 4 aluate the indi idual fitnesses of the offspring .eplace worst ranked part of population with offspring Jntil Kterminating conditionE

Sym$olic AI %S& Genetic Algorithms


Most symbolic A! systems are ery static. Most of them can usually only sol e one gi en specific problem, since their architecture was designed for whate er that specific problem was in the first place. Thus, if the gi en problem were somehow to be changed, these systems could ha e a hard time adapting to them, since the algorithm that would originally arri e to the solution may be either incorrect or less efficient. %enetic algorithms (or %A) were created to combat these problems; they are basically algorithms based on natural biological e olution.

Sym$olic AI %S& Genetic Algorithms


The architecture of systems that implement genetic algorithms (or %A) are more able to adapt to a wide range of problems. A %A functions by generating a large set of possible solutions to a gi en problem. !t then e aluates each of those solutions, and decides on a <fitness le el< (you may recall the phrase, <sur i al of the fittest<) for each solution set. These solutions then breed new solutions. The parent solutions that were more <fit< are more likely to reproduce, while those that were less <fit< are more unlikely to do so. !n essence, solutions are e ol ed o er time. This way you e ol e your search space scope to a point where you can find the solution. %enetic algorithms can be incredibly efficient if programmed correctly.

%enetic *rogramming

!n programming languages such as 7!A*, the mathematical notation is not written in standard notation, but in prefi# notation. Aome e#amples of this, B1' , 1B' LB1'1 , 1 L (1B') LB-1'/3 , 3 L ((1 - ') B /) :otice the difference between the left-hand side to the rightM Apart from the order being different, no parenthesisN The prefi# method makes it a lot easier for programmers and compilers alike, because order precedence is not an issue. Hou can build e#pression trees out of these strings that then can be easily e aluated, for e#ample, here are the trees for the abo e three e#pressions.

%enetic *rogramming

%enetic *rogramming

Hou can see how e#pression e aluation is thus a lot easier. ?hat this ha e to do with %AsM !f for e#ample you ha e numerical data and IanswersI, but no e#pression to con9oin the data with the answers. A genetic algorithm can be used to Ie ol eI an e#pression tree to create a ery close fit to the data. )y IsplicingI and IgraftingI the trees and e aluating the resulting e#pression with the data and testing it to the answers, the fitness function can return how close the e#pression is.

%enetic *rogramming

The limitations of genetic programming lie in the huge search space the %As ha e to search for - an infinite number of e"uations. Therefore, normally before running a %A to search for an e"uation, the user tells the program which operators and numerical ranges to search under. Jses of genetic programming can lie in stock market prediction, ad anced mathematics and military applications .

'volving (eural (et)or*s

4 ol ing the architecture of neural network is slightly more complicated, and there ha e been se eral ways of doing it. 8or small nets, a simple matri# represents which neuron connects which, and then this matri# is, in turn, con erted into the necessary IgenesI, and arious combinations of these are e ol ed.

'volving (eural (et)or*s

Many would think that a learning function could be e ol ed ia genetic programming. Jnfortunately, genetic programming combined with neural networks could be incredibly slow, thus impractical. As with many problems, you ha e to constrain what you are attempting to create. 8or e#ample, in '33&, >a id (halmers attempted to e ol e a function as good as the delta rule. He did this by creating a general e"uation based upon the delta rule with 2 unknowns, which the genetic algorithm then e ol ed.

=ther Areas

%enetic Algorithms can be applied to irtually any problem that has a large search space. Al )iles uses genetic algorithms to filter out IgoodI and IbadI riffs for 9a$$ impro isation. The military uses %As to e ol e e"uations to differentiate between different radar returns. Atock companies use %A-powered programs to predict the stock market.

'+ample

f(#) C OMAP(#1), & KC # KC 01 Q 4ncode Aolution, Rust use 6 bits (' or &). %enerate initial population.
A ! C . & ' & ' ' ' ' & ' & & & & & & ' ' & & '

4 aluate each solution against ob9ecti e.


Sol& A ! C . String &''&' ''&&& &'&&& '&&'' ,itness 'F3 65F F/ 0F' - o Total '/./ /3.1 6.6 0&.3

'+ample Cont/d

(reate ne#t generation of solutions

*robability of Sbeing a parentT depends on the fitness. .eproduction

?ays for parents to create ne#t generation

Jse a string again unmodified. (ut and paste portions of one string to another. .andomly flip a bit.

(rosso er

Mutation

(=M)!:AT!=: of all of the abo e.

Chec*$oard e+ample

?e are gi en an n by n checkboard in which e ery field can ha e a different colour from a set of four colors. %oal is to achie e a checkboard in a way that there are no neighbours with the same color (not diagonal)
1

10 1 2 3 4 5 6 7 8 9 10

10 1 2 3 4 5 6 7 8 9 10

Chec*$oard e+ample Cont/d


(hromosomes represent the way the checkboard is colored. (hromosomes are not represented by bitstrings but by $itmatrices The bits in the bitmatri# can ha e one of the four alues &, ', 1 or 0, depending on the color. (rossing-o er in ol es matri# manipulation instead of point wise operating. (rossing-o er can be combining the parential matrices in a hori$ontal, ertical, triangular or s"uare way. Mutation remains bitwise changing bits in either one of the other numbers.

Chec*$oard e+ample Cont/d


U

This problem can be seen as a graph with n nodes and (n-1) edges, so the fitness 0+1 is defined as,

0+1 2 3 4 0n#51 4n

Chec*$oard e+ample Cont/d


U

8itnesscur es for different cross-o er rules,


Lower-Triangular Crossing Over 180 170 Fitness 160 150 140 130 180 170 160 150 140 130 Square Crossing Over

100

200

300

400

500

200

400

600

800

Horizontal Cutting Crossing Over 180 170 Fitness 160 150 140 130 180 170 160 150 140 130

Verical Cutting Crossing Over

200

400 Generations

600

800

500 1000 Generations

1500

6uestions

MM

T7A(K 89:

Вам также может понравиться