Вы находитесь на странице: 1из 27

Genetic Algorithm for Variable Selection

Jennifer Pittman ISDS Duke University

Genetic Algorithms Step by Step


Jennifer Pittman ISDS Duke University

Example: Protein Signature Selection in Mass Spectrometry

http://www.uni-mainz.de/~frosc

relati$e intensity

/f!g"po#.html

molecular weight

%enetic &lgorithm '(olland) * heuristic method !ased on + sur$i$al of the fittest , * useful when search space $ery large or too complex
for analytic treatment

* in each iteration 'generation) possi!le solutions or


indi$iduals represented as strings of num!ers
# .- # /0 #.1 - - --- - ------ -- - - - - - --- - ----

--

- - - --

-- - -

3 http://www.spectroscopynow.com

* all indi$iduals in population


e$aluated !y fitness function

* indi$iduals allowed to
reproduce 'selection)4 crosso$er4 mutate

2lowchart of %&

http://i!-poland.$irtuala$e.net/ee/genetic-/#geneticalgorithms.htm

'a simplified example)

5nitialization * proteins corresponding to ./6 mass spectrometry


$alues from # -#.// m/z

* assume optimal signature contains # peptides


represented !y their m/z $alues in !inary encoding

* population size ~M78/. where 8 is signature length

- - -

--- - ----

5nitial Population
M 7 -.
--

- - --- - ------ -- - - - - - --- - ----

- - - --

-- - -

8 7 .1

Searching * search space defined !y all possi!le encodings of


solutions

* selection4 crosso$er4 and mutation perform


+pseudo-random, wal9 through search space

* operations are non-deterministic yet directed

Phenotype :istri!ution

http://www.ifs.tuwien.ac.at/~aschatt/info/ga/genetic.html

E$aluation and Selection * e$aluate fitness of each solution in current


population 'e.g.4 a!ility to classify/discriminate) ;in$ol$es genotype-phenotype decoding<

* selection of indi$iduals for sur$i$al !ased on


pro!a!ilistic function of fitness

* on a$erage mean fitness of indi$iduals increases * may include elitist step to ensure sur$i$al of
fittest indi$idual

=oulette >heel Selection


3http://www.softchitech.com/ec"intro"html

?rosso$er * com!ine two indi$iduals to create new indi$iduals


for possi!le inclusion in next generation

* main operator for local search 'loo9ing close to


existing solutions)

* perform each crosso$er with pro!a!ility pc * crosso$er points selected at random

@ ./4A4 .0B

* indi$iduals not crossed carried o$er in population

5nitial Strings Single-Point


-- - - -- ---- - - -----

Cffspring

- - - -- ---

- ----- - -

Dwo-Point
-- - - -- ---- - - ----- - ----- -- - ----

Eniform
-- - - -- ---- - - ---- - - -------- - -- -

Mutation * each component of e$ery indi$idual is modified with


pro!a!ility pm

* main operator for glo!al search 'loo9ing at new


areas of the search space)

* pm usually small @

-4A4 . -B

rule of thum! 7 -/no. of !its in chromosome

* indi$iduals not mutated carried o$er in population

3http://www.softchitech.com/ec"intro"html

phenotype
# .- # /0 #.1 # -F # /G #-6/ # #6 #-0/ #-. #-GF # 00 #- 6 --

genotype
- - --- - ------ -- - - --- ----- - - -

fitness
.6F ..# .1/ .G1

- - - --

# 1 .

# 1 1 ---

- - -

--- - ---- --- ----- - -- - -

- - - -- - - --

selection

one-point crosso$er 'p7 .6)


.# .0 --- - --- - ---- --- ----- - -- - --- - --- ----- - -- - - --- - ----

- - - -- - - --

- - - -- - - --

mutation 'p7 . /)
- - ----- ---- --- - -- - - - -- ---- ----- - - - -

- --- - ----

-- - --- - ----

- - - -- - - --

-- - - - - --

starting generation
# .- # /0 #.1 # -F # /G #-6/ # #6 #-0/ #-. #-GF # 00 #- 6 -- - --- - ------ -- - - --- ----- - - .6F ..# .1/ .G1

- - - --

next generation
- - - --- ---- ----- - - - # .- # 1G #-.. #-66 #-01 #.1 #-GF #-. #- 6 #.-# # 00 # 1. .0.FF .1. .G0

-- - --- - ----

-- - - - - --

genotype

phenotype

fitness

%& E$olution

&ccuracy in Percent

-.

%enerations
http://www.sdsc.edu/s9idl/proHects/!io-SI5:8/

genetic algorithm learning

2itness criteria

-F

-6

-/

-1

%enerations

-/

http://www.demon.co.u9/apl#0//aplG6/s9om.htm

) de l acs' eu l a$ ss enti 2

iteration

* (olland4 J.

References

'-GG.)4 &daptation in natural and artificial systems 4 .nd Ed. ?am!ridge: M5D Press.

* :a$is4 8. 'Ed.) '-GG-)4 (and!oo9 of genetic algorithms.


Kew Lor9: Man Kostrand =einhold.

* %old!erg4 :. '-G0G)4 %enetic algorithms in search4


new philosophy of machine intelligence. Piscataway: 5EEE Press.

optimization and machine learning. &ddison->esley.

* 2ogel4 :. '-GG/)4 E$olutionary computation: Dowards a * NOc94 D.4 (ammel4 E.4 and Schwefel4 (. '-GGF)4

+E$olutionary computation: ?omments on the history and the current state,4 5EEE Drans. Cn E$ol. ?omp. -4 '-)

nline Resources

* http://www.spectroscopynow.com
/index.htm

* http://www.cs.!ris.ac.u9/~colin/e$ollect-/e$ollect * 5lli%&8 * %&li!

'http://www-illigal.ge.uiuc.edu/index.php#)

'http://lancet.mit.edu/ga/)

or p m i t necr eP

iteration

Schema and %&s * a schema is template representing set of !it strings


-PPP@--4 -- -4 - --4 -----4 A B

* e$ery schema s has an estimated a$erage fitness f's):


EtQ- 9 ;f's)/f'pop)< Et

* schema s recei$es exponentially increasing or decreasing


num!ers depending upon ratio f's)/f'pop)

* a!o$e a$erage schemas tend to spread through


population while !elow a$erage schema disappear 'simultaneously for all schema R +implicit parallelism,)

!A"DI#$ %

3www.protagen.de/pics/main/maldi..html

Вам также может понравиться