0 Голоса «за»0 Голоса «против»

Просмотров: 39136 стр.a

Sep 19, 2015

© © All Rights Reserved

PDF, TXT или читайте онлайн в Scribd

a

© All Rights Reserved

Просмотров: 39

a

© All Rights Reserved

- The Woman Who Smashed Codes: A True Story of Love, Spies, and the Unlikely Heroine who Outwitted America's Enemies
- Steve Jobs
- NIV, Holy Bible, eBook
- NIV, Holy Bible, eBook, Red Letter Edition
- Hidden Figures Young Readers' Edition
- Cryptonomicon
- Make Your Mind Up: My Guide to Finding Your Own Style, Life, and Motavation!
- The Golden Notebook: A Novel
- Alibaba: The House That Jack Ma Built
- The 10X Rule: The Only Difference Between Success and Failure
- Autonomous: A Novel
- Hit Refresh: The Quest to Rediscover Microsoft's Soul and Imagine a Better Future for Everyone
- Hit Refresh: The Quest to Rediscover Microsoft's Soul and Imagine a Better Future for Everyone
- Life After Google: The Fall of Big Data and the Rise of the Blockchain Economy
- Algorithms to Live By: The Computer Science of Human Decisions
- Console Wars: Sega, Nintendo, and the Battle that Defined a Generation
- The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution

Вы находитесь на странице: 1из 136

Computation for Signal Processing

and Image Analysis

Guest Editors: Riccardo Poli and Stefano Cagnoni

for Signal Processing and Image Analysis

for Signal Processing and Image Analysis

Guest Editors: Riccardo Poli and Stefano Cagnoni

This is a special issue published in volume 2003 of EURASIP Journal on Applied Signal Processing. All articles are open access

articles distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction

in any medium, provided the original work is properly cited.

Editor-in-Chief

Marc Moonen, Belgium

K. J. Ray Liu, College Park, USA

Associate Editors

Kiyoharu Aizawa, Japan

Gonzalo Arce, USA

Jaakko Astola, Finland

Kenneth Barner, USA

Mauro Barni, Italy

Sankar Basu, USA

Jacob Benesty, Canada

Helmut Blcskei, Switzerland

Chong-Yung Chi, Taiwan

M. Reha Civanlar, Turkey

Tony Constantinides, UK

Luciano Costa, Brazil

Zhi Ding, USA

Petar M. Djuri, USA

Jean-Luc Dugelay, France

Tariq Durrani, UK

Touradj Ebrahimi, Switzerland

Sadaoki Furui, Japan

Moncef Gabbouj, Finland

Fulvio Gini, Italy

Peter Handel, Sweden

Ulrich Heute, Germany

John Homer, Australia

Jiri Jan, Czech

Sren Holdt Jensen, Denmark

Mark Kahrs, USA

Ton Kalker, The Netherlands

Mos Kaveh, USA

Bastiaan Kleijn, Sweden

Ut-Va Koc, USA

Aggelos Katsaggelos, USA

C.-C. Jay Kuo, USA

Chin-Hui Lee, USA

Kyoung Mu Lee, Korea

Sang Uk Lee, Korea

Y. Geoffrey Li, USA

Ferran Marqus, Spain

Bernie Mulgrew, UK

King N. Ngan, Singapore

Antonio Ortega, USA

Mukund Padmanabhan, USA

Ioannis Pitas, Greece

Phillip Regalia, France

Hideaki Sakai, Japan

Wan-Chi Siu, Hong Kong

Dirk Slock, France

Piet Sommen, The Netherlands

John Sorensen, Denmark

Michael G. Strintzis, Greece

Tomohiko Taniguchi, Japan

Sergios Theodoridis, Greece

Xiaodong Wang, USA

Douglas Williams, USA

An-Yen (Andy) Wu, Taiwan

Xiang-Gen Xia, USA

Kung Yao, USA

Contents

Foreword, David E. Goldberg

Volume 2003 (2003), Issue 8, Pages 731-732

Editorial, Riccardo Poli and Stefano Cagnoni

Volume 2003 (2003), Issue 8, Pages 733-739

Blind Search for Optimal Wiener Equalizers Using an Artificial Immune Network Model,

Romis Ribeiro de Faissol Attux, Murilo Bellezoni Loiola, Ricardo Suyama, Leandro Nunes de Castro,

Fernando Jos Von Zuben, and Joo Marcos Travassos Romano

Volume 2003 (2003), Issue 8, Pages 740-747

Evolutionary Computation for Sensor Planning: The Task Distribution Plan, Enrique Dunn

and Gustavo Olague

Volume 2003 (2003), Issue 8, Pages 748-756

An Evolutionary Approach for Joint Blind Multichannel Estimation and Order Detection,

Chen Fangjiong, Sam Kwong, and Wei Gang

Volume 2003 (2003), Issue 8, Pages 757-765

Application of Evolution Strategies to the Design of Tracking Filters with a Large Number of

Specifications, Jess Garca Herrero, Juan A. Besada Portas, Antonio Berlanga de Jess,

Jos M. Molina Lpez, Gonzalo de Miguel Vela, and Jos R. Casar Corredera

Volume 2003 (2003), Issue 8, Pages 766-779

Tuning Range Image Segmentation by Genetic Algorithm, Gianluca Pignalberi, Rita Cucchiara,

Luigi Cinque, and Stefano Levialdi

Volume 2003 (2003), Issue 8, Pages 780-790

Parameter Estimation of a Plucked String Synthesis Model Using a Genetic Algorithm with Perceptual

Fitness Calculation, Janne Riionheimo and Vesa Vlimki

Volume 2003 (2003), Issue 8, Pages 791-805

Optimization and Assessment of Wavelet Packet Decompositions with Evolutionary Computation,

Thomas Schell and Andreas Uhl

Volume 2003 (2003), Issue 8, Pages 806-813

On the Use of Evolutionary Algorithms to Improve the Robustness of Continuous Speech Recognition

Systems in Adverse Conditions, Sid-Ahmed Selouani and Douglas O'Shaughnessy

Volume 2003 (2003), Issue 8, Pages 814-823

Evolutionary Techniques for Image Processing a Large Dataset of Early Drosophila Gene Expression,

Alexander Spirov and David M. Holloway

Volume 2003 (2003), Issue 8, Pages 824-833

A Comparison of Evolutionary Algorithms for Tracking Time-Varying Recursive Systems,

Michael S. White and Stuart J. Flockton

Volume 2003 (2003), Issue 8, Pages 834-840

Programming, Mengjie Zhang, Victor B. Ciesielski, and Peter Andreae

Volume 2003 (2003), Issue 8, Pages 841-859

c 2003 Hindawi Publishing Corporation

Foreword

David E. Goldberg

Department of General Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA

Email: deg@uiuc.edu

special issue on genetic algorithms (GAs) and evolutionary

computation (EC) in image and signal processing edited by

Riccardo Poli and Stefano Cagnoni for two reasons. First,

the special issue is another piece of the mounting evidence

that GAs and EC are finding an important niche in the solution of dicult real-world problems. Second, in reviewing

the contents of the special issue, I find it almost archetypal

in its reflection of the GA/EC applications world of 2003. In

the remainder of this discussion, I briefly review a number of

reasons why genetic and evolutionary techniques are becoming more and more important in real problems and discuss

some of the ways this issue used to both demonstrate eective

GA/EC application and foreshadow more signal and image

processing by evolutionary and genetic means.

There are a number of reasons why GAs and EC are becoming more prevalent in real applications. The first reason

is what I call the buzz. Let us face it, GAs are cool. The very

idea of doing a Darwinian survival of the fittest and genetics

on a computer is neat. But cool and neat, while they may attract our attention, do not merit our sustained involvement.

Another reason for which GAs have become more popular is the motivation from artificial systems. Although decades,

even centuries, of optimization and operations research leave

us with an impressive toolkit, the contingency basis of the

methodology leaves us somewhat cold. By this I mean that

the selection of an optimization technique or OR is contingent on the type of problem you face. If you have a linear

problem with linear constraints, you choose linear programming. If you have a stage decomposable problem, you choose

dynamic programming. If you have a nonlinear problem

with suciently pleasant constraints, you choose nonlinear

programming, and so on. But the very nature of this list of

methods that work in particular problems is part of the problem. One of the promises of biologically inspired techniques

is a framework that does not vary and a larger class of problems that can be tackled within that framework.

This vision of greater robustness is now being realized,

but it is tied to whether the solutions obtained using these

techniques are both tractable and practical. Results about

had a kind of Dr. Jekyll and Mr. Hyde nature. Simple genetic and evolutionary algorithms work well (subquadratically) on straightforward problems, but they require exponential times on more complex ones. This is not the place to

review these results in detail, and the interested reader can

look elsewhere (D. E. Goldberg, The Design of Innovation:

Lessons from and for Competent Genetic Algorithms, Kluwer,

Boston, 2002) but it suces to say that work on adaptive and

self-adaptive crossover and mutation operators is overcoming the tractability hurdle on real problems, resulting in what

appears to be broadly scalable (subquadratic) or competent

solvers.

Yet, theoretical tractability is of little solace to a practitioner who faces the daunting prospect of performing a million costly function evaluations on a 1000-variable problem.

As a result, increasing theory, implementation, and application are showing the way toward principled eciency enhancement using parallelization, time utilization, hybridization, and evaluation relaxation, and these methods are moving us from the realm of the competent (the tractable) to the

realm of the practical.

These fundamental reasonsthe buzz, the need, the

tractability, and the practicality of modern genetic and evolutionary algorithmsare driving an ever-increasing interest

in these methods, and this volume reflects that range of interest in terms of the application areas, operators, codings,

and accoutrements on display.

In terms of application, the use of GAs and EC in

this volume spans such disparate applications as filter tuning, sensor planning, system identification, object detection,

bioinformatic image processing, 3D model interpretation,

and speech recognition. The range of dierent applications

here is a reflection of the breadth of application elsewhere,

and the utility of the GA/EC toolkit across this landscape is

empirical evidence of the robustness of these methods.

Looking under the hood, we see a wide range of codings and operators in evidence, from floating-point vectors to permutations to program codes, from fixed to adaptive operators, and from crossover to mutation with various

732

Additionally, many of the papers here demonstrate an understanding of the importance of eciency enhancement

in real-world problems, and a number of them combine

the best of genetic and evolutionary computation with local search to form useful and ecient hybrids that solve the

problem. Too often, methods specialists are enamored with

the method they helped invent or perfect, but in the real

world, ecient solutions are obtained with an eective combine of global and local techniques.

In all, this special issue is a useful compendium for those

interested in signal and image processing and the proper application of genetic and evolutionary methods to the unsolved problems of these domains. To the field of genetic and

evolutionary computation, this special issue is a growing evidence of the importance of what that field does in areas of

human endeavor that matter. To audience members in both

camps, I recommend without reservation that you study this

special issue, and absorb and apply its many lessons.

David E. Goldberg

Distinguished Professor of Entrepreneurial

Engineering in the Department of General

Engineering at the University of Illinois at

Urbana-Champaign (UIUC). He is also Director of the Illinois Genetic Algorithms

Laboratory and is an aliate of the Technology Entrepreneur Center and the National Center for Supercomputing Applications. He is a 1985 recipient of a US National Science Foundation Presidential Young Investigator Award,

and in 1995, he was named an Associate of the Center for Advanced

Study at UIUC. He was a Founding Chairman of the International

Society for Genetic and Evolutionary Computation, and his book,

Genetic Algorithms in Search, Optimization and Machine Learning

(Addison-Wesley, 1989), is the fourth most widely cited reference

in computer science according to CiteSeer. He has just completed

a new monograph, The Design of Innovation (Kluwer, 2002), that

shows how to design scalable genetic algorithms and how such algorithms are similar to certain processes of human.

c 2003 Hindawi Publishing Corporation

Editorial

Riccardo Poli

Department of Computer Science, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, UK

Email: rpoli@essex.ac.uk

Stefano Cagnoni

Department of Computer Engineering, University of Parma, 43100 Parma, Italy

Email: cagnoni@ce.unipr.it

1.

INTRODUCTION

powerful mechanism of nature mankind has ever discovered. Its power is evident in the impressive level of adaptation reached by all species of animals and plants in nature. It

is intriguing because despite its simplicity and randomness,

it produces incredible complexity in a way that appears to

be very directed, almost purposeful. Like for other powerful

natural phenomena, it is no surprise that several decades ago

a few brilliant researchers in engineering and computer science started wondering whether they could steal the secrets

behind Darwinian evolution and use them to solve problems of practical interest in a variety of application domains.

Those people were pioneers of a new field which, after more

than 30 years from its inception, is now big, well established,

and goes under the name of genetic and evolutionary computation (GEC).

An almost endless number of results and applications of

evolutionary algorithms have been reported in the literature,

showing that the ideas of these pioneers were indeed right.

Nowadays, evolutionary techniques can routinely solve problems in domains such as automatic design, optimisation, pattern recognition, control, and many others. Until recently,

however, only very occasionally could one claim that GEC

techniques approached the performance of human experts in

these same domains, particularly in the case of large scale applications and complex engineering problems. This is why,

initially, successful applications of GEC techniques to the

fields of computer vision, image analysis, and signal processing were few and far in between. Towards the late 1990s, however, the research interest in these areas seemed to be rapidly

growing, and the time seemed right for the creation of an

infrastructure that could foster the interaction between researchers in this area. This is what led, in early 1998, the two

editors of this special issue, together with people from vari-

the European Network of Excellence in Evolutionary Computation, entirely devoted to the applications of evolutionary algorithms to image analysis and signal processing. The

working group organises a regular meeting, the European

Workshop on Evolutionary Computation in Image Analysis

and Signal Processing (EvoIASP), which reached its fifth edition this year. This event gives European and non-European

researchers, as well as people from industry, an opportunity

to present their latest research, discuss current developments

and applications, besides fostering closer interaction between

members of the GEC, image analysis, and signal processing

scientific communities. However, the event is, and intends to

remain, a workshop. Therefore, the work presented there can

never have the depth allowed by more substantial and mature

archival publications such as this journal.

This special issue of EURASIP JASP on GEC for signal

processing and image analysis, being the first of its kind,

has oered computer scientists and engineers from around

the world a unique opportunity to submit their best mature

research for inclusion in this unified, high-quality venue.

The timing of this special issue could not have been better;

well over thirty papers were submitted by contributors from

around the world. The papers were reviewed by a pool of over

thirty international expert reviewers. Only about one third

passed our strict criteria for acceptance and are now in this

volume.

The rest of this editorial is organised as follows. In

Section 2, we will provide a gentle introduction to the basics of evolutionary computation. In Section 3, we describe

each of the papers present in this special issue, briefly summarising, for each one, the problem considered and the evolutionary techniques adopted to tackle it. In Section 4, we

provide our final remarks and acknowledgments, while in the

appendix, we give a brief commented bibliography with suggested further reading.

734

2.

EVOLUTIONARY COMPUTATION: THE BASICS

that the pioneers of GEC stole to make them the propelling

fuel of evolutionary computation processes?

Inheritance

Individuals have a genetic representation (in nature, the

chromosomes and the DNA) such that it is possible for the

ospring of an individual to inherit some of the features of

its parent.

Variation

The osprings are not exact copies of the parents, but instead, reproduction involves mechanisms that create innovation as new generations are born.

Natural selection

Individuals best adapted to the environment have longer life

and higher chances of mating and spreading their genetic

makeup.

Clearly, there is a lot more to natural evolution than these

main forces. However, like for many other nature-inspired

techniques, not all the details are necessary to obtain working

models of a natural system. The three ingredients listed above

are in fact sucient to obtain artificial systems showing the

main characteristic of natural evolution, the ability to search

for highly fit individuals.

For all these ingredients (representation, variation, and

selection), one can focus on dierent realisations. For example, in nature, variation is produced both through mutations

of the genome and through the eect of sexually recombining the genetic material coming from the parents when obtaining the ospring chromosomes (crossover). This is why

many dierent classes of evolutionary algorithms have been

proposed over the years. So, depending on the structures undergoing evolution, on the reproduction strategies and the

variation (or genetic) operators adopted, and so on, evolutionary algorithms can be grouped into several evolutionary

paradigms: genetic algorithms (GAs) [1], genetic programming (GP) [2], evolution strategies (ESs) [3, 4], and so forth.

All the inventors of these dierent evolutionary algorithms (EAs) have to make choices as to which bits of nature

have a corresponding component in their algorithms. These

choices are summarised in the nature-to-computer mapping

shown in Table 1. That is, the notion of individual in nature

corresponds to a tentative solution to a problem of interest

in an EA. The fitness (ability to reproduce and have fertile

osprings that reach the age of reproduction) of natural individuals corresponds to the objective function used to evaluate the quality of the tentative solutions in the computer.

The genetic variation processes of mutation and recombination are seen as mechanisms (search operators) to generate

new tentative solutions to the problem. Finally, natural selection is interpreted as a mechanism to promote the diusion

and mixing of the genetic material of individuals representing good quality solutions and therefore having the potential

to create even fitter individuals (better solutions).

Nature

Computer

Individual

Population

Fitness

Chromosome

Solution to a problem

Set of solutions

Quality of a solution

Representation for a solution

(e.g., set of parameters)

Gene

(e.g., parameter or degree of freedom)

Crossover

Mutation

Natural selection

Search operator

Search operator

Promoting the reuse of good (sub-)solutions

general form.

(1) Initialise population and evaluate the fitness of each

population member.

(2) Repeat.

(a) Select subpopulation for reproduction on the basis

of fitness (selection).

(b) Copy some of the selected individuals without

change (cloning or reproduction).

(c) Recombine the genes of selected parents (recombination or crossover).

(d) Mutate the mated population stochastically (mutation).

(e) Evaluate the fitness of the new population.

(f) Select the survivors based on fitness.

Not all these steps are present in all EAs. For example, in

modern GAs [5] and in GP phase, (a) is part of phases (b)

and (c), while phase (f) is absent. This algorithm is said to be

generational because there is no overlap between generations

(i.e., the ospring population always replaces the parent population). In generational EAs, cloning is used to simulate the

survival of parents for more than one generation.

In the following, we will analyse the various components of an EA in more detail, mainly concentrating on the

GA, although most of what we will say also applies to other

paradigms.

2.1.

Representations

Typically, an adult individual (a solution for a problem) takes

the form of a vector of numbers. These are often interpreted

as parameters (for a plant, for a design, etc.), but in combinatorial optimisation problems, these numbers can actually represent configurations, choices, schedules, paths, and

so on. Anything that can be represented on a digital computer can also be represented in a GA using a binary representation. This is why, at least in principle, GAs have a really broad applicability. However, other nonbinary representations are available, which may be more suitable, for example, for problems with real-valued parameters.

Editorial

735

constitutes a good initial set of choices/parameters for adult

individuals (tentative solutions to a problem), the chromosomes to be manipulated by the GA are normally initialised

in an entirely random manner. That is, the initial population

is a set of random binary strings or of random real-valued

vectors.

Crossover point

Crossover point

101010 1010

101010 1110

111000 1110

111000 1010

Parents

Ospring

Selection is the operation by which individuals (i.e., their

chromosomes) are selected for mating or cloning. To emulate

natural selection, individuals with a higher fitness should be

selected with higher probability. There are many models of

selection, some of which, despite fitting well the biologically

inspired computational model and producing eective results, are not biologically plausible. We briefly describe them

below.

Fitness proportionate selection, besides being the most direct translation into the computational model of the probabilistic principles of evolution, is probably the most widely

used selection scheme. This works as follows. Let N be the

population

size, fi the fitness of individual i, and f =

(1/N) j f j the average population fitness. Then, in fitness

proportionate selection, individual i is selected for reproduction with a probability

(a)

Crossover points

Crossover points

10 1010 1010

11 1010 1110

11 1000 1110

10 1000 1010

Parents

Ospring

(b)

1010101010

111010 1 110

1110001110

pi =

fi

j

fi

fj

= .

fN

(1)

Parents

shrink, so N individuals have to be selected for reproduction.

Therefore, the expected number of selected copies of each individual is

fi

Ni = pi N = .

f

Ospring

(c)

crossover, (b) two-point crossover, (c) uniform crossover.

(2)

to be selected more than once for mating or cloning, while

individuals below the average tend not to be used.

Tournament selection, instead, works as follows. To select

an individual, first a group of T (T 2) random individuals

is created. Then the individual with the highest fitness in the

group is selected, the others are discarded (tournament).

Another alternative is rank selection where individuals are

first sorted (ranked) on the ground of their fitness so that

if an individual i has fitness fi > f j , then its rank is i < j.

Then each individual is assigned a probability of being selected pi taken from a given distribution (typically a monotonic

rapidly decreasing function), with the constraint that

i pi = 1.

2.3. Operators

EAs work well only if their genetic operators allow an ecient

and eective search of the space of tentative solutions.

One desirable property of recombination operators is

to guarantee that two parents sharing a useful common

characteristic always transmit such a characteristic to their

that dierent characteristics distinguishing two parents may

be all inherited by their ospring. For binary GAs, there are

many crossover operators with these properties.

One-point crossover, for example, aligns the two parent chromosomes (bit strings), then cuts them at a randomly chosen common point and exchanges the right-hand

side (or left-hand side) subchromosomes (see Figure 1a).

In two-point crossover, chromosomes are cut at two randomly chosen crossover points and their ends are swapped

(see Figure 1b). A more modern operator, uniform crossover,

builds the ospring, one bit at a time, by randomly selecting one of the corresponding bits from the parents (see

Figure 1c).

Normally, crossover is applied to the individuals of a population with a constant probability pc (often pc [0.5, 0.8]).

Cloning is then applied with a probability 1 pc to keep the

number of individuals in each generation constant.

Mutation is the second main genetic operator used in

GAs. A variety of mutation operators exist. Mutation typically consists of making (usually small) alterations to the

736

Mutation

site

Mutation

site

101 0 101010

101 1 101010

Parent 2

Parameter 1

Parameter 2

Ospring

Parent 1

Parameter 3

often applied to the individuals produced by crossover and

cloning before they are added to the new population. In binary chromosomes, mutation often consists of inverting random bits of the genotypes (see Figure 2). The main goal with

which mutation is applied is preservation of diversity, which

helps GAs to explore as much of the search space as possible. However, due to its random nature, mutation may have

disruptive eects onto evolution if it occurs too often. Therefore, in GAs, mutation is usually applied to genes with a very

low probability.

In real-valued GAs, chromosomes have the form x =

x1 , . . . , x where each gene xi is represented by a floatingpoint number. In these GAs, crossover is often seen as an interpolation process in a multidimensional Euclidean space.

So, the components of the ospring o are calculated from the

corresponding components of the parents p and p as follows:

(3)

(a)

Parent 2

Parameter 1

Ospring

Parameter 2

Parent 1

Parameter 3

(b)

Mutated

individual

Parameter 1

Parameter 2

Random

displacement

Figure 3a). Alternatively, crossover can be seen as the exploration of a multidimensional hyperparallelepiped defined by

the parents (see Figure 3b), that is, the components oi are

chosen uniformly at random within the intervals

(4)

Individual

Parameter 3

(c)

Mutation is often seen as the addition of a small random variation (e.g., Gaussian noise) to a point in a multidimensional

space (see Figure 3c).

Figure 3: (a), (b) crossover operators and (c) mutation for realvalued GAs.

As mentioned before, the principles on which GAs are based

are also shared by many other EAs. However, the use of different representations and operators has led to the development of a number of paradigms, each having its own peculiarities. With no pretence of being exhaustive, in the following, we will shortly mention those paradigms, other than

GAs, that are used in the papers included in this special

issue.

Genetic programming [2, 6] is a variant of GA in which

the individuals being evolved are syntax trees, typically representing computer programs. The trees are created using userdefined primitive sets, which typically include input variables, constants, and a variety of functions or instructions.

crossover and mutation that guarantee the syntactic validity of the ospring. The fitness of the individual trees in the

population is evaluated by running the corresponding programs (typically multiple times, for dierent values of their

input variables).

Evolution strategies [3, 4] are real-valued EAs where

mutation is the key variation operator (unlike GAs where

crossover plays that role). Mutation typically consists of

adding zero-mean Gaussian deviates to the individuals being optimised, with the mutation standard deviation being

varied dynamically so as to maximise the performance of the

algorithm.

Editorial

Artificial immune systems (see [7, Part III, Chapters 10

13] or [8] for an extensive introduction) are distributed computational systems inspired by biological immune systems,

which can recognise patterns and can remember previously

seen patterns in an ecient and eective way. These systems

are very close relatives of EAs (sometimes involving an evolutionary process in their inner mechanics) although they use

a dierent biological metaphor.

3.

In their paper entitled Blind search for optimal Wiener equalizers using an artificial immune network model, Attux et al.

exploit recent advances in the field of artificial immune systems to obtain optimum equalisers for noisy communication

channels, using a technology that does not require the availability of clean samples of the input signal. This approach is

very successful in a variety of test equalisation problems. The

approach is also compared with a more traditional EA, a GA

with niching, showing superior performance.

The paper by Dunn and Olague, entitled Evolutionary

computation for sensor planning, shows how well-designed

evolutionary computation techniques can solve the problem

of optimally specifying sensing tasks for a workcell provided

with multiple manipulators and cameras. The problem is

NP hard, eectively being a composition of a set partitioning problem and multiple traveling salesperson problems.

Nonetheless, thanks to clever representations and the use of

evolutionary search, this system is able to solve the problem,

providing solutions of quality very close to that of the solutions obtained via exhaustive search, but in a tiny fraction of

the time.

The paper entitled An evolution approach for joint blind

multichannel estimation and order detection by Fangjiong et

al. presents a method for the detection of the order and

the estimation of the parameters of a single-input multipleoutput channel. The method is based on a hybrid GA with

specially designed operators. The method shows performances comparable with existing closed-form approaches

which, however, are much more restricted in that they either

assume that the channel order is known or treat the problems

of order estimation and parameter estimation separately.

In Application of evolution strategies to the design of tracking filters with a large number of specifications, Herrero et

al. attack the problem of tracking civil aircrafts from radar

information within the extremely tight performance constraints imposed by a civil aviation standard. They use interactive multiple mode filters optimised by using an ES and

a multiobjective optimisation approach obtaining a highperformance aircraft tracker.

Making EAs more at hand and easy to apply for general

practitioners by self-tuning their parameters is one of the

main aims with which Pignalberi et al. developed GASE, a

GA-based tool for range image segmentation. The system,

along with some practical results, is described in the paper Tuning range image segmentation by genetic algorithm. A

multiobjective fitness function is adopted to take into consideration problems that are typically encountered in range

737

image segmentation.

The paper Parameter estimation of a plucked string synthesis model using a genetic algorithm with perceptual fitness

calculation describes the use of GAs to estimate the control parameters for a widely used plucked string synthesis

model. Using GAs, Riionheimo and Valimaki have been able

to automate parameter extraction, which had been formerly

achieved only through semiautomatic approaches, obtaining

comparable results, both in quantitative and in qualitative

terms. An interesting feature of the approach is the inclusion of knowledge about perceptual properties of the human

hearing system into the fitness function.

Schell and Uhl compare results obtained with a GA-based

approach to the near-best-basis (NBB) algorithm, a wellknown suboptimal algorithm for wavelet packet decomposition. In their paper Optimization and assessment of wavelet

packet decompositions with evolutionary computation, they

highlight the problem of finding good cost functions in terms

of correlation with actual image quality. They show that GAs

provide lower-cost solutions that, however, provide lowerquality images than NBB.

In the paper entitled On the use of evolutionary algorithms

to improve the robustness of continuous speech recognition

systems in adverse conditions, Selouani and OShaughnessy

show how a GA can tune a system based on state-of-the-art

speech recognition technology so as to maximise its recognition accuracy in the presence of severe noise. This hybrid

of evolution and conventional signal processing algorithms

amply outperforms nonadaptive systems. The EA used is a

GA with real-coded representation, rank selection, a heuristic type of crossover, and a nonuniform mutation operator.

The paper Evolutionary techniques for image processing a

large dataset of early Drosophila gene expression by Spirov and

Holloway describes an evolutionary approach to image processing to process confocal microscopy images of patterns of

activity for genes governing early Drosophila development.

The problem is approached using plain GAs, a simplex approach, and a hybrid between these two.

The use of GAs to track time-varying systems based on

recursive models is tackled in A comparison of evolutionary

algorithms for tracking time-varying recursive systems. The paper first compares a plain GA with a GA variant, called random immigrant strategy, showing that the latter performs

better in tracking time-varying systems even if it has problems with fast-varying systems. Finally, a hybrid combination

of GAs and local search that is able to tackle even such hard

tasks is proposed.

Zhang et al., in their paper A domain-independent window approach to multiclass object detection using genetic programming, propose an interesting approach in which GP is

used to both detect and localise features of interest. The approach is compared with a neural network classifier, used

as reference, showing that GP evolved programs can provide significantly lower false-alarm rates. Within the proposed approach, the choice of the primitive set is also discussed, comparing results obtained with two dierent sets:

one comprises only the four basic arithmetical operators, and

738

reported in the paper provide interesting clues to practitioners that would like to use GP to tackle image processing tasks.

10.

4.

11.

CONCLUSIONS

will enjoy reading the papers in this special issue as we did

ourselves. We hope that the broadness of domains to which

EAs can be applied, demonstrated by the contents of this issue, will convince other researchers in image analysis and

signal processing to get acquainted with the exciting world

of evolutionary computation and to apply its powerful techniques to solve important new and old problems in these

areas.

12.

13.

14.

APPENDIX

POINTERS TO FURTHER READING IN GEC

1. David E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley,

Reading, Massachusetts, 1989. A classic book on genetic algorithms and classifier systems.

2. David E. Goldberg. The Design of Innovation: Lessons

from and for Competent Genetic Algorithms. Kluwer

Academic Publishers, Boston, 2002. An excellent, longawaited followup of Goldbergs first book.

3. Melanie Mitchell, An introduction to genetic algorithms,

A Bradford Book, MIT Press, Cambridge, MA, 1996. A

good introduction to genetic algorithms.

4. John H. Holland, Adaptation in Natural and Artificial

Systems, second edition, A Bradford Book, MIT Press,

Cambridge, MA, 1992. Second edition of a classic from

the inventor of genetic algorithms.

5. Thomas Back and Hans-Paul Schwefel. An overview

of evolutionary algorithms for parameter optimization. Evolutionary Computation, vol. 1, no. 1, pp. 123,

1993. A good introduction to parameter optimisation

using EAs.

6. T. Back, D. B. Fogel and T. Michalewicz, Evolutionary Computation 1: Basic Algorithms and Operators,

Institute of Physics Publishing, 2000. A modern introduction to evolutionary algorithms. Good both for

novices and more expert readers.

7. John R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT

Press, 1992. The bible of genetic programming by the

founder of the field. Followed by GP II (1994), GP III

(1999), and GP IV (forthcoming).

8. Wolfgang Banzhaf, Peter Nordin, Robert E. Keller and

Frank D. Francone, Genetic ProgrammingAn Introduction; On the Automatic Evolution of Computer Programs and its Applications, Morgan Kaufmann, 1998.

An excellent textbook on GP.

9. W. B. Langdon and Riccardo Poli, Foundations of Genetic Programming, Springer, February 2002. The only

15.

16.

17.

book entirely devoted to the theory of GP and its relations with the GA theory.

Proceedings of the International Conference on Genetic Algorithms (ICGA). ICGA is the oldest conference on EAs.

Proceedings of the Genetic Programming Conference.

This was the first conference entirely devoted to GP.

Proceedings of the Genetic and Evolutionary Computation Conference (GECCO). Born in 1999 from the

recombination of ICGA and the GP conference mentioned above, GECCO is the largest conference in the

field.

Proceedings of the Foundations of Genetic Algorithms

(FOGA) Workshop. FOGA is a biannual, small but

very prestigious and highly selective workshop. It is

mainly devoted to the theoretical foundations of EAs.

Proceedings of the Congress on Evolutionary Computation (CEC). CEC is a large conference under the patronage of IEEE.

Proceedings of Parallel Problem Solving from Nature

(PPSN). This is a large biannual European conference,

probably the oldest of its kind in Europe.

Proceedings of the European Workshop on Evolutionary Computation in Image Analysis and Signal Processing (EvoIASP). This is a small workshop, reaching

its fifth edition in 2003. It is the only event worldwide

uniquely devoted to the research topics covered by this

special issue.

Proceedings of the European Conference on Genetic

Programming. EuroGP was the first European event

entirely devoted to GP. Run as a workshop in 1998

and 1999, it became a conference in 2000. It has now

reached its sixth edition with EuroGP 2003 held at the

University of Essex. Currently, this is the largest event

worldwide solely devoted to GP.

ACKNOWLEDGMENTS

The guest editors would like to thank Professor David E.

Goldberg for his insightful foreword, the former and present

editors-in-chief of EURASIP JASP, Professor K. J. Ray Liu

and Professor Marc Moonen, for their support in putting together this special issue, and all the reviewers who have generously devoted their time to help ensure the highest possible

quality for the papers in this volume. All the authors of the

manuscripts who have contributed to this special issue are

also warmly thanked.

Riccardo Poli

Stefano Cagnoni

REFERENCES

[1] J. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor, Mich, USA, 1975.

[2] J. R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection, MIT Press, Cambridge,

Mass, USA, 1992.

Editorial

[3] I. Rechenberg, Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution, FrommannHolzboog, Stuttgart, Germany, 1973.

[4] H.-P. Schwefel, Numerical Optimization of Computer Models,

Wiley, Chichester, UK, 1981.

[5] M. Mitchell, An Introduction to Genetic Algorithms, MIT Press,

Cambridge, Mass, USA, 1996.

[6] W. B. Langdon and R. Poli, Foundations of Genetic Programming, Springer-Verlag, New York, NY, USA, 2002.

[7] D. Corne, M. Dorigo, and F. Glover, Eds., New Ideas in Optimization, McGraw-Hill, London, UK, 1999.

[8] D. Dasgupta, Ed., Artificial Immune Systems and Their Applications, Springer-Verlag, New York, NY, USA, 1999.

Riccardo Poli received a Ph.D. degree in

bioengineering (1993) from the University

of Florence, Italy, where he worked on image analysis, genetic algorithms, and neural networks until 1994. From 1994 to 2001,

he was a lecturer and then a reader in

the School of Computer Science of the

University of Birmingham, UK. In 2001,

he became a Professor at the Department

of Computer Science of the University of

Essex, where he founded the Natural and Evolutionary Computation Group. Professor Poli has published around 130 papers on

evolutionary algorithms, genetic programming, neural networks,

and image analysis and signal processing, including the book Foundations of Genetic Programming (Springer, 2002). He has been

Cochair of EuroGP, the European Conference on GP, in 1998,

1999, 2000, and 2003. He was Chair of the GP theme at the Genetic and Evolutionary Computation Conference (GECCO) 2002

and Cochair of the Foundations of Genetic Algorithms Workshop

(FOGA) 2002. He will be General Chair of GECCO 2004. Professor Poli is an Associate Editor of Evolutionary Computation (MIT

Press) and Genetic Programming and Evolvable Machines (Kluwer),

a reviewer for 12 journals, and has been a programme committee

member of 40 international events.

Stefano Cagnoni has been an Assistant Professor in the Department of Computer Engineering of the University of Parma since

1997. He received the Ph.D. degree in bioengineering in 1993. In 1994, he was a

Visiting Scientist at the Whitaker College

Biomedical Imaging and Computation Laboratory at the Massachusetts Institute of

Technology. His main research interests are

in computer vision, evolutionary computation, and robotics. As a member of EvoNet, the European Network of Excellence in Evolutionary Computation, he has chaired

the EvoIASP working group on evolutionary computation in image analysis and signal processing, and the corresponding workshop since its first edition in 1999. He is a reviewer for several journals and a programme committee member of several international

events.

739

c 2003 Hindawi Publishing Corporation

an Artificial Immune Network Model

Romis Ribeiro de Faissol Attux

DSPCOM, DECOM, FEEC, State University of Campinas, C.P. 6101, Campinas, SP, Cep 13083-970, Brazil

Email: romisri@decom.fee.unicamp.br

DSPCOM, DECOM, FEEC, State University of Campinas, C.P. 6101, Campinas, SP, Cep 13083-970, Brazil

Email: mloiola@decom.fee.unicamp.br

Ricardo Suyama

DSPCOM, DECOM, FEEC, State University of Campinas, C.P. 6101, Campinas, SP, Cep 13083-970, Brazil

Email: rsuyama@decom.fee.unicamp.br

DCA, FEEC, State University of Campinas, C.P. 6101, Campinas, SP, Cep 13083-970, Brazil

Email: lnunes@dca.fee.unicamp.br

DCA, FEEC, State University of Campinas, C.P. 6101, Campinas, SP, Cep 13083-970, Brazil

Email: vonzuben@dca.fee.unicamp.br

Joao

DSPCOM, DECOM, FEEC, State University of Campinas, C.P. 6101, Campinas, SP, Cep 13083-970, Brazil

Email: romano@decom.fee.unicamp.br

Received 28 June 2002 and in revised form 1 December 2002

This work proposes a framework to determine the optimal Wiener equalizer by using an artificial immune network model together

with the constant modulus (CM) cost function. This study was primarily motivated by recent theoretical results concerning the

CM criterion and its relation to the Wiener approach. The proposed immune-based technique was tested under dierent channel

models and filter orders, and benchmarked against a procedure using a genetic algorithm with niching. The results demonstrated

that the proposed strategy has a clear superiority when compared with the more traditional technique. The proposed algorithm

presents interesting features from the perspective of multimodal search, being capable of determining the optimal Wiener equalizer

in most runs for all tested channels.

Keywords and phrases: blind equalization, constant modulus algorithm, evolutionary computation, artificial immune systems,

immune network model.

1.

INTRODUCTION

studied blind equalization technique. The last 20 years have

seen the proposal of many relevant works scrutinizing the basis of the CM criterion and its relation to other criteria.

These works pointed out two aspects that deserve to be

highlighted [3, 4]:

(1) the CM cost function is multimodal;

and some Wiener optima.

In particular, the literature indicates a one-to-one relationship between the best Wiener solutions and the minima

of the CM criterion.

From these considerations, it is possible to make a strong

claim: if one can determine the CM global minima, then the

best possible Wiener receiver can also be evaluated.

This suggestion opens an exciting perspective: the possibility of obtaining the best equalizer (in the mean square error

sense) without a desired signal, that is, by using a blind or unsupervised search strategy. To achieve this goal, it is necessary

to propose a method capable of locating, over a set of local minima, the best CM minimum in most of the runs performed by the algorithm. Evolutionary algorithms (EAs) are

particularly suitable to determine the optimal Wiener equalizer because they present a high capability of performing

an exploratory search when a priori knowledge is not available.

This paper proposes to apply the optimization version

of an artificial immune network model, named opt-aiNet

[5], to the problem of determining the optimal Wiener

solution. By combining the CM criterion with the optaiNet algorithm, this paper introduces a novel framework

(CM + opt-aiNet) to obtain the optimal receiver.

Dierent channel models and filter orders were used to

evaluate the potential for finding the global Wiener minimum. In some cases, the proposed strategy was compared

with an approach based on genetic algorithms with niching

[6], which proved to be a valuable tool to solve this problem,

and thus benchmark the proposed technique. In all cases,

the obtained results validated the framework, demonstrating

that it is possible to find the optimal equalizer for a given

channel by using a powerful blind search technique.

The paper is organized as follows. Section 2 presents

some theoretical considerations on the equivalence between

the CM minima and Wiener solutions, a cornerstone of this

work. Section 3 introduces the immunologically inspired algorithm, named opt-aiNet, and places it in the context of

other search techniques, with particular emphasis on EAs.

Section 4 presents the simulation results and discusses the

performance of the algorithm by comparing it with a genetic

algorithm with niching. The final remarks and future trends

are presented in Section 5.

2.

The main goal of communications engineering is to provide adequate message interchange, through a certain channel, between a transmitter and a receiver. Nevertheless, the

channels introduce distortion in the transmitted message,

what usually leads to severe degradation. A device named

equalizer filters the received signal in order to recover the

desired information. Figure 1 depicts the schematic channel

and equalizer representation in a communication system, together with their respective input and output signals.

From Figure 1, it can be inferred that the main goal of the

equalizer is to obtain an output signal as similar as possible

to the transmitted signal, except for a gain K and a delay d,

that is,

y(n) = K s(n d),

(1)

In most applications, the equalizer is implemented using

a finite impulse response (FIR) filter, which is a mathemati-

741

s(n)

x(n)

Channel

y(n)

Equalizer

relationship is given by

y(n) = wT x(n),

(2)

x(n) = [x(n)x(n 1) x(n L + 1)]T is the input vector.

Consequently, the central problem is to adjust the vector

w in order to obtain a good equalization condition, that is,

a condition as close as possible to the ZF (1). If it is possible

to count on a priori knowledge of the channel impulse response, the task becomes purely mathematical. When this is

not the case, it is necessary to determine a suitable optimization criterion.

When information about the transmitted signal is, at

least for some time, at hand, it is possible to make use of the

Wiener criterion, based on the following mean square error

(MSE) cost function:

JW = E s(n d) y(n)

2

(3)

delay is known a priori, JW has a single minimum, named

the Wiener solution. As a rule, each Wiener solution possesses

a distinct MSE. This accounts for an important assertion: if

the equalization delay is a free parameter of (3), then JW has

several minima (multiple local optima). Among these many

optima, there is, usually, a single optimal Wiener solution, associated with an optimal delay.

As can be deduced from the comparison between (1) and

(3), the Wiener criterion is strongly related to the ZF condition. Hence, the determination of the optimal Wiener solution is very important and has a great practical appeal. However, there are two main diculties: the use of samples of the

transmitted signal and the choice of d.

The drawback associated with the dependence on a pilot

signal was the main motivation behind the proposal of blind

techniques, that is, criteria which do not make use of samples

of s(n). Among these, the CM criterion has received special

attention in the last twenty years. Its cost function is given by

JCM = E

where

2

2

R2 y(n)

4

E s(n)

(4)

R2 =

2

.

E s(n)

(5)

except in some trivial cases. Recent works [3, 4] have pointed

in the direction of an intimate relationship between these

742

the core of the CM part of the framework proposed here.

2.1. Relationship between CM minima

and Wiener optima

The rationale of this work is to find an optimal method for

the design of blind equalizers. Since the notion of optimality

can be related to the concept of supervised adaptive filtering,

it is important to discuss the relationship between Wiener

and CM minima. This discussion relies on the following assumptions.

Assume that the best Wiener solutions are close to the

best CM minima so that each minimum of the former class

can be achieved from a minimum of the latter class through

a simple steepest descent algorithm (that will be further described). Therefore, to find the CM global optimum is equivalent to determining the optimal Wiener solution. We will always assume that there is at least one good Wiener solution,

that is, one that provides perfect recovery in the absence of

noise. Such assumption is not reasonable only in a few particular cases (e.g., when there is a channel zero at 1).

The main result of this claim is that it becomes feasible

to determine the best possible equalizer without supervision,

that is, by using a blind search strategy.

The key to accomplish such a demanding task on the CM

cost function is to use strategies capable of performing not

only global search but also multimodal search, such as EAs

with niching and the immunologically inspired technique to

be discussed in the next section.

Therefore, it is important to choose a method capable of

providing a good balance between exploration and exploitation of the search space. This balance allows for the algorithm

to exploit specific portions of the search space without compromising its global search potentialities. These features were

found in EAs with niching and a technique inspired by some

theories of how the human immune system works.

The last step of the framework is to refine the CM solution through the decision-directed (DD) algorithm in order

to compensate for the inherent dierence between this and

the Wiener solution. The iterative expression of the DD algorithm is

(6)

In a previous work, the same task has been performed

using a genetic algorithm with niching, and good results were

reported [6], which will serve as a basis for comparison in

Section 4.

3.

AND AN IMMUNE NETWORK MODEL

Together with many other bodily systems, such as the nervous and the endocrine systems, the immune system plays a

major role in maintaining life. Its primary functions are to

defend the body against foreign invaders (e.g., viruses, bacteria, funguses, etc.) and to eliminate the malfunctioning self

cells and debris.

system gave rise to immunology, a science with approximately 200 years of age. More recently, however, computer

scientists and engineers have found several interesting theories concerning the immune system and its functioning that

could be very helpful in the development of artificial systems

and computational tools capable of solving complex problems. The new field of research that emerged from this interdisciplinary research on immunology, computer science,

engineering, and others, is named artificial immune systems

[7].

3.1.

network theories

system works, two were explored in the development of the

algorithm used in this paper: (1) the clonal selection theory

[8] and (2) the immune network theory [9].

According to the clonal selection theory, when a diseasecausing agent, named pathogen, enters the organism, a number of immune cells capable of recognizing this pathogen are

stimulated and start replicating themselves. The number of

copies each cell generates is directly proportional to the quality of the recognition of the pathogen, that is, the better a cell

recognizes a pathogen, the more copies of itself will be generated. During this self-replicating process, a mutation event

with high rates also occurs such that the progenies of a single

cell are slight variations of the parent cell. This mutational

process of the immune cells has the remarkable feature of

being inversely proportional to the quality of the pathogenic

recognition; the higher the quality of the recognition, the

smaller the mutation rate, and vice versa.

The clonal selection theory, briefly described above, is

broadly used to explain how the immune system defends the

body against pathogens. With a revolutionary view of the immune system, Jerne [9] proposed a novel theory to explain,

among many other things, how the immune system reacts

against itself. Jerne suggested that the immune cells are naturally capable of recognizing each other, and the immune system thus presents a dynamic behavior even in the absence

of pathogens. When an immune cell recognizes another immune cell, it is stimulated and the recognized cell is suppressed. In the original network theory, the results of stimulation and suppression were not clearly defined. Therefore,

dierent immune network models present distinct ways of

accounting for network stimulation and suppression.

The discussion to be presented in Section 3.2 is restricted

to the specific artificial immune network model used in this

work, which combines clonal selection with the immune network theory.

3.2.

multimodal search

In [10], de Castro and Von Zuben proposed an artificial immune network model, named aiNet, inspired by the clonal

selection and network theories of the immune system. This

algorithm is demonstrated to be suitable to perform data

compression and clustering with the aid of some statistical

and graph theoretical strategies.

The aiNet adaptation procedure was further improved

in [5], and transformed into an algorithm to perform

multimodal search, named opt-aiNet. Several features of optaiNet can be highlighted. (1) It is a population-based search

technique, in which each individual of the population is a

real-valued vector represented according to the problem domain. (2) The size of the population, that is, the number of

individuals in the population, is dynamically adjusted. (3) It

is capable of locating multiple optima by making a balance

between exploitation (through a local search technique based

on clonal selection and expansion) and exploration (through

a dynamic diversity maintenance mechanism).

In a simplified form, the opt-aiNet algorithm can be

summarized with the procedure below.

(1) Initialization. Randomly initialize a population with a

small number of individuals.

(2) While stopping criterion is not met, do the following.

(2.1) Fitness evaluation. Determine the fitness (goodness or quality) of each individual of the population and normalize the vector of fitness.

(2.2) Replication. Generate a number of copies

(osprings) of each individual.

(2.3) Mutation. Mutate each of these copies inversely

proportionally to the fitness of its parent cell, but

keep the parent cell. The mutation

c = c + N(0, 1),

=

(2.4)

(2.5)

(2.6)

(2.7)

1

exp( f )

(7)

N(0, 1) is a Gaussian random variable of zero

mean and standard deviation = 1, is a parameter that controls the decay of an inverse exponential function, and f is the fitness of an individual normalized in the interval [0, 1]. A mutation is only accepted if the mutated individual

c is within its range of domain.

Fitness evaluation. Determine the fitness of all

new (mutated) individuals of the population.

Selection. For each clonegroup formed by the

parent individual and its mutated ospring

select the individual with highest fitness and calculate the average fitness of the selected population.

Local convergence. If the average fitness of the

population is not significantly dierent from the

one at the previous iteration, then continue, else

return to step (2.1).

Network interactions. Determine the anity (degree of similarity measured via the Euclidean

distance) of all individuals of the population.

Suppress (eliminate) all but the highest fitness

of those individuals whose anities are less than

a suppression threshold s and determine the

743

number of network individuals, named memory

cells, after suppression.

(2.8) Diversity introduction. Introduce a percentage

d% of randomly generated individuals and return to step (2).

(3) EndWhile.

The original stopping criterion proposed for the algorithm is based on the number of memory cells. After the

network interactions (step (2.7)), a certain number of individuals remain. If this number does not vary from one iteration to the other, then the network is said to have a stable

population size. In such condition, the remaining individuals

are all memory cells corresponding to local optima solutions.

However, in accordance with the classical modus operandi in

adaptive equalization, a maximum number of iterations was

adopted as the stopping criterion.

For a more computational description of the immune algorithm presented, the reader is invited to visit the

website http://www.cs.ukc.ac.uk/people/sta/jt6/aisbook/aisimplementations.htm, from where the original Matlab code

for the opt-aiNet and many other immune algorithms can

be downloaded.

3.3.

simply explained. In steps (2.1) to (2.5), a local search is

being performed based on the clonal selection theory. At

each iteration, a population of individuals is locally optimized through reproduction, anity proportional mutation,

and selection (exploitation of the search space). The fact

that no parent individual has a selective advantage over the

others contributes to the multimodal search of the algorithm.

Steps (2.6) to (2.8) check for the convergence of the local

search procedure, eliminate redundant individuals, and introduce diversity in the population. When the initial population reaches a stable state (determined by the stabilization of

its average fitness), the cells interact with each other in a network form, that is, the Euclidean distance between each pair

of individuals is determined, and some of the similar cells

are eliminated to avoid redundancy. In addition, a number

of randomly generated individuals are added to the current

population, allowing for a broader exploration of the search

space, and the process of local optimization restarts in step

(2.1).

To illustrate the behavior of the opt-aiNet algorithm, assume the simple bidimensional function

f (x, y) = x sin(4x) y sin(4 y + ) + 1

(8)

to be maximized.

Figure 2a depicts f (x, y) and an initial population of 13

individuals after the local search part of the algorithm was

completed for the first time (steps (2.1) to (2.6)). Note that

all the remaining 13 individuals are positioned in peaks of the

function. Figure 2b depicts the function to be optimized after

the convergence of the algorithm. In this case, nearly all peaks

744

f (x, y)

2.5

2

1.5

1

0.5

0

0.5

1

1

0.5

y

0

0.5

0.5

0.5

(a)

f (x, y)

2.5

2

1.5

1

0.5

0

0.5

1

1

0.5

1

y

0

0.5

0.5

0.5

0

x

(b)

applied to the function described in (8).

optima and all local optima of very low values in comparison

with the highest peaks.

3.4. Opt-aiNet and other search techniques

The algorithm described in this paper is most often characterized as an immune algorithm since it is inspired in the

immune system. Nevertheless, the similarities between some

immune and EAs are striking and deserve remarks.

EAs can be defined as search and optimization strategies

with their origins and inspiration in the biological processes

of evolution [11]. For an algorithm to be characterized as

evolutionary, it has to present a population of individuals

that are subjected to reproduction, genetic variation, and selection. Therefore, most EAs are comprised of the following

main steps: (1) reproduction with inheritance, (2) selection,

and (3) genetic variation [12].

If one looks into the clonal selection theory of the immune system, briefly reviewed in Section 3.1 and used as a

of an EA (reproduction, selection, and variation) are embodied in the clonal selection procedure. Steps (2.1) to (2.3) of the

opt-aiNet algorithm correspond to the clonal selection principle of the immune system. These can be likened to a genetic algorithm [13] with no crossover and elitist selection,

or to the evolution strategies originally proposed by Schwefel

[14].

However, it is important to remark that a number of differences exist among them, in addition to their sources of

inspiration. For instance, in opt-aiNet, no coding of the individuals is performed, as in the case of genetic algorithms,

the mutation rate of each individual is inversely proportional to fitness (an original approach inspired by some immune mechanisms), and a deterministic and elitist selection

scheme is adopted.

Another remarkable dierence between the opt-aiNet

and any EA is the presence of direct interactions (connections) between the network individuals (cells). In optaiNet, as individuals are connected with each other in a

network-like structure, a dynamic control of the population size can be performed. We are not going much further

into specific dierences between these algorithms, but the

interested reader is invited to refer to [5, 15] for additional

discussions.

Since all the evolutionary steps are embodied in the

adaptive procedure of opt-aiNet, it is possible to consider

EAs to be particular cases of opt-aiNet. Taking into account

an opposite viewpoint, it is possible to claim that the optaiNet algorithm is nothing but a new type of evolutionary approach inspired by the immune system, for it contains the main steps of reproduction, variation, and selection, which an algorithm needs to be characterized as evolutionary. Regardless of which algorithm can be viewed as

a particular case of the other, it is important to note that

both are adaptive systems suitable for exploratory search.

There is a main dierence in performance, however, once

the opt-aiNet is intrinsically suitable for performing multimodal search, while EAs require modifications to tackle such

problems.

Empirical comparisons could also be performed between

the opt-aiNet algorithm and other search procedures, such

as simulated annealing [16] and particle swarm optimization techniques [17]. However, as the nature of the optimal Wiener equalizer problem requires an algorithm capable of eciently locating multiple solutions to the problem,

the performances of these algorithms are supposed not to

be competitive with the ones presented by EAs with niching and the opt-aiNet algorithm. However, empirical investigation must still be undertaken in order to validate this

claim.

4.

SIMULATION RESULTS

In order to evaluate the performance of the opt-aiNet algorithm when applied to search for the optimal Wiener

equalizers, three dierent channels (C1, C2, and C3) were

Table 3: Results of C2 and 7-coecient equalizer.

Parameter

Initial population

Suppression threshold (s )

Number of osprings per cell

(equation (7))

Maximum number of iterations

Number of runs

745

Solution

Wopt

W2

W3

W4

W5

Value

5

0.35

10

50

1000

100

0.0312

48%

100%

0.0458

40%

0.0917

8%

0.0918

2%

0.1022

2%

Table 2: Results of C1 and 8-coecient equalizer.

Solution

Wopt

W2

W3

W4

W5

W6

Solution

Wopt

W2

W3

0.1293

48%

82%

0.1397

22%

17%

0.1445

12%

1%

0.1533

10%

0.1890

4%

0.1951

4%

Residual MSE

0.0071

0.0075

0.0104

Freq. (opt-aiNet)

66%

32%

2%

Solution

Wopt

W1

Residual MSE

0.0071

0.0075

Freq. (opt-aiNet)

84%

16%

HC2 = 1 + 1.2z1 0.3z2 + 0.8z3 ,

(9)

C1 and C2 are nonminimum phase channels and C3 has

maximum phase. The equalizer, as mentioned in Section 2, is

always an FIR filter with L coecients. We estimate the CM

cost function through time averaging and use the mapping

JFIT =

1

1 + JCM

(10)

is to transform minima into maxima.

We used the immune network model, as discussed in

Section 3.2, to obtain the CM global minimum for these channels. The best individual was refined by the aforementioned

DD algorithm (6) and compared with the Wiener solutions.

This procedure allows a direct verification of the potentialities of the proposed method. The results are presented in

terms of convergence rates to dierent minima, which favors

a straightforward performance analysis.

The default values for the parameters used to run the optaiNet algorithm are presented in Table 1.

The first test was performed with channel C1 and an 8coecient equalizer. The results are summarized in Table 2,

together with the equivalent outcome produced by the GA

benchmark [6]. In all tables, Wopt , W2 , W3 , and so forth

stand for the various Wiener minima (ranked according to

their MSE).

The results demonstrate that the immune network was

able to find the global optimum in most cases, thus surpassing the GA by a great margin. It is also relevant to observe

that when global convergence did not occur, the rule was to

The second test was carried out with channel C2 and a

7-coecient equalizer. The results are presented in Table 3,

together with the GA performance.

In this case, the results are even more impressive; the immune network was capable of determining the best minimum in all runs. Again, the proposal led to results far superior to those achieved by the GA.

Finally, channel C3 and a 12-coecient equalizer were

considered. We chose this equalizer length to increase the size

of the search space, thus increasing the problem diculty.

There is no available benchmark in this case. Table 4 presents

the results for the opt-aiNet algorithm.

The global convergence rate is lower than that of the previous test cases. However, simulation performances such as

the one illustrated in Section 3, and previous experience with

machine-learning techniques, encouraged us to try to improve the performance of the algorithm by varying some of

its adaptation parameters. Based upon the discussion presented in [5, 10], concerning the importance of each parameter, beta was changed to = 100. This choice would lead to

a more precise local search, that is, capability of dealing with

the MSE similarity between Wopt and W2 . Table 5 depicts the

results.

By simply fine-tuning the local search of opt-aiNet, a

greater improvement in its performance could be observed.

The method once more proved itself capable of achieving optimal performance in the vast majority of trials.

The results presented so far are good indicators of the

opt-aiNet potentiality to locate the global optima solutions.

However, it is known that this algorithm is capable of determining most local optima solutions of a given problem, as

illustrated in Figure 2b. To study how the multimodal search

of opt-aiNet works on problems C1 to C3, assume, without

746

W3

were much favorable to the opt-aiNet algorithm, which can

also be understood as an evolutionary search technique inspired by the immune system.

These investigations support the establishment of CMbased evolutionary search as a strong paradigm for optimal

blind equalization.

A natural extension of this work is the testing of the

opt-aiNet algorithm with its automatic stopping criterion so

that the amount of user-defined parameters of the algorithm

[10, 15] could be reduced. Further studies also involve the

use of the opt-aiNet in the context of nonlinear equalization,

prediction, and identification.

W3

ACKNOWLEDGMENTS

W4

support. Ricardo Suyama, Leandro Nunes de Castro (Profix.

540396/01-0), and Fernando Von Zuben (300910/96-7)

would like to thank CNPq for the financial support.

[0.1740

0.4516

[0.1805

0.4559

[0.1113

0.2195

[0.1041

0.2160

[0.0094

0.2261

[0.0045

0.2389

[0.0025

0.0773

[0.0019

0.0747

[0.0795

0.1515

[0.0824

0.1458

[0.1197

0.1494

[0.1355

0.1485

[0.1768

0.1024

0.1297

0.0852

0.1137

0.1007

0.1303

0.0967

0.1021

0.0949

0.0377

0.0297

0.0414

0.2687

0.0394

0.0205

0.0475

0.2449

0.2060

0.1512

0.4798

0.1023

0.2133

0.1477

0.4794

0.1108

0.0025

0.2134

0.2414

0.4720

0.0036

0.2117

0.2416

0.4692

0.0877

0.1181

0.1279

0.1087

0.0959

0.1175

0.1282

0.1022

0.1265

0.3463

0.1260

0.1174

0.1127

0.3420

0.1198

0.1248

0.3269

0.1402

0.1249

0.0497

Close to

0.1882

0.0860]

0.1862

0.0781]

0.2622

0.5315]

0.2460

0.5218]

0.0991

0.0547]

0.1002

0.0527]

0.1518

0.0815]

0.1416

0.0835]

0.3541

0.0761]

0.3742

0.0508]

0.1777

0.0538]

0.1769

0.0601]

0.0967

0.0252]

Wopt

Wopt

W2

W2

W4

W6

W6

W5

W5

W7

column of Table 6 presents some of the individuals of a typical run of opt-aiNet when applied to C1. In the vicinity

of each individual, we find an associated Wiener solution,

presented in column 2 of Table 6 (eventual sign discrepancies are inevitable in blind equalization). A close inspection

reveals seven dierent Wiener optima, including the global

minimum. This property of diversity maintenance confirms

the capability of multimodal exploration, inherent to the immune network approach.

5.

This work started claiming that there is a strong relationship between the CM global optima and some of the Wiener

solutions so that such solutions can be attained by refining the CM minima using a simple DD technique. On the

other hand, the CM global optimum can be easily reached by

means of a blind search procedure, such as an EA. Therefore,

the combination of the CM criterion with an ecient global

search procedure gives rise to a framework to design optimal

Wiener filters. This is the core of our proposal.

Our approach uses an immune-based algorithm, named

opt-aiNet, to optimize the parameters of the equalizer, and

benchmarks its performance against those obtained by using a genetic algorithm with niching. Dierent channels and

REFERENCES

[1] D. N. Godard, Self-recovering equalization and carrier tracking in two-dimensional data communication systems, IEEE

Trans. Communications, vol. 28, no. 11, pp. 18671875, 1980.

[2] S. Haykin, Adaptive Filter Theory, Prentice Hall, Upper Saddle

River, NJ, USA, 3rd edition, 1996.

[3] C. R. Johnson, P. Schniter, T. J. Endres, J. D. Behm, D. R.

Brown, and R. A. Casas, Blind equalization using the constant modulus criterion: a review, Proceedings of the IEEE,

vol. 86, no. 10, pp. 19271950, 1998.

[4] H. Zeng, L. Tong, and C. R. Johnson, An analysis of constant

modulus receivers, IEEE Trans. Signal Processing, vol. 47, no.

11, pp. 29902999, 1999.

[5] L. N. de Castro and J. Timmis, An artificial immune network for multimodal function optimization, in Proc. IEEE

Congress of Evolutionary Computation (CEC 02), vol. 1, pp.

699704, Honolulu, Hawaii, USA, May 2002.

[6] A. M. Costa, R. R. F. Attux, and J. M. T. Romano, A new

method for blind channel identification with genetic algorithms, in Proc. IEEE International Telecommunications Symposium, Natal, Brazil, September 2002.

[7] L. N. de Castro and J. Timmis, Artificial Immune Systems:

a New Computational Intelligence Approach, Springer-Verlag,

London, UK, 2002.

[8] F. M. Burnet, The Clonal Selection Theory of Acquired Immunity, Cambridge University Press, Cambridge, UK, 1959.

[9] N. K. Jerne, Towards a network theory of the immune system, Ann. Immunol. (Inst. Pasteur), vol. 125C, pp. 373389,

1974.

[10] L. N. de Castro and F. J. Von Zuben, Data Mining: A Heuristic Approach, Chapter XII aiNet: an artificial immune network

for data analysis, pp. 231259, Idea Group Publishing, Hershey, Pa, USA, 2001.

[11] T. Back, D. B. Fogel, and Z. Michalewicz, Evolutionary Computation 1: Basic Algorithms and Operators, Institute of Physics

Publishing (IOP), Bristol, UK, 2000.

[12] W. Atmar, Notes on the simulation of evolution, IEEE

Transactions on Neural Networks, vol. 5, no. 1, pp. 130148,

1994.

[13] J. H. Holland, Adaptation in Natural and Artificial Systems,

MIT Press, Cambridge, Mass, USA, 2nd edition, 1992.

[14] H.-P. Schwefel, Kybernetische evolution als strategie der experimentellen forschung in der stromungstechnik, Diploma

Thesis, Technical University of Berlin, Berlin, Germany,

March 1965.

[15] L. N. de Castro and F. J. Von Zuben, Learning and optimization using the clonal selection principle, IEEE Transaction on

Evolutionary Computation, vol. 6, no. 3, pp. 239251, 2002.

[16] S. Kirkpatrick, C. D. Gelatt Jr., and M. P. Vecchi, Optimization by simulated annealing, Science, vol. 220, no. 4598, pp.

671680, 1983.

[17] J. Kennedy, R. Eberhart, and Y. Shi, Swarm Intelligence, Morgan Kaufmann Publishers, San Francisco, Calif, USA, 2001.

Romis Ribeiro de Faissol Attux was born in

Goiania, Brazil, in 1978. He received the B.S.

and M.S. degrees, both in electrical engineering, from the State University of Campinas (Unicamp), Campinas, Brazil, in 1999

and 2001, respectively. Currently, he is a

doctorate student at the same institution.

His main research interests are blind equalization and identification, adaptive nonlinear filtering, evolutionary computation, and

dynamical systems.

Murilo Bellezoni Loiola was born in Sao

Paulo, Brazil, in 1979. In 2002, he received his B.S. degree in electrical engineering from the State University of Campinas

(Unicamp), Campinas, Brazil, where he is

currently an M.S. student. His main research interests include turbo equalization,

smart antennas, artificial neural networks,

and evolutionary algorithms.

Ricardo Suyama was born in Sao Paulo,

Brazil, in 1978. He received the B.S. degree

in electrical engineering from the State University of Campinas (Unicamp), Campinas,

Brazil, where he is currently an M.S. student. His research interests include adaptive equalization, adaptive nonlinear filtering, smart antennas, and evolutionary algorithms.

Leandro Nunes de Castro received the B.S.

degree in electrical engineering from the

Federal University of Goias, Goiania, Brazil,

in 1996, and M.S. degree in control engineering and Ph.D. degree in computer engineering from the State University of Campinas, Campinas, Sao Paulo, Brazil, in 1998

and 2001, respectively. He was a Research

Associate with the Computing Laboratory

at the University of Kent, Canterbury, UK

from 2001 to 2002, and is currently a Visiting Lecturer at the School

of Computer and Electrical Engineering at Unicamp. His research

interests include artificial immune systems, artificial neural networks, evolutionary algorithms, and artificial life. Dr. de Castro is

a member of the IEEE, and the SBA (Brazilian Society of Automation). He has been a referee for a number of conferences and journals related to computational intelligence, such as the Soft Computing Journal, IEEE Transactions on Evolutionary Computation,

and IEEE Transactions on Neural Networks.

747

Fernando Jose Von Zuben received his B.S.

degree in electrical engineering in 1991. In

1993, he received his M.S. degree and his

Ph.D. degree in 1996, both in automation,

from the Faculty of Electrical and Computer Engineering at the State University of

Campinas, SP, Brazil. Since 1997 he is an

Assistant Professor in the Department of

Computer Engineering and Industrial Automation, at the State University of Campinas, SP, Brazil. The main topics of his research are artificial neural networks, artificial immune systems, evolutionary algorithms,

nonlinear control systems, and multivariate data analysis. F. Von

Zuben is a member of IEEE, INNS, and AAAI.

Joao Marcos Travassos Romano was born

in Rio de Janeiro in 1960. He received the

B.S. and M.S. degrees in electrical engineering from the State University of Campinas

(Unicamp) in Brazil in 1981 and 1984, respectively. In 1987, he received the Ph.D. degree from University of Paris XI. In 1988, he

joined the Communications Department of

the Faculty of Electrical and Computer Engineering, Unicamp, where he is now a Professor. He served as an Invited Professor in the University Rene

Descartes in Paris during the winter of 1999, and in the Communications and Electronic Laboratory in CNAM, Paris, during the winter of 2002. He is the responsible of the Signal Processing for Communications Laboratory. His research interests concern adaptive

and intelligent signal processing and its applications in telecommunications problems like channel equalization and smart antennas.

Since 1988, he is a recipient of the Research Fellowship of CNPq,

Brazil. Professor Romano is a member of the IEEE Electronics and

Signal Processing Technical Committee and an IEEE senior member. From April 2000, he is the President of the Brazilian Communications Society (SBrT), a Sister Society of ComSoc, IEEE.

c 2003 Hindawi Publishing Corporation

The Task Distribution Plan

Enrique Dunn

Departamento de Electronica y Telecomunicaciones, Division de Fsica Aplicada, Centro de Investigacion Cientfica y

de Educacion Superior de Ensenada, 22860 Ensenada, BC, Mexico

Email: edunn@cicese.mx

Gustavo Olague

Departamento de Ciencias de la Computacion, Division de Fsica Aplicada, Centro de Investigacion Cientfica y

de Educacion Superior de Ensenada, 22860 Ensenada, BC, Mexico

Email: olague@cicese.mx

Received 29 June 2002 and in revised form 29 November 2002

Autonomous sensor planning is a problem of interest to scientists in the fields of computer vision, robotics, and photogrammetry. In automated visual tasks, a sensing planner must make complex and critical decisions involving sensor placement and the

sensing task specification. This paper addresses the problem of specifying sensing tasks for a multiple manipulator workcell given

an optimal sensor placement configuration. The problem is conceptually divided in two dierent phases: activity assignment and

tour planning. To solve such problems, an optimization methodology based on evolutionary computation is developed. Operational limitations originated from the workcell configuration are considered using specialized heuristics as well as a floating-point

representation based on the random keys approach. Experiments and performance results are presented.

Keywords and phrases: sensor planning, evolutionary computing, combinatorial optimization, random keys.

1.

INTRODUCTION

the development of sensing strategies for computer vision

tasks [1]. The goal of such planning is to determine, as autonomously as possible, a group of sensing actions that lead

to the fulfillment of the vision task objectives. This is important because there are environments (i.e., dynamic environments with physical and temporal constraints) and tasks (i.e.,

scene exploration, highly accurate reconstruction) where the

specification of an adequate sensing strategy is not a trivial

endeavor. Moreover, an eective planner must make considerations that require complex spatial and temporal reasoning based on a set of mathematical models dependent of the

vision task goals [2]. Indeed, dicult numerical and combinatorial problems arise, presenting a rich variety of research

opportunities. Our approach is to state such problems in optimization terms and apply evolutionary computation (EC)

methodologies in their solution [3].

The problem of visual inspection of a complex threedimensional object requires the acquisition of multiple object images from dierent viewpoints [4]. Accordingly, to formulate a sensing strategy, an eective planner must consider

how the spatial distribution of viewpoints aects a specific

sensor is, how the sensing actions will be executed. These

are the kind of general considerations that call for the use of

a flexible computing paradigm like EC. This work presents

the ongoing development of the EPOCA [5] sensor planning system, giving special attention to the task distribution

problem that emerges from a multiple manipulator workcell

[6].

The literature provides multiple examples of work dealing with automated sensing planning systems which consider

a manipulator using a camera-in-hand configuration. The

HEAVEN system developed by Sakane et al. [7] is an example in which the camera and light illumination placement

are studied. The MVP system developed by Abrams et al.

[8] considered the viewpoint planning of one manipulator

monitoring the movements of a second robot. The work developed by Triggs and Laugier [9] considers workspace constraints of a robot carrying a camera with the goal of automated inspection. More recently, Whaite and Ferrie [10]

developed an uncertainty based approach for autonomous

exploration using a manipulator robot. The next best view

problem for automated surface acquisition working with

a range scanner has been addressed by Pito [11]. Marchand and Chaumette [12] studied optimal camera motion in

active vision systems for 3D reconstruction and exploration.

Ye and Tsotsos [13] developed a sensor planner system for 3D

object search applied in mobile robotics. However, none of

these systems have studied the problem of assigning and sequencing the best order of movements that a multiple robot

system needs to perform.

This paper is organized as follows. First, the problem

statement is given in Section 2. Then, our approach to the

task distribution problem using EC is presented in Section 3.

In this section, we address the aspects of search space reduction, solution representation, and search heuristics. Experimental results are presented next in order to demonstrate the

validity and usefulness of the solution. Finally, conclusions

and guidelines for future research are provided to end the

paper.

2.

749

PROBLEM STATEMENT

with the use of manipulator robots, see Figure 1. However, the incorporation of such devices makes additional demands on a sensing planner. In this example, each camera is mounted on the robot hand with the goal of measuring the box on the table. Also, additional floating cameras represent a set of desired viewpoints. The sensing plan

must consider not only the constraints and objectives of the

particular visual task but also the operational restrictions

imposed by the workcell. Additionally, in the case where

multiple manipulators are equipped with digital cameras, a

problem of robot coordination needs to be resolved. More

precisely, sensing actions need to be distributed among the

various sensing stations, and an ecient task specification

for the entire workcell should be determined. The EPOCA

network design module can determine an optimal sensing

configuration for multiple cameras converging on a threedimensional object [14]. We use this configuration as input

for our task distribution problem in the proposed multiple

robot workcell. It is assumed that the robots move in straight

lines between dierent viewpoints and that each robot must

start and finish each tour from a predetermined configuration. In this way, the problem of specifying an ecient task

distribution for the manipulator robots consists of the following.

(1) Assigning to each of the robots a set of viewpoints from

which to obtain an image, see Figure 2. In other words,

determining how many and which viewpoints are to be

assigned to each robot.

(2) Deciding on an optimal tour for each of the robots, see

Figure 3. This involves specifying the correct order of

each viewpoint in a robots tour.

In this way, we have two of the most dicult combinatorial problems in computer science, which are the set partition and traveling salesman problems, see Figures 2 and 3

for the graphical interpretation of these problems. Actually,

our task distribution problem consists of a multiple traveling

salesman problem instance. The goal is to specify the optimal

the robots, forming dierent excluding sets.

tour to follow each of the robots.

every viewpoint specified by the EPOCA network configuration module is visited. In order to describe our task distribution problem, the following definitions are given.

750

Definition 1 (Photogrammetric network). A photogrammetric network is represented as an ordered set V of n threedimensional viewpoints. Each individual viewpoint is expressed as V j , where j ranges from j = 1 to n.

Definition 2 (Robot workcell). A multirobot active vision

system is represented by an ordered set R consisting of r

robots in the workcell. Each individual robot is expressed by

Ri , where i ranges from i = 1 to r.

Definition 3 (Operational environment). Each robot has an

operational restricted physical space denoted by Oi , where i

ranges from i = 1 to r.

Accordingly, the problem statement can be expressed as

follows.

Definition 4 (Task distribution problem). Find a set of r ordered subsets Xi V, where V = {ri=1 Xi | V j Xi , V j

Oi } such that the total length traveled by the robots is minimized.

From the above definitions, the activity assignment problem relates each of the n elements of V with one of the

r possible elements of R. Considering that each robot Ri

has assigned ni viewpoints, a problem of sequencing the

viewpoints emerges, which we call tour planning. Our goal is

to find the best combination of activity assignment and tour

planning in order to optimize the overall operational cost

of the task distribution. This total operational cost is produced by adding individual tour costs, Qi , defined by the Euclidean distance that each robot needs to travel in straight

lines among the dierent

viewpoints. Hence, the criterion

is represented as QT = ri=1 Qi . Such a problem statement

yields a combinatorial problem which is computationally

NP-hard and requires the use of special heuristics in order

to avoid an exhaustive search.

3.

problem with a large search space. An optimization method

based on genetic algorithms is proposed. To obtain a quality

solution, three key aspects need to be addressed: search space

reduction, solution representation, and search heuristics. The

following sections present our approach to these key aspects

in order to develop a global optimization method to solve the

task distribution problem.

3.1. Search space reduction

Combinatorial problems generally have to satisfy a given set

of competing restrictions. In our task distribution problem,

some of these restrictions are straightforward; that is, each

viewpoint should be assigned to only one robot, each viewpoint should be visited only once inside a robot tour. On

the other hand, implicit restrictions, like the accessibility of a

robot to a particular viewpoint, need to be determined. Consideration of such restrictions can help reduce the size of the

search space. This is relevant because in practice a manip-

Figure 4: Operational restrictions. The workcell configuration imposes accessibility restrictions. Hence, when a robot reach is limited,

it is possible to reduce the search space for the activity assignment

phase.

Table 1: Structure ACCESSIBILITY containing the number and the

list of robots capable of reaching a particular viewpoint.

Viewpoint ID

Number of robots

V1

..

.

Vn

r1

..

.

rn

RobID1 , . . . , RobIDr1

..

.

RobID1 , . . . , RobIDrn

which such restrictions are computed is presented next.

Assuming a static and obstacle-free environment, it is

reasonable to compute the robots accessibility for a given position and orientation by means of solving the robot inverse

kinematic problem. In this work, we consider the PUMA560

manipulator which consists of six degrees of freedom. A

three-dimensional computer graphics simulation environment was developed in order to visualize such accessibility

restrictions. Multiple manipulators were considered in our

computer simulation. The inverse kinematic problem was

solved for every robot at each viewpoint. The cases where a

robot could access a viewpoint were stored in an auxiliary

data structure called ACCESSIBILITY. This structure contains an entry for every viewpoint V j in order to keep a

record of how many and which robots are capable of reaching that particular viewpoint, see Table 1. Such values remain

constant throughout the course of task execution, therefore,

they only need to be computed once. The above method evaluates the restrictions imposed by the physical arrangement

of the workcell, as well as the robot revolute joint limitations.

Such operational restrictions are incorporated implicitly as

an intrinsic element of our optimization method.

3.2.

Solution representation

In this representation, each viewpoint V j is assigned a random value S j in the range (0, 1), allowing for the implementation of very straightforward genetic operators. These

S2

S1

S = 0.41

S3

0.51 0.15

Sn

0.84

Table 2: Structure TASKS containing the list of viewpoints comprising each robot tour Ti .

0.18

random floating-point value Si in the range (0, 1). These values are

stored in a string S.

Since there are n dierent viewpoints, S will consist of n elements, see Figure 5. Random keys use a heuristic we call the

smallest-value-first heuristic. In our case, the viewpoint with

the smallest corresponding value in S would be the first viewpoint in a given permutation P. The viewpoint with the second smallest value in S would be the second viewpoint in P,

and so forth. In this way, the order of a viewpoint V j inside

a given permutation P depends on the magnitude of its corresponding value S j with respect to all the other values in S.

To illustrate, given five viewpoints, a possible representation

string can be

S = [0.89, 0.76, 0.54, 0.23, 0.62].

(1)

The smallest value in S is found at the fourth position, denoted by S4 . Therefore, V4 is the first viewpoint in the resulting permutation P. The second smallest value is found in the

third position S3 , making V3 the second viewpoint in P, and

so on. The resulting permutation of the five viewpoints is

P = V4 , V3 , V5 , V2 , V1 .

751

(2)

task distribution problem. The smallest-value-first heuristic avoids the generation of unfeasible solutions common to

permutation-based representations. Random keys representation also allows our optimization method to apply genetic

operators without the need for additional heuristics.

The convention of encoding a possible solution into a

string representation has been specified. The question of how

to describe the corresponding solution to such a representation is now considered. Recalling the problem statement,

initially, there is a set of n viewpoints V j , and each must be

assigned to one of the r possible robots. Using random keys

representation, a possible solution is codified into a string S

of n values. As stated in Section 2, we want to optimize the

total operational cost QT . However, the solution representation S needs to be decoded into an explicit description of the

task distribution problem. Such a description would represent each of the r robot tours. To accomplish this, an auxiliary data structure called TASKS is proposed to represent

the global task distribution among robots, see Table 2. This

structure has an entry Ti for each robot Ri , which describes

that robot tour; that is, Ti lists the sequence of viewpoints

assigned to that particular robot. Each of these Ti tours is

evaluated to obtain an individual tour cost Qi , from which

the total operational cost QT is obtained. The question before us now is how to convert a string representation into a

corresponding task distribution description. The following

R1

..

.

Rr

v1

..

.

vr

T1 = [ViewID1 , . . . , ViewIDv1 ]

..

.

Tr = [ViewID1 , . . . , ViewIDvr ]

subsection presents the heuristics used by our method to obtain such task distribution description.

3.3.

Search heuristics

A solution representation S needs to be evaluated. Such evaluation is applied to the task distribution description contained in TASKS. Hence, a mapping M : S TASKS is

necessary. The mapping M assigns and sequences the viewpoints among the dierent robots and stores the results in

the structure TASKS. The mapping M makes use of the solution representation data structures S and TASKS, as well as

the precomputed operational restrictions stored in ACCESSIBILITY. The two distinct phases of activity assignment and

tour planning are presented separately.

3.3.1

Activity assignment

The activity assignment problem allocates each of the viewpoints V j to one of the possible robots. The goal is to provide

an initial unsequenced set of individual robot tours Ti using

the following steps.

Step 1. Obtain the r j number of robots capable of reaching

that particular viewpoint by consulting the ACCESSIBILITY structure, see Table 1.

Step 2. Divide the interval (0, 1) into r j equally distributed

segments in order to determine the size of a comparison segment Seg = 1/r j .

Step 3. Calculate in which k segment the random value S j resides, that is, k = Int(S j / Seg) + 1.

Step 4. Assign the viewpoint V j to the kth robot in the corresponding entry in the ACCESSIBILITY structure. In

this way, the assigned robot index i is given by RobIDk ,

which is found on the entry that corresponds to V j inside the ACCESSIBILITY table.

Step 5. Append V j to the list of viewpoints, Ti assigned to

the ith robot. The tour description Ti is stored in the

TASKS structure.

A graphical description of these heuristic steps is shown

in Figure 6. The series of actions performed in the activity

assignment phase are based on the compliance with operational restrictions, and in doing so, assure that any codified

string S brings a valid solution to the assignment problem.

Based on such strategy, each possible codification string S has

only one possible interpretation. After executing this series of

steps, each viewpoint is assigned to a robot. The viewpoints

assigned to a single robot Ri are grouped into a set Ti . Each

752

S=

S1

S2

S3

0.41

0.23

0.15

0.79

0.42

0.96

0.64

S2

S1

Sn

S3

0.18

0.79

Sn

0.42 0.96

0.64

0.18

S1 = 0.41

ACCESSIBILITY

1/3

Viewpoint

Number of

robots

List of

robots

V1

R1 , R3 , R4

Vn

rn

RobID1 , . . . , RobIDrn

2/3

1/3

2/3

Mapping

S1 = 0.24

4

k=2

2

0

1/3

2/3

values contained in S is adjusted before applying the smallest-valuefirst heuristic to the values stored in TASKS.

1 through 4, corresponding to the assignment phase.

TASKS

robot and these tours are stored in the structure TASKS. Until this point, the order of each viewpoint inside a given tour

has not been specified. This is the problem we approach next.

Robot ID

No. of views

List of viewpoints

R1

.

.

.

3

.

.

.

T1 = [V1 , V3 , V8 ]

.

.

.

Rm

rm

Tm = [ViewID1 , . . . , ViewIDrm ]

The tour planning problem consists of correctly sequencing

each of the r robot tours Ti stored in the structure TASKS.

These tours are initially obtained from the activity assignment phase presented above, in which every viewpoint V j is

assigned to one of the r possible robots Ri . The goal of the

tour planning phase is to minimize the total operational cost

QT . This situation is equivalent to solving r dierent traveling salesman problems. The smallest-value-first heuristic can

be applied to sequencing problems such as the one presented

here. Unfortunately, the rules by which the preceding assignments were made in Steps 1 through 4 produce undesirable

tendencies in the representation values S j that correspond to

each tour specification Ti . This is due to the deterministic

heuristic applied for robot assignment. As a consequence, the

values corresponding to the viewpoints contained in Ti will

be, on the average, higher than those corresponding to the

viewpoints in Ti1 and will create a bias inside each Ti when

directly applying the smallest-value-first heuristic. Therefore,

the values inside S need to be adjusted to eliminate such

unwanted properties. This is accomplished by the following

heuristic steps.

Step 6. Recall in which of the k possible segments of the range

(0, 1) lies the S j value used in the assignment phase.

Step 7. Calculate the value Sj in the range (0, 1) that reflects

the relative position of S j inside the kth segment. For

example, consider the value 0.70 which lies inside the

range (0.60, 0.80). This value lies exactly in the middle,

hence its corresponding value in the range (0, 1) is 0.5.

A graphic description of this heuristic is presented in

Figure 7.

S=

S1

S2

S3

0.24

0.73

0.04

0.34

0.77

S8

0.69

0.27

0.46

T1 is rearranged in the following manner

Robot ID

No. of views

List of viewpoints

R1

T1 = [V3 , V1 , V8 ]

Figure 8: Tour planning. The smallest-value-first heuristic is applied to each robot tour considering the previously adjusted values

in S.

Step 9. Apply the smallest-value-first heuristic to each of the

unordered robot tours Ti using the values stored in S ,

see Figure 8.

These series of steps ensure an unbiased tour sequencing, hence, empowering the search algorithm to more eectively seek out a global optima from a very large and complex

search space.

4.

distribution problem was incorporated into an extension of

the functionality of the EPOCA system developed by Olague

753

problem for complex objects. The problem of task distribution emerges as a result of the photogrammetric network

design performed by EPOCA. The system can be classified

as an EC-based system that addresses the complex goal of

automating the planning of sensing strategies for accurate

three-dimensional reconstruction.

Two dierent experiments are presented next: the first is

a simple scenario intended to illustrate our methods functionality; the second experiment is somewhat more complex

and its goal is to show the eectiveness and flexibility of our

system.

4.1. Experiment A

This experiment consists of eight viewpoints to be distributed among four manipulators. The viewpoints are

stacked into four pairs, each pair arranged beneath one of

the robots initial position, see Figure 9. The optimal task distribution for this example can be obtained using a greedy

heuristic. Hence, such an experiment might seem trivial, but

it will exemplify our methods functionality.

Operational restrictions are computed first, with the

goal of determining which robots can access a particular

viewpoint. As mentioned in Section 3, to compute such restrictions, the inverse kinematic problem is solved for every robot at each viewpoint. The results of such validations

are stored in the structure ACCESSIBILITY. The physical

arrangement of the robots for Experiment A is such that

every camera can be reached by three dierent robots, see

Table 3.

The genetic algorithm works with a population of codified strings, selecting the best individuals for reproduction.

Such reproduction process combines the characteristics of

two selected parent solutions and provides two new ospring

solutions which, in turn, will be part of the next generation

of solutions. This process is repeated in an iterative manner

until a certain number of generations is executed. At the end

of this iterative process, we obtain a set of possible solutions.

One of those individuals, which represented the optimal solution, was given by the following random keys representation:

S = [0.72, 0.71, 0.32, 0.14, 0.81, 0.80, 0.27, 0.07].

(3)

the k segments each element S j resides. For the first viewpoint V1 , there are three possible robots to be assigned, see

Table 3; hence, the comparison segment Seg = 1/3 = 0.33.

In this way, following Steps 1 through 5, the corresponding

representation value S1 = 0.72 is determined to be in the

third segment, which is delimited by (0.66, 1.00). Therefore,

the robot to be assigned is the third robot on V1 s entry on

the structure ACCESSIBILITY, in this case RobID = 3. The

corresponding robot to be assigned to each viewpoint V j is

given by

Robot = R3 , R3 , R1 , R1 , R4 , R4 , R2 , R2 .

(4)

Figure 9: Eight viewpoints are to be distributed among four manipulators. Viewpoints are depicted as individual cameras and solid

lines connected such cameras illustrate each robot tour corresponding to an optimal task distribution.

depicted in Figure 9.

Viewpoint ID

Number of robots

V1

V2

r1 = 3

r2 = 3

R1 R2 R3

R1 R2 R3

V3

V4

V5

r3 = 3

r4 = 3

r5 = 3

R1 R3 R4

R1 R3 R4

R1 R2 R4

V6

V7

r6 = 3

r7 = 3

R1 R2 R4

R2 R3 R4

V8

r8 = 3

R2 R3 R4

viewpoints. The values contained in S will now be adjusted

in accordance with Steps 5 through 9 so that the smallestvalue-first heuristic can be applied to the viewpoints assigned

to each robot. For the first viewpoint, its corresponding value

S1 is adjusted as follows. Recall that S1 = 0.72 resides on the

third segment which is delimited by (0.66, 1.00). The corresponding value of 0.72 on the range (0, 1) with respect to the

third segment just mentioned is given by the value 0.18. Applying these steps to every value in S yields

S = [0.18, 0.15, 0.96, 0.42, 0.45, 0.42, 0.81, 0.21].

(5)

smallest-value-first heuristic rearranges TASKS as shown in

Table 4.

Twenty trials were executed and this global minimum

distribution was reached in every single execution in an average of 15.1 generations.

4.2.

Experiment B

measured by four manipulators. The goal is to distribute the

754

tour planning phase.

Robot ID

Number of viewpoints

T1 = [V4 V3 ]

2

3

2

2

T2 = [V8 V7 ]

T3 = [V2 V1 ]

T4 = [V6 V5 ]

to the configuration shown in Figure 10.

depicted in Figure 10.

manipulators. Viewpoints are depicted as individual cameras.

Figure 11: Best solution found by the genetic algorithm for the configuration shown in Figure 10.

photogrammetric network consisting of 13 cameras in an optimal manner, see Figure 10. Working with this fixed configuration, we executed several tests. First, to test our methods

functionality, we executed the task distribution planner. Several possible solutions are obtained over the course of multiple executions, two of such solutions are depicted in Figures

11 and 12. Notice that the best solution found, represented

in Figure 11, does not incorporate all of the available robots.

Figure 12 shows a more typical solution which is also found

by our system.

In order to test the methods adaptability, two of the four

manipulator robots were disabled. This additional restriction

is reflected only on changes to the values stored in Table 5.

Viewpoint ID

Number of robots

V1

V2

V3

r1 = 2

r2 = 2

r3 = 2

R2 , R4

R2 , R3

R1 , R4

V4

V5

r4 = 2

r5 = 2

R1 , R4

R1 , R4

V6

V7

V8

r6 = 2

r7 = 2

r8 = 2

R2 , R3

R2 , R4

R2 , R3

V9

V10

r9 = 2

r10 = 2

R1 , R3

R1 , R3

V11

V12

V13

r11 = 3

r12 = 3

r13 = 3

R1 , R 2 , R 3

R1 , R 2 , R 4

R1 , R 2 , R 4

The system is expected to distribute tasks among the two remaining robots. Results from such tests are shown in Figures

13 and 14. In these cases the activity assignment problem becomes visually more simple to resolve, but the diculty of

the tour planning problem becomes more evident since each

tour will consist of more viewpoints.

Since our approach is based on EC techniques, the determination of the task distribution plan is the product of

the evolution process over a population of possible solutions.

Therefore, fitness values of each of these individuals, and of

the population in general, reflect the eect of such evolution. In this way, the population fitness values evolve over

the course of several generations until an optimal solution

is found, see Figure 15. The stepwise decrements in the best

fitness line point out the combinatorial aspect of our search,

while the average fitness confirms the positive eect of the

evolution process.

While great detail has been given to the special heuristics

used in our approach, the behavior of the curves presented in

755

1960

1920

1840

1800

Figure 14: An environment similar to Figure 13 showing the systems flexibility to changes in the workcell configuration.

Exhaustive

search

1760

1720

Figure 13: Solution found by the system for the case where a pair

of robots were disabled from the configuration shown in Figure 10.

Gready

search

1880

10

20

30

Execution number

40

50

Figure 16: Genetic algorithm performance over multiple executions. The obtained solutions are always better than a greedy search,

reaching the global optima 14 out of 50 times.

methodology is obtained from the comparison of its solutions against those oered by alternative methodologies. The

proposed methodology is compared to an exhaustive search

and a greedy heuristic. The results for the fixed configuration shown in Figure 10 are presented in Figure 16. As the

figure illustrates, our algorithm consistently outperforms a

greedy heuristic in terms of the quality of the proposed solutions. The advantage obtained with the genetic algorithm approach refers to the computational cost; considering the EC

algorithm requires about 3 seconds against 14 hours for an

exhaustive search. On the other hand, our approach reaches

a global optima 28% of the time over the course of 50 executions, coming within an average of 2.9% to global optima. As

these results reflect, there is an obvious compromise between

solution quality and computational eciency.

5500

5.

5000

The development of an eective sensor planner for automated vision tasks implies the consideration of operational

restrictions as well as the vision tasks objectives. This work

presents a solution for the task distribution problem inherent to multiple robot workcells. The problem is conceptualized as two separate combinatorial problems: activity assignment and tour planning. A genetic algorithm-based strategy

that concurrently solves these problems was presented along

with experimental results. The approach employs auxiliary

data structures in order to incorporate accessibility limitations and to specify a task distribution plan. The evolutionary

nature of the optimization method allows for multiple approximate solutions of the optimization problem to be found

over the course of several executions. Performance considerations support the use of the proposed methodology compared to a greedy heuristic or an exhaustive search.

Future work can consider the robot motion planning

problem presented when there are obstacles in the environment or when the manipulator can collide with each other.

Also, the representation scheme can be modified to use two

values instead of adjusting the original representation string

by heuristic means. Furthermore, the genetic operators can

4500

4000

Worse

fitness

3500

3000

Average

fitness

2500

2000

Best

fitness

1500

1000

20

40

60

Generation

80

100

120

algorithm operational parameters. A single point crossover

operator, subject to a probability Pc = 0.95, was utilized.

Furthermore, the mutation operator consisting of an additive value obeying a normal distribution N(0, 0.2) for each

of the elements in the representation string was also applied

according to a probability Pm = 0.001.

756

be modified in search of improving the evolutionary algorithm performance. Also, a rigorous analysis of the properties of the heuristics used is needed. At present, we are working toward a real implementation of our algorithms for intelligent sensor planning.

ACKNOWLEDGMENTS

This research was founded by Contract 35267-A from

CONACyT and under the LAFMI Project. The first author

was supported by scholarship 142987 from CONACyT. Figures 1, 2, 3, 4, 9, 10, 11, 12, 13, and 14 were generated with

software written at the Geometry Center. The authors thank

the anonymous reviewers for their suggestions which greatly

helped improve this paper.

REFERENCES

[1] K. A. Tarabanis, P. K. Allen, and R. Y. Tsai, A survey of sensor

planning in computer vision, IEEE Transactions on Robotics

and Automation, vol. 11, no. 1, pp. 86104, 1995.

[2] J. Miura and K. Ikeuchi, Task-oriented generation of visual

sensing strategies in assembly tasks, IEEE Trans. on Pattern

Analysis and Machine Intelligence, vol. 20, no. 2, pp. 126138,

1998.

[3] G. Olague and R. Mohr, Optimal camera placement for accurate reconstruction, Pattern Recognition, vol. 35, no. 4, pp.

927944, 2002.

[4] T. S. Newman and A. K. Jain, A survey of automated visual

inspection, Computer Vision and Image Understanding, vol.

61, no. 2, pp. 231262, 1995.

[5] G. Olague, Planification du placement de cameras pour des

mesures 3D de precision, Ph.D. thesis, Institut National Polytechnique de Grenoble, France, October 1998.

[6] G. Olague and E. Dunn, Multiple robot task distribution:

Towards an autonomous photogrammetric system, in Proc.

IEEE Systems, Man and Cybernetics Conference, vol. 5, pp.

32353240, Tucson, Ariz, USA, October 2001.

[7] S. Sakane, R. Niepold, T. Sato, and Y. Shirai, Illumination

setup planning for a hand-eye system based on an environmental model, Advanced Robotics, vol. 6, no. 4, pp. 461482,

1992.

[8] S. Abrams, P. K. Allen, and K. A. Tarabanis, Dynamic sensor

planning, in Proc. IEEE International Conf. on Robotics and

Automation, Atlanta, Ga, USA, May 1993.

[9] B. Triggs and C. Laugier, Automatic task planning for robot

vision, in Proc. Int. Symp. Robotics Research, Munich, October

1995.

[10] P. Whaite and F. P. Ferrie, Autonomous exploration: Driven

by uncertainty, IEEE Trans. on Pattern Analysis and Machine

Intelligence, vol. 19, no. 3, pp. 193205, 1997.

[11] R. Pito, A solution to the next best view problem for automated surface acquisition, IEEE Trans. on Pattern Analysis

and Machine Intelligence, vol. 21, no. 10, pp. 10161030, 1999.

[12] E. Marchand and F. Chaumette, Active vision for complete

scene reconstruction and exploration, IEEE Trans. on Pattern

Analysis and Machine Intelligence, vol. 21, no. 1, pp. 6572,

1999.

[13] Y. Ye and J. K. Tsotsos, Sensor planning for 3D object search,

Computer Vision and Image Understanding, vol. 73, no. 2, pp.

145168, 1999.

[14] G. Olague, Automated photogrammetric network design using genetic algorithms, Photogrammetric Engineering & Remote Sensing, vol. 68, no. 5, pp. 423431, 2002, Paper awarded

the 2003 First Honorable Mention for the Talbert Abrams

Award, by ASPRS.

[15] J. C. Bean, Genetic algorithms and random keys for sequencing and optimization, ORSA Journal on Computing, vol. 6,

no. 2, pp. 154160, 1994.

Enrique Dunn received a computer engineering degree from Universidad Au

tonoma

de Baja California, in 1999. He obtained the M.S. degree in computer science

from CICESE, Mexico, in 2001. Currently,

Dunn is working towards the Ph.D. degree

at the Electronics and Telecommunications

Department, Applied Physics Division, CICESE, Mexico. His research interests include robotics, combinatorial optimization,

evolutionary computation, close range photogrammetry, and 3D

simulation. He is a student member of the ASPRS.

Gustavo Olague holds a Bachelors degree

(Honors) in Electronics Engineering and a

Masters degree in computer science from

de Chihuahua,

Mexico, in 1992 and 1995, respectively. He

received the Diplome de Doctorat en Imagerie, Vision et Robotique (Ph.D.) from

Institut National Polytechnique de Grenoble, France, in 1998. From 1999 to 2001, he

was an Associate Professor of computer science and in 2002, he was promoted to Professor of the Applied

Physics Division at CICESE, Mexico. Dr. Olague is a member of

the ASPRS, ISGEC, IEEE, IEEE Computer Society, IEEE Robotics

and Automation, IEEE SMC and RSPSoc. Dr. Olague has served on

numerous Technical Committees and has been invited to lecture at

universities in France, Spain, and Colombia. He has served as Chair

and Cochair at numerous international conferences like the ASPRS

2001 and 2003 during the Close-Range Photogrammetry session

and the IEEE SMC 2001 Robotics session. He also had visiting appointments at the Technische Universitat Clausthal, Germany and

the LAAS, France. His research interests include robotics, computer

vision, and, in particular, the coupling of evolutionary computation in those two research domains (autonomous systems and visual perception). Dr. Olague is recipient of the 2003 First Honorable Mention for the Talbert Abrams Award.

c 2003 Hindawi Publishing Corporation

Estimation and Order Detection

Chen Fangjiong

Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong

Department of Electronic Engineering, South China University of Technology, Wushan, Guangzhou 510641, China

Email: eefjchen@scut.edu.cn

Sam Kwong

Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong

Email: cssamk@cityu.edu.hk

Wei Gang

Department of Electronic Engineering, South China University of Technology, Wushan, Guangzhou 510641, China

Email: ecgwei@scut.edu.cn

Received 30 May 2001 and in revised form 28 January 2003

A joint blind order-detection and parameter-estimation algorithm for a single-input multiple-output (SIMO) channel is presented. Based on the subspace decomposition of the channel output, an objective function including channel order and channel

parameters is proposed. The problem is resolved by using a specifically designed genetic algorithm (GA). In the proposed GA,

we encode both the channel order and parameters into a single chromosome, so they can be estimated simultaneously. Novel GA

operators and convergence criteria are used to guarantee correct and high convergence speed. Simulation results show that the

proposed GA achieves satisfactory convergence speed and performance.

Keywords and phrases: genetic algorithms, SIMO, blind signal identification.

1.

INTRODUCTION

Many applications in signal processing encounter the problem of blind multichannel identification. Traditional methods of such identification usually apply higher-order statistics techniques. The major problems of these methods are

slow convergence and many local optima [1]. Since the original work of Tong et al. [1, 2], many lower-order statisticsbased methods have been proposed for blind multichannel

identification (see [3] and references therein). A common

assumption in these methods is that the channel order is

known in advance. However, such information is, in fact,

not available. Thus, we are obliged to estimate the channel

order beforehand. Though many order-detection algorithms

can be applied (e.g., see [4]) to solve this particular problem,

the approaches that separate order detection and parameter

estimation may not be ecient, especially when the channelimpulse response has small head and tail taps [5].

To tackle this drawback, a class of channel-estimation algorithms performing joint order detection and parameter estimation has been proposed [5, 6]. In [5], a cost function in-

the algorithm may not be ecient because the channel order

is estimated by evaluating all the possible candidates from 1

to a predefined ceiling. The method proposed in [6] is also

not a real joint approach since the order was separately estimated by detecting the rank of an overmodelled data matrix.

In fact, this is very similar to the methods that applied a rankdetection procedure to an overmodelled data covariance matrix in [4]. Order estimation via rank detection may not be

ecient because it is sensitive to noise [4] and the calculation

of eigenvalue decomposition is also computationally costly.

In this paper, we propose a real joint order-detection

and channel-estimation method based on genetic algorithm

(GA). The GAs have been widely used in channel-parameter

estimation [7, 8, 9]. However, its application to joint order

detection and parameter estimation has not been well explored. Based on the subspace decomposition of the outputautocorrelation matrix, we first develop a new objective function for estimating channel order and parameters. Then, a

novel GA-based technique is presented to resolve this problem. The key proposition of the proposed GA is that the

758

Consequently, the channel order and parameters can be simultaneously estimated. Simulation results show that the

new GA outperforms existing GAs in convergence speed. We

also compare the performance of the proposed GA with the

closed-form subspace method which assumes that the channel order is known [10]. Simulation results show that the

proposed GA achieves a similar performance.

2.

PROBLEM FORMULATION

We consider a multichannel FIR system with M subchannels. The transmitted discrete signal s(n) is modulated, filtered, and transmitted over these Gaussian subchannels. The

received signals are filtered and down-band converted. The

resulting baseband signal at the mth sensor can be expressed

as follows [1]:

xm (n) =

hm (k)s(n k) + bm (n),

m = 1, . . . , M,

(1)

k=0

where bm (n) denotes the additive Gaussian noise and is assumed to be uncorrelated with the input signal s(n), hm (n) is

the equivalent discrete channel-impulse response associated

with the mth sensor, and L is the largest order of these subchannels (note that the subchannels may have dierent orders). Equation (1) can be represented in vector-matrix formulation as follows:

xm (n) = Hm s(n) + bm (n),

m = 1, . . . , M,

(2)

where H = [HT1 HTM ]T is the M(N +1)(N +L+1) overall system transfer matrix and b(n) = [bT1 (n) bTM (n)]T

is the M(N + 1) 1 additive noise vector.

If we define the output-autocorrelation matrix as Rxx =

E[x(n)x(n)T ], then we have

Rxx = HRss HT + Rbb ,

autocorrelation matrix of s(n) and Rbb = E[b(n)b(n)T ] is

the MN MN autocorrelation matrix of b(n). In the following, we will present an objective function based on the subspace decomposition of Rxx . To exploit the subspace properties, the following assumptions must be made [10]: the

parameter matrix H has full column rank, which implies

M(N + 1) (N + L + 1) and the subchannels do not share

common zeros. The autocorrelation matrix Rss has full rank.

The basic idea of subspace decomposition is to decompose the Rxx into a signal subspace and a noise subspace. Let

1 2 M(N+1) be the eigenvalues of Rxx ; since H

has full column rank (N + L + 1) and Rss has full rank, it implies that the signal component of Rxx , that is, HRss HH , has

rank of N + L + 1. Therefore,

i > n2

i =

for i = 1, . . . , N + L + 1,

n2

(3)

bm (n) = bm (n) bm (n 1) bm (n N)

(4)

(5)

hm,0 hm,L 0

..

..

..

..

.

.

.

.

0 hm,0 hm,L

.

Hm =

..

If we perform the subspace decomposition of Rxx , we get

H

Rxx = U U = Us Un

hm (n).

We define an M(N + 1) 1 overall observation vector as

T

(n)]T , then the multichannel system

x(n) = [x1T (n) xM

can be represented in matrix formulation as

x(n) = Hs(n) + b(n),

(7)

Us Un

(10)

eigenvalues of Rxx in descending order and the columns

of Us are the corresponding orthogonal eigenvectors of

1 , . . . , N+L+1 , and n = diag{N+L+2 , . . . , M(N+1) } contains

the other eigenvalues and the columns of Un are the orthogonal eigenvectors corresponding to eigenvalue n2 . The spans

of Us and Un denote the signal subspace and the noise subspace, respectively. The key proposal is that the columns of H

also span the signal subspace of Rxx . The channel parameters

can then be uniquely identified by the orthogonal property

between the signal subspace and the noise subspace [10], that

is,

HH Un = 0.

(6)

(9)

where

xm (n) = xm (n) xm (n 1) xm (n N)

(8)

(11)

all the channel parameters. From (11), we propose an objective function as follows:

J(h) = HH Un .

(12)

to be known. However, in practice this is not true. Therefore, the channel order must be estimated beforehand. In this

paper, we estimate the channel order based on (12). Since

the subchannels may have dierent orders, order estimation

refers to the largest. Note that the channel identifiability does

not depend on whether the subchannels have the same order but on whether they have common zeros [10]. We show

that order estimation aects the number of global optima in

(12). It shows that J(h) has only one nonzero optimum when

the channel order is correctly estimated [10]. We study the

cases where the channel order is either under- or overestimated based on (12).

If the channel order is overestimated, then J(h) will have

more than one nonzero optimum. For instance, let the estimated order be L + 1; we define

T

= 0 hm,0 hm,L ,

T

T

0 = hm,0 hm,L 0 .

h1m = 0 hTm

h2m

hTm

(13)

H2 will satisfy the following condition:

UTn H1 = UTn H2 = 0.

(14)

nonzero optima:

h1 = h11

h2 = h21

h1M

h2M

T T

T

(15)

underestimated, then J(h) has no nonzero optimum. If this

is not true, from the above derivation, J(h) with correctly

estimated order will have more than one nonzero solution.

This contradicts the conclusion in [10].

Therefore, we can conclude that the optima of J(h) satisfy

the following conditions: optima of J(h) are

(i) more than one nonzero optimum overestimated order,

(ii) only one nonzero optimum correctly estimated order,

(iii) no nonzero optimum underestimated order.

Now let l denote the estimated order. Assuming that the

channel order is unknown, we propose to include l in the objective function of (12) and propose a new objective function

J(l, h) =

HH Un

. In order to let l converge on the correct

order, the following conditions must be met:

(1) trivial solution, that is, h = 0, must be avoided,

(2) l is more likely to converge to a small order.

is a solution of

Note that h has a free constant scale. If h

(11), then h, where is an arbitrary constant, is also a solution of (11). A common technique to avoid a trivial solution

is to normalize h to

h

= 1 [5, 6, 10]. In this paper, we extend this constraint by proposing

h

1, and concentrate

on a special case. That is, we fix the first parameter of h to

h(1) = 1. Such a constraint is helpful in avoiding the computation of normalization during iteration. Note that l will

aect the objective value by using the number of elements

in h to compute it. A smaller l implies that fewer elements

are used. Consequently, it may result in a smaller objective

759

value. Therefore, such a constraint is also helpful in making l

converge to a smaller value.

To ensure condition (2), we suggest imposing a penalty

on J(l, h) when a larger estimate of channel order is achieved.

Practically, the objective value (J(l, h)) converges to a small

value rather than exact zero. Therefore, we apply the multiplication instead of addition. The following objective function is proposed:

J(l, h) = lK UH

nH ,

(16)

K 0.

3.

GENETIC ALGORITHM

of biological evolution. The algorithm begins with a collection of parameter estimates (called a chromosome) and each

is evaluated for its fitness for solving a given optimization

task. In each generation, the fittest chromosomes are allowed

to mate, mutate, and give birth to ospring. These children

form the basis of the new generation. Since the children generation always contains the elite of the parents generation,

a newborn generation tends to be closer to a solution to the

optimization problem. After a few evolutions, workable solutions can be achieved if some convergence criteria are satisfied. In fact, a GA is a very flexible tool and is usually adapted

to the given optimization problem. The features of the proposed GA are described as below.

Encoding

Each chromosome has two parts. One represents the channel

order and is encoded in binary and the other represents the

channel parameters and is encoded in real value. Let (c, h)ij

( j = 1, . . . , Q) denote the jth chromosome of the ith generation where Q is the population size. The chromosome structure is as follows:

c1 c2 cS

h1 h2 hT

(17)

as h. Note that the length of order chromosomes decides the

length of parameter chromosomes and one should ensure

that the length of parameter chromosomes is greater than the

possible channel order.

Initialization

Normally, the initial values of the chromosomes are randomly assigned. In the proposed GA, in order to prevent the

algorithm from converging to a trivial solution, as we have

shown in Section 2, the first parameter of h (i.e., the first gene

of parameter chromosomes) is fixed to h1 = 1, where other

genes are randomly initialized.

Fitness function

In the proposed GA, tournament selection is adopted, in

which the objective values are obtained by computing the

760

objective value to fitness value. Since the order chromosomes

have a very simple coding (in binary) and a smaller gene

pool, order chromosomes are expected to converge much

faster than the parameter chromosomes. Thus, we propose

to detect the convergence of order chromosomes and parameter chromosomes separately. However, it should be noted

that the objective values of (16) cannot directly indicate the

fitness of the order chromosomes. The fitness function for

order chromosomes is required and is defined as follows. The

fitness of an estimated order l is measured as the number of

chromosomes whose order is equal to l. The order fitness of

(c, h)ij is denoted as

f cij

cumij (l).

(18)

The above fitness function is not used in tournament selection but only in the convergence criteria of order chromosomes.

Parent selection

A good parent selection mechanism gives better parents a

better chance to reproduce. In the proposed GA, we employ

an elitist method [8] and tournament selection [11]. First,

partial chromosomes of the present population, that is, the

Q best chromosomes, are directly selected. Then, the other

(1 ) Q child chromosomes are generated via tournament

selection within the whole parent population. That is, two

chromosomes are randomly selected from the parents population in each cycle. The one with the smaller objective value

is selected.

Crossover

Crossover combines the feature of two parent chromosomes

to form two child chromosomes. Generally, the parent chromosomes are mated randomly [12]. In the proposed GA,

each chromosome contains two parts with dierent coding

technique. The order chromosome will decide how many elements in the parameter chromosome are used to calculate

the objective value. Therefore, these two parts cannot be decoupled. The conventional methods that perform crossover

separately may not be ecient. Normally, the order chromosomes will be short. For instance, an order chromosome with

a length of 5 implies a searching space from 1 to 32, which

covers most practical cases of the FIR channels. Therefore,

the order chromosomes are expected to converge much faster

than the parameter chromosomes. We propose not to perform crossover on the order chromosomes but to use mutation only. For the parameter chromosomes, crossover between chromosomes with dierent order is more explorative

(i.e., searches more data space). However, it may also damage the building blocks in the parent chromosomes. On the

other hand, crossover between chromosomes with the same

order is more exploitative (i.e., it speeds up convergence).

However it may cause premature convergence. Since faster

convergence is preferable in blind channel identification, we

propose to mate chromosomes of the same order. For each

estimated order, if the number of corresponding chromosomes is odd, a randomly selected chromosome is added to

the mating pool.

Assume that the chromosomes are mated and a pair of

them is given as

i

i

(c, h)ij = c1 c2 cS , h1 h2 hT j ,

(19)

(c, h)ik = c1 c2 cS , h1 h2 hT k .

a1 +1 , . . . , a2 be a2 a1 random real numbers in (0, 1), then

the parameter parts of the child chromosomes are defined as

i

i

i

hi+1

j = h1, j ha1 , j , a1 +1 ha1 +1, j

i

i

i

hi+1

k = h1,k ha1 ,k , a1 +1 ha1 +1,k

(20)

where a two-point crossover is adopted.

Mutation

A mutation feature is introduced to prevent premature convergence. Originally, mutation was designed only for binaryrepresented chromosomes. For real value chromosomes, the

following random mutation is now widely adopted [12]:

g = g + (, ),

(21)

may be Gaussian or uniform, and and are the related

mean and variance. In this paper, we use normal mutation

for the order genes. That is, we randomly alter the genes from

0 to 1 or from 1 to 0 with probability Pm . Normally, Pm is a

small number. However, in the proposed GA, the value of the

order chromosome decides the used parameter genes for calculating the objective function. Less value of order means a

lesser number of parameter genes and consequently less objective value. Therefore, in the start-up period of the iteration, the order chromosomes are more likely to converge on

a small value where order is equal to 1. A large mutation rate

is adopted to prevent such premature convergence.

For the parameter part, a uniform PDF is employed.

Let a3 , a4 [1, T] be two random integers (a3 < a4 ), and

let a3 +1 , . . . , a4 be a4 a3 random real numbers between

(1, 1), then the parameter chromosomes of the child generation are defined as

hi+1

j = h1 , . . . , ha3 , ha3 +1 + a3 +1 /P, . . . , ha4

+ a4 /P, . . . , ha4 +1 , . . . , hT ,

(22)

iteration to speed up the convergence.

761

Population size

48

16

Penalty scale

1/12

pm

0.5

10.2m/100

Convergence criterion

We propose a dierent convergence criterion for order chromosomes and parameter chromosomes. The order chromosomes are considered to be converged if the gene pool is dominated by a certain order, that is,

cumij

D

cumij (l)

cumij

D

J(c, h)s1

30

0.1

0.1

( + 1)/( 1)

( 1)/( + 1)

(23)

other orders

of chromosomes with order lD , and is a predefined ratio.

When the order chromosomes are converged, the mutation

rate of order chromosomes is set to zero (pm = 0). The parameter chromosomes are considered to be converged if the

change in the smallest objective value within X generations

is small, that is,

J(c, h)i J(c, h)iX < eJ(c, h)i ,

(24)

where e is also a predefined ratio. Theoretically, the objective function in (16) has multiple minima that may have

overestimated orders. In order to cause the order chromosomes to converge on the correct channel order, we impose a

penalty on the chromosomes with greater order. Due to the

random nature of a GA, though in most cases the order

chromosomes can converge on the real channel order (see

the simulation result in Table 1), there is no guarantee that

the chromosomes will absolutely converge on the real channel order. Therefore, we propose to examine the converged

result to ensure correct convergence. If we let (c, h)s1 be the

current converged result, the examination can be carried out

as follows (see the outer loop in Figure 2): reduce the order of (c, h)s1 by 1, fix the order, and run the proposed GA

again (note that this time the order chromosomes are fixed,

i.e., pm = 0). After a few generations, a new result denoted

as (c, h)s2 can be achieved. If the objective values of (c, h)s1

and (c, h)s2 , that is, J(c, h)s1 and J(c, h)s2 , are close enough,

then we can decide that J(c, h)s1 has overestimated order and

J(c, h)s2

drop from J(c, h)s1 to J(c, h)s2 is significantly large, the following inequality arises:

J(c, h)s1 J(c, h)s2 > J(c, h)s1 + J(c, h)s2 .

(25)

distinguishably large enough for us to say that (c, h)s1 has

converged on the real channel order. From the inequality in

(25), one can draw two lines with slope of ( + 1)/( 1) and

( 1)/( + 1) (see Figure 1). The shaped region in Figure 1

shows the data space given by (25). The criterion set in (25)

is, in fact, an enumeration search. However, the order estimation in the proposed GA does not solely rely on this enumeration search. In the proposed GA, we have employed certain

strategies to give the order chromosome a better chance of

converging to the real channel order. The simulation result

also shows that in most cases the order chromosomes can

converge on (or close to) the real channel order (see Table 2).

The enumeration search is, thus, used to compensate for the

drawback of the GA.

762

Start

Configure the proposed GA

according to Table 1

Initialize the chromosomes

selection, crossover, and mutation

Evaluate the chromosomes by the

objective function (13) and the

order fitness function (15)

Reinitialize the

parameter

chromosomes

in (20) is satisfied?

No

Yes

Set Pm = 0

chromosomes by 1

and set Pm = 0

in (21) is satisfied?

No

Inner loop

Yes

Store the converged result

No

in (22) is satisfied?

Outer loop

Yes

Terminate

The overall flow diagram of the proposed approach is illustrated in Figure 2. It can be seen that the proposed GA has

an inner and an outer loop. The criteria in (23) and (24) in

the inner loop guarantee that a global optimum is achieved.

We have shown that this solution may have an overestimated

order. The criterion in (25) in the outer loop is used to reexamine the solution reached and guarantee the correct estimate.

It is important to note that although the order part and

the parameter part have a distinct representation, fitness

function, and convergence criterion, we encode the two parts

into a single chromosome rather than keeping two separate

chromosomes. This is because the order part decides how

many genes of the parameter chromosome should be used to

calculate the objective value and, therefore, these two parts

cannot be decoupled.

4.

EXPERIMENTAL RESULT

of the proposed GA. We use the same multichannel FIR system as that in [9], where two sensors are adopted and the

channel-impulse responses are

(26)

population size is used in order to explore greater data space.

The searching space of channel order is from 1 to 8 (S = 3).

In the blind channel estimation, a model of FIR multichannel is normally modelled by oversampling the output of a

real channel. A multichannel model with two subchannels of

5

26

43.4%

6

21

35%

7

11

18.3%

8

2

3.3

Total

60

100%

763

Average order of

the population

7

6

5

4

3

0

200

300

400

500

600

700

800

600

700

800

population

101

100

101

102

103

104

0

100

200

300

400

500

Generations

first inner loop run.

101

100

Order = 6

101

J(c, h)

order 8 represents a real channel of order 16, which covers most normal channels. Note that order chromosomes of

length 3 can also map the searching space from 9 to 16. So,

in case no satisfactory solution is reached, one may remap

the order searching space (916) and rerun the algorithm.

A large mutation rate (pc = 0.5) is adopted to prevent premature convergence. To speed up the convergence of parameter chromosomes, we adjust P every 100 generations (see

Table 2), where a denotes the floor value of a.

A 25-dB Gaussian white noise is added to the output and

2,000 output samples are used to estimate the autocorrelation matrix Rxx . Figure 3 shows a typical evolution curve. In

each generation, the average objective value and estimated

order of the whole population are plotted. From Figure 3,

one can see that the order chromosomes converge much

faster than the parameter chromosomes. They converge on

the true channel order in the first inner loop run (order = 5

in Figure 3). We store this converged result, reduce the order

by 1, set pm = 0, and then begin another GA execution. After

the convergence (order = 4 in Figure 3), we evaluate these

two converged results (order = 5 and order = 4 in Figure 3)

by using the outer loop criterion in (25). Since there is an exponential drop between the two results, the condition in (25)

is satisfied. Thus, our algorithm stops and concludes that order 5 is the final estimate.

The channel order is estimated by detecting the drop between two converged objective values, which may be similar to the traditional method where the eigenvalues of an

overmodeled covariance matrix are calculated and the channel order is determined when there is a significant drop between two adjoining eigenvalues [4]. However, our algorithm

is more ecient since the calculation of eigenvalue decomposition can be avoided and it can be seen that the drop is much

more significant (an exponential drop).

Figure 4 shows an evolution curve where the channel order is overestimated in the first inner loop run (order = 6

in Figure 4). In Figure 4, the objective values of the first two

converged results are quite close, which does not satisfy the

criterion set in (25). Further examination is thus required.

As above, we can get the third converged result (order = 4 in

Figure 4). By evaluating it with (25), we can draw the same

conclusion as from Figure 3.

When compared with existing work, the convergence

speed of the proposed GA is satisfactory since it can be seen

that a quite reliable solution can be reached in about 1,000

generations, whereas the algorithm in [9] converges after

2,000 generations (note that in [9] the channel order is assumed to be known). In [8], an identification problem with

similar complexity is simulated. The algorithm converges after hundreds of generations, but it is nonblind and, there-

100

Order = 5

102

Order = 4

103

104

200

400

600

Generations

800

1000

1200

loop run.

note that the convergence speed is aected by the complexity of the target problem. A more complicated multichannel

will result in slower convergence speed. We simulate a multichannel system with four subchannels and find that the algorithm converges after 1,000 generations. The eect of problem complexity seems to be a common problem of GAs and

needs further study.

Since the proposed GA needs to estimate the secondorder statistics of the channel output (the autocorrelation

matrix), it cannot be used directly in a rapidly varying channel. However, if some subspace tracking algorithm is employed (e.g., [13]), the noise subspace, that is, Un in (16) can

be updated when a new sample vector (x(n) in (7)) is received. The objective function can be adapted according to

764

100

generation cycles.

5.

101

RMSE

GA has been proposed for blind channel estimation. Computer simulations show that its performance is comparable

with existing closed form approaches. Moreover, the proposed GA can provide a joint order and channel estimation,

whereas most of the existing approaches must assume that

the channel order is known or treat the problem of order estimation and parameter estimation separately.

102

103

10

CONCLUSIONS

15

20

25

30

SNR (dB)

SS-SVD

SS-GA

be applied to a rapidly varying channel. However, this requires further investigation and is beyond the scope of this

paper.

It is obvious that the computation is costly if the converged order in the first inner loop run is much greater than

the real channel order. In the proposed GA, though there

is no guarantee that the order chromosomes are absolutely

converging on the real channel order in the first inner loop

run, we have proposed several strategies to make them converge more closely. To illustrate the point, 60 independent

trials are done and we record the converged order in the first

inner loop run. Table 2 shows the results. The first row denotes the converged orders. The second row gives the times

where the order chromosomes converge on a certain order.

The third row shows the proportions. Table 2 illustrates that

at most times the order chromosomes converge to or close to

the real channel order (order 5 and 6 get about 80% of the

trials).

To evaluate the performance of the proposed GA, we

compare it with a singular value decomposition-based closed

form approach (SVD) that assumes that the channel order is

known [10]. Root mean square error (RMSE) is employed to

measure the estimation performance, which is defined as

Nt

1

1

RMSE =

hi h,

h

Nt i=1

(27)

t denotes the estimated channel parameters

set at 50, and h

in the ith trial. The comparison results are given in Figure 5.

It can be seen that the proposed GA achieves similar performance with lower signal-to-noise ratio (SNR). At high SNR,

the performance of GA is worse, because the converged result

is not close enough to the real optimum. However, the per-

ACKNOWLEDGMENTS

The authors would like to express their appreciation to the

Editor-in-Charge, Prof Riccardo Poli, of this manuscript for

his eort in improving the quality and readability of this paper. This work is done when Dr. Chen was visiting the City

University of Hong Kong and his work is supported by City

University Research Grant 7001416 and the Doctoral Program fund of China under Grant 20010561007.

REFERENCES

[1] L. Tong, G. Xu, and T. Kailath, Blind identification and

equalization based on second-order statistics: a time domain

approach, IEEE Transaction on Information Theory, vol. 40,

no. 2, pp. 240349, 1994.

[2] L. Tong, G. Xu, Hassibi, B., and T. Kailath, Blind channel

identification based on second-order statistics: a frequencydomain approach, IEEE Transactions on Information Theory,

vol. 41, no. 1, pp. 329334, 1995.

[3] L. Tong and S. Perreau, Multichannel blind identification:

from subspace to maximum likelihood methods, Proceedings

of the IEEE, vol. 86, no. 10, pp. 19511968, 1998.

[4] A. P. Liavas, P. A. Regalia, and J.-P. Delmas, Blind channel

approximation: eective channel order determination, IEEE

Trans. Signal Processing, vol. 47, no. 12, pp. 33363344, 1999.

[5] L. Tong and Q. Zhao, Joint order detection and blind channel

estimation by least squares smoothing, IEEE Trans. Signal

Processing, vol. 47, no. 9, pp. 23452355, 1999.

[6] J. Ayadi and D. T. M. Slock, Blind channel estimation and

joint order detection by MMSE ZF equalization, in Proc.

IEEE 50th Vehicular Technology Conference (VTC 99), vol. 1,

pp. 461465, Amsterdam, The Netherlands, September 1999.

[7] L. Yong, H. Chongzhao, and D. Yingnong, Nonlinear system

identification with genetic algorithms, in Proc. 3rd Chinese

World Congress on Intelligent Control and Intelligent Automation (WCICA 00), vol. 1, pp. 597601, Hefei, China, JuneJuly

2000.

[8] L. Yao and W. A. Sethares, Nonlinear parameter estimation

via the genetic algorithm, IEEE Trans. Signal Processing, vol.

42, no. 4, pp. 927935, 1994.

[9] S. Chen, Y. Wu, and S. McLaughlin, Genetic algorithm optimization for blind channel identification with higher order

cumulant fitting, IEEE Transaction on Evolutionary Computation, vol. 1, no. 4, pp. 259265, 1997.

[10] E. Moulines, P. Duhamel, Cardoso, J.-F., and Mayrargue, S.,

Subspace methods for blind identification of multichannel

FIR filters, IEEE Trans. Signal Processing, vol. 43, no. 2, pp.

516525, 1995.

[11] K. Krishnakumar, Microgenetic algorithms for stationary

and nonstationary function optimization, in Proc. Intelligent

Control and Adaptive Systems, vol. 1196 of SPIE Proceedings,

pp. 289296, Philadelphia, Pa, USA, November 1990.

[12] K. F. Man, K. S. Tang, and S. Kwong, Genetic Algorithms: Concepts and Design, Springer-Verlag, London, UK, 1999.

[13] S. Attallah and K. Abed-Meraim, Fast algorithms for subspace tracking, IEEE Signal Processing Letters, vol. 8, no. 7,

pp. 203206, 2001.

Guangdong province, China. He received

the B.S. degree from Zhejiang University

in 1997 and the Ph.D. degree from South

China University of Technology in 2002, all

in electronic and communication engineering. He worked as a Research Assistant in

City University of Hong Kong from January 2001 to September 2001 and from January 2002 to May 2002. He is currently with

the School of Electronic and Communication Engineering, South

China University of Technology. His research interests include

blind signal processing and wireless communication.

Sam Kwong received his B.S. and M.S. degrees in electrical engineering from the State

University of New York at Bualo, USA, and

University of Waterloo, Canada, in 1983 and

1985, respectively. In 1996, he received his

Ph.D. degree from the University of Hagen, Germany. From 1985 to 1987, he was a

Diagnostic Engineer with the Control Data

Canada where he designed the diagnostic

software to detect the manufacture faults of

the VLSI chips in the Cyber 430 machine. He later joined the Bell

Northern Research Canada as a Member of Scientific Sta, where

he worked on both the DMS-100 Voice Network and the DPN100 Data Network Project. In 1990, he joined the City University

of Hong Kong as a Lecturer in the Department of Electronic Engineering. He is currently an Associate Professor in the Department

of Computer Science at the same university. His research interests

are in genetic algorithms, speech processing and recognition, data

compression, and networking.

Wei Gang was born in January 1963. He received the B.S., M.S., and Ph.D. degrees in

1984, 1987, and 1990, respectively, from Tsinghua University and South China University of Technology. He was a Visiting Scholar

to the University of Southern California

from June 1997 to June 1998. He is currently

a Professor at the School of Electronic and

Communication Engineering, South China

University of Technology. He is a Committee Member of the National Natural Science Foundation of China.

His research interests are signal processing and personal communications.

765

c 2003 Hindawi Publishing Corporation

of Tracking Filters with a Large Number

of Specifications

Garca Herrero

Jesus

Departamento de Informatica, Escuela Politecnica Superior (EPS), Universidad Carlos III de Madrid, 28911 Leganes, Madrid, Spain

Email: jgherrer@inf.uc3m.es

Departamento de Senales, Sistemas y Radiocomunicaciones, ETSI Telecomunicacion,

Universidad Politecnica de Madrid, 28040 Madrid, Spain

Email: besada@grpss.ssr.upm.es

Departamento de Informatica, EPS, Universidad Carlos III de Madrid, 28911 Leganes, Madrid, Spain

Email: aberlan@ia.uc3m.es

Departamento de Informatica, EPS, Universidad Carlos III de Madrid, 28911 Leganes, Madrid, Spain

Email: molina@ia.uc3m.es

Departamento de Senales, Sistemas y Radiocomunicaciones, ETSI Telecomunicacion,

Universidad Politecnica de Madrid, 28040 Madrid, Spain

Email: gonzalo@grpss.ssr.upm.es

Departamento de Senales, Sistemas y Radiocomunicaciones, ETSI Telecomunicacion,

Universidad Politecnica de Madrid, 28040 Madrid, Spain

Email: jramon@grpss.ssr.upm.es

Received 28 June 2002 and in revised form 14 February 2003

This paper describes the application of evolution strategies to the design of interacting multiple model (IMM) tracking filters in

order to fulfill a large table of performance specifications. These specifications define the desired filter performance in a thorough

set of selected test scenarios, for dierent figures of merit and input conditions, imposing hundreds of performance goals. The

design problem is stated as a numeric search in the filter parameters space to attain all specifications or at least minimize, in

a compromise, the excess over some specifications as much as possible, applying global optimization techniques coming from

evolutionary computation field. Besides, a new methodology is proposed to integrate specifications in a fitness function able to

eectively guide the search to suitable solutions. The method has been applied to the design of an IMM tracker for a real-world

civil air trac control application: the accomplishment of specifications defined for the future European ARTAS system.

Keywords and phrases: evolution strategies, radar tracking filters, multicriteria optimization.

1.

INTRODUCTION

A tracking filter has the double goal of reducing measurement noise and consistently predicting future values of signal. This kind of problems has ecient solutions in the case

of stationary signals, but solutions for nonstationary problems are not so consolidated yet. This is the case in the field

we are dealing with in this paper, tracking aircraft trajectories

from radar measurements in air trac control (ATC) applications.

The design of tracking filters for the ATC problem demands complex algorithms, like the modern interacting

multiple model (IMM) filter [1]. These algorithms depend

on a high number of parameters (seven in the IMM design presented here) which must be adjusted in order to

achieve, as much as possible, the desired tracking filter performance. IMM has proven certainly satisfactory performance for tracking maneuvering targets, in relation to previous approaches. However, the relation between its input

parameters and final performance is far from clear due to

strongly nonlinear interactions among all parameters. Therefore, no direct design methodology has been proposed to

generate the best solution for a specific application to date,

apart from manual parameterization and evaluation with

simulation.

Besides, real-world applications of tracking filters for

ATC usually address performance specifications defined over

an exhaustive set of realistic operational scenarios and covering a number of conflicting figures of merit. These two

characteristics, large table of specifications and application

of complex algorithms, make the design of modern tracking

filter a very complex problem.

In this paper, the authors expose a new methodology to

design and adjust tracking filters for ATC applications based

on the use of evolution strategies (ES) as an optimization

problem over a customized cost function (fitness function).

The method has been demonstrated by the design of a realworld engineering application: a modern ATC system promoted by EUROCONTROL for Europe, the ARTAS system.

Due to the high dimensionality of parameters space and the

large number of defined constrains (the operational scenarios and performance figures sum up to 264 specifications for

ARTAS), an automatic procedure to search and tune the final solution is mandatory. Classical techniques, such as those

based on gradient descent, were discarded due to the high

number of local minima presented by the fitness function.

ES have been selected for this problem due to their high robustness and immunity to local extremes/discontinuities in

the fitness function.

However, the selection of a fitness function taking account of all specifications is not so direct since all of them

should be simultaneously considered to guide the search.

The performance of ES has been analyzed in previous works

for sets of test functions, but its application to a real engineering problem with hundreds of specifications, where

the fitness landscapes properties are not well known, is

a harder task. A procedure has been proposed to build

this function, exploiting specific knowledge about the domain. Objectives with similar behavior in the search are

grouped first to select the worst cases for each group, and

then combine all of them in the final cost function. Results show that this procedure is able to find acceptable solutions lowering the excess over some specifications as much as

possible.

The paper starts by presenting the design performance

constrains for ATC problems in Section 2 (particularized

for an industrial application, the ARTAS system) and a description of the IMM algorithm in Section 3. In Section 4,

767

we explain the proposed optimization method based on

ES. Finally, Sections 5 and 6 are aimed at discussing optimization results and characteristics of solutions minimizing the fitness function, and summarizing the main conclusions.

2.

OF ARTAS SYSTEM

surveillance system developed by EUROCONTROL, relying

on the implementation of interoperable units coordinated

together. Each ARTAS unit will be in charge of processing all

surveillance data reports (i.e., primary and secondary radar

reports, ADS reports, etc.) to form a good estimate of the

current air trac situation in its responsibility volume.

Each of the ARTAS units should fulfill a set of well defined interoperability requirements to ensure a very high

quality of the assessed air situation that will be delivered to

the rest of the units. ARTAS defines, with a highly detailed

level, the required performance for all components, and especially for the tracker systems which process radar data.

To do this, it considers that the worst case of track performance will be expected in the case that a tracker receives

only monoradar data, while other cases of fusion with extra

data situations lead to relatively better performance. Therefore, the main emphasis is given to this monoradar case,

leaving the definition of performance for other cases as a

matter of specifying improvement factors. The most important aspect considered for tracker quality definition is the

specification of track output quality in a set of well-defined

representative input conditions. These conditions are classified with respect to radar and aircraft characteristics because of the very dierent behavior of any tracker for varying input conditions. Radar parameters represent the accuracy and quality of available data, while target conditions

are the distance and orientation of the flight with respect

to radar, motion state of aircraft (uniform velocity, turning, accelerating), and specific values of speed and acceleration.

Since it would not be possible to specify the performance

for all possible input situations, which would require an

enormous amount of figures, an area is defined in which the

performance is described by a limited amount of parameters

and some simple relations. Besides, since ARTAS will provide radar data processing basically for the control of civil

aircraft, the specifications consider the most representative

situations and the upper and lower limits of speed and accelerations in these conditions. ARTAS dierentiates scenarios

for two basic types of controlled areas in ATC terminal maneuvering area (TMA), covered by sensors with shorter refresh period (4 seconds), moderate range (up to 80 nautical

miles or NM), and enroute area, and by sensors with longer

period (12 seconds) and larger coverage (up to 230 NM). We

have considered in this study the enroute area since the difficulty is higher to achieve the performance figures specified

in this situation, being the design process for other situations

completely similar.

768

Out of all possible combinations, ARTAS has carried

out a choice containing the most important and realistically

worst cases. It comprises a number of simple input scenarios on which the nominal track quality requirements are defined. The methodology specified for this evaluation is based

on Monte Carlo simulation with the input parameters (radar

and trajectory parameters) particularized for each scenario.

The trajectories in dierent scenarios vary in the following

features:

(i) orientation with respect to the radar (radial or tangential starting courses, starting at a short, medium, or

maximum range);

(ii) sequence of dierent modes of flight (uniform, turns,

and longitudinal accelerations);

(iii) values of accelerations (upper and lower limits);

(iv) values of speeds (upper and lower limits).

There are eight specified simple scenarios with uniform

motion, and twelve complex scenarios including initialization with uniform motion, transition to transversal maneuver, and a second transition to come back to uniform motion.

When the target is far enough from the radar, a pure radial

approach to the radar leads to the worst case for transversal and heading errors during maneuver transitions, since azimuth error (much higher than radial error) is projected over

these components. With a similar reasoning, a pure tangential approach is the worst case for longitudinal and groundspeed errors during maneuvers. So, the scenarios basically

contain these two types of situations, varying in distance, velocities, and acceleration magnitudes. The authors have considered a couple of scenarios with longitudinal maneuvers although ARTAS does not specify performance for that type of

situations. The reason for this is that these operations appear in civil operations (especially in the TMAs) and the

filter is conceived to operate in real conditions. Otherwise,

the resulting tracking filter could be overfitted to transversal maneuvers, but developing undesirable systematic errors

with longitudinal maneuvers. The specifications for longitudinal scenarios were obtained extrapolating the ARTAS relations for the new input conditions. The resulting 22 scenarios, to be taken into account in the design of tracking filter are shown in Figure 1 (a circle represents radar position

and a square the initial position of target trajectory). Since

the specifications depend tightly on the input conditions,

there is no a priori worst case scenario whose attainment

would guarantee all cases, but all of them have to be considered simultaneously in the design process. It must be taken

into account that the design of tracker will be done considering that all requirements will be met without intermediate adaptation of the tracker parameters once the tracker

has been tuned for the typical radar characteristics and controlled volume (in this case, enroute area). The design will

provide a single set of parameters that would allow the filter to accomplish all the specifications in all the scenarios

considered.

For each of these scenarios, the performance of the

tracker should approach listed performance goal values un-

der the defined conditions. The accuracy requirements are

expressed as a function of several input parameters depending on each specific-tested scenario: groundspeed, range, orientation of the trajectory with respect to the radar (radial

and tangential projection of velocity heading), magnitude of

the transversal acceleration, and magnitude of the groundspeed change. There are four quality parameters in which

the requirements are defined: two for position (errors measured along and across trajectory direction, resp., longitudinal and transversal errors) and velocity (errors expressed

in the groundspeed and heading components). All of them

are expressed with the root mean square errors (RMSE), estimated by means of Monte Carlo simulation. Similarly, accuracy requirements are also defined for vertical coordinates,

but this work will address only the 2D (horizontal) filtering,

although similar ideas could be used for the design of a vertical tracker.

There are three basic parameters characterizing the desired shape of the RMS functions: peak value (RMSpv), convergence value (RMScv), and time period of RMS convergence to a certain level close to the final convergence value

(RMSpv + c RMSpv). These values are specified for dierent situations: initialization, transition from uniform motion

to turn, and transition to come back from turn to uniform

motion. Therefore, for each type of situation, the specifications are particularized according to the target evolution,

defining a bounding mask for each magnitude and scenario.

An example is indicated in Figure 2, with the transversal error obtained through simulation and the ARTAS bounding

mask for the scenario 10. Instead of measuring performance

along the whole trajectory in each scenario, only some interest points in the aircraft trajectory will be assessed to guarantee that the measured performance attains the bounding

mask: convergence RMSE in rectilinear motion before and

after maneuver segments (CV1 and CV2), and maximum

RMSE during maneuver (PV).

The design of a tracking filter aims at attaining a satisfactory trade-o among all specifications. The quality of

the design will be evaluated by means of simulation over

22 test scenarios, producing several types of trade-os to

be considered. First, the dierent transitions in modes of

flight (uniform and maneuvers) impose a trade-o between

steady-state smoothing and peak error during maneuvers,

which always lead to conflicting requirements (the higher the

smoothing factor the higher the filter error during transitions and vice versa). This is considered with the three representative values for each scenario and magnitude: CV1,

CV2, and PV. Secondly, each one of the magnitudes evaluated (transversal, longitudinal, heading and groundspeed

RMS errors) could individually shift the design towards different solutions, and so all magnitudes must be considered

at the same time to arrive to a certain compromise. Finally, dierent design scenarios impose harder conditions

for dierent magnitudes (radial trajectories for transversal

and heading errors, etc.) so that all scenarios should be

taken into account. In Table 1, we indicate the arrangement

of specifications as they will be considered in the design.

Specifications s() are particularized for the three evaluation

150 m/s

769

35 NM

150 m/s

15 NM

300 m/s

150 m/s

80 NM

65 NM

215 NM

50 NM

50 NM

15 NM

150 m/s

300 m/s

35 NM

150 m/s

230 NM

200 NM

6

a = 2.5 m/s2

a = 2.5 m/s2

150 m/s

300 m/s

65 NM

80 NM

200 NM

10

a = 2.5 m/s2

a = 2.5 m/s2

150 m/s

300 m/s

150 m/s

300 m/s

150 m/s

215 NM

215 NM

65 NM

80 NM

215 NM

11

12

300 m/s

300 m/s

150 m/s

50 NM

a = 2.5 m/s2

30 NM

a = 6 m/s2

15

15 NM

30 NM

200 NM

17

m/s2

a = 6 m/s2

14

150 m/s

15 NM

230 NM

16

a = 6 m/s2

13

a = 6 m/s2

a = 1.2

300 m/s

300 m/s

30 NM

50 NM

50 NM

a = 2.5 m/s2

18

19

a = 6 m/s2

20

a = 2.5 m/s2

300 m/s

300 m/s

230 NM

200 NM

21

22

a = 1.2 m/s2

Scenario PVlongitudinal

CV1longitudinal

CV2longitudinal

PVheading

CV1heading

CV1heading

s(PV41 )

..

.

s(CV141 )

..

.

S(CV241 )

..

.

s(PV4 j )

..

.

s(CV14 j )

..

.

s(CV24 j )

..

.

1

..

.

s(PV11 )

..

.

s(CV111 )

..

.

s(CV211 )

..

.

j

..

.

s(PV1 j )

..

.

s(CV11 j )

..

.

s(CV21 j )

..

.

{longitudinal, transversal, groundspeed, heading}), and for

..

.

..

.

number of specifications is 3 4 22 = 264.

770

Transversal error (m)

800

PV

700

Plots

z[k]

600

Prediction

z1

Update

specification

500

1]

x[k

P[k 1]

Kalman

filter

x[k]

P[k]

400

300

100

0

Convergence value

RMS specification

CV2

200

CV1

0

100

200

300

Time (s)

400

500

600

magnitude (meters).

3.

quality of output, the tracker in the core will have to apply advanced filtering techniques (IMM filtering, joint probabilistic data association, etc.). In this section we briefly describe

the basic principles of IMM trackers, the proposed structure

for these application, and the basic aspects for the design

process.

3.1. General considerations

The IMM tracking methodology maintains a set of dierent

dynamic models, each one is matched to a specific type of

motion pattern, and represents the target trajectory as a series of states, with the sequence of transitions modelled as a

Markov chain. In our case, the states considered will be uniform motion, transversal maneuvers (both towards right and

left), and longitudinal maneuvers. To estimate the target state

(location, velocity, etc.), there is a bank of Kalman filters corresponding to the dierent motion models in the set, complemented with an estimation of the probabilities that the

target is in each one of the possible states.

So, the elementary module in the tracking structure is a

Kalman filter [3] which sequentially processes the measurements z[k], combining them with predictions computed according to the target dynamic model, to update the estima

tion of target state and associated covariance matrix x[k],

P[k], respectively (see Figure 3).

The IMM maintains tracks conditioned to each jth motion state, with dierent Kalman filters, x j [k], P j [k], and estimation of the probability that the target is in each of them,

j[k]. One of the basic elements in this methodology is the

interacting process, which keeps all of them engaged to the

most probable one. The structure considered in this work is

shown in Figure 4, with four Kalman filters corresponding

to the four motion states considered. It takes as input the

and provides the estimation of target position and kinematic

state, together with estimated covariance matrix of errors,

x[k],

P[k].

The IMM algorithm develops the following four steps to

process the measures received from the available sensors to

estimate the target state: intermode interaction/mixing, prediction, updating, and combination for output.

(i) The tracking cycle for each received plot z[k] starts

with the interaction phase, mixing the state estimators

coming from each of the four models to obtain the new

inputs x o j [k] and Po j [k]. So, the input to each Kalman

filter is not directly the last update but a weighted combination of all modes taking into account the mode

probabilities. This step is oriented to assure that the

most probable mode dominates the rest.

(ii) Then, the prediction and updating phases are performed with the Kalman filter equations according to

the available models for target motion contained in

each mode.

(iii) The estimated probabilities of modes j [k] are updated, based on two types of variables: a priori transition probabilities of Markov chain pi j , and mode likelihoods computed with the residuals between each plot

and mode predictions j [k].

(iv) Finally, mode probabilities are employed as weights to

combine partial tracks for final output. Besides, each

individual output and probability is internally stored

to process plots coming in the future.

3.2.

tracking system which determine its performance are the

following: the number and type of models used in the set,

and transition parameters. The first aspect is dependent on

each tracking problem, and we have selected, as seen in

Section 3.1, a particular structure composed of four tracking modes reflecting the most representative situations in

civil air trac: constant velocity, turns to right or left, and

longitudinal accelerations. They correspond to target states

= 1, 2, 3, 4 in Figure 4. All modes interact within the IMM

structure to achieve the most proper response for each situation. Mode 1, = 1, is a simple constant velocity model

x1 [k 1]

P1 [k 1]

771

x2 [k 1]

P2 [k 1]

z1

x4 [k 1]

P4 [k 1]

Interaction/combination

x01 [k 1]

P01 [k 1]

Plots

z[k]

x3 [k 1]

P3 [k 1]

x02 [k 1]

P02 [k 1]

Kalman

filter

=2

Kalman

filter

=1

x03 [k 1]

P03 [k 1]

Kalman

filter

=3

1 [k]

2 [k]

3 [k]

4 [k]

x1 [k]

P1 [k]

x2 [k]

P2 [k]

x3 [k]

P3 [k]

x4 [k]

P4 [k]

x04 [k 1]

P04 [k 1]

1 [k 1] 4 [k 1]

z1

Kalman

filter

=4

Mode

probability

computation

1 [k]

Mode

combination

for output

4 [k]

x[k]

P[k]

Table 2: Parameters to adjust in the IMM design.

Parameter

Description

pUT

pUL

Transition probability between uniform motion and longitudinal acceleration

pTU

pLU

Transition probability between longitudinal acceleration and uniform motion

at

t 2

l 2

Plant noise variance for parametric circular models ( = 2, 3)

Plant noise variance for longitudinal models ( = 4)

with zero plant variance noise. Modes for tracking transversal maneuvers (turns), = 2, 3, are filters with circular extrapolation dynamics [4, 5], one for each possible direction.

They provide a highly adaptive response to transversal transitions, being one of the parameters to fix, in this filter, the

typical acceleration of target when performing turns. Finally,

mode = 4 is a linear-extrapolation motion model with a

plant noise component projected along longitudinal direction. Since the target deviations along transversal direction

are covered by circular modes, this last model will quickly

detect and adapt to variations in longitudinal velocity during

accelerations and decelerations.

Each mode in the structure has its own parameters to

tune, and must be adjusted in the design process. Besides, the

transition probabilities between all possible pairs of modes,

modelled as a Markov chain, are directly related with the rate

of change from any mode to the rest. They have a very deep

impact in the tracker behaviour during transitions and the

must also decide the most proper values for these parameters.

Since there are four modes, the transition probability matrix

pi j , being defined each term as probability of the target arriving to state j at time k, given that the state at time k 1 was

i, is

p

11

p

21

T[k] =

p

31

p41

p12

p22

p32

p42

p13

p23

p33

p43

p14

p24

p34

p44

pUL

p

1

p

0

0

TU

TU

.

=

pTU

0

1 pTU

0

pLU

0

0

1 pLU

(1)

772

The number of parameters have been simplified by considering only as possible transitions between uniform motion and

the rest of modes. The parameters pUT , pUL are the probabilities of starting transversal and longitudinal maneuvers, given

an aircraft at uniform motion, while the parameters pTU , pLU

are the probabilities of transitions to uniform motion, given

that the aircraft is performing, respectively, transversal and

longitudinal maneuvers.

It is important to notice that all parameters, those in

each particular model plus transition probabilities in Markov

chain, are completely coupled through the IMM algorithm

since partial outputs from each mode are combined and

feedback all modes. So, there is a strongly nonlinear interaction between them, making the adjusting process certainly

dicult. The whole set of parameters in the tracking structure is summarized in Table 2.

4.

The design of the particular IMM tracking structure addressed in this work, stated as adjusting the seven numeric

input parameters to fit filter performance within ARTAS

specifications, can be generally considered as a numerical optimization problem. We are searching for the proper combination of real input parameters that minimizes a real function assessing the quality of solutions as a cost f : V

R7 R. The final design solution

xd V should be a

xd ) f (

x ) for

7

any x V R . The subspace V stands for the region

of feasible solutions, defined as those vectors representing

a valid IMM filter: parameters for probabilities must fall in

the interval [0, 1] and parameters for variances must be positive. These are the only constraints to be accomplished by

solutions during the search. Performance specifications are

not considered as constraints here, but they will be used as

penalty terms in the objective cost function. The cost would

achieve a minimum value of zero only in the ideal case of a

solution accomplishing all specifications, grading the rest of

possible cases with a positive global cost function that will be

detailed later.

4.1. Evolution strategies

In numeric optimization problems, when f is a smooth,

low-dimensional function, there are an available number

of classic optimization methods. The best case is for lowdimensional analytical functions, where solutions can be analytically determined or found with simple sampling methods. If partial derivatives of function with respect to input

parameters are available, gradient-descent methods could be

used to find the directions leading to a minimum. However,

these gradient-descent methods quickly converge and stop at

local minima, so additional steps must be added to find the

global minimum. For instance, with a moderated number of

global minima, we could run several gradient-descent solvers

to find the best solution. The problem is that the number

of similar local minima increases exponentially with dimensionality, making these types of solvers unfeasible. In our particular case, besides a high-dimensional input space causing

multimodal dependence, we do not have an analytical function to optimize. It is the result of a complex and exhaustive evaluation process implying the simulation and performance assessment of tracking structure on the whole set of

22 scenarios defined. The evaluation of a single point in the

input space requires several minutes of CPU time (Pentium

III, 700 MHz). Besides, the evaluation of quality after all simulations is not direct but it should take into account system

performance in all scenarios and magnitudes in comparison

with the whole table of specifications. As we will see later,

multiple specifications (or objectives) will increase the number of solutions with similar performance, increasing therefore the complexity of the search.

For complex domains, evolutionary algorithms have

proven to be robust and ecient stochastic optimization

methods, combining properties of volume and path-oriented

searching techniques. ES [6] are the evolutionary algorithms

specifically conceived for numerical optimization, and have

been successfully applied to engineering optimization problems with real-valued vector representations [7]. They combine a search process which randomly scans the feasible region (exploration) and local optimization along certain paths

(exploitation), achieving very acceptable rates of robustness

and eciency. Each solution to the problem is defined as an

individual in a population, codifying each individual with a

couple of real-valued vectors: the searched parameters and a

standard deviation of each parameter used in the search process. In this specific problem, one individual will represent

the set of dynamic parameters in the IMM structure, as indicated in Table 2, (x1 , . . . , x7 ), and their corresponding standard deviations (1 , . . . , 7 ).

The optimization search basically consists in evolving a

population of individuals in order to find better solutions.

The computational procedure of ES can be summarized in

the following steps, according to the named + strategy

defined by Back and Schwefel [8], and particularized for our

problem:

(1) generate an initial population with individuals uniformly distributed on the search space V ;

(2) evaluate the objective value for each individual in pop

xi ), i = 1, . . . , ;

ulation f (

(3) Select the best parents in population to generate a set

of new individuals, by means of genetic operators

of recombination and mutation. In this case, recombination follows a canonical discrete recombination [6],

and mutation is carried out as follows:

i = i exp N(0, ) ,

xi = xi + N 0, i ,

(2)

stands for a normal distribution with zero mean and

variance 2 ;

(4) calculate the objective value of the generated ospring

f (

this new set containing parents and children to form

the next generation;

(5) Stop if the halting criterion is satisfied. Otherwise, go

to step (3).

We have implemented ES for this problem with a size of

50 + 30 individuals and mutation factor = 0.9. The

fitness function will directly depend on the dierences between RMS values of errors, evaluated through Monte Carlo

simulation, and ARTAS specifications for all scenarios and

magnitudes, as will be detailed next. It is important to notice that simulations are carried out using common random

numbers to evaluate all individuals in all generations, enhancing system comparison within the optimization loop. In

other words, the noise samples used to simulate all scenarios in the RMS evaluation are the same for each individual

in order to exploit the advantages coming from the use of

a deterministic fitness function. Besides, the number of iterations was selected to guarantee that confidence intervals

of estimated figures were short in relation to the estimated

values.

A basic aspect to achieve successful optimization in any

evolutionary algorithm is the control of diversity, but this

appropriateness will depend on the problem landscape. If a

population converges to a particular point in a search space

too fast in relation to the roughness of its landscape, it is

very probable that it will end in a local minimum. On the

contrary, a too slow convergence will require a large computational eort to find the solution. ES give the higher importance to the mutation operator, achieving the interesting

property of being self-adaptive in the sizes of steps carried

out during mutation, as indicated in step (3) of the algorithm above. Before selecting an algorithm for optimization,

it is interesting to consider the point of view of the no free

lunch (NFL) theorem [9], which asserts that no optimization procedure is better than a random search if the performance measurement consists in averaging arbitrary fitness

functions. The performance of ES has been widely analyzed

under a set of well-known test functions [8, 10]. They are

artificial analytical functions used as benchmarks for comparison of representative properties of optimization techniques, such as convergence velocity under unimodal landscapes, robustness with multimodality, nonlinearity, constraints, presence of flat plateaus at dierent heights, and so

forth. However, the performance on these test functions cannot be directly extrapolated to real engineering applications.

The application of ES to a new problem, such as our complex IMM design against multiple specifications where the

landscape properties are not known (it is not known even

if there is a global minimum or not), is a challenge open to

research.

4.2. Multiobjective optimization

The selection of the proper fitness function for this application is the problem-dependent feature with the highest impact on the algorithm (higher than the ES parameters such

as population size or mutation factor). Really, we should regard this design as a multiobjective optimization problem,

where each individual objective is the minimization of dierence between desired specification and assessed performance

773

in each specific figure of merit. When a problem involves simultaneous optimization of multiple, usually conflicting objectives (or criteria), the goal is not so clear as in the case of

single-objective optimization. The presence of dierent objectives generates a set of alternative solutions, defined as

Pareto-optimal solutions [11]. The presence of conflicting

multiple objectives leads to the fact that dierent solutions

cannot be directly compared and ranked to determine the

best one, but the concept of domination appears for comx1 is dominated by a second one

x2 if

parisons. A solution

considered. In any other case, they could not be strictly compared. Taking into account this concept of domination, a

Pareto-optimal set P is defined as the set of solutions such

that there exists no solution in the search space dominating

any member in P.

Some multiobjective optimization techniques have the

double goal of guiding the search towards the global Paretooptimal set and at the same time covering as many solutions

as possible. There are several proposed evolutionary methods

[12] that address this goal by maintaining a population diversity to cover the whole Pareto front. This fact implies first

the enlargement of population size and then specific procedures to guarantee guiding the search to the desired optimal

set with a well-distributed sample of the front. Among these

procedures, we can mention methods, such as selection by

aggregation and so forth, switching the objectives during the

selection phase to decide which individuals will appear in the

mating pool. Zitzler et al. [12] analyze and compare, over

some standard test analytical functions, some of the most

outstanding multiobjective evolutionary algorithms.

From the authors point of view, the peculiarities of the

problem dealt with, namely, the complexity and computational cost of evaluation function together with the considerable number of specifications, preclude the application of

techniques to derive the whole Pareto set. We have considered a weighting sum on partial goals to build a global fitness

function:

Minimizex

wi f i

x .

(3)

i=1

weighted sums converge to particular solutions of Pareto

front, corresponding to the tangential point in the direction

defined by the vector of weights. The general idea is illustrated in Figure 5 for a simplified case with only two objective

functions f1 and f2 . The shaded area is an example of finite

image set of the feasible region by objective functions f1 and

f2 , being the set of nondominated solutions (Pareto front, P)

represented with a bold line. No solution in the image set has

simultaneously lower values in f1 and f2 than any point in

P. A pair of weights define a direction for search in space of

objective functions, leading to the tangential point for each

solution.

However, a large number of specifications will make the

weighted summation cumbersome, being dicult that all

objectives are simultaneously considered to guide the search.

774

Pareto-optimal front

Minimum of

w1 f1 + w2 f2

f1

Minimum of

w1 f1 + w2 f2

f2

50

100

150

200

250

x,

x > 0,

0,

x 0.

R(x) =

(4)

heading, and groundspeed) have the same importance,

40

60

80

20

40

60

80

Generations

100

120

140

100

120

140

20

15

Fitness

with 264 components. A variation is proposed to reduce the

number of objectives in the sum by exploiting knowledge

about the problem. Basically, objectives with similar behavior are grouped to select a representative per group, the

one with the worst value, so that it guarantees that all objectives in the group are represented in the final function. If

we consider Table 1 with the whole set of specifications, we

are going to select the worst case for each column, leaving

only 12 terms in the summation. It is important to notice that

this maximum operation will break the linearity of function

with respect to objectives and will make the landscape depend on each specific input vector. A trajectory of solutions

in the search process may jump along dierent goal functions

if the scenarios with the worst case change. The justification

comes from the fact that each magnitude has certain dependence with the input parameters similar in all scenarios, so a

single representative is enough to be considered in the optimization. Besides, the selection of the worst case assures that

if the method can satisfy that term, all the scenarios will be

simultaneously accomplished.

Taking into account this consideration, the fitness function, which assesses the quality of a solution as the degree

of attainment performance figures with respect to specifications, is presented next. The following details have also been

considered.

(i) It assesses the excess over the specification for each

performance figure, penalizing a solution as the error increases, but once the error is below the specification, the cost is zero. This is so because there is

no additional advantage if the RMSE decreases more

after the required values are attained. This is implemented for each magnitude by means of the expression R(pi s(pi )), where pi is the ith performance figure (RMSE), s(pi ) the specification, and R() the ramp

function:

20

10

5

0

defining a partial cost for ith figure,

%

ci = R

&

pi s(pi )

.

pi

(5)

(iii) In order to add some flexibility in the trade-o between maneuver and uniform motion performances,

weighting factors t are included. They allow us to vary

the priority of these performance figures, in the case

where all of them cannot be attained at the same time,

defining therefore a cost per jth scenario,

c sj =

4

i=1

'

PV R

PVi j s PVi j

PVi j

(

)

CV1i j s CV1i j

+ CV1 R

s CV1i j

(

CV2i j s CV2i j

+ CV2 R

s CV2i j

)

(6)

)*

where the subindex i represents each interest magnitude (longitudinal, transversal, groundspeed, and

heading) and j the scenario index.

(iv) Finally, considering the set E of all the scenarios where

the performance figures are evaluated (in our example,

the 22 scenarios indicated in Figure 1), the worst case

scenario is j, for each figure of merit and selected time

775

700

1100

1000

600

900

500

800

700

400

600

300

500

400

200

300

100

0

200

0

50

100

150

200 250

Time

300

350

400

450

100

50

100

150

200 250

Time

300

350

400

450

20

16

18

14

16

12

14

10

12

10

6

4

2

0

50

100

150

200 250

Time

300

350

400

50

100

150

200 250

Time

300

350

400

450

function to be minimized is as follows:

4

'

(

PV max R

j E

i=1

(

+ CV1 max R

j E

(

+ CV2 max R

j E

PVi j s PVi j

PVi j

)

CV1i j s CV1i j

s CV1i j

CV2i j s CV2i j

s CV2i j

)

(7)

)*

specifications for all performance figures, each one assessed in the worst case scenario.

5.

RESULTS

process to adjust the filter parameters according to ARTAS

obtained particularizing expression (6) to the case of a weight

of 1 for all magnitudes PV = CV1 = CV2 = 1.

First, Figure 6 summarizes the evolution of best individual in the population (the one with the lowest value of fitness), indicating graphically the accomplishment of specifications along the generations. Each design objective is presented by a row in the diagram, while the best individual for

each generation appears in each column. The grey level of

position (i, j) in the image indicates the quality of the fitting

to the ith specification of the best individual for the jth generation. The grey level represents linearly the relative excess

over the restriction (no excess is presented as white, 100%

or higher excess as black), which is the partial cost function

related with this constraint. Therefore, a completely white

column means that the optimization process has found a

set of parameters able to fulfil all design restrictions, while

a complete white row means that all best individuals in this

optimization exercise are able to fulfil the specification for

776

Longitudinal error (m)

350

500

450

300

400

250

350

200

300

250

150

200

100

150

50

0

100

0

50

100

150

200

Time

250

300

350

50

50

100

200

Time

250

300

350

30

150

25

6

20

5

15

4

3

10

2

5

1

0

0

50

100

150

200

Time

250

300

350

50

100

150

200

Time

250

300

350

computed from the whole set of partial costs as indicated in

Section 4 is plotted. This kind of figure serves not only to

see the convergence of the optimization process graphically,

but also to see the most demanding performance criteria to

be accomplished and to compare the suitability of a predefined tracking scheme (with some free design parameters)

for a certain tracking problem. Applying exactly the same

proposed methodology, we could have performed the optimization exercise with an alternative IMM structure, or even

with a dierent tracking technique with open design parameters, and compared after designing the process its capabilities

against specifications.

As it can be seen, the optimization process makes the

overall figure lighter from the initial generations (left) to

the end of the optimization (right), achieving a trade-o

point to accomplish as many specifications as possible. The

highest improvement is carried out in the first 80 generations, with very slight modifications from that point until the

to attain that specification together with the rest. So, scenarios 12 and 13, corresponding to specifications 133156,

present the worst performance after the optimization. The

specific performance values and ARTAS bounding masks for

these scenarios, corresponding to transversal maneuvers at

215 NM, v = 300 m/2, a = 2.5 m/s2 , (scenario 12) and at

65 NM, v = 150 m/2, a = 6.0 m/s2 , (scenario 13), are indicated in Figures 7 and 8. The magnitudes with worst performance are the transversal and heading errors (peak values) during transversal maneuvers. The peak value of heading error is the globally worst figure in the set, more than

100% over specification. Besides, as it can be seen, the convergence error values for some of the magnitudes in these

scenarios are practically tangent to specifications, indicating that the optimization process has eectively considered

all of them to arrive to the final trade-o solution. So,

this method selects the parameters adapting system behavior to the bounding mask. This is apparent not only for the

777

3.5

3

2.5

2

1.5

50

100

sol 2

1.5

1

150

sol 1

sol 5

0.5

0

200

0.5

1.5

250

Fitness

1

2.1

2.05

2

1.95

1.9

1.85

1.8

1.75

1.7

1.65

6

Runs

0.5

0.5

10

1.5

sol 2

0.5

10

Runs

sol 5

sol 1

0.5

x = x

1 + x2 x1 + x5 x1 .

1.5

0.5

0.5

=0

1.82

1.8

1.78

1.76

Fitness

presented scenarios with worst cases but for all design scenarios as well.

Dierent runs of the global optimization process (using

dierent random seeds to generate individuals in the initial

population) were carried out to analyze the consistency of

the solutions obtained. The results of ten independent runs

are indicated in Figure 9, presenting only the best individual

in population after optimization (instead of the whole evolution process) and the final values of fitness achieved.

As it can be seen, dierent runs led to solutions quite consistent in terms of overall fitness and whose specifications are

presenting problems to the filter (always those in scenarios

12 and 13). However, the specific vector solutions found after optimization in each run had significant dierences, indicating that fitness function probably has a multimodal landscape, even after having selected a particular set of weighting

factors among specifications, = 1.

Since it is not possible to represent fitness landscape with

seven dimensions, the following analysis was carried out. The

three solutions with closest fitness values, resulting from runs

1, 2, and 5, were selected to be combined and to generate a

grid of linear combinations (convex hull) as follows:

1.74

1.72

sol 1

sol 2

1.7

1.68

1.66

1.64

0.4 0.2

0.2

0.4

0.6

0.8

1.2

1.4

= 0.

(8)

The fitness landscape for a grid with , varying in the interval [1.5, 0.5], in steps 0.1 units, is indicated in Figure 10. It

can be seen that the fitness is practically flat over the particular region of search space represented by linear combinations

778

=0

=1

2.5

2.4

2.8

2.3

2.2

2.4

Fitness

Fitness

2.6

2.2

2.1

2

1.9

1.8

1.8

1.6

0.4 0.2

1.7

sol 1

sol 5

sol 2

sol 5

0

0.2

0.4

0.6

0.8

1.2

1.4

1.6

0.5

0.5

1.5

values that are presented in Figure 10; they are only particular cases of linear combinations). The particular solutions

1, 2, and 5 correspond to the points (0, 0), (1, 0), and (0, 1) in

plane. In Figure 11, the projection on plane is presented

with grey levels, where the feasible region in plane can

be clearly separated. All solutions within the convex combinations of solutions , in [0, 1] are feasible. Besides, the

2D graphs corresponding to the paths connecting all pairs

of solutions are presented in Figures 11 and 12. As it can

be seen, the solutions found by algorithm are eectively local minima of fitness function in spite of the fact that the

function is almost flat in this region of convex linear combinations. This shows that the algorithm is capable of finding

appropriate solutions, and confirms the fact that we have a

multimodal function even after having combined the multiple restrictions in a scalar function. Dierent runs arrived

to dierent local minima in a region where the relative difference between minima can be practically neglected, so all

solutions can be taken as good design points for the adopted

criterion. The algorithm was carried out with dierent criteria (for instance, the penalty of RMSE peak values being ten

times higher than convergence RMSE), achieving results consistent with the preferences: all specifications with the highest

priority were first accomplished, leading to higher errors in

the other specifications.

6.

CONCLUSION

for the design of IMM-tracker techniques to accomplish a

considerably large set of predefined specifications.

An exhaustive set of test scenarios with performance

specifications for each and a specific IMM structure with

open parameters are the input to solver. The procedure may

be summarized as performing an optimization over the pa-

combination of partial excesses over specifications that takes

into account some knowledge about the problem in the form

described in Section 4. This fitness function summarizes the

attainment of all interest accuracy statistics for the dierent

interest times (steady state, start and end of maneuvers, etc.)

in all design scenarios. The evaluation involved the costly

Monte Carlo simulation, as specified by ARTAS, to calculate accuracy statistics, although the methodology is open for

the inclusion of other possible evaluation methods for IMM

tracking filters, such as the one described in [9].

This method has been successfully used in a monoradar

application, leading to a significant improvement over previous nonsystematic approaches for the same problem. Even

more, the form of fitness function described serves as a

method for relaxing constraints: those more important for

us are provided a higher weight in (6), and those not so important a lower weight.

REFERENCES

[1] H. A. Blom and Y. Bar-Shalom, The interacting multiple

model algorithm for systems with Markovian switching coecients, IEEE Trans. Automatic Control, vol. 33, no. 8, pp.

780783, 1988.

[2] EUROCONTROL, Functional and performance specification of ARTAS. Version 2.6, http://www.eurocontrol.int/

artas/public system support/online doc request/online doc

request summary.htm.

[3] Y. Bar-Shalom and X. R. Li, Multitarget-Multisensor Tracking: Principles and Techniques edited by Y. Bar-Shalom, YBS

Publishing, Danvers, Mass, USA, 1995.

[4] N. Nabaa and R. H. Bishop, Validation and comparison of

coordinated turn aircraft maneuver models, IEEE Trans. on

Aerospace and Electronics Systems, vol. 36, no. 1, pp. 250259,

2000.

[5] K. Kastella and M. Biscuso, Tracking algorithms for air trac

control applications, Air Trac Control Quarterly, vol. 3, no.

1, pp. 1943, 1995.

[6] H. P. Schwefel, Numerical Optimisation of Computer Models,

John Wiley & Sons, New York, NY, USA, 1981.

[7] I. Rechenberg, Evolution strategy: Natures way of optimization, in Optimization: Methods and Applications, Possibilities

and Limitations, H. W. Bergmann, Ed., Lecture Notes in Engineering, pp. 106126, Springer, Berlin, Germany, 1989.

[8] T. Back, Evolutionary Algorithms in Theory and Practice, Oxford University Press, New York, NY, USA, 1996.

[9] D. H. Wolpert and W. G. Macready, No-free-lunch theorems

for optimization, IEEE Trans. on Evolutionary Computation,

vol. 1, no. 1, pp. 6782, 1997.

[10] K. Ohkura, Y. Matsumura, and K. Ueda, Robust evolution

strategies, Applied Intelligence, vol. 15, no. 3, pp. 153169,

2001.

[11] K. Deb, Evolutionary algorithms for multi-criterion optimization in engineering design, in Evolutionary Algorithms in

Engineering and Computer Science, John Wiley & Sons, Chichester, UK, 1999, Chapter 8.

[12] E. Zitzler, K. Deb, and L. Thiele, Comparison of multiobjective evolutionary algorithms: Empirical results, Evolutionary

Computation, vol. 8, no. 2, pp. 173195, 2000.

Garca Herrero received his Master

Jesus

degree in telecommunication engineering

from Universidad Politecnica de Madrid

(UPM) in 1996 and his Ph.D. degree from

the same university in 2001. He has been

working as a Lecturer at the Department of

Computer Science, Universidad Carlos III

de Madrid, since 2000. There, he is also integrated in the Systems, Complex and Adaptive Laboratory, involved in artificial intelligence applications. His main interests are radar data processing,

navigation, and air trac management, with special stress on data

fusion for airport environments. He has also worked in the Signal

Processing and Simulation Group of UPM since 1995, participating in several national and European research projects related to air

trac control.

Juan A. Besada Portas received his Master degree in telecommunication engineering from Universidad Politecnica de Madrid

(UPM) in 1996 and his Ph.D. degree from

the same university in 2001. He has worked

in the Signal Processing and Simulation

Group of the same university since 1995,

participating in several national and European projects related to air trac control.

He is currently an Associate Professor at

Universidad Politecnica de Madrid (UPM). His main interests are

air trac control, navigation, and data fusion.

received his

Antonio Berlanga de Jesus

B.S. degree in physics from Universidad

Autonoma,

Madrid, Spain in 1995, and his

Ph.D. degree in computer engineering from

Universidad Carlos III de Madrid in 2000.

Since 2002, he has been there as an Assistant Professor of automata theory and programming language translation. His main

research topics are evolutionary computation applications and network optimization

using soft computing.

779

received his Master degree in telecommunication engineering from Universidad Politecnica de Madrid

(UPM) in 1993 and his Ph.D. degree from

the same university in 1997. He is an Associate Professor at Universidad Carlos III de

Madrid. His current research focuses on the

application of soft computing techniques

(NN, evolutionary computation, fuzzy logic

and multiagent systems) to radar data processing, navigation, and air trac management. He joined the

Computer Science Department of Universidad Carlos III de

Madrid in 1993, being enrolled in the Systems, Complex, and

Adaptive Laboratory. He has also worked in the Signal Processing

and Simulation Group of UPM since 1992, participating in several

national and European projects related to air trac control. He is

the author of up to 10 journal papers and 70 conference papers.

Gonzalo de Miguel Vela received his

telecommunication engineering degree in

1989 and his Ph.D. degree in 1994 from

Universidad Politecnica de Madrid. He is

currently a Professor in the Department

of Signals, Systems, and Radiocommunications of the same university and is

a member of the Data Processing and

Simulation Research Group at the Telecommunication School. His fields of interest

and activity are radar signal processing and data processing for air

trac control applications.

Jose R. Casar Corredera received his graduate degree in telecommunications engineering in 1981 and his Ph.D. degree in 1983

from the Universidad Politecnica de Madrid

(UPM). He is a Full Professor in the Department of Signals, Systems, and Radiocommunications of UPM. At the present time,

he is Adjunct to the Rector for Strategic Programs and Head of the Signal and Data Processing Group at the same university. His research interests include radar technologies, signal and data processing, multisensory fusion, and image analysis both for civil and defence applications. During 1993, he was Vice Dean for Studies and

Research at the Telecommunications Engineering School of UPM.

During 1995, he was Deputy Vice President for Research at UPM

and from 1996 to February 2000 Vice President for Research at

UPM.

c 2003 Hindawi Publishing Corporation

Algorithm

Gianluca Pignalberi

Dipartimento di Informatica, Universit`a di Roma La Sapienza, Via Salaria, 113 00198 Roma, Italy

Email: pignalbe@dsi.uniroma1.it

Rita Cucchiara

Dipartimento di Ingegneria dellInformazione, Universit`a di Modena e Reggio Emilia, Via Vignolese, 905 41100 Modena, Italy

Email: rita.cucchiara@unimo.it

Luigi Cinque

Dipartimento di Informatica, Universit`a di Roma La Sapienza, Via Salaria, 113 00198 Roma, Italy

Email: cinque@dsi.uniroma1.it

Stefano Levialdi

Dipartimento di Informatica, Universit`a di Roma La Sapienza, Via Salaria, 113 00198 Roma, Italy

Email: levialdi@dsi.uniroma1.it

Received 1 July 2002 and in revised form 19 November 2002

Several range image segmentation algorithms have been proposed, each one to be tuned by a number of parameters in order

to provide accurate results on a given class of images. Segmentation parameters are generally aected by the type of surfaces

(e.g., planar versus curved) and the nature of the acquisition system (e.g., laser range finders or structured light scanners). It is

impossible to answer the question, which is the best set of parameters given a range image within a class and a range segmentation

algorithm? Systems proposing such a parameter optimization are often based either on careful selection or on solution spacepartitioning methods. Their main drawback is that they have to limit their search to a subset of the solution space to provide an

answer in acceptable time. In order to provide a dierent automated method to search a larger solution space, and possibly to

answer more eectively the above question, we propose a tuning system based on genetic algorithms. A complete set of tests was

performed over a range of dierent images and with dierent segmentation algorithms. Our system provided a particularly high

degree of eectiveness in terms of segmentation quality and search time.

Keywords and phrases: range images, segmentation, genetic algorithms.

1. INTRODUCTION

Image segmentation problems can be approached with several solution methods. The range image segmentation subfield has been addressed in dierent ways. But, since an algorithm should work correctly for a large number of images in

a class, such a program is normally characterized by a high

number of tuning parameters in order to obtain a correct, or

at least satisfactory, segmentation.

Usually the correct set of parameters is given by the developers of the segmentation algorithm, and it is expected

to give satisfactory segmentations for the images in the class

used to tune the parameters. But it is possible that, given

changing input image class, the results are not satisfactory.

To avoid exhaustive test tuning, an expert system to tune parameters should be proposed. In this way, it should be pos-

work correctly with a chosen class of images.

Several expert systems have been proposed by other

teams. We can quote [1] that performs the tuning of a color

image segmentation algorithm by a genetic algorithm (GA).

The same technique can be applied to range segmentation algorithms. Up till now, only techniques that partition the parameter space and work on a successive approximation have

been used (such as in [2, 3, 4, 5]). Such techniques obtain results similar to those provided by the algorithm teams tuning.

In this paper, we propose a tuning system based on GAs.

To prove the validity of this method, we will show results

obtained using well-tuned segmentation algorithms of range

images (in particular the ones proposed at the University of

Bern and University of South Florida). Genetic solutions are

evaluated according to a fitness function that accounts for

dierent types of errors such as under/oversegmentation or

miss-segmentation.

The paper is organized as follows. In Section 2, we summarize the related works. In Section 3, we describe in detail

our approach. In Section 4, we show the experimental results,

while in Section 5, we present our conclusions.

2.

RELATED WORKS

Range images are colored according to the distance from the

sensor that scans the image. In fact, each pixel in a range image indicates the value of the distance from the sensor to the

foreground object point. Image segmentation is the refinement of an image into patches corresponding to the represented regions. So the range image segmentation algorithm

aims at partitioning and labeling range images into surface

patches that correspond to surfaces of 3D objects.

Surface segmentation is still a challenging problem. Currently, many dierent approaches have been proposed. The

known algorithms devoted to range segmentation may be

subdivided into at least three broad categories [6]:

(1) those based on a region-growing strategy,

(2) those based on clustering method,

(3) those based on edge detection and completion followed by surface filling.

Many algorithms addressing range segmentation have

been proposed. In [6], there is a complete analysis of four

segmentation algorithmsfrom the University of South

Florida (USF), the University of Bern (UB), the Washington State University (WSU), and the University of Edinburgh

(UE). The authors show that a careful parameter tuning has

to be performed according to the chosen segmentation algorithm and image set. Such algorithms are based on the

above methods, and show dierent performances and results

in terms of segmentation quality and segmentation time.

Jiang and Bunke [7] describe an evolution of the segmentation algorithm built at the University of Bern and in

[5], the same segmentation algorithm is used for other tests.

Recently, a dierent segmentation algorithm was presented,

based on the scan-line grouping technique [8], but using a

region-growing strategy and showing good segmentation results and a quasi-real-time computation capability. Zhang

et al. [9] presented two algorithms, both edge based, segmenting noisy range images. By these algorithms, the authors investigated the use of the intensity edge maps (IEMs)

in noisy range image segmentation, and the results compared

against the corresponding ones are obtained without using

IEMs. Such algorithms use watershed and scan-line grouping techniques. Chang and Park [10] proposed a segmentation of range images based on the fusion of range and

intensity images, and the estimation of parameters for surface patches representation is performed by a least-trimmed

squares (LTS) method. Baccar et al. [11] describe a method

781

to extract, via classification, edges from noisy range images. Several algorithms (particularly color segmentation algorithms) are described or summarized in [12].

Parameters tuning is still a main task, and a possible solution is proposed. A dierent method to tune set parameters

is given by Min et al. in [2, 3, 4]. The main drawback seems

to be that a limited subset of the complete solution space is

allowed to be explored, but exposes the method to the possibility of missing the global optimum or a good enough local

optimum. But such a method is fast and ecient enough to

represent a fine-tuning step: given a set of rough local suboptima, the algorithm proposed in [2] could quickly explore a

limited space around these suboptima to reach, if they exist,

local optima.

In [6], for the first time, an objective performance comparison of range segmentation algorithms has been proposed. Further results on such comparison have been proposed in [3, 4, 13, 14]. Another comparison has been presented in [15], where another range segmentation algorithm

is proposed. This is based on a robust clustering method

(used also for other tasks). But the need for tuning algorithm

parameters is still present.

2.2.

to image segmentation

GA is a well-known spread technique for exploring in parallel a solution space by encoding the concept of evolution in

the algorithmic search: from a population of individuals representing possible problem solutions, evolution is carried out

by means of selection and reproduction of new solutions. Basic principles of GAs are now well known. Quoted references

are the books of Goldberg [16] and Michalewicz [17]; a survey is presented in [18], while a detailed explanation of a basic GA for solving NP-hard optimization problem, presented

by Bhanu et al., can be found in [1].

Many GA-driven segmentation algorithms have been

proposed in the literature; in particular, an interesting solution was presented by Yu et al. [19], an algorithm that can

segment and reconstruct range images via a method called

RESC (RESidual Consensus). Chun and Yang [20] presented

an intensity image segmentation by a GA split-and-merge exploiting strategies; and Andrey and Tarroux [21] proposed

an algorithm which can segment intensity images by including production rules in the chromosome, that is, a data string

representing all the possible features present in a population

member. Methods for segmenting textured images are described by Yoshimura and Oe [22] and Tseng and Lai [23].

The first one adopts a small region-representing chromosome, while the second one uses GAs to improve the iterated conditional modes (ICM) algorithm [24]. Cagnoni et al.

[25] presented a GA based on a small set of manually traced

contours of the structure of interest (anatomical structures

in three-dimensional medical images). The method combines the good trade-o between simplicity and versatility

oered by polynomial filters with the regularization properties that characterize elastic-contour models. Andrey [26]

proposed another interesting work, in which the image to be

782

regions with dierent characteristics are presented as a set of

ecological niches. A GA is then used to evolve a population

distributed all over this environment. The GA-driven evolution leads distinct species to spread over dierent niches.

Consequently, the distribution of the various species at the

end of the run unravels the location of the homogeneous regions on the original image. The method has been called selectionist relaxation because the segmentation emerges as a

by-product of a relaxation process [27] mainly driven by selection.

As previously stated, the algorithm presented in [1] tunes

a color-image segmentation algorithm, namely, phoenix

[28], by a chromosome formed by the program parameters,

and not formed by image characteristics as in [19, 20, 21].

A complete survey on GA used in image processing is that

one compiled by Alander [29].

3.

ENVIRONMENT

Using the same rationale as in [1], we adopted a GA for tuning the set of parameters of a range segmentation algorithm.

Dierent approaches to the tuning of parameters could

be represented by evolutionary programming (EP) and evolution strategy (ES).

The first one places emphasis on the behavioral linkage

between parents and their osprings (the solutions). Each

solution is replicated into a new population and is mutated

according to a distribution of mutation types. Each ospring

solution is assessed by computing its fitness. Similarly, the

second one tries random changes in the parameters defining

the solution, following the example of natural mutations.

Like both ES and EP, GA is a useful method of optimization when other techniques, such as gradient descent or direct analytical discovery, are not possible. Combinatoric and

real-valued function optimization in which the optimization

surface or fitness landscape is rugged, possessing many locally

optimal solutions, are well suited for GA.

We chose GA because it is a well-tested method in image

segmentation and a good starting point to explore the evolutionary framework.

Because of the universal model, we have the possibility of changing the segmentation algorithm with few consequent changes in the GA code. These changes mainly involve

the chromosome composition and the generation definition.

The fitness evaluation has been modeled for the problem of

range segmentation and can be kept constant as the reproduction model. This is one of the features of our proposal

that we called GASE or genetic algorithm segmentation environment (introduced as GASP in [30]).

The main goal of GASE is to suggest a signature for a class

of images, that is, the best fitted set of parameters performing

the optimal segmentation. In this way, when our system finds

a good segmentation for an image or for a particular surface,

we can say that the same parameters will work correctly for

the same class of images or for the same class of surfaces (i.e.,

all the surfaces presenting a big curvature radius).

3.1.

In Figure 1, we show the architecture of our system. Following the block diagram, we see that an input image Ii is

first segmented by a program s (range segmentation algorithm) with a parameter set sj , producing a new image having labeled surface patches Misj . All such segmented images

are stored in a database that we call phenotype repository.

Briefly, we may write

Misj = segmentation s, sj , Ii .

(1)

means of the so-called fitness evaluation (in block geneticbased learning) computing a score Fisj by comparing the

segmented image Misj with the ground truth segmented image Gi . We assume that our fitness function evaluates a

cost, therefore positively valued (or zero valued if the segmented image coincides exactly with the ground truth one).

Thus

Fisj 0.

(2)

This process is fulfilled for all available images with different parameter sets. The sets that produce the best results

(called w ) are stored in the so-called final genotype repository (if fitness function is under a given threshold). Once

the score is assigned, a tuple Pi j containing the genotype,

the score value, the phenotype identifier, and the generation (sj , Fisj , i j, k) is written in a database called evaluation

repository. The genetic computation selects two individuals to be coupled among the living ones (mating individuals

selection); these genotypes are processed by the crossover

block that outputs one or more osprings that could be mutated. The generated individuals will be the new genotypes

sj in the next generation step.

At the end of a generation, a to-be-deleted individuals

selection is performed. The decision on which individuals

are to be erased from the evaluation repository is made by

fixing a killing probability pk depending on the fitness and

the age of the individuals (their k value). If an individual has

a score greater than pk , the solution it represents will be no

longer considered. In this way, we have a limited number of

evaluated points in the solution space.

3.2.

GASE features

designed. Among others, we mention the fitness function,

the chromosome, described in Sections 3.3 and 3.4, and the

crossover.

The fitness function is a heuristic function that indicates

to the GA whether an individual fits or not the environment.

The chromosome is the data structure that contains the characters of the individuals. The crossover is the method that indicates how parents characteristics are inherited by children.

For this work, we used modified versions of multiple point

crossover [31] and uniform crossover [32], as described in

[30].

783

sj

Age counter k

GASE

Segmentation

algorithm

s

Crossover

sj

Misj

Ii

Mutation

To-be-deleted

individuals

selection

Reproduction

Phenotype

repository

i , i

Pin

Mating

individuals

selection

Pi , Pin

Gi

Fitness

evaluation

w

Training set

& prototype

repository

Fisj

Genetic evolution

Evaluation

repository

Pi j = (2j , Fisj , i j, k)

Genetic-based learning

(best parameters)

The most critical step in the genetic evolution process is

the definition of a reliable fitness function which ensures

monotonousness with respect to the improvement provided

by changing the segmentation parameters. The fitness function could be used for comparing both dierent algorithms

and dierent parameter sets within the same algorithm. In

[6] the problem of comparing range segmentation algorithms has been thoroughly analyzed, nevertheless the authors evaluations take into account a number of separate

performance figures and no global merit value is provided.

More precisely, the authors consider five figures that are

functions of a precision percentage:

(1)

(2)

(3)

(4)

(5)

correct segmentation,

oversegmentation,

undersegmentation,

miss-segmentation,

noise segmentation.

then guide our feedback loop within the optimization process, and therefore, we define a unique performance value

specifically accounting for all points. In [33] and in [34] a

function assigning a scalar to a segmentation is used. Particularly in [34], that function is the probability error between the ground truth and the machine-segmented image.

But such a way of assessing fitness is judged not suitable [6].

This means that a more robust way to have a scalar could

be to order a vector of properties. Of course the ordering of

vectors is not straightforward without using particular techniques; one of them could be to adopt a weighted sum of the

components.

We define the fitness function as a weighted sum of a

number of components:

F = w1 C + w2 Hu + w3 Ho + w4 U :

wi = 1,

(3)

i=1

single components.

The fitness takes into account two levels of errors (and

therefore is a cost to be minimized); the former is a measure

at pixel level computed with a pixel-by-pixel comparison, the

latter is a measure at surface level considering the number of

computed surfaces. At the pixel level, C is the cost associated

with erroneously segmented pixels and U accounts for unsegmented pixels. At the surface levels, we add two factors

(handicaps), one due to undersegmentation (Hu ) and one

due to oversegmentation (Ho ).

Let G be the ground truth image, having NG regions

called RGi composed by PGi pixels, i = 1, . . . , NG , and let MS

be the machine-segmented image, having NM regions called

RM j composed by PM j pixels, j = 1, . . . , NM . We define the

overlap map O so that

(4)

with the same coordinates in the two regions is the value Oi j .

784

is straightforward that if there is no overlap between the two

regions, Oi j = 0; while in case of complete overlap, Oi j =

PGi = PM j .

Starting from Oi j , we search the index x j for all RM j : x j =

argmaxNi=G1 (Oi j ) to compute the cost C:

NM

PGx j Ox j j

j =1

C=

NM

(5)

real and the ideal segmentation at pixel level.

The term U accounts for the unlabeled pixels, that is,

those pixels that at the end of the process do not belong to

any region (this holds only for the USF segmentation algorithm since the UB segmentation algorithm allocates all unlabeled pixels to the background region):

U=

NM

i=1

Pi

NG

Oi j .

(6)

j =1

entries mi j so that

mi j =

4.

if i = argmaxNj =M1 Oi j ,

(7)

0 otherwise.

The handicap Hu is accounting for the number of undersegmented regions (those which appear in the resulting image as a whole whilst separated in the ground truth image):

(

Hu = k # RM j :

NG

mi j > 1, j = 1, . . . , NM .

(8)

i=1

is set to 1, while more entries in a column can be set to 1

if undersegmentation occurs and a segmented region covers

more ground truth regions.

Finally, Ho is a handicap accounting for the number of

oversegmented regions (those which appear in ground truth

image as a whole whilst split in the resulting image):

(

Ho = k # RM j :

NG

chromosome manipulation, we should use a binary coding, but since some genes (i.e., parameters) could assume

real values, this coding is not sucient. So we decided to

adopt an extended logical binary coding in order to represent real values with a fixed-point code (with a defined number of decimals). Thus we define the symbol set as {0, 1, dot}

to allow a representation (of fixed but arbitrary precision)

of the decimal of the number. The choice of a fixed precision could seem wrong, but we can consider that, beyond

a certain precision, segmentation algorithm performances

are not aected. We could have used a floating-point representation of the chromosome, as suggested in [36], but in

the case we studied, a fixed-point representation seems to

be sucient. The binary strings are formed by the juxtaposition of BCD-coded genes, memory consuming but giving accuracy to and from decimal conversion. The choice of

extending the symbols set including dot was a help for visual inspection of the created population databases (listed in

Figure 1).

Our chromosome contains all the parameters (their

meanings are listed in Tables 1 and 2) of the chosen segmentation algorithm. In this way, the solution spaces considered

are n-dimensional with n = 5 for USF and n = 10 for UB.

mi j = 0, j = 1, . . . , NM .

(9)

i=1

The handicaps Ho and Hu are both multiplied by a constant k just to enlarge the variability range.

Some results about the eectiveness of the adopted fitness

function have been presented in [35].

3.4. Coding the chromosomes

One of the main tasks in GASE was to code the chromosome,

that is, to code the parameter set for a given segmentation

algorithm.

EXPERIMENTAL RESULTS

Experiments carried out on GASE are used as a benchmark of the Michigan State University/Washington State

University synthetic image database (that we will refer

to as MSU/WSU database, http://sampl.eng.ohio-state.edu/

sampl/data/3DDB/RID/index.htm) and as a subset of the

University of Bern real database (referred to as ABW). The

tests performed are very time consuming since each segmentation process is iterated for a single experiment many times

(i.e., for each individual of the solution population and for

each generation).

Since we tested our GA with both a fixed and random

number of children crossover, according to [30], we have to

use an alternative definition of generation. The term generation in GAs is often used as a synonym of the iteration

step and is related to the process of creating a new solution. In our case, a generation step is given by the results

obtained in a fixed time slice. In this manner, we can establish a time slice in function of the reference workstation; for

instance, with a standard PC (AMD Duron 700 MHz) running Linux OS, we could define the time slice as one minute

of computation. In order to compare the ecacy and eciency of results, we will define a convergence trend maximum time to get the optimal solution in a given Max G

generations.

4.1.

The first experiment was the tuning of the UB segmentation algorithm [7]. This algorithm initially tries to detect

the edges (jump and crease [37]) of the segmenting image

by computing the scan lines. After finding the candidates

for area borders, it accomplishes an edge-filling process. This

785

Name

Range

Meaning

N

Tpoint

Tperp

Tangle

Tarea

WINSIZE

MAXPTDIST

MAXPERPDIST

MAXANGLE

MINREGPIX

212

0

0

0.0180.0

0

Maximum point-to-point distance between pixel and 4-connected neighbor in region

Maximum perpendicular distance between pixel and plane equation of grown region

Maximum angle between normal of pixel and normal of grown region

Maximum number of pixels to accept or reject a region

Variable Name

Meaning

Variable type

Range

Th toleran

float

0.515.0

Th length

Th jump

Minimum distance for jump edges

int

float

3

1.020.0

Th crease

Th area

Th morph

Minimum number of pixels for a valid surface

Number of postprocessing morphological operators

float

int

float

0.0180.0

0

1.03.0

Th PRMSE

Th Pavgerr

Plane region acceptance (average error)

float

float

0.110.0

0.0510.0

Th CRMSE

Th Cavgerr

Curve region acceptance (average error)

float

float

0.110.0

0.0510.0

The range is limited according to the observed lack of meaning of greater values when segmenting MSU/WSU images, so the shown limits are less than

possible.

segmentation algorithm is capable of segmenting curved surfaces and the available version [38] can segment images of the

GRF2-K2T database (named after the brand and model of

the structured light scanner used). We used a version, slightly

modified at the University of Modena, which is able to segment also synthetic images of the MSU/WSU database. A set

of 35 images was chosen and a tuning task as in [6] was executed.

While the tuning done should provide very good results,

it is our opinion that a training set should not be too large.

We then chose a subset of 6 images as our training set. This

set was input to GASE, and the resulting parameters set were

used to segment the test set (formed by the remaining 29 images) and to find the most suitable set.

We fixed our generation in 1 minute and the maximum

number of generations in 30, that is to say, about 30 minutes

of computation for every image of the training set. It took a

total of about 3 hours to obtain 6 possible solutions and to

select the most suitable for the test set. During this time our

algorithm performed about 10000 segmentations on the images. An exhaustive search should explore all the enormous

space of solution (the space has 10 dimensions, and one parameter potentially ranges from 0 to ) and all the instances

of the test set. In our case, the exhaustive search was substi-

test an individual on all images and measure the fitness as a

function of the goodness over the whole training set.

As an acceptable approximation, to save computational

time, we evaluated the fitness of every individual, applied on

a single image at a time. We assumed that, thanks to the genetic evolution, when the individual genotype becomes common in the population, it will be tested on dierent images.

At the end, the best scored individuals are tested on all images

of the training set and the one that outperforms the others in

average is selected as the best.

In Table 3, we show the parameters used for this test.

With original opt. val. we refer to the parameters tuned by

the algorithm author, while with GASE opt. val. we refer to

those tuned by GASE. In Table 4, we show the average scores

obtained in this test. Although the improvement could seem

poor, it is not because of the presence of images with very

dierent characteristics, which were not considered in the

training set. As a matter of fact, the fitness improvement is

in most of the cases of one or more units (see Figures 2 and

3 where original and GASE opt. val. are compared). The best

improvement was of 11.26 points, while in one case only the

GASE optimization generated the worst result with respect

to the manual selection.

786

authors and by GASE.

Parameter

7.5

3.61

10.0

30.0

4.55

36.78

Th SegmToler

Th Jump

Th Crease

Th PRMSE

Th PAvErr

Th CRMSE

1.11

1.07

1.11

0.51

0.21

0.57

Th CAvErr

Th PostprFact

Th SegmLen

1.09

2.0

3

0.45

1.79

2

Th RegArea

100

by GASE opt. val.

Parameters set

Original

GASE

Average fitness

15.96

15.04

val. Fitness = 8.42.

val. Fitness = 7.65.

and by GASE.

Parameter

WINSIZE

MAXPTDIST

10

12.0

9

13.2

MAXPERPDIST

MAXANGLE

MINREGPIX

4.0

25.0

500

5.3

11.45

482

4.2.

val. Fitness = 4.61.

Fitness = 3.91.

The second experiment was performed on the USF segmentation algorithm [6]. Based on a region growing strategy, it computes the normal vector for each pixel within a

parametric-sized window. After that first computation, it selects seed points on the basis of a reliability measure. From

these seed points, it accomplishes the region growing, aggregating surfaces until at least one of four parametric criteria

is met. This segmentation algorithm has been tuned using a

set of parameters proposed by its authors. As we can see in

[6], the given results are very impressive, so we knew how

dicult it will be to improve them. Nevertheless, we performed the following experiment: given the original training

787

Table 6: Average results of USF segmentation algorithm with original opt. val. and GASE opt. val. on 10 ABW images at 80% of compare

tolerance (we recall that tool measures segmentation algorithm performances with respect to a certain precision tolerance, ranging from 51

to 95%).

Parameters set GT regions

Original

GASE

20.1

20.1

Correct detection

Oversegmentation

Undersegmentation

Missed

Noise

13.1

12.9

1.24 (0.96)

1.27 (0.99)

0.1

0.1

0.0

0.0

6.9

7.1

2.8

3.7

as our training set and the other 9 as the test set. Then we

compared the results on this subset to the corresponding former results on the same subset, using the comparison tool

presented in [6]. The comparison tool considers five types

of region classification: correct detection, oversegmentation,

undersegmentation, miss-segmentation, and noise segmentation. When all region classifications have been determined,

a metric describing the accuracy of the recovered geometry is

computed; any pair of regions R1 and R2 in the ground truth

image, representing adjacent faces of the same object, have

their angle An recorded in the truth data. If R1 and R2 are

classified as correct detections, the angle Am between the surface normals of their corresponding regions in the machinesegmented image is computed. Then |An Am | is computed

for every correct detection classification. The number of angle comparisons, the average error, and the standard deviation are reported, giving an indirect estimation of the accuracy of the recovered geometry of the correctly segmented

portion of image.

The set as tuned by GASE is in Table 5, and we refer to

as GASE opt. val. The same table also includes the parameters as tuned in [6] which are referred as original opt. val.

The results are not better than those presented in [6], but in

a limited amount of time (we fixed the search in 15 generations), we reached a good result considering that the solution

space was larger than that considered in [6]. Moreover, no

information is given about the time spent to select the solution space, while an average time can be easily determined

to explore the whole solution space to select the original

opt. val.

In Table 6, we present the results determined by the two

sets with a precision tolerance of 80% (see [6]). In Figure 4,

we show the plots corresponding to the experiment. The

comparison tool provides five error measures, in addition to

a measure of correctness. All these measures are related with

a tolerance percentage. Plots of Figures 4a, 4b, 4c, 4d, and

4e show the results on the training set of the original opt.

val. (curve labeled as HE) versus GASE opt. val. (with label

GA). The comparison is very interesting, especially considering that the heuristic selection was performed on a small

solution space and tuned on all 10 images, while the GASE

one, although optimized by GAs, was tuned on a single image only.

In particular, Figure 4a indicates that both parameter

sets achieve the same number of correct instances over the

training set, while Figures 4b and 4c demonstrate that, for

problems of over- and undersegmentation, GASE and orig-

inal opt. val. have an opposite behavior since GASE produces less undersegmentation errors but higher oversegmentation. Finally, the last two plots show that there is

no noticeable dierence in noise segmentation and misssegmentation.

5.

both for the selection of the more appropriate algorithm (region growing, edge filling, clustering, etc.) and for the obtained accuracy. A variety of systems to perform this task

have been presented in the literature (we recall [6, 15]), and

all of them need an accurate parameters tuning, according to

the image characteristics.

A tool to compare results was proposed in [6], and it has

been used to address the parameters tuning (as in [2, 3, 4]),

using only one of the given measures. The tuning methods

are based either on careful selection or on solution spacepartitioning search which limits the dimensions of the solution space.

We proposed an automated search method, based on genetic algorithms, that allows us to search a large solution

space while requiring a manageable amount of computation

time (according to the chosen segmentation algorithm). To

address the search, we used a fitness function that combines

dierent measures given by the comparison tool (although

using a dierent source code). We thus implemented a system called GASE to test dierent segmentation algorithms,

namely, UB and USF.

We saw that for the UB, we obtained excellent results,

improving segmentation quality and the speed of segmentation. For the USF, we obtained reasonable results, similar to the one proposed by the authors, but without having any knowledge about the nature of the parameters. In

fact, GAs start from random values of the parameter set and

are able to reach a similar solution in relatively few generations. Finally, embedded in GASE and as a stand-alone

tool, an algorithm to robustly award a scalar value to a

segmentation was proposed.

We believe that this work provides the basis to design a

wizard (or expert system) helping human operators in segmenting images. Our final aim is to build an interactive system that, after an unsupervised training time, will help human operators in the task of obtaining good segmentations.

The expert system will provide the framework for the operator to decide the parameters to segment a single or a subset

of surfaces in a complex scene (as done in [39]).

788

ABW-structured light images

instances

14

12

10

8

6

4

2

50

55

60

65 70

75

80

85

Compare tool tolerance (%)

0.45

Average number of oversegmentation

instances

16

90

95

100

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0

50

55

HE

GA

90

95

100

95

100

0.3

16

14

0.25

Average number of noise

instances

instances

65 70 75 80 85

Compare tool tolerance (%)

HE

GA

0.2

0.15

0.1

0.05

0

50

60

12

10

8

6

4

2

55

60

65 70 75 80

85

Compare tool tolerance (%)

90

95

100

0

50

55

HE

GA

60

65

70

75 80 85

Compare tool tolerance (%)

90

HE

GA

18

Average number of missed instances

16

14

12

10

8

6

4

50

55

60

65

70

75

80

85

Compare tool tolerance (%)

90

95

100

HE

GA

(e) Average missed regions on 10 ABW images.

Figure 4: Results, as measured by the comparison tool, obtained by the original opt. val. (labeled HE) and GASE opt. val. (labeled GA)

on 10 images of the ABW database.

REFERENCES

[1] B. Bhanu, S. Lee, and J. Ming, Adaptive image segmentation

using a genetic algorithm, IEEE Trans. Systems, Man, and

Cybernetics, vol. 25, no. 12, pp. 15431567, 1995.

[2] J. Min, M. W. Powell, and K. W. Bowyer, Progress in automated evaluation of curved surface range image segmentation, in Proc. IEEE International Conference on Pattern Recognition (ICPR 00), pp. 16441647, Barcelona, Spain, September 2000.

[3] J. Min, M. W. Powell, and K. W. Bowyer, Automated performance evaluation of range image segmentation, in IEEE

Workshop on Applications of Computer Vision, pp. 163168,

Palm Springs, Calif, USA, December 2000.

[4] J. Min, M. W. Powell, and K. W. Bowyer, Objective, automated performance benchmarking of region segmentation algorithms, in Proc. International Conf. on Computer Vision,

Barcelona, Spain, 2000.

[5] X. Jiang, An adaptive contour closure algorithm and its experimental evaluation, IEEE Trans. on Pattern Analysis and

Machine Intelligence, vol. 22, no. 11, pp. 12521265, 2000.

[6] A. Hoover, G. Jean-Baptiste, X. Jiang, et al., An experimental

comparison of range image segmentation algorithms, IEEE

Trans. on Pattern Analysis and Machine Intelligence, vol. 18, no.

7, pp. 673689, 1996.

[7] X. Jiang and H. Bunke, Edge detection in range images based

on scan line approximation, Computer Vision and Image Understanding, vol. 73, no. 2, pp. 183199, 1999.

[8] X. Jiang, H. Bunke, and U. Meier, High-level feature based

range image segmentation, Image and Vision Computing, vol.

18, no. 10, pp. 817822, 2000.

[9] Y. Zhang, Y. Sun, H. Sari-Sarraf, and M. A. Abidi, Impact of

intensity edge map on segmentation of noisy range images,

in Three-Dimensional Image Capture and Applications III, vol.

3958 of SPIE Proceedings, pp. 260269, San Jose, Calif, USA,

January 1990.

[10] I. S. Chang and R.-H. Park, Segmentation based on fusion of

range and intensity images using robust trimmed methods,

Pattern Recognition, vol. 34, no. 10, pp. 19511962, 2001.

[11] M. Baccar, L. A. Gee, and M. A. Abidi, Reliable location and

regression estimates with application to range image segmentation, Journal of Mathematical Imaging and Vision, vol. 11,

no. 3, pp. 195205, 1999.

[12] H. D. Cheng, X. Jiang, Y. Sun, and J. Wang, Color image

segmentation: advances and prospects, Pattern Recognition,

vol. 34, no. 12, pp. 22592281, 2001.

[13] X. Jiang, K. W. Bowyer, Y. Morioka, et al., Some further results of experimental comparison of range image segmentation algorithms, in Proc. IEEE International Conference on

Pattern Recognition (ICPR 00), vol. 4, pp. 877881, Barcelona,

Spain, September 2000.

[14] M. W. Powell, K. W. Bowyer, X. Jiang, and H. Bunke, Comparing curved-surface range image segmenters, in Proc. International Conf. on Computer Vision, pp. 286291, Bombay,

India, January 1998.

[15] H. Frigui and R. Krishnapuram, A robust competitive clustering algorithm with applications in computer vision, IEEE

Trans. on Pattern Analysis and Machine Intelligence, vol. 21, no.

5, pp. 450465, 1999.

[16] D. E. Goldberg, Genetic Algorithms in Search Optimization

and Machine Learning, Addison-Wesley, Reading, Mass, USA,

1989.

[17] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, Springer-Verlag, New York, NY, USA, 2nd

edition, 1994.

789

[18] M. Srinivas and L. M. Patnaik, Genetic algorithms: a survey,

IEEE Computer, vol. 27, no. 6, pp. 1726, 1994.

[19] X. Yu, T. D. Bui, and A. Krzyzak, Robust estimation for range

image segmentation and reconstruction, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 16, no. 5, pp. 530

538, 1994.

[20] D. N. Chun and H. S. Yang, Robust image segmentation using genetic algorithm with a fuzzy measure, Pattern Recognition, vol. 29, no. 7, pp. 11951211, 1996.

[21] P. Andrey and P. Tarroux, Unsupervised image segmentation

using a distributed genetic algorithm, Pattern Recognition,

vol. 27, no. 5, pp. 659673, 1994.

[22] M. Yoshimura and S. Oe, Evolutionary segmentation of texture image using genetic algorithms towards automatic decision of optimum number of segmentation areas, Pattern

Recognition, vol. 32, no. 12, pp. 20412054, 1999.

[23] D.-C. Tseng and C.-C. Lai, A genetic algorithm for MRFbased segmentation of multi-spectral textured images, Pattern Recognition Letters, vol. 20, no. 14, pp. 14991510, 1999.

[24] J. Besag, On the statistical analysis of dirty pictures, Journal

of the Royal Statistical Society, vol. 48, no. 3, pp. 259302, 1986.

[25] S. Cagnoni, A. B. Dobrzeniecki, R. Poli, and J. C. Yanch, Genetic algorithm-based interactive segmentation of 3D medical images, Image and Vision Computing, vol. 17, no. 12, pp.

881895, 1999.

[26] P. Andrey, Selectionist relaxation: genetic algorithms applied

to image segmentation, Image and Vision Computing, vol. 17,

no. 3-4, pp. 175187, 1999.

[27] L. S. Davis and A. Rosenfeld, Cooperating processes for lowlevel vision: a survey, Artificial Intelligence, vol. 17, no. 1-3,

pp. 245263, 1981.

[28] K. I. Laws, The Phoenix image segmentation system: description and evaluation, Tech. Rep. 289, SRI International,

Menlo Park, Calif, USA, December 1982.

[29] J. T. Alander, Indexed bibliography of genetic algorithms

in optics and image processing, Tech. Rep. 94-1-OPTICS,

Department of Information Technology and Production Economics, University of Vaasa, Vaasa, Finland, 2000.

[30] L. Cinque, R. Cucchiara, S. Levialdi, S. Martinz, and G. Pignalberi, Optimal range segmentation parameters through

genetic algorithms, in Proc. IEEE International Conference

on Pattern Recognition (ICPR 00), pp. 14741477, Barcelona,

Spain, September 2000.

[31] L. J. Eshelman, R. A. Caruana, and J. D. Schaer, Biases in

the crossover landscape, in Proc. 3rd International Conference

on Genetic Algorithms, J. D. Schaer, Ed., pp. 1019, Fairfax,

Va, USA, June 1989.

[32] G. Syswerda, Uniform crossover in genetic algorithms, in

Proc. 3rd International Conference on Genetic Algorithms, J. D.

Schaer, Ed., pp. 29, Fairfax, Va, USA, June 1989.

[33] M. D. Levine and A. M. Nazif, An experimental rule-based

system for testing low level segmentation strategies, in Multicomputers and Image Processing Algorithms and Programs,

K. Preston and L. Uhr, Eds., pp. 149160, Academic Press,

New York, NY, USA, 1982.

[34] Y. W. Lim and S. U. Lee, On the color image segmentation

algorithm based on the thresholding and the fuzzy C-means

techniques, Pattern Recognition, vol. 23, no. 9, pp. 935952,

1990.

[35] L. Cinque, R. Cucchiara, S. Levialdi, and G. Pignalberi, A

methodology to award a score to range image segmentation,

in Proc. 6th International Conference on Pattern Recognition

and Information Processing, pp. 171175, Minsk, Belarus, May

2001.

[36] F. Herrera, M. Lozano, and J. L. Verdegay, Tackling realcoded genetic algorithms: operators and tools for behavioural

790

analysis, Artificial Intelligence Review, vol. 12, no. 4, pp. 265

319, 1998.

[37] R. Homan and A. K. Jain, Segmentation and classification

of range images, IEEE Trans. on Pattern Analysis and Machine

Intelligence, vol. 9, no. 5, pp. 608620, 1987.

[38] Range image segmentation comparison project, 2002,

http://marathon.csee.usf.edu/range/seg-comp/results.html.

[39] L. Cinque, R. Cucchiara, S. Levialdi, and G. Pignalberi, A decision support system for range image segmentation, in Proc.

3rd International Conference on Digital Information Processing

and Control in Extreme Situations, pp. 4550, Minsk, Belarus,

May 2002.

Gianluca Pignalberi received in 2000 his

degree in computer science, focusing especially on image processing and artificial

intelligence methods, from the University

of Rome La Sapienza. He is a Consultant, and his current interests include language recognition and data compression

techniques, combined with artificial intelligence methods.

Rita Cucchiara graduated magna cum

laude in 1989, with the Laurea in electronic

engineering from University of Bologna and

received the Ph.D. in computer engineering

from University of Bologna in 1993. She was

an Assistant Professor at the University of

Ferrara and is currently an Associate Professor in computer engineering at the Faculty of Engineering of Modena, University

of Modena and Reggio Emilia, Italy, since

1998. Her research activity includes computer vision and pattern

recognition, and in particular image segmentation, genetic algorithms for optimization, motion analysis, and color analysis. She

is currently involved in research projects of video surveillance, domotics, video transcoding for high performance video servers, and

support to medical diagnosis with image analysis. Rita Cucchiara is

a member of the IEEE, ACM, GIRPR (Italian IAPR), and AIxIA.

Luigi Cinque received his Ph.D. degree in

physics from the University of Napoli in

1983. From 1984 to 1990, he was with the

Laboratory of Artificial Intelligence (Alenia

SpA). Presently, he is a Professor at the Department of Computer Science of the University of Rome La Sapienza. His scientific interests cover image sequences analysis, shape and object recognition, image

database, and advanced man-machine interaction. Professor Cinque is presently an Associate Editor of Pattern Recognition Journal and Pattern Recognition Letters. He is a

senior member of IEEE, ACM, and IAPR. He has been in the program committee of many international conferences in the field of

imaging technology, and he is the author of over 100 scientific publications in international journals and conference proceedings.

Stefano Levialdi graduated as a telecommunications engineer from the University

of Buenos Aires in 1959. He has been

at the University of Rome La Sapienza

since 1983, teaching two courses on humancomputer interaction. His research interests

are in visual languages, human-computer

interaction, and usability. He acts as Director of the Pictorial Computing Laboratory

and is a IEEE Life Fellow in 1991 and has

been the General Chair of over 35 international conferences; he will

be the General Chairman of IFIPS Interact 05 Conference to be

held in Rome, Italy.

c 2003 Hindawi Publishing Corporation

Model Using a Genetic Algorithm with Perceptual

Fitness Calculation

Janne Riionheimo

Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3000,

FIN-02015 HUT, Espoo, Finland

Email: janne.riionheimo@hut.fi

Vesa Valim

aki

Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3000,

FIN-02015 HUT, Espoo, Finland

Pori School of Technology and Economics, Tampere University of Technology, P.O. Box 300,

FIN-28101, Pori, Finland

Email: vesa.valimaki@hut.fi

Received 30 June 2002 and in revised form 2 December 2002

We describe a technique for estimating control parameters for a plucked string synthesis model using a genetic algorithm. The

model has been intensively used for sound synthesis of various string instruments but the fine tuning of the parameters has been

carried out with a semiautomatic method that requires some hand adjustment with human listening. An automated method for

extracting the parameters from recorded tones is described in this paper. The calculation of the fitness function utilizes knowledge

of the properties of human hearing.

Keywords and phrases: sound synthesis, physical modeling synthesis, plucked string synthesis, parameter estimation, genetic

algorithm.

1. INTRODUCTION

Model-based sound synthesis is a powerful tool for creating

natural sounding tones by simulating the sound production

mechanisms and physical behavior of real musical instruments. These mechanisms are often too complex to simulate

in every detail, so simplified models are used for synthesis.

The aim is to generate a perceptually indistinguishable model

for real instruments.

One workable method for physical modelling synthesis is

based on digital waveguide theory proposed by Smith [1]. In

the case of the plucked string instruments, the method can

be extended to model also the plucking style and instrument

body [2, 3]. A synthesis model of this kind can be applied to

synthesize various plucked string instruments by changing

the control parameters and using dierent body and plucking models [4, 5]. A characteristic feature in string instrument tones is the double decay and beating eect [6], which

can be implemented by using two slightly mistuned string

models in parallel to simulate the two polarizations of the

transversal vibratory motion of a real string [7].

Parameter estimation is an important and dicult challenge in sound synthesis. Usually, the natural parameter settings are in great demand at the initial state of the synthesis.

When using these parameters with a model, we are able to

produce real-sounding instrument tones. Various methods

for adjusting the parameters to produce the desired sounds

have been proposed in the literature [4, 8, 9, 10, 11, 12].

An automated parameter calibration method for a plucked

string synthesis model has been proposed in [4, 8], and then

improved in [9]. It gives the estimates for the fundamental

frequency, the decay parameters, and the excitation signal

which is used in commuted synthesis.

Our interest in this paper is the parameter estimation of

the model proposed by Karjalainen et al. [7]. The parameters

of the model have earlier been calibrated automatically, but

the fine-tuning has required some hand adjustment. In this

work, we use recorded tones as a target sound with which the

synthesized tones are compared. All synthesized sounds are

then ranked according to their similarity with the recorded

tone. An accurate way to measure sound quality from the

792

viewpoint of auditory perception would be to carry out listening tests with trained participants and rank the candidate

solutions according to the data obtained from the tests [13].

This method is extremely time consuming and, therefore, we

are forced to use analytical methods to calculate the quality of

the solutions. Various techniques to simulate human hearing

and calculate perceptual quality exist. Perceptual linear predictive (PLP) technique is widely used with speech signals

[14], and frequency-warped digital signal processing is used

to implement perceptually relevant audio applications [15].

In this work, we use an error function that simulates

the human hearing and calculates the perceptual error between the tones. Frequency masking behavior, frequency dependence, and other limitations of human hearing are taken

into account. From the optimization point of view, the task

is to find the global minimum of the error function. The

variables of the function, that is, the parameters of the synthesis model, span the parameter space where each point

corresponds to a set of parameters and thus to a synthesized sound. When dealing with discrete parameter values,

the number of parameter sets is finite and given by the product of the number of possible values of each parameter. Using nine control parameters with 100 possible values, a total

of 1018 combinations exist in the space and, therefore, an exhaustive search is obviously impossible.

Evolutionary algorithms have shown a good performance

in optimizing problems relating to the parameter estimation

of synthesis models. Vuori and Valimaki [16] tried a simulated evolution algorithm for the flute model, and Horner et

al. [17] proposed an automated system for parameter estimation of FM synthesizer using a genetic algorithm (GA). GAs

have been used for automatically designing sound synthesis

algorithms in [18, 19]. In this study, a GA is used to optimize

the perceptual error function.

This paper is sectioned as follows. The plucked string

synthesis model and the control parameters to be estimated

are described in Section 2. Parameter estimation problem

and methods for solving it are discussed in Section 3.

Section 4 concentrates on the calculation of the perceptual

error. In Section 5, we discretize the parameter space in a

perceptually reasonable manner. Implementation of the GA

and dierent schemes for selection, mutation, and crossover

used in our work are surveyed in Section 6. Experiments and

results are analyzed in Section 7 and conclusions are finally

drawn in Section 8.

2.

plucked string synthesis in this study. The block diagram

of the model is presented in Figure 1. It is based on digital

waveguide synthesis theory [1] that is extended in accordance

with commuted waveguide synthesis approach [2, 3] to include also the body modes of the instrument in the string

synthesis model.

Dierent plucking styles and body responses are stored as

wavetables in the memory and used to excite the two string

Horizontal polarization

Excitation

database

Sh (z)

mo

mp

out

gc

1 mp

1 mo

Sv (z)

Vertical polarization

x(n)

y(n)

F(z)

zLI

H(z)

models Sh (z) and Sv (z) that simulate the eect of the two

polarizations of the transversal vibratory motion. A single

string model S(z) in Figure 2 consists of a lowpass filter H(z)

that controls the decay rate of the harmonics, a delay line

zLI , and a fractional delay filter F(z). The delay time around

the loop for a given fundamental frequency f0 is

Ld =

fs

,

f0

(1)

implemented by the delay line zLI and the fractional delay filter F(z). The delay line is used to control the integer

part LI of the string length while the coecients of the filter

F(z) are adjusted to produce the fractional part L f [20]. The

fractional delay filter F(z) is implemented as a first-order allpass filter. Two string models are typically slightly mistuned

to produce a natural sounding beating eect.

A one-pole filter with transfer function

H(z) = g

1+a

1 + az1

(2)

(2) determines the overall decay rate of the sound while parameter 1 < a < 0 controls the frequency-dependent decay.

The excitation signal is scaled by the mixing coecients m p

and (1 m p ) before sending it to two string models. Coecient gc enables coupling between the two polarizations.

Mixing coecient mo defines the proportion of the two polarizations in the output sound. All parameters m p , gc , and

mo are chosen to have values between 0 and 1. The transfer

function of the entire model is written as

+ m p 1 mo gc Sh (z)Sv (z),

(3)

793

Parameter

f0,h

f0,v

gh

ah

gv

av

mp

mo

gc

Control

Fundamental frequency of the horizontal string model

Fundamental frequency of the vertical string model

Loop gain of the horizontal string model

Frequency-dependent gain of the horizontal string model

Loop gain of the vertical string model

Frequency-dependent gain of the vertical string model

Input mixing coecient

Output mixing coecient

Coupling gain of the two polarizations

where the string models Sh (z) and Sv (z) for the two polarizations can be written as an individual string model

S(z) =

1

1

zLI F(z)H(z)

(4)

sound synthesis of various plucked string instruments [5, 21,

22]. Dierent methods for estimating the parameters have

been used, but in consequence of interaction between the

parameters, systematic methods are at least troublesome but

probably impossible. The nine parameters that are used to

control the synthesis model are listed in Table 1.

3.

Determination of the proper parameter values for sound synthesis systems is an important problem and also depends on

the purpose of the synthesis. When the goal is to imitate the

sounds of real instruments, the aim of the estimation is unambiguous: we wish to find a parameter set which gives the

sound output that is suciently similar to the natural one in

terms of human perception. These parameters are also feasible for virtual instruments at the initial stage after which the

limits of real instruments can be exceeded by adjusting the

parameters in more creative ways.

Parameters of a synthesis model correspond normally

to the physical characteristics of an instrument [7]. The

estimation procedure can then be seen as sound analysis

where the parameters are extracted from the sound or from

the measurements of physical behavior of an instrument

[23]. Usually, the model parameters have to be fine-tuned

by laborious trial and error experiments, in collaboration

with accomplished players [23]. Parameters for the synthesis model in Figure 1 have earlier been estimated this way

and recently in a semiautomatic fashion, where some parameter values can be obtained with an estimation algorithm while others must be guessed. Another approach is

to consider the parameter estimation problem as a nonlinear optimization process and take advantage of the general searching methods. All possible parameter sets can then

be ranked according to their similarity with the desired

sound.

3.1.

Calibrator

the model, is given here. The fundamental frequency f0 is

first estimated using the autocorrelation method. The frequency estimate in samples from (1) is used to adjust the delay line length LI and the coecients of the fractional delay

filter F(z). The amplitude, frequency, and phase trajectories

for partials are analyzed using the short-time Fourier transform (STFT), as in [4]. The estimates for loop filter parameters g and a are then analyzed from the envelopes of individual partials. The excitation signal for the model is extracted

from the recorded tone by a method described in [24]. The

amplitude, frequency, and phase trajectories are first used to

synthesize the deterministic part of the original signal and

the residual is obtained by a time-domain subtraction. This

produces a signal which lacks the energy to excite the harmonics when used with the synthesis model. This is avoided

by inverse filtering the deterministic signal and the residual

separately. The output signal of the model is finally fed to

the optimization routine which automatically fine-tunes the

model parameters by analyzing the time-domain envelope of

the signal.

The dierence in the length of the delay lines can be estimated based on the beating of a recorded tone. In [25],

the beating frequency is extracted from the first harmonic

of a recorded string instrument tone by fitting a sine wave

using the least squares method. Another procedure for extracting beating and two-stage decay from the string tones is

described by Bank in [26]. In practice, the automatical calibrator algorithm is first used to find decent values for the

control parameters of one string model. These values are also

used for another string model. The mistuning between the

two string models has then been found by ear [5] and the

dierences in the decay parameters are set by trial and error.

Our method automatically extracts the nine control parameter values from recorded tones.

3.2.

Optimization

Instead of extracting the parameters from audio measurements, our approach here is to find the parameter set that

produces a tone that is perceptually indistinguishable from

the target one. Each parameter set can be assigned with a

794

quality value which denotes how good is the candidate solution. This performance metric is usually called a fitness

function, or inversely, an error function. A parameter set is

fed into the fitness function which calculates the error between the corresponding synthesized tone and the desired

sound. The smaller the error, the better the parameter set and

the higher the fitness value. These functions give a numerical grade to each solution, by means of which we are able to

classify all possible parameter sets.

4.

FITNESS CALCULATION

time domain. Since spectra of all musical sounds vary with

time, it is appropriate to calculate the spectral similarity

in short time segments. A common method is to measure

the least squared error of the short-time spectra of the two

sounds [17, 18]. The STFT of signal y(n) is a sequence of

discrete Fourier transforms (DFT)

Y (m, k) =

N

1

m = 0, 1, 2, . . . ,

n=0

(5)

with

wk =

2k

,

N

k = 0, 1, 2, . . . , N 1,

(6)

and H is the hop size or time advance (in samples) per frame.

Integers m and k refer to the frame index and frequency bin,

respectively. When N is a power of two, for example, 1024,

each DFT can be computed eciently with the FFT algorithm. If o(n) is the output sound of the synthesis model and

t(n) is the target sound, then the error (inverse of the fitness)

of the candidate solution is calculated as follows:

L1 N 1

E=

1

1

O(m, k) T(m, k) 2 ,

=

F

L m=0 k=0

(7)

and t(n) and L is the length of the sequences.

4.1. Perceptual quality

The analytical error calculated from (7) is a raw simplification from the viewpoint of auditory perception. Therefore,

an auditory model is required. One possibility would be to

include the frequency masking properties of human hearing

by applying a narrow band masking curve [27] for each partial. This method has been used to speed up additive synthesis [28] and perceptual wavetable matching for synthesis

of musical instrument tones [29]. One disadvantage of the

method is that it requires peak tracking of partials, which

is a time-consuming procedure. We use here a technique

which determines the threshold of masking from the STFT

sequences. The frequency components below that threshold

are inaudible, therefore, they are unnecessary when calculating the perceptual similarity. This technique proposed in [30]

error calculation [18].

4.2.

(1) windowing the signal and calculating STFT,

(2) calculating the power spectrum for each DFT,

(3) mapping the frequency scale into the Bark domain and

calculating the energy per critical band,

(4) applying the spreading function to the critical band

energy spectrum,

(5) calculating the spread masking threshold,

(6) calculating the tonality-dependent masking threshold,

(7) normalizing the raw masking threshold and calculating the absolute threshold of masking.

The frequency power spectrum is translated into the Bark

scale by using the approximation [27]

= 13 arctan

0.76 f

kHz

&

+ 3.5 arctan

f

7.5 kHz

&2

(8)

where f is the frequency in Hertz and is the mapped frequency in Bark units. The energy in each critical band is calculated by summing the frequency components in the critical

band. The number of critical bands depends on the sampling

rate and is 25 for the sample rate of 44.1 kHz. The discrete

representation of fixed critical bands is a close approximation and, in reality, each band builds up around a narrow

band excitation. A power spectrum P(k) and energy per critical band Z() for a 12 milliseconds excerpt from a guitar

tone are shown in Figure 3a.

The eect of masking of each narrow band excitation

spreads across all critical bands. This is described by a spreading function given in [31]

10 log10 B() = 15.91 + 7.5( + 0.474)

+

17.5 1 + ( + 0.474)2 dB.

(9)

spreading eect is applied by convolving the critical band energy function Z() with the spreading function B() [30].

The spread energy per critical band SP () is shown in

Figure 3c.

The masking threshold depends on the characteristics of

the masker and masked tone. Two dierent thresholds are

detailed and used in [30]. For the tone masking noise, the

threshold is estimated as 14.5 + dB below the SP . For noise

masking, the tone it is estimated as 5.5 dB below the SP . A

spectral flatness measure is used to determine the noiselike

or tonelike characteristics of the masker. The spectral flatness

measure V is defined in [30] as the ratio of the geometric

to the arithmetic mean of the power spectrum. The tonality

factor is defined as follows:

%

= min

&

V

,1 ,

Vmax

(10)

795

20

Magnitude (dB)

Magnitude (dB)

20

40

60

20

60

80

80

100

40

63

P(k)

250

1k

Frequency (Hz)

4k

100

6

16k

0

Bark

Z()

(a) Power spectrum (solid line) and energy per critical band

(dashed line).

20

20

Magnitude (dB)

Magnitude (dB)

40

60

80

100

20

40

60

80

63

P(k)

250

1k

Frequency (Hz)

4k

16k

S()

100

20

63

250

1k

Frequency (Hz)

P(k)

(c) Power spectrum (solid line) and spread energy per critical

band (dashed line).

4k

16k

W()

(dashed line).

Figure 3: Determining the threshold of masking for a 12 milliseconds excerpt from a recorded guitar tone. Fundamental frequency of the

tone is 331 Hz.

where Vmax = 60 dB. That is to say that if the masker signal is entirely tonelike, then = 1, and if the signal is pure

noise, then = 0. The tonality factor is used to geometrically weight the two thresholds mentioned above to form the

masking energy oset U() for a critical band

U() = (14.5 + ) + 5.5(1 ).

(11)

estimate the raw masking threshold

R() = 10log10 (SP ())U()/10 .

(12)

normalization procedure used in [30] takes this into account

and divides each component of R() by the number of points

in the corresponding band

Q() =

R()

,

Np

(13)

where N p is the number of points in the particular critical band. The final threshold of masking for a frequency

spectrum W(k) is calculated by comparing the normalized

threshold to the absolute threshold of hearing and mapping from Bark to the frequency scale. The most sensitive

area in human hearing is around 4 kHz. If the normalized

796

Amplitude (dB)

a 4 kHz sinusoidal tone with one bit of dynamic range, it is

changed to the absolute threshold of hearing. This is a simplified method to set the absolute levels since in reality the

absolute threshold of hearing varies with the frequency.

An example of the final threshold of masking is shown

in Figure 3d. It is seen that many of the high partials and

the background noise at the high frequencies are below the

threshold and thus inaudible.

20

40

Perceptual error is calculated in [18] by weighting the error

from (7) with two matrices

G(m, k) =

0

otherwise,

(14)

as defined previously. Matrices are defined such that the full

error is calculated for spectral components which are audible

in a recorded tone t(n) (that is above the threshold of masking). The matrix G(m, k) is used to account for these components. For the components which are inaudible in a recorded

tone but audible in the sound output of the model o(n), the

error between the sound output and the threshold of masking is calculated. The matrix H(m, k) is used to weight these

components.

Perceptual error E p is a sum of these two cases. No error

is calculated for the components which are below the threshold of masking in both sounds. Finally, the perceptual error

function is evaluated as

Ep =

1

Fp

N 1

L1

1

O(m, k) T(m, k) 2 G(m, k)

Ws (k)

L k=0

m=0

O(m, k) T(m, k) 2 H(m, k) ,

(15)

where Ws (k) is an inverted equal loudness curve at sound

pressure level of 60 dB shown in Figure 4 that is used to

weight the error and imitate the frequency-dependent sensitivity of human hearing.

5.

20

63

250

1k

Frequency (Hz)

4k

16k

the inverse of the equal loudness curve at the SPL of 60 dB.

otherwise,

H(m, k)

=

60

reduced by discretizing the individual parameters in a perceptually reasonable manner. The range of parameters can be

reduced to cover only all the possible musical tones and deviation steps can be kept just below the discrimination threshold.

5.1.

Decay parameters

model in Figure 2 have been studied in [32]. Time constant

of the overall decay was used to describe the loop gain

parameter g while the frequency-dependent decay was controlled directly by parameter a. Values of and a were varied

and relatively large deviations in parameters were claimed to

be inaudible. Jarvelainen and Tolonen [32] proposed that a

variation of the time constant between 75% and 140% of the

reference value can be allowed in most cases. An inaudible

variation for the parameter a was between 83% and 116% of

the reference value.

The discrimination thresholds were determined with two

dierent tone durations 0.6 second and 2.0 seconds. In our

study, the judgement of similarity between two tones is done

by comparing the entire signals and, therefore, the results

from [32] cannot be directly used for the parametrization

of a and g. The tolerances are slightly smaller because the

judgement is made based on not only the decay but also the

duration of a tone. Based on our informal listening test and

including a margin of certainty, we have defined the variation

to be 10% for the and 7% for the parameter a. The parameters are bounded so that all the playable musical sounds from

tightly damped picks to very slowly decaying notes are possible to produce with the model. This results in 62 discrete

nonuniformly distributed values for g and 75 values for a, as

shown in Figures 5a and 5b. The corresponding amplitude

envelopes of tones with dierent g parameter are shown in

Figure 5c. Loop filter magnitude responses for varying parameter a with g = 1 are shown in Figure 5d.

5.2.

The fundamental frequency estimate f0 from the calibrator

is used as an initial value for both polarizations. When the

797

0

0.1

0.2

Value of parameter a

Value of parameter g

0.95

0.9

0.85

0.3

0.4

0.5

0.8

0.6

0.75

0

20

40

60

20

Discrete scale

(a) Discrete values for the parameter g when f0 = 331 and the

variation for the time constant is 10%.

10

20

Amplitude (dB)

Amplitude (dB)

60

(b) Discrete values for the parameter a when the variation is 7%.

30

40

12

50

60

40

Discrete scale

10

15

5000

10000

15000

Frequency (Hz)

Time (s)

(c) Amplitude envelopes of tones with dierent discrete values of g.

20000

of a when g = 1.

fundamental frequencies of two polarizations dier, the frequency estimate settles in the middle of the frequencies, as

shown in Figure 6. Frequency discrimination thresholds as

a function of frequency have been proposed in [33]. Also

the audibility of beating and amplitude modulation has been

studied in [27]. These results do not give us directly the discrimination thresholds for the dierence in the fundamental

frequencies of the two-polarization string model, because the

fluctuation strength in an output sound depends on the fundamental frequencies and the decay parameters g and a.

The sensitivity of parameters can be examined when a

synthesized tone with known parameter values is used as a

target tone with which another synthesized tone is compared.

Varying one parameter after another and freezing the others, we obtain the error as a function of the parameters. In

Figure 7, the target values of f0,v and f0,h are 331 and 330 Hz.

The solid line shows the error when f0,v is linearly swept from

327 to 344 Hz. The global minimum is obviously found when

is found when f0,v = 329 Hz, that is, when the beating is similar. The dashed line shows the error when both f0,v and f0,h

are varied but the dierence in the fundamental frequencies

is kept constant. It can be seen that the dierence is more

dominant than the absolute frequency value and have to be

therefore discretized with higher resolution. Instead of operating the fundamental frequency parameters directly, we

optimize the dierence d f = | f0,v f0,h | and the mean frequency f0 = | f0,v + f0,h |/2 individually. Combining previous

results from [27, 33] with our informal listening test, we have

discretized d f with 100 discrete values and f0 with 20. The

range of variation is set as follows:

% &1/3

f0

rp =

which is shown in Figure 8.

10

(16)

798

250

150

Error

Normalized magnitude

200

0.5

100

0.5

50

0

328

0

0.01

0.02

Time (s)

80 Hz

84 Hz

0.03

0.04

f0,v

( f0,v + f0,h )/2

80 + 84 Hz

Maximum

0.5

f0,h

10

9

0

r p+ r p (Hz)

Normalized magnitude

333

target values of f0,v and f0,h are 331 and 330 Hz. The solid line shows

the error when f0,h = 330 and f0,v is linearly swept from 327 to

334 Hz. The dashed line shows the error when both frequencies are

varied simultaneously while the dierence remains similar.

0.5

329

330

331

332

Fundamental frequency f0 (Hz)

0.01

0.011

80 Hz

84 Hz

0.012

Time (s)

0.013

8

7

6

0.014

5

80 + 84 Hz

Maximum

show functions for two single-polarization guitar tones with fundamental frequencies of 80 and 84 Hz. Dash-dotted line corresponds

to a dual-polarization guitar tone with fundamental frequencies of

80 and 84 Hz.

The tolerances for the mixing coecients m p , mo , and gc have

not been studied and the parameters have been earlier adjusted by trial and error [5]. Therefore, no initial guesses are

made for these parameters. The sensitivities of the mixing coecients are examined in an example case in Figure 9, where

m p = 0.5, m p = 0.5, and m p = 0.1. It can be seen that the

parameters m p and mo are most sensitive near the boundaries and the parameter gc is most sensitive near zero. Ranges

for m p and mo are discretized with 40 values according to

125

250

500

Frequency estimate f0 (Hz)

1k

function of frequency estimate from 80 to 1000 Hz.

range of which is limited to 00.5.

Discretizing the nine parameters this way results in 2.77

1015 combinations in total for a single tone. For an acoustic guitar, about 120 tones with dierent dynamic levels and

playing styles have to be analyzed. It is obvious that an exhaustive search is out of question.

6.

GENETIC ALGORITHM

the principle of survival of the fittest [34]. These algorithms

operate on a population of potential solutions improving

799

300

are discussed in [36], where the characteristics of the operators are also described. Few modifications to the original

mutation operators in step 5 have been made to improve the

operation of the algorithm with the discrete grid.

The algorithm we use is implemented as follows.

250

Error

200

150

100

50

0

0.2

0.4

0.6

0.8

Gain

mp

mo

gc

Target values

coupling coecient gc . Target values are m p = mo = 0.5 and gc =

0.1.

0.8

0.6

the analysis methods discussed in Section 3. The range

of the parameter f0 is chosen and the excitation signal is produced according to these results. Calculate

the threshold of masking (Section 4) and the discrete

scales for the parameters (Section 5).

(2) Initialization: create a population of S p individuals

(chromosomes). Each chromosome is represented as

a vector array

x, with nine components (genes), which

contains the actual parameters. The initial parameter

values are randomly assigned.

(3) Fitness calculation: calculate the perceptual fitness of

each individual in the current population according to

(15).

(4) Selection of individuals: select individuals from the

current population to produce the next generation

based upon the individuals fitness. We use the normalized geometric selection scheme [37], where the

individuals are first ranked according to their fitness

values. The probability of selecting the ith individual

to the next generation is then calculated by

Pi = q (1 q)r 1 ,

(17)

q

,

1 (1 q)S p

(18)

0.4

where

0.2

q =

0

0

10

20

Discrete scale

30

40

characteristics of the individuals from generation to generation. Each individual, called a chromosome, is made up of

an array of genes that contain, in our case, the actual parameters to be estimated.

In the original algorithm design, the chromosomes were

represented with binary numbers [35]. Michalewicz [36]

showed that representing the chromosomes with floatingpoint numbers results in faster, more consistent, higher precision, and more intuitive solution of the algorithm. We

use a GA with the floating-point representation, although

the parameter space is discrete, as discussed in Section 5.

We have also experimented with the binary-number representation, but the execution time of the iteration becomes

slow. Nonuniformly graduated parameter space is transformed into the uniform scales where the GA operates on.

The floating-point numbers are rounded to the nearest dis-

probability of selecting the best individual, and r is the

rank of the individual, where 1 is the best and S p is the

worst. Decreasing the value of q slows the convergence.

(5) Crossover: randomly pick a specified number of parents from selected individuals. An ospring is produced by crossing the parents with a simple, arithmetical, and heuristic crossover scheme. Simple crossover

creates two new individuals by splitting the parents in

a random point and swapping the parts. Arithmetical crossover produces two linear combinations of the

parents with a random weighting. Heuristic crossover

produces a single ospring

xo which is a linear extrapolation of the two parents

x p,1 and

x p,2 as follows:

xo = h

x p,2

x p,1 +

x p,2 ,

(19)

x p,2 is not worse than

x p,1 . Nonfeasible solutions are

possible and if no solution is found after w attempts,

the operator gives no ospring. Heuristic crossover

contributes to the precision of the final solution.

800

(6) Mutation: randomly pick a specified number of individuals for mutation. Uniform, nonuniform, multinonuniform, and boundary mutation schemes are

used. Mutation works with a single individual at a

time. Uniform mutation sets a randomly selected parameter (gene) to a uniform random number between

the boundaries. Nonuniform mutation operates uniformly at early stage and more locally as the current

generation approaches the maximum generation. We

have defined the scheme to operate in such a way that

the change is always at least one discrete step. The degree of nonuniformity is controlled with the parameter b. Nonuniformity is important for fine-tuning.

Multi-nonuniform mutation changes all of the parameters in the current individual. Boundary mutation sets a parameter to one of its boundaries and is

useful if the optimal solution is supposed to lie near

the boundaries of the parameter space. The boundary mutation is used in special cases, such as staccato

tones.

(7) Replace the current population with the new one.

(8) Repeat steps 3, 4, 5, 6, and 7 until termination.

Our algorithm is terminated when a specified number of

generations is produced. The number of generations defines

the maximum duration of the algorithm. In our case, the

time spent with the GA operations is negligible compared to

the synthesis and fitness calculation. Synthesis of a tone with

candidate parameter values takes approximately 0.5 second,

while the duration of the error calculation is 1.2 second. This

makes 1.7 second in total for a single parameter set.

7.

to estimate the parameters for the sound produced by the

synthesis model itself. First, the same excitation signal extracted from a recorded tone by the method described in

[24] was used for target and output sounds. A more realistic case is simulated when the excitation for resynthesis is extracted from the target sound. The system was implemented

with Matlab software and all runs were performed on an Intel Pentium III computer. We used the following parameters

for all experiments: population size S p = 60, number of generations = 400, probability of selecting the best individual

q = 0.08, degree of nonuniformity b = 3, retries w = 3,

number of crossovers = 18, and number of mutations = 18.

The pitch synchronous Fourier transform scheme, where

the window length Lw is synchronized with the period length

of the signal such that Lw = 4 fs / f0 , is utilized in this work.

The overlap of the used hanning windows is 50%, implying

that hop size H = Lw /2. The sampling rate is fs = 44100 Hz

and the length of FFT is N = 2048.

The original and the estimated parameters for three experiments are shown in Table 2. In experiment 1 the original excitation is used for the resynthesis. The exact parameters are estimated for the dierence d f and for the decay

parameters gh , gv , and av . The adjacent point in the discrete grid is estimated for the decay parameter ah . As can

be seen in Figure 7, the sensitivity of the mean frequency

is negligible compared to the dierence d f , which might be

the cause of deviations in mean frequency. Dierences in the

mixing parameters mo , m p , and the coupling coecient gc

can be noticed. When running the algorithm multiple times,

no explicit optima for mixing and coupling parameters were

found. However, synthesized tones produced by corresponding parameter values are indistinguishable. That is to say that

the parameters m p , mo , and gc are not orthogonal, which is

clearly a problem with the model and also impairs the eciency of our parameter estimation algorithm.

To overcome the nonorthogonality problem, we have run

the algorithm with constant values of m p = mo = 0.5 in experiment 2. If the target parameters are set according to discrete grid, the exact parameters with zero error are estimated.

The convergence of the parameters and the error of such case

is shown in Figure 11. Apart from the fact that the parameter

values are estimated precisely, the convergence of the algorithm is very fast. Zero error is already found in generation

87.

A similar behavior is noticed in experiment 3 where an

extracted excitation is used for resynthesis. The dierence

and the decay parameters gh and gv are again estimated precisely. Parameters m p , mo , and gc drift as in previous experiment. Interestingly, m p = 1, which means that the straight

path to vertical polarization is totally closed. The model is, in

a manner of speaking, rearranged in such a way that the individual string models are in series as opposed to the original

construction where the polarization are arranged in parallel.

Unlike in experiments 1 and 2, the exact parameter values are not so relevant since dierent excitation signals are

used for the target and estimated tones. Rather than looking into the parameter values, it is better to analyze the tones

produced with the parameters. In Figure 12, the overall temporal envelopes and the envelopes of the first eight partials

for the target and for the estimated tone are presented. As

can be seen, the overall temporal envelopes are almost identical and the partial envelopes match well. Only the beating

amplitude diers slightly but it is inaudible. This indicates

that the parametrization of the model itself is not the best

possible since similar tones can be synthesized with various

parameter sets.

Our estimation method is designed to be used with real

recorded tones. Time and frequency analysis for such case

is shown in Figure 13. As can be seen, the overall temporal envelopes and the partial envelopes for a recorded tone

are very similar to those that are analyzed from a tone that

uses estimated parameter values. Appraisal of the perceptual

quality of synthesized tones is left as a future project, but

our informal listening indicates that the quality is comparable with or better than our previous methods and it does

not require any hand tuning after the estimation procedure.

Sound clips demonstrating these experiments are available at

http://www.acoustics.hut.fi/publications/papers/jasp-ga.

801

3

332

332.5

331.5

331

330.5

330

329.5

2.5

2

1.5

1

0.5

0

329

50

100

150

50

f0

100

150

Generation

Generation

Target value of f0

df

Target value of d f

0

0.1

0.99

0.98

Value of a

Value of g

0.2

0.97

0.3

0.4

0.5

0.96

0.95

0.6

100

200

300

Generation

Target value of gh

Target value of gv

gh

gv

400

ah

av

0.8

20

0.6

15

Gain

Error

25

0.4

10

0.2

50

100

Generation

gc

150

Target value of ah

Target value of av

100

Generation

50

150

100

200

Generation

300

400

Target values

Figure 11: Convergence of the seven parameters and the error for experiment 2 in Table 2. Mixing coecients are frozen as m p = mo = 0.5 to

overcome the nonorthogonality problem. One hundred and fifty generations are shown and the original excitation is used for the resynthesis.

802

Table 2: Original and estimated parameters when a synthesized tone with known parameter values are used as a target tone. The original

excitation is used for resynthesis in experiments 1 and 2 and the extracted excitation is used for the resynthesis in experiment 3. In experiment

2 the mixing coecients are frozen as m p = mo = 0.5.

Parameter

Target parameter

Experiment 1

Experiment 2

Experiment 3

f0

330.5409

331.000850

330.5409

330.00085

df

0.8987

0.8987

0.8987

0.8987

gh

0.9873

0.9873

0.9873

0.9873

ah

gv

0.2905

0.3108

0.2905

0.2071

0.9907

0.9907

0.9907

0.9907

av

mp

0.1936

0.1936

0.1936

0.1290

0.5

0.2603

(0.5)

1.000

mo

gc

0.5

0.1013

0.6971

0.2628

(0.5)

0.1013

0.8715

0.2450

0.0464

0.4131

Error

0.5

Amplitude (dB)

Normalized amplitude

20

40

60

0.5

2

0

4

1

0.5

1

Time (s)

1.5

Partial

6

8 2

1

Time (s)

0.5

Amplitude (dB)

Normalized amplitude

20

40

60

0.5

2

0

4

1

0.5

1

Time (s)

1.5

Partial

6

8 2

1

Time (s)

Figure 12: Time and frequency analysis for experiment 3 in Table 2. The synthesized target tone is produced with known parameter values

and the synthesized tone uses estimated parameter values. Extracted excitation is used for the resynthesis.

803

0.5

Amplitude (dB)

Normalized amplitude

20

40

60

0.5

2

0

4

1

2

Time (s)

Partial

6

8 2

1

Time (s)

0.5

0

Amplitude (dB)

Normalized amplitude

20

40

60

0.5

2

0

4

1

2

Time (s)

Partial

6

8 2

1

Time (s)

Figure 13: Time and frequency analysis for a recorded tone and for a synthesized tone that uses estimated parameter values. Extracted

excitation is used for the resynthesis. Estimated parameter values are f0 = 331.1044, d f = 1.1558, gh = 0.9762, ah = 0.4991, gv = 0.9925,

av = 0.0751, m p = 0.1865, mo = 0.7397, and gc = 0.1250.

A parameter estimation scheme based on a GA with a perceptual fitness function was designed and tested for a plucked

string synthesis algorithm. The synthesis algorithm is used

for natural-sounding synthesis of various string instruments.

For this purpose, automatic parameter estimation is needed.

Previously, the parameter values have been extracted from

recordings using more traditional signal processing techniques, such as short-term Fourier transform, linear regression, and linear digital filter design. Some of the parameters

could not have been reliably estimated from the recorded

sound signal, but they have had to be fine-tuned manually

by an expert user.

In this work, we presented a fully automatic parameter

extraction method for string synthesis. The fitness function

we use employs knowledge of properties of the human auditory system, such as frequency-dependent sensitivity and

frequency masking. In addition, a discrete parameter space

the nonuniformity of the sampling grid, and the number of

allowed values for each parameter were chosen based on former research results, experiments on parameter sensitivity,

and informal listening.

The system was tested with both synthetic and real tones.

The signals produced with the synthesis model itself are considered a particularly useful class of test signals because there

will always be a parameter set that exactly reproduces the analyzed signal (although discretization of the parameter space

may limit the accuracy in practice). Synthetic signals oered

an excellent tool to evaluate the parameter estimation procedure, which was found to be accurate with two choices of

excitation signal to the synthesis model. The quality of resynthesis of real recordings is more dicult to measure as there

are no known correct parameter values. As high-quality synthesis of several plucked string instrument sounds has been

possible in the past with the same synthesis algorithm, we

804

expected to hear good results using the GA-based method,

which was also the case.

Appraisal of synthetic tones that use parameter values

from the proposed GA-based method is left as a future

project. Listening tests similar to those used for evaluating

high-quality audio coding algorithms may be useful for this

task.

REFERENCES

[1] J. O. Smith, Physical modeling using digital waveguides,

Computer Music Journal, vol. 16, no. 4, pp. 7491, 1992.

[2] J. O. Smith, Ecient synthesis of stringed musical instruments, in Proc. International Computer Music Conference

(ICMC 93), pp. 6471, Tokyo, Japan, September 1993.

[3] M. Karjalainen, V. Valimaki, and Z. Janosy, Towards highquality sound synthesis of the guitar and string instruments,

in Proc. International Computer Music Conference (ICMC 93),

pp. 5663, Tokyo, Japan, September 1993.

[4] V. Valimaki, J. Huopaniemi, M. Karjalainen, and Z. Janosy,

Physical modeling of plucked string instruments with application to real-time sound synthesis, Journal of the Audio Engineering Society, vol. 44, no. 5, pp. 331353, 1996.

[5] M. Laurson, C. Erkut, V. Valimaki, and M. Kuuskankare,

Methods for modeling realistic playing in acoustic guitar

synthesis, Computer Music Journal, vol. 25, no. 3, pp. 3849,

2001.

[6] G. Weinreich, Coupled piano strings, Journal of the Acoustical Society of America, vol. 62, no. 6, pp. 14741484, 1977.

[7] M. Karjalainen, V. Valimaki, and T. Tolonen, Plucked-string

models: from the Karplus-Strong algorithm to digital waveguides and beyond, Computer Music Journal, vol. 22, no. 3, pp.

1732, 1998.

[8] T. Tolonen and V. Valimaki, Automated parameter extraction for plucked string synthesis, in Proc. International Symposium on Musical Acoustics (ISMA 97), pp. 245250, Edinburgh, Scotland, August 1997.

[9] C. Erkut, V. Valimaki, M. Karjalainen, and M. Laurson, Extraction of physical and expressive parameters for modelbased sound synthesis of the classical guitar, in the Audio Engineering Society 108th International Convention, Paris,

France, February 2000, preprint 5114, http://lib.hut.fi/Diss/

2002/isbn9512261901.

[10] A. Nackaerts, B. De Moor, and R. Lauwereins, Parameter

estimation for dual-polarization plucked string models, in

Proc. International Computer Music Conference (ICMC 01),

pp. 203206, Havana, Cuba, September 2001.

[11] S.-F. Liang and A. W. Y. Su, Recurrent neural-network-based

physical model for the chin and other plucked-string instruments, Journal of the Audio Engineering Society, vol. 48, no.

11, pp. 10451059, 2000.

[12] C. Drioli and D. Rocchesso, Learning pseudo-physical models for sound synthesis and transformation, in Proc. IEEE International Conference on Systems, Man, and Cybernetics, pp.

10851090, San Diego, Calif, USA, October 1998.

[13] V.-V. Mattila and N. Zacharov, Generalized listener selection

(GLS) procedure, in the Audio Engineering Society 110th International Convention, Amsterdam, The Netherlands, 2001,

preprint 5405.

[14] H. Hermansky, Perceptual linear predictive (PLP) analysis of

speech, Journal of the Acoustical Society of America, vol. 87,

no. 4, pp. 17381752, 1990.

[15] A. Harma, M. Karjalainen, L. Savioja, V. Valimaki, U. Laine,

and J. Huopaniemi, Frequency-warped signal processing for

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

vol. 48, no. 11, pp. 10111031, 2000.

J. Vuori and V. Valimaki, Parameter estimation of non-linear

physical models by simulated evolutionapplication to the

flute model, in Proc. International Computer Music Conference (ICMC 93), pp. 402404, Tokyo, Japan, September 1993.

A. Horner, J. Beauchamp, and L. Haken, Machine tongues

XVI: Genetic algorithms and their application to FM matching synthesis, Computer Music Journal, vol. 17, no. 4, pp. 17

29, 1993.

R. Garcia, Automatic generation of sound synthesis techniques, M.S. thesis, Massachusetts Institute of Technology,

Cambridge, Mass, USA, 2001.

C. Johnson, Exploring the sound-space of synthesis algorithms using interactive genetic algorithms, in Proc. AISB

Workshop on Artificial Intelligence and Musical Creativity, pp.

2027, Edinburgh, Scotland, April 1999.

D. Jae and J. O. Smith, Extensions of the Karplus-Strong

plucked-string algorithm, Computer Music Journal, vol. 7,

no. 2, pp. 5669, 1983.

C. Erkut, M. Laurson, M. Kuuskankare, and V. Valimaki,

Model-based synthesis of the ud and the renaissance lute,

in Proc. International Computer Music Conference (ICMC 01),

pp. 119122, Havana, Cuba, September 2001.

C. Erkut and V. Valimaki, Model-based sound synthesis of tanbur, a Turkish long-necked lute, in Proc. IEEE

Int. Conf. Acoustics, Speech, Signal Processing, pp. 769772, Istanbul, Turkey, June 2000.

C. Roads, The Computer Music Tutorial, MIT Press, Cambridge, Mass, USA, 1996.

V. Valimaki and T. Tolonen, Development and calibration of

a guitar synthesizer, Journal of the Audio Engineering Society,

vol. 46, no. 9, pp. 766778, 1998.

C. Erkut, M. Karjalainen, P. Huang, and V. Valimaki, Acoustical analysis and model-based sound synthesis of the kantele,

Journal of the Acoustical Society of America, vol. 112, no. 4, pp.

16811691, 2002.

B. Bank, Physics-based sound synthesis of the piano, Tech.

Rep. 54, Helsinki University of Technology, Laboratory of

Acoustics and Audio Signal Processing, Espoo, Finland, May

2000, http://www.acoustics.hut.fi/publications/2000.html.

E. Zwicker and H. Fastl, Psychoacoustics: Facts and Models,

Springer-Verlag, Berlin, Germany, 1990.

M. Lagrange and S. Marchand, Real-time additive synthesis

of sound by taking advantage of psychoacoustics, in Proc.

COST-G6 Conference on Digital Audio Eects (DAFx 01), pp.

59, Limerick, Ireland, December 2001.

C. W. Wun and A. Horner, Perceptual wavetable matching

for synthesis of musical instrument tones, Journal of the Audio Engineering Society, vol. 49, no. 4, pp. 250262, 2001.

J. D. Johnston, Transform coding of audio signals using perceptual noise criteria, IEEE Journal on Selected Areas in Communications, vol. 6, no. 2, pp. 314323, 1988.

M. R. Schroeder, B. S. Atal, and J. L. Hall, Optimizing digital

speech coders by exploiting masking properties of the human

ear, Journal of the Acoustical Society of America, vol. 66, no. 6,

pp. 16471652, 1979.

H. Jarvelainen and T. Tolonen, Perceptual tolerances for decay parameters in plucked string synthesis, Journal of the Audio Engineering Society, vol. 49, no. 11, pp. 10491059, 2001.

C. C. Wier, W. Jesteadt, and D. M. Green, Frequency discrimination as a function of frequency and sensation level,

Journal of the Acoustical Society of America, vol. 61, no. 1, pp.

178184, 1977.

M. Mitchell, An Introduction to Genetic Algorithms, MIT

Press, Cambridge, Mass, USA, 1998.

[35] J. H. Holland, Adaptation in Natural and Artificial Systems,

University of Michigan Press, Ann Arbor, Mich, USA, 1975.

[36] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, AI Series. Springer-Verlag, New York, NY,

USA, 1992.

[37] J. Joines and C. Houck, On the use of non-stationary penalty

functions to solve nonlinear constrained optimization problems with GAs, in IEEE International Symposium on Evolutionary Computation, pp. 579584, Orlando, Fla, USA, June

1994.

Janne Riionheimo was born in Toronto,

Canada, in 1974. He studies acoustics and

digital signal processing at Helsinki University of Technology, Espoo, Finland, and music technology, as a secondary subject, at the

Centre for Music and Technology, Sibelius

Academy, Helsinki, Finland. He is currently

finishing his M.S. thesis, which deals with

parameter estimation of a physical synthesis

model. He has worked as a Research Assistant at the HUT Laboratory of Acoustics and Audio Signal Processing from 2001 until 2002. His research interests include physical

modeling of musical instruments and musical acoustics. He is also

working as a Recording Engineer.

Vesa Valimaki was born in Kuorevesi, Finland, in 1968. He received his Master of Science in Technology, Licentiate of Science in

Technology, and Doctor of Science in Technology degrees, all in electrical engineering from Helsinki University of Technology

(HUT), Espoo, Finland, in 1992, 1994, and

1995, respectively. Dr. Valimaki worked at

the HUT Laboratory of Acoustics and Audio Signal Processing from 1990 until 2001.

In 1996, he was a Postdoctoral Research Fellow in the University

of Westminster, London, UK. He was appointed Docent in audio

signal processing at HUT in 1999. During the academic year 2001

2002 he was Professor of Signal Processing at Pori School of Technology and Economics, Tampere University of Technology, Pori,

Finland. In August 2002, he returned to HUT, where he is currently

Professor of Audio Signal Processing. His research interests are in

the application of digital signal processing to audio and music. He

has published more than 120 papers in international journals and

conferences. He holds two patents. Dr. Valimaki is a senior member of the IEEE Signal Processing Society and a member of the

Audio Engineering Society and the International Computer Music

Association.

805

c 2003 Hindawi Publishing Corporation

Decompositions with Evolutionary Computation

Thomas Schell

Department of Scientific Computing, University of Salzburg, Jakob Haringer Street 2, A-5020 Salzburg, Austria

Email: tschell@cosy.sbg.ac.at

Andreas Uhl

Department of Scientific Computing, University of Salzburg, Jakob Haringer Street 2, A-5020 Salzburg, Austria

Email: uhl@cosy.sbg.ac.at

Received 30 June 2002 and in revised form 27 November 2002

In image compression, the wavelet transformation is a state-of-the-art component. Recently, wavelet packet decomposition has

received quite an interest. A popular approach for wavelet packet decomposition is the near-best-basis algorithm using nonadditive

cost functions. In contrast to additive cost functions, the wavelet packet decomposition of the near-best-basis algorithm is only

suboptimal. We apply methods from the field of evolutionary computation (EC) to test the quality of the near-best-basis results.

We observe a phenomenon: the results of the near-best-basis algorithm are inferior in terms of cost-function optimization but are

superior in terms of rate/distortion performance compared to EC methods.

Keywords and phrases: image compression, wavelet packets, best basis algorithm, genetic algorithms, random search.

1.

INTRODUCTION

the JPEG standard [1]) have been superceded in favor

of wavelet-based schemes in the last years. Consequently,

the new JPEG2000 standard [2] is based on the wavelet

transformation. Apart from the pyramidal decomposition,

JPEG2000 part II also allows wavelet packet (WP) decomposition which is of particular interest to our studies.

WP-based image compression methods which have been

developed [3, 4, 5, 6] outperform the most advanced wavelet

coders (e.g., SPIHT [7]) significantly for textured images in

terms of rate/distortion performance (r/d).

In the context of image compression, a more advanced

but also more costly technique is to use a framework that

includes both rate and distortion, where the best-basis (BB)

subtree which minimizes the global distortion for a given

coding budget is searched [8, 9]. Other methods use fixed

bases of subbands for similar signals (e.g., fingerprints [10])

or search for good representations with general purpose optimization methods [11, 12].

Usually in wavelet-based image compression, only the

coarse scale approximation subband is successively decomposed. With the WP decomposition also, the detail subbands

lend themselves to further decomposition. From a practical

subbands: approximation, horizontal detail, vertical detail,

and diagonal detail. Each of these four subbands can be recursively decomposed at will. Consequently, the decomposition can be represented by a quadtree.

Concerning WPs, a key issue is the choice of the decomposition quadtree. Obviously, not every subband must be decomposed further; therefore, a criterion which determines

whether a decomposition step should take place or not is

needed.

Coifman and Wickerhauser [13] introduced additive cost

functions and the BB algorithm which provides an optimal decomposition according to a specific cost metric.

Taswell [14] introduced nonadditive cost functions which are

thought to anticipate the properties of good decomposition quadtrees more accurately. With nonadditive cost functions, the BB algorithm mutates to a near-best-basis (NBB)

algorithm because the decomposition trees are only suboptimal. The divide-and-conquer principle of the BB relies on

the locality (additivity) of the underlying cost function. In

the case of nonadditive cost functions, this locality does not

exist.

In this work, we are interested in the assessment of

the WP decompositions provided by the NBB algorithm.

We focus on the quality of the NBB results in terms of

cost-function optimization as well as image quality (PSNR).

Both, the cost-function value and the corresponding image

quality of a WP decomposition is suboptimal due to the construction of the NBB algorithm.

We have interfaced the optimization process of WP decompositions by means of cost functions with the concepts

of evolutionary computation (EC). Hereby, we obtain an alternative method to optimize WP decompositions by means

of cost functions. Both approaches, NBB and EC, are subject

to our experiments. The results provide valuable new insights

concerning the intrinsic processes of the NBB algorithm. Our

EC approach perfectly suits the needs for the assessment of

the NBB algorithm but, from a practical point of view, the

EC approach is not competitive in terms of computational

complexity.

In Section 2, we review the definition of the cost functions which we analyze in our experiments. The NBB algorithm is described in Section 3. For the EC methods, we

need a flat representation of quadtrees (Section 4). In Sections 5 and 6, we review genetic algorithms and random

search specifically adapted to WP optimization. For our experiments, we apply an SPIHT inspired software package for

image compression by means of WP decomposition. Our

central tool of analysis are scatter plots of WP decompositions (Section 7). In Section 8, we compare the NBB algorithm and EC for optimizing WP decompositions.

of vector z is MN. The cost-function value is calculated as

follows:

4,p

k

COST FUNCTIONS

As a preliminary, we review the definitions of a cost function and the additivity. A cost function is a function C :

RM RN R. If y RM RN is a matrix of wavelet

coecients

and C is a cost function, then C(0) = 0 and

C(y) = i, j C(yi j ). A cost function C is additive if and only

if

C a z 1 z2 = C a z 1 + Ca z 2 ,

(1)

The goal of any optimization algorithm is to identify a WP

decomposition with a minimal cost-function value.

Alternatively to the NBB algorithm (Section 3), we apply

methods from evolutionary computation (Sections 5 and 6)

to optimize WP decompositions. The fitness of a particular

WP decomposition is estimated with nonadditive cost functions. We employ the three nonadditive cost functions listed

below.

(i) Coifman Wickerhauser entropy. Coifman and Wickerhauser [15] defined the entropy for wavelet coecients as follows:

Cn1 (y) =

i, j:pi j

=0

pi j ln pi j ,

pi j =

2

yi j

y

2

(2)

reorder and transform the coecients yi j . All coecients yi j

are rearranged in a decreasing absolute-value sorted vector z,

(3)

From the definition of the weak l p norm, we deduce that unfavorable slowly decreasing sequences or, in the worst case,

uniform sequences of vectors z cause high numerical values

of the norm, whereas fast decreasing zs result in low ones.

(iii) Shannon entropy. Below, we will consider the matrix y simply as a collection of real-valued coecients xi ,

1 i MN. The matrix y is rearranged such that the first

row is concatenated with the second row at the right side and

then the new row is concatenated with the third row and so

on. With a simple histogram binning method, we will estimate the probability mass function. The sample data interval

is given by a = mini xi and b = maxi xi . Given the number of

bins J, the bin width w is w = (b a)/J. The frequency f j for

j 1

the jth bin is defined by f j = #{xi | xi a + jw} k=1 fk .

The probabilities p j are calculated from the frequencies f j

simply by p j = f j /MN. From the obtained class probabilities, we can calculate the Shannon entropy [14]

Cn2,J (y) =

J

j =1

2.

807

p j log2 p j .

(4)

image quality. PSNR can be seen as a nonadditive cost function. With a slightly modified NBB, PSNR as a cost function

provides WP decomposition with an excellent r/d performance, but at the expense of high computational costs [12].

3.

NBB ALGORITHM

With additive cost functions, a dynamic programming approach, that is, the BB algorithm [13], provides the optimal

WP decomposition with respect to the applied cost function.

Basically, the BB algorithm traverses the quadtree in a depthfirst-search manner and starts at the level right above the

leaves of the decomposition quadtree. The sum of the cost

of the children node is compared to the cost of the parent

node. If the sum is less than the cost of the parent node, the

situation remains unchanged. But, if the cost of the parent

node is less than the cost of the children, then the child nodes

are pruned o the tree. From bottom upwards, the tree is reduced whenever the cost of a certain branch can be reduced.

An illustrating example is presented in [15]. It is an essential

property of the BB algorithm that the decomposition tree is

optimal in terms of the cost criteria, but not in terms of the

obtained r/d performance.

When switching from additive to nonadditive cost functions, the locality of the cost function evaluation is lost. The

BB algorithm can still be applied because the correlation

among the subbands is assumed to be minor but obviously

the result is only suboptimal. Hence, instead of BB, this new

variant is called NBB [14].

808

4.

ENCODING OF WP QUADTREES

a flat representation of a WP-decomposition quadtree. In

other words, we want an encoding scheme for quadtrees in

the form of a (binary) string. Therefore, we have adopted

the idea of coding a heap in the heap-sort algorithm. We use

strings b of finite length L over a binary alphabet {0, 1}. If the

bit at index k, 1 k L, is set, then the according subband

has to be decomposed. Otherwise, the decomposition stops

in this branch of the tree

decompose,

bk =

0 stop.

(5)

If the bit at index k is set (bk = 1), the indices of the resulting

four subbands are derived by

km

= 4 k + m,

1 m 4.

(6)

maximal level of the quadtree by lmax N. At this level, all

nodes are leaves of the quadtree. The level l of any node k in

the quadtree can be determined by

0,

l=

l :

k = 0 (root),

l1

4r k <

r =0

4r , k > 0.

(7)

r =0

5.

GENETIC ALGORITHM

Genetic algorithms (GAs) are evolution-based search algorithms especially designed for parameter optimization problems with vast search spaces. GAs were first proposed in the

seventies by Holland [17]. Generally, parameter optimization

problems consist of an objective function to evaluate and estimate the quality of an admissible parameter set, that is, a solution of the problem (not necessarily the optimal, just anyone). For the GA, the parameter set needs to be encoded into

a string over a finite alphabet (usually a binary alphabet). The

encoded parameter set is called a genotype. Usually, the objective function is slightly modified to meet the requirements

of the GA and hence will be called fitness function. The fitness function determines the quality (fitness) for each genotype (encoded solution). The combination of a genotype and

the corresponding fitness forms an individual. At the start of

an evolution process, an initial population, which consists of

a fixed number of individuals, is generated randomly. In a

selection process, individuals of high fitness are selected for

recombination. The selection scheme mimics natures principle of the survival of the fittest. During recombination, two

individuals at the time exchange genetic material, that is,

parts of the genotype string, are exchanged at random. After a new intermediate population has been created, a mutation operator is applied. The mutation operator randomly

changes some of the alleles (values at certain positions/loci of

alleles which might have vanished from the population have

a chance to reenter. After applying mutation, the intermediate population has turned into a new one (next generation)

replacing the former.

For our experiments, we apply a GA which starts with an

initial population of 100 individuals. The initial population is

generated randomly. The chromosomes are decoded into WP

decompositions as described in Section 4. The fitness of the

individuals is determined with a cost function (Section 2).

Then, the standard cycle of selection, crossover, and mutation is repeated 100 times, that is, we evolve 100 generations

of the initial population. The maximum number of generations was selected empirically such that selection schemes

with a low selection pressure suciently converge. As selection methods, we use binary tournament selection (TS) with

partial replacement [18] and linear ranking selection (LKR)

with = 0.9 [19]. We have experimented with two variants

of crossover. Firstly, we applied standard two-point crossover

but obviously this type of crossover does not take into account the tree structure of the chromosomes. Additionally,

we have conducted experiments with a tree-crossover operator (Section 5.1) which is specifically adapted to operations on quadtrees. For both, two-point crossover and tree

crossover, the crossover rate is set to 0.6 and the mutation

rate is set to 0.01 for all experiments.

As a by-product, we obtained the results presented in

Figure 1 for the image Barbara (Figure 5). Instead of a cost

function, we apply the image quality (PSNR) to determine

the fitness of an individual (i.e., WP decomposition). We

present the development of the PSNR during the course of a

GA. We show the GA results in the following parameter combinations: LRK and TS, each with either two-point crossover

or with tree crossover. After every 100th sample (population

size of the GA) of the random search (RS, Section 6), we

indicate the best-so-far WP decomposition. Obviously, for

each evaluation of a WP decomposition, a full compression

and decompression step which causes a tremendous execution time is required. The result of a NBB optimization using

weak l1 norm is displayed as a horizontal line because the

runtime of the NBB algorithm is far below the time which is

required to evolve one generation of the GA. The PSNR of

the NBB algorithm is out of reach for RS and GA. The treecrossover operator does not improve the performance of the

standard GA. The execution of a GA or RS run lasts from 6

to 10 days on an AMD Duron processor with 600 MHz. The

GA using TS with and without tree crossover was not able to

complete the 100 generations within this time limit. Further

examples of WP optimization by means of EC are discussed

in [20].

5.1.

Tree crossover

crossover) have a considerably disruptive eect on the tree

structure of subbands which is encoded into a binary string.

With the encoding discussed above, a one- or two-point

crossover results in two new individuals with tree structures

which are almost unrelated to the tree structures of their

25.5

809

25.4

2

25.3

PSNR

25.2

4

25.1

25

10

12

11

13

14

15

24.9

24.8

24.7

(a) Individual A.

0

10

20

30

40 50 60

Generations

70

80

90

100

1

GA: TS (t = 2), tree crossover

GA: LRK ( = 0.9), tree crossover

NBB: Wl

RS

GA: TS (t = 2)

8

10

11

12

13

14

15

A

B

1

1

1

2

0

1

3

1

0

4

0

1

5

0

1

6

1

0

7

0

0

8

0

1

9

0

1

10

0

1

11

0

1

12

1

0

13

1

0

(b) Individual B.

that is, the GA is expected to evolve better individuals from

good parents.

To demonstrate the eect of standard one-point

crossover, we present a simple example. The chromosomes

of the parent individuals A and B are listed in Table 1 and

the according binary trees are shown in Figure 2. As a cut

point for the crossover, we choose the gap between gene 6

and 7. The chromosome parts from locus 7 to the right end

of the chromosome are exchanged between individuals A and

B. This results in two new trees (i.e., individual A and B )

which are displayed in Figure 3. Evidently, the new generation of trees dier considerably from their parents.

The notion is to introduce a problem-inspired crossover

such that the overall tree structure is preserved while only local parts of the subband trees are altered [11]. Specifically,

one node in each individual (i.e., subband tree) is chosen at

random, then the according subtrees are exchanged between

the individuals. In our example, the candidate nodes for the

crossover are node 2 in individual A and node 10 in individual B. The tree crossover produces a new pair of descendants A and B which are displayed in Figure 4. Compared

to the standard crossover operator, tree crossover moderately

alters the structure of the parent individuals and generates

new ones.

6.

RANDOM SEARCH

straightforward due to the quadtree structure. If we consider

4

8

5

9

10

12

11

13

14

15

(a) Individual A .

1

2

4

8

5

9

10

6

11

12

7

13

14

15

(b) Individual B .

obtain random WP decomposition just by creating random

0/1 strings of a given length. An obvious drawback is that

this method acts in favor of small quadtrees. We assume that

810

1

2

10

12

11

13

14

15

1

Figure 5: Barbara.

2

3

5

4

8

10

6

11

12

7

13

14

15

This is a useful assumption because we need at least one

wavelet decomposition. The probability to obtain a node at

level l is (1/2)l . Due to the rapidly decreasing probabilities,

the quadtrees will be rather sparse.

Another admittedly theoretical approach would be to assign a uniform probability to all possible quadtrees. Then,

this set is sampled for WP decompositions. Some simple considerations will show that in this case small quadtrees are

excluded from evaluation. In the following, we will calculate the number A(k) of trees with nodes on equal or less

than k levels. If k = 0, then we have A(0) := 1 because

there is only the root node on level l = 0. For A(k), we

obtain the recursion A(k) = [1 + A(k 1)]4 because we

can construct quadtrees of height equal to or less than k by

adding a new root node to trees of height k 1. The number of quadtrees B(k) of height k is given by B(0) := 1 and

B(k) = A(k) A(k 1), k 1. From the latter argument, we

see that the number of quadtrees of height B(k) increases exponentially. Consequently, the number of trees of low height

is diminishing and hence, when uniformly sampling the set

of quadtrees, they are almost excluded from the evaluation.

With image compression in mind, we are interested in

trees of low height because trees with a low number of nodes

and a simple structure require less resources when encoded

into a bitstream. Therefore, we have adopted the RS approach

of the first paragraph with a minor modification. We require

that the approximation subband is at least decomposed down

to level 4 because it contains usually a considerable amount

of the overall signal energy.

Similar to the GA, we can apply the RS using PSNR instead of cost functions to evaluate WP decompositions. Using a RS as discussed above with a decomposition depth of

at least 4 for the approximation subband, we generate 4000

almost unique samples of WP decompositions and evaluate

the corresponding PSNR. The WP decomposition with the

highest PSNR value is recorded. We have repeated the single

RS runs at least 90 times. The best three results in decreasing order and the least result of a single RS run for the image

Barbara are presented as follows: 24.648, 24.6418, 24.6368,

. . . , 24.4094.

If we compare the results of the RS to those obtained by

NBB with cost function weak l1 norm (PSNR 25.47), we realize that the RS is about 1 dB below the NBB algorithm. To

increase the probability of a high quality result of the RS, a

drastic increase of the sample size is required, which again

would result in a tremendous increase of the RS runtime.

7.

AND IMAGE QUALITY

a broad spectrum of visual features. In this work, we present

the results for the well-known image Barbara. The considerable amount of texture in the test picture demonstrates the

superior performance of the WP approach in principle.

The output of the NBB, GA, and RS is a WP decomposition. WPs are a generalization of the pyramidal decomposition. Therefore, we apply an algorithm similar to SPIHT

which exploits the hierarchical structure of the wavelet coecients [21] (SMAWZ). SMAWZ uses the foundations of

SPIHT, most importantly the zero-tree paradigm, and adapts

them to WPs.

Cost functions are the central design element in the NBB

algorithm. The working hypothesis of (additive and nonadditive) cost functions is that a WP decomposition with an optimal cost-function value provides also a (sub-) optimal r/d

performance. The optimization of WP decompositions via

cost functions is an indirect strategy. Therefore, we compare

the results of the EC methods to that of the NBB algorithm

by generating scatter plots. In these plots, we simultaneously

26

811

25.4

25

25.2

24

23

25

21

PSNR

PSNR

22

20

19

24.6

18

17

16

15

24.8

24.4

3

5

6

7

8

9

Coifman-Wickerhauser entropy

10

11

Random WPs

PSNR.

24.2

3.48 3.485 3.49 3.495 3.5 3.505 3.51 3.515 3.52 3.525 3.53

Coifman-Wickerhauser entropy

NBB

RS

GA: TS (t = 2)

GA: TS (t = 2), tree crossover

GA: LRK ( = 0.9), tree crossover

PSNR for WP decompositions obtained by NBB, RS, and GA.

the cost-function value and the image quality (PSNR).

Figure 6 displays the correlation of the nonadditive costfunction Coifman-Wickerhauser entropy and the PSNR. For

the plot, we generated 1000 random WP decompositions and

calculated the value of the cost function and the PSNR after

a compression with 0.1 bpp. Note that WP decompositions

with the same decomposition level of the approximation subband are grouped into clouds.

8.

TO COST-FUNCTION OPTIMIZATION

use the GA to evolve WP decompositions by means of costfunction optimization. Therefore, we choose some nonadditive cost functions and compute WP decompositions with

the NBB algorithm, a GA, and a RS. For each cost function,

we obtain a collection of suboptimal WP decompositions.

We calculate the PSNR for each of the WP decompositions

and generate scatter plots (PSNR versus cost-function value).

The comparison of the NBB, GA, and RS results provide surprising insight into the intrinsic processes of the NBB algorithm.

We apply the GA and RS as discussed in Sections 5 and 6,

using the nonadditive cost-functions Coifman-Wickerhauser

entropy, weak l1 norm, and Shannon entropy to optimize

WP decompositions. The GA as well as the RS generate and

evaluate 104 WP decompositions. The image Barbara is decomposed according to the output of NBB, GA, and RS and

compressed to 0.1 bpp. Afterwards, we determine the PSNR

of the original and the decompressed image.

In Figure 7, we present the plot for the correlation between the Coifman-Wickerhauser entropy and PSNR for

NBB, GA, and RS. The WP decomposition obtained by the

NBB algorithm is displayed as a single dot. The other dots

run. With the Coifman-Wickerhauser entropy, we notice a

defect in the construction of the cost function. Even though

the GA and RS provide WP decompositions with a costfunction value less than that of the NBB, the WP decomposition of the NBB is superior in terms of image quality. As a

matter of fact, the NBB provides suboptimal WP decompositions with respect to the Coifman-Wickerhauser entropy.

The correlation between weak l1 norm and PSNR is displayed in Figure 8. Similar to the scatter-plot of the CoifmanWickerhauser entropy, the WP decomposition of the NBB is

an isolated dot. But this time, the GA and the RS are not able

to provide a WP decomposition with a cost-function value

less than the cost-function value of the NBB-WP decomposition.

Even more interesting is the cost-function Shannon entropy (Figure 9). Similar to the Coifman-Wickerhauser entropy, the Shannon entropy provides WP decompositions

with a cost-function value lower than the NBB. In the upper right of the figure, there is a singular result of the GA

using TS. This WP decomposition has an even higher costfunction value than the one of the NBB but is superior in

terms of PSNR.

In general, the GA employing LRK provides better results

than the GA using TS concerning the cost-function values.

Within the GA-LRK results, there seems to be a slight advantage for the tree crossover. In all three figures, the GA-LRK

with and without tree crossover is clearly ahead of the RS.

This is evidence for a more ecient optimization process of

the GA compared to RS.

In two cases (Figures 7 and 9), we observe the best costfunction values for the GA- and the RS-WP decomposition.

Nevertheless, the NBB-WP decomposition provides higher

image quality with an inferior cost-function value. The singular result for the GA of Figure 9 is yet another example

812

RS, and GA fail to consistently predict the image quality, that

is, a lower cost-function value does not assert a higher image

quality.

25.6

25.4

PSNR

25.2

9.

25

24.8

24.6

24.4

24.2

400000

450000

NBB

RS

GA: TS (t = 2)

500000

550000

Weak l norm

600000

650000

GA: TS (t = 2), tree crossover

GA: LRK ( = 0.9), tree crossover

decompositions obtained by NBB, RS, and GA.

25.2

25.1

25

24.9

PSNR

24.8

24.7

24.6

24.5

24.4

24.3

24.2

0.0071 0.0072 0.0073 0.0074 0.0075 0.0076 0.0077 0.0078

Shannon entropy

NBB

RS

GA: TS (t = 2)

GA: TS (t = 2), tree crossover

GA: LRK ( = 0.9), tree crossover

decompositions obtained by NBB, RS, and GA. The results of GA:

TS (t = 2), tree crossover are not displayed due to zooming.

SUMMARY

to the construction only, suboptimal cost-function values as

well as suboptimal image quality. We are interested in an assessment of the quality of the NBB results.

We have adapted a GA and a RS to the problem of

WP-decomposition optimization by means of additive and

nonadditive cost functions. For the GA, a problem-inspired

crossover operator was implemented to reduce the disruptive

eect on decomposition trees when recombining the chromosomes of WP decompositions. Obviously, the computational complexity of RS and GA are exorbitantly higher than

that of the NBB algorithm. But the RS and GA are in this case

helper applications for the assessment of the NBB algorithm.

We compute WP decompositions with the NBB algorithm, the RS, and GA. The central tool of analysis is the correlation between cost-function value and the corresponding

PSNR of WP decompositions which we visualize with scatter

plots. The scatter plots reveal the imperfect correlation between cost-function value and image quality for WP decompositions for all of the presented nonadditive cost functions.

This also holds true for many other additive and nonadditive

cost functions. We observed that the NBB-WP decomposition provided excellent image quality even though the corresponding cost-function value was sometimes considerably

inferior compared to the results of the RS and GA. Consequently, our results revealed defects in the prediction of image quality by means of cost functions.

With the RS and GA at hand, we applied minor modifications to these algorithms. Instead of employing cost functions for optimizing WP decompositions, we used the PSNR

as a fitness function which resulted in a further increase of

computational complexity because each evaluation of a WP

decomposition requires a full compression and decompression step. Hereby, we directly optimize the image quality.

This direct approach of optimizing WP decomposition with

GA and RS, employing PSNR as a fitness function, requires

further improvement to exceed the performance of the NBB.

REFERENCES

for this phenomenon. As a result, the correlation of the costfunction value and the PSNR, as indicated in all three scatter plots, is imperfect. (In the case of perfect correlation, we

would observe a line starting in the right and descending to

the left.)

The NBB algorithm generates WP decompositions according to split and combine decisions based on costfunction evaluations. In contrast, RS and GA generate a complete WP decomposition and the cost-function value is computed afterwards. The overall cost-function values of NBB,

Compression Standard, Van Nostrand Reinhold, New York,

NY, USA, 1993.

[2] D. Taubman and M. W. Marcellin, JPEG2000: Image Compression Fundamentals, Standards and Practice, Kluwer Academic

Publishers, Boston, Mass, USA, 2002.

[3] J. R. Goldschneider and E. A. Riskin, Optimal bit allocation

and best-basis selection for wavelet packets and TSVQ, IEEE

Trans. Image Processing, vol. 8, no. 9, pp. 13051309, 1999.

[4] F. G. Meyer, A. Z. Averbuch, and J.-O. Stromberg, Fast adaptive wavelet packet image compression, IEEE Trans. Image

Processing, vol. 9, no. 5, pp. 792800, 2000.

[5] R. Oktem,

L. Oktem,

and K. Egiazarian, Wavelet based image

compression by adaptive scanning of transform coecients,

Journal of Electronic Imaging, vol. 2, no. 11, pp. 257261, 2002.

[6] Z. Xiong, K. Ramchandran, and M. T. Orchard, Wavelet

packet image coding using space-frequency quantization,

IEEE Trans. Image Processing, vol. 7, no. 6, pp. 892898, 1998.

[7] A. Said and W. A. Pearlman, A new, fast, and ecient image

codec based on set partitioning in hierarchical trees, IEEE

Trans. Circuits and Systems for Video Technology, vol. 6, no. 3,

pp. 243250, 1996.

[8] K. Ramchandran and M. Vetterli, Best wavelet packet bases

in a rate-distortion sense, IEEE Trans. Image Processing, vol.

2, no. 2, pp. 160175, 1993.

[9] N. M. Rajpoot, R. G. Wilson, F. G. Meyer, and R. R. Coifman, A new basis selection paradigm for wavelet packet image coding, in Proc. International Conference on Image Processing (ICIP 01), pp. 816819, Thessaloniki, Greece, October

2001.

[10] T. Hopper, Compression of gray-scale fingerprint images,

in Wavelet Applications, H. H. Szu, Ed., vol. 2242 of SPIE Proceedings, pp. 180187, Orlando, Fla, USA, 1994.

[11] T. Schell and A. Uhl, Customized evolutionary optimization

of subband structures for wavelet packet image compression,

in Advances in Fuzzy Systems and Evolutionary Computation,

N. Mastorakis, Ed., pp. 293298, World Scientific Engineering

Society, Puerto de la Cruz, Spain, February 2001.

[12] T. Schell and A. Uhl, New models for generating optimal

wavelet-packet-tree-structures, in Proc. 3rd IEEE Benelux

Signal Processing Symposium (SPS 02), pp. 225228, IEEE

Benelux Signal Processing Chapter, Leuven, Belgium, March

2002.

[13] R. R. Coifman and M. V. Wickerhauser, Entropy based algorithms for best basis selection, IEEE Transactions on Information Theory, vol. 38, no. 2, pp. 713718, 1992.

[14] C. Taswell, Satisficing search algorithms for selecting nearbest bases in adaptive tree-structured wavelet transforms,

IEEE Transactions on Signal Processing, vol. 44, no. 10, pp.

24232438, 1996.

[15] M. V. Wickerhauser, Adapted Wavelet Analysis from Theory to

Software, A. K. Peters, Wellesley, Mass, USA, 1994.

[16] C. Taswell, Near-best basis selection algorithms with nonadditive information cost functions, in Proc. IEEE International Symposium on Time-Frequency and Time-Scale Analysis

(TFTS 94), M. Amin, Ed., pp. 1316, IEEE Press, Philadelphia, Pa, USA, October 1994.

[17] J. H. Holland, Adaptation in Natural and Artificial Systems,

MIT Press, Ann Arbor, Mich, USA, 1975.

[18] T. Schell and S. Wegenkittl, Looking beyond selection probabilities: adaption of the 2 measure for the performance analysis of selection methods in GA, Evolutionary Computation,

vol. 9, no. 2, pp. 243256, 2001.

[19] J. E. Baker, Adaptive selection methods for genetic algorithms, in Proc. 1st International Conference on Genetic Algorithms and Their Applications, J. J. Grefenstette, Ed., pp. 101

111, Lawrence Erlbaum Associates, Hillsdale, NJ, USA, July

1985.

[20] T. Schell, Evolutionary optimization: selection schemes, sampling and applications in image processing and pseudo random number generation, Ph.D. thesis, University of Salzburg,

Salzburg, Austria, 2001.

[21] R. Kutil, A significance map based adaptive wavelet zerotree

codec (SMAWZ), in Media Processors 2002, S. Panchanathan,

V. Bove, and S. I. Sudharsanan, Eds., vol. 4674 of SPIE Proceedings, pp. 6171, San Jose, Calif, USA, January 2002.

813

computer science from Salzburg University,

Austria and from the Bowling Green State

University, USA and a Ph.D. from Salzburg

University. Currently, he is with the Department of Scientific Computing as a Research

and Teaching Assistant at Salzburg University. His research focuses on evolutionary

computing and signal processing, especially

image compression.

Andreas Uhl received the B.S. and M.S. degrees (both in mathematics) from Salzburg

University and he completed his Ph.D. on

applied mathematics at the same university.

He is currently an Associate Professor with

tenure in computer science aliated with

the Department of Scientific Computing,

and with the Research Institute for Software

Technology, Salzburg University. He is also

a part-time lecturer at the Carinthia Tech

Institute. His research interests include multimedia signal processing (with emphasis on compression and security issues), parallel

and distributed processing, and number theoretical methods in

numerics.

c 2003 Hindawi Publishing Corporation

the Robustness of Continuous Speech Recognition

Systems in Adverse Conditions

Sid-Ahmed Selouani

Secteur Gestion de lInformation, Universite de Moncton, Campus de Shippagan, 218 boulevard J.-D.-Gauthier,

Shippagan, Nouveau-Brunswick, Canada E8S 1P6

Email: selouani@umcs.ca

Douglas OShaughnessy

INRS-Energie-Materiaux-Telecommunications, Universite du Quebec, 800 de la Gaucheti`ere Ouest,

place Bonaventure, Montreal, Canada H5A 1K6

Email: dougo@inrs-telecom.uquebec.ca

Received 14 June 2002 and in revised form 6 December 2002

Limiting the decrease in performance due to acoustic environment changes remains a major challenge for continuous speech

recognition (CSR) systems. We propose a novel approach which combines the Karhunen-Lo`eve transform (KLT) in the melfrequency domain with a genetic algorithm (GA) to enhance the data representing corrupted speech. The idea consists of projecting noisy speech parameters onto the space generated by the genetically optimized principal axis issued from the KLT. The

enhanced parameters increase the recognition rate for highly interfering noise environments. The proposed hybrid technique,

when included in the front-end of an HTK-based CSR system, outperforms that of the conventional recognition process in severe

interfering car noise environments for a wide range of signal-to-noise ratios (SNRs) varying from 16 dB to 4 dB. We also showed

the eectiveness of the KLT-GA method in recognizing speech subject to telephone channel degradations.

Keywords and phrases: speech recognition, genetic algorithms, Karhunen-Lo`eve transform, hidden Markov models, robustness.

1.

INTRODUCTION

with the serious problem of acoustic condition changes.

Their performance often degrades due to unknown adverse conditions (e.g., due to room acoustics, ambient noise,

speaker variability, sensor characteristics, and other transmission channel artifacts). These speech variations create

mismatches between the training data and the test data. Numerous techniques have been developed to counter this in

three major areas [1].

The first area includes noise masking [1], spectral and

cepstal substraction [2], and the use of robust features

[3]. Robust feature analysis consists of using noise-resistant

parameters such as auditory-based features, mel-frequency

cepstral coecients (MFCC) [4], or techniques such as relative spectral (RASTA) methodology [5]. The second type of

method refers to the establishment of compensation models

for noisy environments without modification to the speech

signal. The third field of research is concerned with distance

and similarity measurements. The major methods of this

field are founded on the principle to find a robust distorsion

less influenced by noise [6].

Despite these eorts to address robustness, adapting to

changing environments remains the major obstacle to speech

recognition in practical applications. Investigating innovative strategies has become essential to overcome the drawbacks of classical approaches. In this context, evolutionary

algorithms (EAs) are robust solutions, and they are useful

to find good solutions to complex problems (artificial neural

networks topology or weights for instance) and to avoid local minima [7]. Applying artificial neural networks, Spalanzani [8] showed that recognition of digits and vowels can

be improved by using genetically optimized initialization of

weights and biases. In this paper, we propose an approach

which can be viewed as a signal transformation via a mapping operator using a mel-frequency space decomposition

based on the Karhunen-Lo`eve transform (KLT) and a genetic algorithm (GA) with a real-coded encoding (a part of

EAs). This transformation attempts to adapt hidden Markov

model-based CSR systems for adverse conditions. The principle consists of finding in the learning phase the principal

axes generated by the KLT and then optimizing them for the

815

provide projected noisy data that are as close as possible to

clean data.

This paper is organized as follows. Section 2 describes the

basis of our proposed hybrid KLT-GA enhancement method.

Section 3 describes the model linking the KLT to the evolution mechanism, which leads to a robust representation

of noisy data. Then, Section 4 describes the database, the

platform used in our experiments and the evaluation of the

proposed KLT-GA-based recognizer in a noisy car environment and in a telephone channel environment. This section

includes the comparison of KLT-GA processed recognizers to

a baseline CSR system in order to evaluate performance. Finally, Section 5 concludes with a perspective of this work.

2.

ROBUST SYSTEM

CSR systems based on statistical models such as hidden

Markov models (HMM) automatically recognize speech

sounds by comparing their acoustic features with those determined during training [9]. A bayesian statistical framework underlies the HMM-speech recognizer. The development of such a recognizer can be summarized as follows. Let

w be a sequence of phones (or words), which produces a sequence of observable acoustic data o, sent through a noisy

transmission channel. In our study, telephone speech is corrupted by additive noise. The recognition process aims to

provide the most likely phone sequence w given the acoustic

data o. This estimation is performed by maximizing a posteriori (MAP) the p(w | o) probability:

w = argmax p(w | o) = argmax p(o | w)p(w),

w

(1)

prior probability, determined by the language model, that the

speaker utters w, and p(o | w) is the conditional probability

that the acoustic chanel produces the sequence o. Let be

the set of models used by the recognizer to decode acoustic

parameters through the use of the MAP. Then (1) becomes

w = argmax p(o | w, )p(w).

w

2.2.

the logarithm of the short-term power spectrum of the signal. The use of a logarithmic function allows deconvolution

of the vocal tract transfer function and the voice source. Consequently, the pulse sequence corresponding to the periodic

voice source reappears in the cepstrum as a strong peak in

the frequency domain. The derived cepstral coecients are

commonly used to describe the short-term spectral envelope

of a speech signal. The computation of MFCCs requires the

selection of M critical bandpass filters that roughly approximate the frequency response of the basilar membrane in the

cochlea of the inner ear [4]. A discrete cosine transform, Cn ,

is applied to the output of M filters, Xk . These filters are triangular, cover the 1566844 Hz frequency range, and are spaced

on the mel-frequency scale. These filters are applied to the log

of the magnitude spectrum of the signal, which is estimated

on a short-time basis. Thus

(2)

Cn =

The mismatch between the training and the testing environments leads to a worse estimate for the likelihood of o given

and thus degrades CSR performance. Reducing this mismatch should increase the correct recognition rate. The mismatch can be viewed by considering the signal space, the feature space, or the model space. We are concerned with the

feature space, and consider a transformation T that maps

into a transformed feature space. Our approach is to find T

and the phone sequence w that maximize the joint likelihood of o and w given :

[T , w ] = argmax p(o | w, T, )p(w).

the typical conventional HMM-based technique is used to estimate w, while an EA-based technique enhances noisy data

iteratively by keeping the noisy features as close as possible to

the clean data. This EA-based transformation aims to reduce

the mismatch between training and operating conditions by

giving the HMM the ability to recall the training conditions.

As is shown in Figure 1, the idea is to manipulate the axes

generating the feature representation space to achieve a better robustness on noisy data. MFCCs serve as acoustic features. A Karhunen-Lo`eve decomposition in the MFCC domain allows obtaining the principal axes that constitute the

basis of the space where noisy data is represented. Then, a

population of these axes is created (corresponding to individuals in the initialization of the evolution process). The

evolution of the individuals is performed by EAs. The individuals are evaluated via a fitness function by quantifying,

through generations, their distance to individuals in a noisefree environment. The fittest individual (best principal axes)

is used to project the noisy data in its corresponding dimension. Genetically modified MFCCs and their derivatives are

finally used as enhanced features for the recognition process.

(3)

M

k=1

Xk cos

&

n

(k 0.5) ,

M

n = 1, 2, . . . , N,

(4)

analysis order, and Xk , k = 1, 2, . . . , M = 20, represents the

log-energy output of the kth filter.

2.3.

In order to reduce the eects of noise on ASR, many methods propose to decompose the vector space of the noisy signal

into a signal-plus-noise subspace and a noise subspace [10].

We remove the noise subspace and estimate the clean signal

from the remaining signal space. Such a decomposition applies the KLT to the noisy zero-mean normalized data.

816

MFC analysis

Clean speech

Enhanced

MFCC

operators

a22

a11

a33

KLT

S2

S1

decomposition

a13

Recognition

S3

a23

HMM

MFC analysis

Noisy speech

mean normalized MFCC vector C

has a symmetric nonnegative autothe assumption that C

with a rank r N, then C

T C]

correlation matrix R = [C

can be represented as a linear combination of eigenvectors

1 , 2 , . . . , r , which correspond to eigenvalues 1 2

can be calculated using

r 0, respectively. That is, C

the following orthogonal transformation:

=

C

r

k=1

k k ,

k = 1, . . . , r,

(5)

in the space generated by the rby the projection of C

eigenvector basis. Given that the magnitudes of low-order

eigenvalues are higher than for the high-order ones, the effect of the noise on the low-order eigenvalues is proportionately less than that for high-order ones. Thus, a linear estimation of the clean vector C is performed by projecting the

noisy vectors on the space generated by principal components weighted by a function Wk , which applies strong attenuation over higher-order eigenvectors depending on the

noise variance [10]. The enhanced MFCCs are then given by

=

C

r

k=1

Wk k k ,

k = 1, . . . , r.

(6)

particularly in the case of signal subspace decomposition

[10]. The optimal order r fixing the beginning of the strong

attenuation must be determined. In our new approach, GAs

determine optimal principal components. No assumptions

need to be made. Optimization is achieved when vectors

1 , 2 , . . . , N , which do not correspond necessarily to the

Gen , are

and C. The genetically enhanced MFCCs, C

Gen =

C

N

k=1

k k ,

k = 1, . . . , N.

(7)

Determining an optimal r is not needed since the GA considers vectors 1 , 2 , . . . , N as the fittest individuals for the

complete space dimension N. This process can be regarded

as the mapping transform, T , of (3).

3.

the chromosome (or solution) representation, the selection

function, the genetic operators making up the reproduction

function, the creation of the initial population, the termination criteria, and the evaluation function [11, 12]. The GA

maintains and manipulates a family or population of solutions (the 1 , 2 , . . . , N vectors in our case) and implements

a survival of the fittest strategy in its search for better solutions.

3.1.

Solution representation

the population. It is important since the representation

scheme determines how the problem is structured in the

GA and also determines the adequate genetic operators to

use [13]. For our application, the useful representation of

an individual or chromosome for function optimization involves genes or variables from an alphabet of floating-point

numbers with values within the variables upper and lower

bounds (resp., +1 and 1). Michalewicz [14] has done extensive experimentation comparing real-valued and binary GAs,

817

precision with more consistent results across replications.

2. Compute fit[X] and fit[Y ], fitness of X and Y

3. If fit[X] > fit[Y ]

Then X = X + g(X Y ) and Y = X

Estimate

' feasibility of X :

1 if ai xi bi i

(X ) =

0 otherwise

xi components of X , i = 1, . . . , N

4. If (X ) = 0

Then generate new g; goto 2

5. If all individuals reproduced then Stop

else goto 1

Stochastic selection is used to keep search strategies simple while allowing adaptivity. The selection of individuals to

produce successive generations plays an extremely important

role in GAs. A common selection approach assigns a probability of selection, P j , to each individual, j, based on its fitness value. Various methods exist to assign probabilities to

individuals; we use the normalized geometric ranking [15].

This method defines P j for each individual by

P j = q (1 q)s1 ,

(8)

q

,

1 (1 q)P

(9)

where

q =

(

the rank of the individual (1 being the best), and P the population size.

3.3. Genetic operators

The basic search mechanism of the GA is provided by two

types of operators: crossover and mutation. Crossover transforms two individuals into two new individuals, while mutation alters one individual to produce a single solution. A

float representation of the parents is denoted by X and Y . At

the end of the search, the fittest individual survives and is retained as an optimal KLT axis in its corresponding rank of

1 , 2 , . . . , N vectors.

f (Gen) = u2

Gen

1

Genmax

))t

(11)

shape parameter, Gen the current generation, and Genmax the

maximum number of generations. The multi-nonuniform

mutation generalizes the application of the nonuniform mutation operator to all the components of the parent X. The

main advantage of this operator is that the alteration is distributed on all individual components which lead to the extension of the search space and then permit to deal with any

kind of noise.

3.3.1 Crossover

3.4.

and transmit it to each ospring. In order to avoid the extension of the exploration domain of the best solution, we

preferred to use a crossover that utilizes fitness information,

that is, a heuristic crossover [15]. Let ai and bi be the lower

and upper bound, respectively, of each component xi representing a member of the population (X or Y ). This operator

produces a linear interpolation of X and Y . New individuals

X and Y (children) are created according Algorithm 1.

the mel-frequency space (that make the noisy MFCCs if

they are projected into these axes) to find the closest to the

clean MFCC. Thus, evolution is driven by a fitness function

defined in terms of a distance measure between the noisy

MFCC projected on a given individual (axis) and the clean

MFCC. The fittest individual is the axis which corresponds to

the minimum of that distance. The distance function applied

to cepstral (or other voice representations) refers to spectral

distorsion measures and represents the cost in a classification

system of speech frames. For two vectors C and C representing two frames [6], each with N components, the geometric

distance is defined as

3.3.2 Mutation

Mutation operators tend to make small random changes in

an attempt to explore all regions of the solution space [16].

The principle of a nonuniform mutation used in our application consists of randomly selecting one component, xk , of

an individual and setting it equal to a nonuniform random

number,1 xk :

x + b x f (Gen)

k

k

k

xk =

xk ak + xk f (Gen)

1 Otherwise,

if u1 < 0.5,

if u1 0.5,

(10)

Evaluation function

d C, C =

N

Ck C k

l

)1/l

(12)

k=1

which has been a valuable measure for both clean and noisy

speech [6, 17]. Figure 2 gives for the first four best axes the

evolution of their fitness (distorsion measure) through 300

is used as a distance measure

generations. Note that d(C, C)

because the evaluation function must be maximized.

818

0

0

0.5

1

1.5

2

2.5

3

4

100

200

3.5

300

100

0.5

0.5

1.5

2

2

2.5

100

300

1.5

2.5

3.5

200

Generation

Generation

200

300

Generation

3.5

100

200

300

Generation

Figure 2: Evolution of the performances of the best individual during 300 generations. Only the four first axes are considered among the

twelve.

The ideal, zero-knowledge assumption starts with a population of completely random axes. Another typical heuristic,

used in our system, initializes the population with a uniform distribution in a default set of known starting points

described by the boundaries (ai , bi ) for each axis component. The GA-based search ends when the population gets

homogeneity in performance (when children do not surpass

their parents), converges according to the Euclidean distorsion measure, or is terminated by the user if the number of

maximum generations is reached. Finally, the evolution process can be summarized in Algorithm 2.

4.

EXPERIMENTS

The following experiments used the TIMIT database [18],

which contains broadband recordings of a total of 6300 sentences, 10 sentences spoken by each of 630 speakers from 8

phonetically rich sentences. To simulate a noisy environment, car noise was added artificially to the clean speech.

To study the eect of such noise on the recognition accuracy

of the CSR system that we evaluated, the reference templates

for all tests were taken from clean speech. The training set is

composed of 1140 sentences (114 speakers) of dr1 and dr2

TIMIT subdirectories. On the other hand, the dr1 subset of

the TIMIT database, composed of 110 sentences, was chosen

to evaluate the recognition system.

In a second set of experiments, and in order to study

the impact of telephone channel degradation on recognition

accuracy of both baseline and enhanced CSR systems, the

NTIMIT database was used [19]. It was created by transmitting speech from the TIMIT database over long-distance telephone lines. Previous work has demonstrated that telephone

line use increases the rate of recognition errors; for example,

Moreno and Stern [20] report a 68% error rate by using a

version of SPHINX-II [21] as CSR system, TIMIT as training

database, and NTIMIT database, for the test.

819

Generate for each principal KLT component a population of axes

For Genmax generation Do

For each set of components Do

Project noisy data using KLT axes

Evaluate global Euclidean distance for clean data

End For

Select and Reproduce

End For

Project noisy data onto space generated by the best individuals

Algorithm 2: The evolutionary search technique for best KLT axes.

In order to test the recognition of continuous speech data

enhanced as described above, the HTK-based speech recognizer [22] was used. HTK is an HMM-based toolkit used for

isolated or continuous whole-word-based recognition systems. The toolkit supports continuous-density HMMs with

any number of state and mixture components. It also implements a general parameter-tying mechanism which allows

the creation of complex model topologies. Twelve MFCCs

were calculated using a 30-millisecond hamming window

advanced by 10 milliseconds for each frame. To do this, an

FFT calculates a magnitude spectrum for each frame, which

is then averaged into 20 triangular bins arranged at equal

mel-frequency intervals. Finally, a cosine transform is applied to such data to calculate the 12 MFCCs which form

a 12-dimensional (static) vector. This static vector is then

expanded after enhancement to produce a 36-dimensional

(static + first and second derivatives: MFCC D A) vector

upon which the HMMs, that model the speech subword

units, were trained. Regarding the used frame length, the

1140 sentences of dr1 and dr2 TIMIT subsets provided

342993 frames that were used for the training. The baseline system used a triphone Gaussian mixture HMM system. Triphones were trained through a tree-based clustering

method to deal with unseen context. A set of binary questions about phonetic contexts is built; the decision tree is

constructed by selecting the best question from the rule set

at each node [23].

4.3. Results and discussion

4.3.1 GA parameters

A population of 150 individuals is generated for each k and

evolves during 300 generations. The values for the GA parameters given in Table 1 were selected after extensive crossvalidation experiments and were shown to perform well with

all data. The maximum number of generations needed and

the population size are well adapted to our problem since

no improvement was observed when these parameters were

increased. At each generation, the best individuals are retained to reproduce. In the end of the evolution process, the

best individuals of the best population are considered as the

Parameter

Parameter value

Number of generations

Population size

Probability of selecting the best, q

Heuristic crossover rate

Multi-nonuniform mutation rate

Number of runs

Number of frames

Boundaries [ai , bi ]

300

150

0.08

0.25

0.06

50

114331

[1.0, +1.0]

[15]. For this purpose, data sets are composed of 114331

frames extracted from the TIMIT training subset and corresponding noisy frames extracted from the noisy TIMIT and

NTIMIT databases.

4.3.2

at dierent values of SNR, from 16 dB to 4 dB. Figure 3

shows that using the KLT-GA-based optimization to enhance

the MFCCs that were used for recognition with N-mixture

Gaussian HMMs for N = 1, 2, 4, 8 with triphone models

leads to a higher word recognition rate. The CSR system

including the KLT-GA-processed MFCCs performs significantly better than its MFCC D A- and KLT-MFCC D Abased CSR systems, for low and high noise conditions. The

system which contains enhanced MFCCs achieves 81.67% as

the best word recognition rate (%CW ) for 16-dB SNR and

four Gaussian mixtures. In the same conditions, the baseline

system dealing with noisy MFCCs and the system containing KLT-processed MFCCs achieve, respectively, 73.89% and

77.25%. The increased accuracy is more significant in low

SNR conditions, which attests to the robustness of the approach when acoustic conditions become severely degraded.

For instance, in the 4-dB SNR case, the KLT-GA-MFCCbased CSR system has accuracy higher than KLT-MFCCand MFCC-based CSR systems, respectively, by 12% and

20%. The comparison between KLT- and KLT-GA-processed

90

90

80

80

70

70

% Recognition rate

% Recognition rate

820

60

50

60

50

40

40

30

10

SNR (dB)

15

30

20

10

SNR (dB)

15

20

15

20

Baseline

KLT

(a) 1-mixture.

(b) 2-mixture.

90

90

80

80

% Recognition rate

% Recognition rate

KLT-GA

Baseline

KLT-GA

KLT

70

60

50

40

70

60

50

40

30

10

SNR (dB)

15

20

Baseline

KLT-GA

30

10

SNR (dB)

Baseline

KLT-GA

KLT

KLT

(c) 4-mixture.

(d) 8-mixture.

Figure 3: Percent word recognition performance (%CWrd ) of the KLT- and KLT-GA-based CSR systems compared to the baseline HTK

method (noisy MFCC) using (a) 1-mixture, (b) 2-mixture, (c) 4-mixture, and (d) 8-mixture triphones for dierent values of SNR.

more powerful whatever is the level of noise degradation.

Considering the KLT-based CSR, inclusion of the GA technique raised accuracy by about 11%. Figure 4 plots the

variations of the first four MFCCs for a signal that has been

chosen from the test set. It is clear from the comparison illustrated in this figure that the processed MFCCs, using the

proposed KLT-GA-based approach, are less variant than the

noisy MFCCs and closer to the original ones.

Extensive experimental studies characterized the impairments induced by telephone networks [24]. When speech is

recorded through telephone lines, a reduction in the analysis

bandwidth yields higher recognition error, particularly when

the system is trained with high-quality speech and tested using simulated telephone speech [20]. In our experiments, the

training set (dr1 and dr2 subdirectories of TIMIT) (1140 sentences and 342993 frames) was used to train a set of clean

821

10

20

10

Second MFCC

First MFCC

10

20

10

30

20

50

100

150

Frame number

200

250

50

100

150

Frame number

200

250

50

100

150

Frame number

200

250

20

10

Fourth MFCC

Third MFCC

10

10

10

20

20

30

50

100

150

Frame number

200

250

Figure 4: Comparison between clean, noisy, and enhanced MFCCs represented by solid, dotted, dashed-dotted lines, respectively.

as a test set. This subdirectory is composed of 110 sentences

and 34964 frames. Speakers and sentences used in the test

were dierent than those used in the training phase. For

the KLT- and KLT-GA-based CSR systems, we found that

using the KLT-GA as a preprocessing approach to enhance

the MFCCs that were used for recognition with N-mixture

Gaussian HMMs for N = 1, 2, 4, and 8, using triphone models, led to an important improvement in the accuracy of the

word recognition rate. Table 2 showed that this dierence

can reach 27% for MFCC D A- and KLT-GA-MFCC D Abased CSR systems. Table 2 shows that substitution and insertion errors are considerably reduced when the evolutionary approach is included, which gives more eectiveness to

the CSR system.

5.

CONCLUSION

GAs, for an important real-world application by presenting

of a KLT-GA hybrid enhancement noise reduction approach

in the cepstral domain in order to get less-variant parameters. Experiments show that the use of the enhanced parameters using such a hybrid approach increases the recognition

rate of the CSR process in highly interfering car noise environments for a wide range of SNRs varying from 16 dB to

4 dB and when speech is submitted to the telephone channel degradation. The approach can be applied whatever the

distorsion of vectors under the condition to identify the fitness function. The front-end of the proposed KLT-GA-based

CSR system does not require any a priori knowledge about

the nature of the corrupting noisy signal, which allows dealing with any kind of noise. Moreover, using this enhancement technique avoids the noise estimation process that requires a speech/nonspeech preclassification, which could not

be accurate for low SNRs. It is also interesting to note that

such a technique is less complex than many other enhancement techniques, which need to either model or compensate

for the noise. However, this enhancement technique requires

822

rate (%Ins ), deletion rate (%Del ), and substitution rate (%Sub )

of the MFCC D A-, KLT-MFCC D A-, KLT-GA-MFCC D Abased HTK CSR systems using (a) 1-mixture, (b) 2-mixture, (c) 4mixture, and (d) 8-mixture triphone models.

[6]

[7]

MFCC D A

82.71

4.27

33.44

13.02

KLT-MFCC D A

77.05

KLT-GA-MFCC D A 54.48

5.11

5.42

30.04

25.42

17.84

40.10

[8]

[9]

(b) %CWrd using 2-mixture triphone models.

[10]

MFCC D A

KLT-MFCC D A

81.25

78.11

3.44

3.81

38.44

48.89

15.31

18.08

KLT-GA-MFCC D A 52.40

4.27

52.40

43.33

[11]

[12]

[13]

78.85

3.75

38.23

17.40

[14]

KLT-MFCC D A

76.27

KLT-GA-MFCC D A 49.69

4.88

5.62

39.54

25.31

18.85

44.69

[15]

MFCC D A

MFCC D A

KLT-MFCC D A

78.02

77.36

3.96

5.37

40.83

34.62

18.02

17.32

KLT-GA-MFCC D A 48.41

6.56

26.46

45.00

[16]

[17]

[18]

a large amount of data in order to find the best individual. Many other directions remain open for further work.

Present goals include analyzing evolved genetic parameters,

evaluating how performance scales with other types of noise

(nonstationary, limited band, etc.).

[19]

[20]

REFERENCES

[1] Y. Gong, Speech recognition in noisy environments: A survey, Speech Communication, vol. 16, no. 3, pp. 261291, 1995.

[2] S. F. Boll, Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoustics, Speech, and Signal

Processing, vol. 27, no. 2, pp. 113120, 1979.

[3] D. Mansour and B. H. Juang, A family of distortion measures

based upon projection operation for robust speech recognition, IEEE Trans. Acoustics, Speech, and Signal Processing, vol.

37, no. 11, pp. 16591671, 1989.

[4] S. B. Davis and P. Mermelstein, Comparison of parametric

representation for monosyllabic word recognition in continuously spoken sentences, IEEE Trans. Acoustics, Speech, and

Signal Processing, vol. 28, no. 4, pp. 357366, 1980.

[5] H. Hermansky, N. Morgan, A. Bayya, and P. Kohn, RASTAPLP speech analysis technique, in Proc. IEEE Int. Conf. Acous-

[21]

[22]

[23]

[24]

tics, Speech, Signal Processing, vol. 1, pp. 121124, San Fransisco, Calif, USA, March 1992.

J. Hernando and C. Nadeu, A comparative study of parameters and distances for noisy speech recognition, in Proc. Eurospeech 91, pp. 9194, Genova, Italy, September 1991.

C. R. Reeves and S. J. Taylor, Selection of training data for

neural networks by a Genetic Algorithm, in Parallel Problem

Solving from Nature, pp. 633642, Springer-Verlag, Amsterdam, The Netherlands, September 1998.

A. Spalanzani, S.-A. Selouani, and H. Kabre, Evolutionary

algorithms for optimizing speech data projection, in Genetic

and Evolutionary Computation Conference, p. 1799, Orlando,

Fla, USA, July 1999.

D. OShaughnessy, Speech Communications: Human and Machine, IEEE Press, Piscataway, NJ, USA, 2nd edition, 2000.

Y. Ephraim and H. L. Van Trees, A signal subspace approach

for speech enhancement, IEEE Trans. Speech, and Audio Processing, vol. 3, no. 4, pp. 251266, 1995.

D. E. Goldberg, Genetic Algorithms in Search, Optimization

and Machine Learning, Addison-Wesley, Reading, Mass, USA,

1989.

J. Holland, Adaptation in Natural and Artificial Systems, The

University of Michigan Press, Ann Arbor, Mich, USA, 1975.

L. B. Booker, D. E. Goldberg, and J. H. Holland, Classifier

systems and genetic algorithms, Artificial Intelligence, vol. 40,

no. 1-3, pp. 235282, 1989.

Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, AI series. Springer-Verlag, New York, NY, USA,

1992.

C. R. Houk, J. A. Joines, and M. G. Kay, A genetic algorithm for function optimization: a Matlab implementation,

Tech. Rep. 95-09, North Carolina State University, Raleigh,

NC, USA, 1995.

L. Davis, Ed., The Genetic Algorithm Handbook, chapter 17,

Van Nostrand Reinhold, New York, NY, USA, 1991.

B. H. Juang, L. R. Rabiner, and J. G. Wilpon, On the use

of bandpass liftering in speech recognition, in Proc. IEEE

Int. Conf. Acoustics, Speech, Signal Processing, pp. 765768,

Tokyo, Japan, April 1986.

W. M. Fisher, G. R. Doddington, and K. M. Goudie-Marshall,

The DARPA speech recognition research database: specifications and status, in Proc. DARPA Speech Recognition Workshop, pp. 9399, Palo Alto, Calif, USA, February 1986.

C. Jankowski, A. Kalyanswamy, S. Basson, and J. Spitz,

NTIMIT: A phonetically balanced, continuous speech telephone bandwidth speech database, in Proc. IEEE Int. Conf.

Acoustics, Speech, Signal Processing, vol. 1, pp. 109112, Albuquerque, NM, USA, April 1990.

P. J. Moreno and R. M. Stern, Sources of degradation of

speech recognition in the telephone network, in Proc. IEEE

Int. Conf. Acoustics, Speech, Signal Processing, vol. 1, pp. 109

112, Adelaide, Australia, April 1994.

X. D. Huang, F. Alleva, H. W. Hon, M. Y. Hwang, K. F. Lee, and

R. Rosenfeld, The SPHINX-II speech recognition system: An

overview, Computer, Speech and Language, vol. 7, no. 2, pp.

137148, 1993.

Cambridge University Speech Group, The HTK Book (Version

2.1.1), Cambridge University Group, March 1997.

L. R. Bahl, P. V. de Souza, P. S. Gopalakrishnan, D. Nahamoo,

and M. A. Picheny, Decision trees for phonological rules in

continuous speech, in Proc. IEEE Int. Conf. Acoustics, Speech,

Signal Processing, pp. 185188, Toronto, Canada, May 1991.

W. D. Gaylor, Telephone Voice Transmission. Standards and

Measurements, Prentice-Hall, Englewood Clis, NJ, USA,

1989.

Sid-Ahmed Selouani received his B.E. degree in 1987 and his M.S. degree in 1991,

both in electronic engineering from the

University of Science and Technology of Algeria (U.S.T.H.B). He joined the Communication Langagi`ere et Interaction PersonneSyst`eme (CLIPS) Laboratory of Universite

Joseph Fourier of Grenoble, taking part in

the Algerian-French double degree program

degree in

and then he got a Docteur dEtat

the field of speech recognition in 2000 from the University of Science and Technology of Algeria. From 2000 to 2002, he held a postdoctoral fellowship in the Multimedia Group at the Institut National de Recherche Scientifique (INRS-Telecommunications) in

Montreal. He had teaching experience from 1991 to 2000 in the

University of Science and Technology of Algeria before starting

to work as an Assistant Professor at the Universite de Moncton,

Campus de Shippagan. He is also an Invited Professor at INRSTelecommunications. His main areas of research involve speech

recognition robustness and speaker adaptation by evolutionary

techniques, auditory front-ends for speech recognition, integration

of acoustic-phonetic indicative features knowledge in speech recognition, hybrid connectionist/stochastic approaches in speech recognition, language identification, and speech enhancement.

Douglas OShaughnessy has been a Professor at INRS-Telecommunications (University of Quebec) in Montreal, Canada,

since 1977. For this same period, he has

been an Adjunct Professor in the Department of Electrical Engineering, McGill University. Dr. OShaughnessy has worked as a

Teacher and Researcher in the speech communication field for 30 years. His interests

include automatic speech synthesis, analysis, coding and recognition. His research team is currently working

to improve various aspects of automatic voice dialogues in English

and French. He received his education from the Massachusetts Institute of Technology, Cambridge, MA (B.S. and M.S. degrees in

1972; Ph.D. degree in 1976). He is a Fellow of the Acoustical Society of America (1992) and an IEEE Senior Member (1989). From

1995 to 1999, he served as an Associate Editor for the IEEE Transactions on Speech and Audio Processing, and has been an Associate

Editor for the Journal of the Acoustical Society of America since

1998. Dr. OShaughnessy has been selected as the General Chair of

the 2004 International Conference on Acoustics, Speech and Signal Processing (ICASSP) in Montreal, Canada. He is the author of

the textbook Speech Communications: Human and Machine (IEEE

press, 2000).

823

c 2003 Hindawi Publishing Corporation

Dataset of Early Drosophila Gene Expression

Alexander Spirov

Department of Applied Mathematics and Statistics and The Center for Developmental Genetics, Stony Brook University,

Stony Brook, NY 11794-3600, USA

The Sechenov Institute of Evolutionary Physiology and Biochemistry, Russian Academy of Sciences, 44 Thorez Avenue,

St. Petersburg 194223, Russia

Email: spirov@kruppel.ams.sunysb.edu

David M. Holloway

Mathematics Department, British Columbia Institute of Technology, Burnaby, British Columbia, Canada V5G 3H2

Chemistry Department, University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z1

Email: david holloway@bcit.ca

Received 10 July 2002 and in revised form 1 December 2002

Understanding how genetic networks act in embryonic development requires a detailed and statistically significant dataset integrating diverse observational results. The fruit fly (Drosophila melanogaster) is used as a model organism for studying developmental genetics. In recent years, several laboratories have systematically gathered confocal microscopy images of patterns of

activity (expression) for genes governing early Drosophila development. Due to both the high variability between fruit fly embryos

and diverse sources of observational errors, some new nontrivial procedures for processing and integrating the raw observations

are required. Here we describe processing techniques based on genetic algorithms and discuss their ecacy in decreasing observational errors and illuminating the natural variability in gene expression patterns. The specific developmental problem studied is

anteroposterior specification of the body plan.

Keywords and phrases: image processing, elastic deformations, genetic algorithms, observational errors, variability, fluctuations.

1.

INTRODUCTION

aimed at deciphering how the blueprints of the body plan encrypted in DNA become a living, spatially patterned organism. Key to this process is ensembles of control genes acting

in concert to govern particular events in embryonic development. During developmental events, genes encoded in the

DNA are converted into spatial expression patterns on the

scale of the embryo. The genes, and their products, are active

players in regulating this pattern formation. In the first few

hours of fruit fly (Drosophila melanogaster) development, a

network of some 1520 genes establishes a striped pattern of

gene expression around the embryo [1, 2] (Figure 1). These

stripes are the first manifestation of the segments which characterize the anteroposterior (AP) (head-to-tail) organization

of the fly body plan. Similar segmentation events occur in

other animals, including humans. Drosophila research helps

to understand the genetics underlying such processes.

Though Drosophila may be a relatively easy organism

in which to do developmental genetics, there remain many

processing of large set of gene expression images in order

to achieve an integrated and statistically significant detailed

view of the segmentation process.

It is not possible to observe all segmentation genes at

once in the same embryo over the duration of patterning.

Single embryos can be imaged for a maximum of three

segmentation genes. Embryos are killed in the fixing process prior to imaging. Therefore, data sets integrated from

multiple embryos, stained for the variety of segmentation

genes, and over the patterning period, are necessary for

gaining a complete picture of segmentation dynamics. In

addition, collecting images from multiple flies (hundreds)

allows us to quantitate the level of natural variability in

segmentation and the experimental error in collecting this

data.

More and more laboratories (including those engaged in the Drosophila Genome Project) are presenting images of embryos from confocal scanning, for example, [3, 4] (see http://urchin.spbcas.ru/Mooshka/ and

http://www.fruitfly.org/). All workers in this area face image

825

(a)

(a)

(b)

(b)

reconstruction for Drosophila. These images show the first indications of body segmentation in the embryo. (a) An image of a developing fruit-fly egg under light microscope. The egg is shaped like

a prolate ellipsoid. Dark dots are nuclei located just under the egg

surface. There are about 3000 nuclei in this image. The nuclei are

scanned to visualize the amount of one of the segmentation gene

products (even-skipped or eve) at each nucleus. The darker the nucleus, the greater the local concentration of eve. (b) A reconstructed

3D picture showing the arrangement of nuclei and visualizing the

eve pattern in a yellow-red-black palette.

from the results of confocal microscopy.

In this paper, we review problems in the field of processing confocal images of Drosophila gene expression and

present our processing techniques based on genetic algorithms (GAs). We will discuss their ecacy in decreasing observational errors and visualizing natural variability in gene

expression patterns.

2.

DATA SETS FROM RAW IMAGES

Sources of variability in our images can be roughly subdivided into natural embryo variability in size and shape, natural expression pattern variability, errors of image processing

procedures, experimental errors (fixation, dyeing), observational errors (confocal scanning), and the molecular noise of

expression machinery.

Figure 2: Embryos of the same time class and the same length

have dierent expression patterns. Eve stripes dier in spacing and

overall domain along the anteroposterior (AP, x-) axis, and show

stripe curvature in the dorsoventral (DV, y-) direction.

the limits of natural size variability). However, integration of

data from dierent flies requires size standardization.

Size variability was resolved by image preprocessing with

the Khoros package [5]. After a cropping procedure, each image was rescaled to the same length and width. Relative units

of percent egg length are used.

2.2.

the positioning and proportions of expression patterns for

the same gene at the same developmental stage (Figure 2).

To match two images such as Figures 2a and 2b (in order to make integrated datasets), we use 2D elastic deformations. We treat separately the dorsoventral (DV) curvature dierences and the AP spacing dierences [6]. First,

we perform a 2D elastic deformation to straighten segmentation stripes. This step minimizes the DV contribution to

the AP patterning, especially to AP variability. Next, on

a pairwise basis, we move (in 1D) the stripes into register along the AP axis, minimizing the variability in stripe

spacing and overall expression domain. These two steps

make for a tough optimization procedure, which is probably

best solved with modern heuristic approaches such as GAs

[6].

Early embryos of isogenic fruit flies can dier in length by

30%. Regardless of such dierences in size, expression patterns for segmentation remain qualitatively the same. This is

a classic case of scaling in biological pattern formation; the

2.3.

Scanning error

After the above processing, images still have variability in fluorescence intensity due to experimental conditions. With image processing, we can address experimental or observational

826

250

Fluorescence

200

Stripe

straightening

150

100

50

0

Registration

,%

axis

DV-

50

100

80

60

40

20

Figure 3: An example of the systematic DV distortion of an expression surface, with the gene Kruppel.

errors which have a systematic character. Due to the ellipsoidal geometry of the egg, nuclei in the center of the image

(along the AP axis) are closer to the microscope objective and

look brighter than nuclei at the top and bottom of the image.

Intensity shows a DV dependence (Figure 3). The brightness

depends (roughly) quadratically on DV distance from the AP

midline. We flatten this DV bias by a procedure of expression

surface stretching.

Figure 4 summarizes the three steps of image processing

which follow the scaling: stripe straightening, stripe registration, and expression surface stretching. The details of the

processing techniques are in Section 3.

After image processing, we can generate an integrated

dataset and begin to address questions regarding the segmentation patterning dynamics. We are pursuing two problems initially. First, we are visualizing the maturation of the

expression patterns for all segmentation genes over the patterning period. Second, since we have removed many of the

sources of variability in the images, what remains should be

largely indicative of intrinsic, molecular scale fluctuations in

protein concentrations. We are comparing relative noise levels within the segmentation signaling hierarchy. These are

some of the first tests of theoretical predictions for noise

propagation in segmentation signaling [7, 8]. In general,

both of these approaches should provide tests of existing theories for segment patterning.

3.

METHODS

Gene expression was measured using fluorescently-tagged

antibodies as described in [9]. For each embryo, a 1024

1024 pixel image with 8 bits of fluorescence data in each of 3

channels was obtained (Figure 5). To obtain the data in terms

of nuclear location, an image segmentation procedure was

applied [10].

Stretching

Figure 4: Steps for processing large sets of images to obtain an integrated dataset of segmentation pattern dynamics (a pair of images

used in this example). Stripe straightening minimizes the DV contribution to the AP patterning. Stripe registration minimizes the

variability in AP stripe positioning. Expression surface stretching

minimizes systematic observational errors in the DV direction.

an ASCII table containing a series of data records, one for

each nucleus. (About 25003500 nuclei are described for

each image.) Each nucleus is characterized by a unique identification number, the x- and y-coordinates of its centroid,

and the average fluorescence levels of three gene products.

At present, over 1000 images have been scanned and processed. Our dataset contains data from embryos stained for

14 gene products. Each embryo was stained for eve (Figures

1 and 2) and two other genes.

Time classification

All embryos under study belong to cleavage cycle 14 [11].

This cycle is about an hour long and is characterized by a

rapid transition of the pair-rule gene expression patterns,

which culminates in the formation of 7 stripes. The embryos

were classified into eight time classes primarily by observation of the eve pattern. This classification was later verified

by observation of the other patterns and by membrane invagination data.

827

Complete registration is achieved by sequential application of the polynomial transformations (1) and (2) to pairs of

images. Complete registration within each time class relative

to a starting image (the time class exemplar) gives sets of images suitable for constructing integrated datasets. If we then

compare results across time classes, we are able to visualize

detailed pattern dynamics over cell cycle 14.

The starting images in each time class, the time class exemplars, were chosen using the following way: the distance

between each (stripe-straightened) image and every other

(stripe-straightened) image in a time class was calculated

using the registration cost function (see Section 3.3). These

costs were summed for each image and the image with the

lowest total cost was used as the starting image. All other images in the time class were registered to this image. The starting image was unaected by the registration transformation

[6].

We perform (fluorescence intensity) surface stretching to

decrease DV distortion using the following polynomial:

Z = Z + C1 Y + C2 Y 2 + C3 XY + C4 Y 3 + C5 XY 2 + C6 X 2 Y, (3)

for three gene products.

Our three main deformations introduced above (stripe

straightening, registration, and surface stretching) are based

on polynomial series. Due to the character of segmentation pattern variability, our deformations are reminiscent of

an earlier attempt by Thompson [12] to quantitatively describe the mechanism of shape change . Stripe straightening

looks quite similar to his famous image of a puer fish to

Mola mola fish transformation. This visually simple graphical technique was explicitly described by Bookstein [13, 14].

We have found that Drosophila segmentation patterns can

also be related by such simple transformation functions.

The stripe-straightening procedure is a transformation of

the AP, x-coordinate by the following polynomial:

x = Axy 2 + Bx2 y + Cxy 3 + Dx2 y 2 ,

(1)

where x = w w0 , y = h h0 , w and h are initial spatial coordinates, and w0 , h0 , A, B, C, and D are parameters.

The y-coordinate remains the same while the x-coordinate is

transformed as a function of both coordinates w and h (for

details, see [6, 15, 16]). The parameters w0 , h0 , A, B, C, and

D for each image are found by means of GAs.

Our pairwise image registration procedure is the next

step in the sequential transformation of the x-coordinate. We

use the following polynomial for x :

x = c0 + c1 x + c2 x2 + c3 x3 + c4 x4 + c5 x5 ,

(2)

of GAs for each image (for details, see [6, 16]).

are initial spatial coordinates, and W 0 , H 0 , C0 , C1 , C2 , C3 , C4 ,

and C5 are parameters found by means of GAs. Note that W 0

and H 0 generally dier from w0 and h0 in expression (1).

The computing time for finding parameters by optimization techniques is comparable for the three polynomial

transformations (1), (2), and (3), though stripe straightening

(1) is the most time intensive [6, 15, 16].

3.3.

Optimization by GAs

GAs, simplex, and a hybrid of these [6, 16]. Fitting polynomial coecients is fairly routine and can be solved with any

GA library. All we need is to define cost functions for our

three particular tasks.

We used a standard GA approach in a classic evolutionary strategy (ES). ES was developed by Rechenberg [17] and

Schwefel [18] for computer solution of optimization problems. ES algorithms consider the individual as the object

to be optimized. The character data of the individual is the

parameters to be optimized in an evolutionary-based process. These parameters are arranged as vectors of real numbers for which operations of crossover and mutation are

defined.

In GA, the program operates on a population of floatingpoint chromosomes. At each step, the program evaluates

every chromosome according to a cost function (below).

Then, according to a truncation strategy, an average score

is calculated. Copies of chromosomes with scores exceeding the average replace all chromosomes with scores less

than average. After this, a predetermined proportion of

the chromosome population undergoes mutation in which

one of the coecients gets a small increment. This whole

cycle is repeated until a desired level of optimization is

achieved.

D-V axis

828

UNIX, and in Borland and DEC Pascal. Details of the EO0.8.5 C++ library implementation have been published [6,

16].

90

80

70

60

50

40

30

20

10

4.

A-P axis

The following procedure evaluates chromosomes during the

GA calculation for stripe straightening. Each image was subdivided into a series of longitudinal strips (Figure 6). Each

strip is subdivided into bins, and a mean brightness (local

fluorescence level) is calculated for each bin. Each row of

means gives a profile of local brightness along each strip.

The cost function is computed by pairwise comparison of

all profiles and summing the squares of dierences between

the strips. The task of the stripe-straightening procedure is to

minimize this cost function.

As discussed in the introduction, fluorescence intensity measurements demonstrate high variability and are subject to diverse observational and experimental errors. Our aim with

the image processing is to decrease some of the observational

and experimental errors and help distinguish these from the

natural variability which we would like to study (i.e., characterization of the stochastic nature of molecular processes in

this gene network). We will discuss the ecacy of the image

processing by comparison of initial and residual variability in

our data.

4.1.

at each nucleus (Z), ranked according to DV position (ycoordinate) while the x-coordinate was ignored. Argument

Z j is a given nucleus fluorescence level and Z j+1 and Z j 1 are

fluorescence levels for its two nearest (DV) neighbors. Our

tests show that F1 is better for our purposes.

as possible (by heuristic optimizations) between the data

within a time class. Figure 7a shows a superposition of about

hundred eve expression surfaces after stripe straightening

and registration. (The intensity data is discrete at nuclear resolution but we display some of our results as continuously

interpolated expression surfaces.)

Embryo-to-embryo variability of the expression pattern

for the first ten zygotic segmentation genes we are studying is

similar to that for eve. Because of the two-dimensionality of

the expression surface and the irregularity of nuclear distribution, quantitative comparison of this variability is a tough

biometric task.

One way to simplify the problem is to compare representative cross-sections through the expression surface along

the midline of an embryo in the AP direction (e.g., Figure 6,

center strip). For all nuclei with centroids located between

50% and 60% embryo width (DV position), expression levels were extracted and ranked by AP coordinate. This array of

250350 nuclei gives an AP transect through the expression

surface [19].

Using these transects, we can measure the eect on

embryo-to-embryo variability of our processing steps.

Figure 7b shows the variability after rescaling and stripe

straightening (before complete registration) for about a

hundred eve expression profiles from the 8th time class

(Figure 7c). Intensity means at each AP position are shown

with error bars (standard deviation). Minimizing stripe spacing variability, by registration, reduces the error bars significantly (Figures 7d and 7e). In addition to molecular-level

fluctuations in gene expression, one of the remaining sources

of error in Figures 7d and 7e may be experimental variability in intensity (from fixing and dying procedures, as well

as variability in microscope scanning), estimated at 1015%

of the 0255 intensity scale. Normalization of this variability

may require both image processing and empirical solutions.

3.3.4 Implementation

4.2.

both in EO-0.8.5 C++ library [4] for DOS/Windows and

Due to systematic distortions in intensity data, however, the

To evaluate the similarity of a registering image to the reference image (time class exemplar), we use an approach similar to the previous one. We take longitudinal strips from

the midlines of the registering and reference images (e.g.,

Figure 6, centre strip). The strips are subdivided into bins

and mean brightness calculated for each bin. Each row of

means gives the local brightness profile along each embryo.

The cost function is computed by comparing the profiles and

summing the squares of dierences between them. Registration proceeds until this cost is minimized.

3.3.3 Cost function for surface stretching

To minimize distortion of the (fluorescence intensity) expression surface along the DV direction (y-coordinate), we

tested two cost functions based on discrete approximations

of first- and second-order derivatives in y:

F1 =

F2 =

Z j Z j+1

2

2Z j Z j+1 Z j 1

2

(4)

.

829

250

Fluorescence

250

200

150

100

50

0

150

100

p osi

tion

,%

EL

65

60

55

50

45

40

35

30

50

D-V

Fluorescence

200

30

40

50

60

70

80

A-P position in % of egg length

30

90

40

50

60

70

AP position (% egg length)

(a)

80

90

(b)

250

Fluorescence

200

Fluorescence

300

250

200

150

100

50

150

100

50

0

50

11

21

31

41

51

61

71

81

0

30

91

40

50

60

70

AP position (% egg length)

(c)

80

(d)

300

Fluorescence

250

200

150

100

50

0

50

21

41

61

81

(e)

Figure 7: Superposition of about a hundred images for eve gene expression from time class 8 (late cycle 14). (a) Superposition of all

eve expression surfaces after the stripe straightening and registration. (b) Variability of expression profiles for gene eve after the stripestraightening procedure. (c) Mean intensity at each AP position, with standard deviation error bars for the expression profiles from (b). (d)

Residual variability for the same dataset after stripe straightening and registration. (e) Mean intensity with standard deviation error bars for

the expression profiles from (d). These have decreased significantly with stripe registration. Data for the 1D profiles is extracted from 10%

(DV) longitudinal strips (e.g., Figure 6, center strip). Cubic spline interpolation was used to display discrete data.

expression surface for such an embryo looks like a half ellipsoid (Figures 8a and 8b). The fluorescence level at the edges

of the image is about 20 arbitrary units, while in the center it

is about 60 units. (The expression surface follows the geometry of the embryo as illustrated in Figure 1b.) Even in eve null

mutants, background fluorescence shows this distortion.

830

100

100

80

80

60

60

40 80

60

40

20

40 80

60

40

20

(X, Y, Z)

20

40

60

80

(X, Y, Z)

20

40

(a)

60

40

40

20

20

20

40

80

(b)

60

0

40 0

60

80

(X, Y, Z)

60

60

80

(c)

0

40 0

60

80

(X, Y, Z)

20

40

60

80

(d)

Figure 8: Surface stretching transformation. (a) and (b) Experimental expression surface and scatter plot, for a truly uniform distribution

of the eve gene product. (c) and (d) Expression surface and scatter plot after surface stretching, minimizing the systematic errors in intensity

data.

The stretching procedure transforms the expression surface along the DV, y-axis (Figures 8c and 8d). Minimizing

the systematic observational error in this direction gives us a

chance to directly observe nucleus-to-nucleus variability in a

single embryo (Figure 8c).

5.

We have found heuristic optimization procedures (transformations (1), (2), and (3)) to be a simple and eective way to

reduce observational errors in embryo images. This reduction of variability allows us to focus on the variability intrin-

cycle 14. Here, we give an overview of some of our results

with processed datasets.

5.1. Integrated dataset

As mentioned in the introduction, dataset integration from

multiple scanned embryos is necessary due to the impossibility of simultaneously staining embryos for all segmentation genes at once (the current limit is triple staining). Other

work [19, 20] have begun to address the processing necessary to standardize images for dataset integration. Myasnikova et al. [19] have used transects, as in Figures 7b and

7c, and have done stripe registration of the profiles (with

831

gt

200

150

100

60

50

30

40

50

60

A-P position

70

80

20

90

D-V

p osit

50

40

30

ion

Fluorescence

250

hb

class 8 (late cycle 14) for the gap genes hunchback (hb), giant (gt),

Kruppel, and knirps(kni) and the pair-rule gene eve. Each surface

is the gene expression for a time class exemplar (as discussed in

Section 3).

of stripe straightening and surface stretching, allowing for

the construction of 2D expression surfaces and integrated

datasets (Figure 9). These steps also minimize contributions

to AP variability from DV sources, clarifying the task of

studying molecular sources of intensity variability.

More such processed segmentation patterns are posted

and updated on the website HOX Pro (http://www.iephb.

nw.ru/hoxpro, [21]) and the web-resource DroAtlas (http://

www.iephb.nw.ru/spirov/atlas/atlas.html).

5.2. Dynamics of profile maturation

Any analysis of the formation of gene expression patterns

must address the striking dynamics over cycle 14. Especially

in early cycle 14, these patterns are quite transient, only settling down around mid-cycle 14 to the segmentation pattern.

Comparative analysis of pattern dynamics for the pair-rule

genes is particularly important. Essential questions on the

mechanisms underlying these striped patterns are still open

[22, 23].

The only way to trace the patterning in sucient detail

to address these questions is to integrate large sets of embryo images over these developmental stages. (Time ranking within cycle 14 is not a simple task. Presently, it takes an

expert to rank images into time classes. We are developing

automated software for ranking, to be published elsewhere.)

AP profiles which have been registered can be integrated into

composite pictures like Figure 10, which plots AP distance

horizontally against time (at the 8 time class resolution) vertically, with intensity in the outward direction.

Figure 10 allows us to examine a number of features of

cycle 14 expression dynamics. Gap genes tend to establish

sharp spatial boundaries earlier than the pair-rule genes.

Pair-rule genes are initially expressed in broad domains,

which later partition into seven stripes. The regularity of the

kni

123 4 5 6 7

eve

1 2 3 4 5 6 7

hairy

AP profiles of expression for the gap genes gt, hb, kni, and pairrule genes eve and hairy (h). Horizontal coordinate is spatial AP

axis (from left to right); vertical coordinate is time axis (from up

to down); expression axis is perpendicular to the plane of the diagrams. White numbers marks individual stripes of eve and hairy.

late cycle pattern is well covered in the literature, but the details of the early dynamics are not so well characterized.

All five genes show a movement towards the middle of

the embryo, with anterior expression domains moving posteriorly and posterior domains moving anteriorly. In more

detail, the small anterior domain of knirps (white arrowhead)

appears to move posteriorly at the same speed as eve stripe 1

(also marked by white arrowhead). It appears that we can see

interactions between hb and gt in the posterior: a posterior

gt peak forms first, but as posterior hb forms, the gt peak

moves anteriorly. This interaction appears to be reflected in

the movement of stripe 7 of eve and h (black arrowheads).

We hope that further study of the correlation between expression domains over cycle 14 and observation of the fine

gene-specific details of domain dynamics will serve to test

theories of pattern formation in Drosophila segmentation.

832

250

Fluorescence

200

150

100

50

0

0

20

40

60

AP position (% egg length)

80

100

(a)

250

Fluorescence

200

the genes, but the anterior edge of the eve stripe is relatively

well controlled. Figure 11b shows means and standard deviations at each AP position. We are using this type of data to

address how noise is propagated and filtered in the segmentation network (to appear elsewhere).

To conclude, we have applied image processing steps to

minimize particular sources of experimental and observational error in the scanned images of segmentation gene expression. Cropping and scaling addresses embryo size variability. Stripe straightening eliminates variable DV contributions to the AP pattern. Registration minimizes dierences in

expression domains and spacing for pair-rule genes. Expression surface stretching minimizes systematic observational

error along the y-axis. The combination of these procedures

allows us to create composite 2D expression surfaces for the

segmentation genes, allowing us to investigate pattern dynamics over cycle 14. Also, these procedures allow us to do

single-embryo statistics, eliminating many sources of experimental variability in order to address molecular-level noise

in the genetic machinery.

150

ACKNOWLEDGMENT

The work of AS is supported by USA National Institutes of

Health, Grant RO1-RR07801, INTAS Grant 97-30950, and

RFBR Grant 00-04-48515.

100

50

0

20

40

60

AP position (% egg length)

80

100

(b)

Figure 11: Eve and bcd fluorescence scatterplots and profiles (early

cycle 14, time class 1), sampled from a 50% DV longitudinal strip.

(a) Scatterplots after stripe straightening and surface stretching.

Each dot is the intensity for a single nucleus. (b) Curves of mean

intensity at each AP position, with standard deviation error bars.

Pictures like Figure 7c give us glimpses into the molecularlevel fluctuations existing in this gene network. However,

such data still displays variability in scanning between embryos and over time with the experimental procedure.

With stripe straightening and surface stretching, we have a

chance to look at nucleus-to-nucleus variability in single embryos, eliminating many sources of experimental error. (The

drawback is that we are limited to triple-stained embryos.)

Figure 11a shows the maternal protein bicoid (bcd) (exponential) and expression of eve (single peak, the future eve

stripe 1) for a single embryo in early cycle 14. This image was

made from a 50% DV longitudinal strip so that the observed

variation at any AP position is that in the DV direction (e.g.,

along a stripe). Each dot is the intensity for a single nucleus.

The variation in this plot is largely due to natural, molecularlevel fluctuations in gene expression. At this developmental

REFERENCES

[1] M. Akam, The molecular basis for metameric pattern in the

Drosophila embryo, Development, vol. 101, no. 1, pp. 122,

1987.

[2] P. A. Lawrence, The Making of a Fly, Blackwell Scientific Publications, Oxford, UK, 1992.

[3] B. Houchmandzadeh, E. Wieschaus, and E. Leibler, Establishment of developmental precision and proportions in the

early Drosophila embryo, Nature, vol. 415, no. 6873, pp. 798

802, 2002.

[4] M. Keijzer, J. J. Merelo, G. Romero, and M. Schoenauer,

Evolving objects: a general purpose evolutionary computation library, in Proc. 5th Conference on Artificial Evolution

(EA-2001), P. Collet, C. Fonlupt, J.-K. Hao, E. Lutton, and

M. Schoenauer, Eds., number 2310 in Springer-Verlag Lecture

Notes in Computer Science, pp. 231244, Springer-Verlag, Le

Creusot, France, 2001.

[5] J. Rasure and M. Young, An open environment for image

processing software development, in Proceedings of 1992

SPIE/IS&T Symposium on Electronic Imaging, vol. 1659 of

SPIE Proceedings, pp. 300310, San Jose, Calif, USA, February 1992.

[6] A. V. Spirov, A. B. Kazansky, D. L. Timakin, J. Reinitz,

and D. Kosman, Reconstruction of the dynamics of the

Drosophila genes from sets of images sharing a common pattern, Journal of Real-Time Imaging, vol. 8, pp. 507518, 2002.

[7] D. Holloway, J. Reinitz, A. V. Spirov, and C. E. VanarioAlonso, Sharp borders from fuzzy gradients, Trends in Genetics, vol. 18, no. 8, pp. 385387, 2002.

[8] T. C. Lacalli and L. G. Harrison, From gradients to segments:

models for pattern formation in early Drosophila embryogenesis, Semin. Dev. Biol., vol. 2, pp. 107117, 1991.

[9] D. Kosman, S. Small, and J. Reinitz, Rapid preparation of

a panel of polyclonal antibodies to Drosophila segmentation

proteins, Development Genes and Evolution, vol. 5, no. 208,

pp. 290294, 1998.

[10] D. Kosman, J. Reinitz, and D. H. Sharp, Automated assay of

gene expression at cellular resolution, in Proc. Pacific Symposium on Biocomputing (PSB 98), R. Altman, K. Dunker,

L. Hunter, and T. Klein, Eds., pp. 617, World Scientific Press,

Singapore, 1998.

[11] V. A. Foe and B. M. Alberts, Studies of nuclear and cytoplasmic behaviour during the five mitotic cycles that precede

gastrulation in Drosophila embryogenesis, Journal of Cell Science, vol. 61, pp. 3170, 1983.

[12] D. W. Thompson, On Growth and Form, Cambridge University Press, Cambridge, UK, 1917.

[13] F. L. Bookstein, When one form is between two others: an

application of biorthogonal analysis, American Zoologist, vol.

20, pp. 627641, 1980.

[14] F. L. Bookstein, Morphometric Tools for Landmark Data: Geometry and Biology, Cambridge University Press, Cambridge,

UK, 1991.

[15] A. V. Spirov, D. L. Timakin, J. Reinitz, and D. Kosman, Experimental determination of Drosophila embryonic coordinates

by genetic algorithms, the simplex method, and their hybrid,

in Proc. 2nd European Workshop on Evolutionary Computation in Image Analysis and Signal Processing (EvoIASP 00),

S. Cagnoni and R. Poli, Eds., number 1803 in Springer-Verlag

Lecture Notes in Computer Science, pp. 97106, SpringerVerlag, Edinburgh, Scotland, UK, April 2000.

[16] A. V. Spirov, D. L. Timakin, J. Reinitz, and D. Kosman, Using

of evolutionary computations in image processing for quantitative atlas of Drosophila genes expression, in Proc. 3rd European Workshop on Evolutionary Computation in Image Analysis and Signal Processing (EvoIASP 01), E. J. W. Boers, J. Gottlieb, P. L. Lanzi, et al., Eds., number 2037 in Springer-Verlag

Lecture Notes in Computer Science, pp. 374383, SpringerVerlag, Lake Como, Milan, Italy, April 2001.

[17] I. Rechenberg,

Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution,

Frommann-Holzboog, Stuttgart, Germany, 1973.

[18] H.-P. Schwefel, Numerical Optimization of Computer Models,

John Wiley & Sons, Chichester, UK, 1981.

[19] E. M. Myasnikova, A. A. Samsonova, K. N. Kozlov, M. G. Samsonova, and J. Reinitz, Registration of the expression patterns of Drosophila segmentation genes by two independent

methods, Bioinformatics, vol. 17, no. 1, pp. 312, 2001.

[20] K. Kozlov, E. Myasnikova, A. Pisarev, M. Samsonova, and

J. Reinitz, A method for two-dimensional registration and

construction of the two-dimensional atlas of gene expression

patterns in situ, Silico Biology, vol. 2, no. 2, pp. 125141,

2002.

[21] A. V. Spirov, M. Borovsky, and O. A. Spirova, HOX Pro DB:

the functional genomics of hox ensembles, Nucleic Acids Research, vol. 30, no. 1, pp. 351353, 2002.

[22] J. Reinitz, E. Mjolsness, and D. H. Sharp, Model for cooperative control of positional information in Drosophila by bicoid

and maternal hunchback, The Journal of Experimental Zoology, vol. 271, no. 1, pp. 4756, 1995.

[23] J. Reinitz and D. H. Sharp, Mechanism of eve stripe formation, Mechanisms of Development, vol. 49, no. 1-2, pp. 133

158, 1995.

833

Alexander Spirov is an Adjunct Associate

Professor in the Department of Applied

Mathematics and Statistics and the Center for Developmental Genetics at the State

University of New York at Stony Brook,

Stony Brook, New York. Dr. Spirov was born

in St. Petersburg, Russia. He received M.S.

degree in molecular biology in 1978 from

the St. Petersburg State University, St. Petersburg, Russia. He received his Ph.D. in

the area of biometrics in 1987 from the Irkutsk State University,

Irkutsk, Russia. His research interests are in computational biology and bioinformatics, web databases, data mining, artificial intelligence, evolutionary computations, animates, artificial life, and

evolutionary biology. He has published about 80 publications in

these areas.

David M. Holloway is an instructor of

mathematics at the British Columbia Institute of Technology and a Research Associate

in chemistry at the University of British

Columbia, Vancouver, Canada. His research

is focused on the formation of spatial pattern in developmental biology (embryology) in animals and plants. Topics include

the establishment and maintenance of differentiation states, coupling between chemical pattern and tissue growth for the generation of shape, and the

eects of molecular noise on spatial precision. This work is chiefly

computational (the solution of partial dierential equation models

for developmental phenomena), but also includes data analysis for

body segmentation in the fruit fly. He received his Ph.D. in physical

chemistry from the University of British Columbia in 1995, and did

postdoctoral fellowships there and at the University of Copenhagen

and Simon Fraser University.

c 2003 Hindawi Publishing Corporation

Time-Varying Recursive Systems

Michael S. White

Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 0EX, UK

Email: mike@whitem.com

Stuart J. Flockton

Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 0EX, UK

Email: s.flockton@rhul.ac.uk

Received 28 June 2002 and in revised form 29 November 2002

A comparison is made of the behaviour of some evolutionary algorithms in time-varying adaptive recursive filter systems. Simulations show that an algorithm including random immigrants outperforms a more conventional algorithm using the breeder

genetic algorithm as the mutation operator when the time variation is discontinuous, but neither algorithms performs well when

the time variation is rapid but smooth. To meet this deficit, a new hybrid algorithm which uses a hill climber as an additional

genetic operator, applied for several steps at each generation, is introduced. A comparison is made of the eect of applying the

hill climbing operator a few times to all members of the population or a larger number of times solely to the best individual; it is

found that applying to the whole population yields the better results, substantially improved compared with those obtained using

earlier methods.

Keywords and phrases: recursive filters, evolutionary algorithms, tracking.

1.

INTRODUCTION

Many problems in signal processing may be viewed as system identification. A block diagram of a typical system identification configuration is shown in Figure 1. The information available to the user is typically the input and the noisecorrupted output signals, x(n) and a(n), respectively, and

the aim is to identify the properties of the unknown system by, for example, putting an adaptive filter of a suitable

structure in parallel to the unknown system and altering the

parameters of this filter to minimise the error signal (n).

When the nature of the unknown system requires pole-zero

modelling, there is a diculty in adjusting the parameters

of the adaptive filter, as the mean square error (MSE) is a

nonquadratic function of the recursive filter coecients, so

the error surface of such a filter may have local minima as

well as the global minimum that is being sought. The ability

of evolutionary algorithms (EAs) to find global minima of

multimodal functions has led to their application in this area

[1, 2, 3, 4].

All these authors have considered only time-invariant

unknown systems. However in many real-life applications,

time variations are an ever-present feature. In noise or echo

cancellation, for example, the unknown system represents

Movements inside or outside of the recording environment

cause the characteristics of this filter to change with time.

The system to be identified in an HF transmission system

corresponds to the varying propagation path through the atmosphere. Hence there is an interest in investigating the applicability of evolutionary-based adaptive system identification algorithms to tracking time-varying recursive systems.

Previous work on the use of EAs in time-varying systems

has been published in [5, 6, 7, 8, 9] but none of these deal

with system identification of recursive systems. After explaining our choice of filter structure in Section 3, we go on in

Section 4 to compare the performance of the EA introduced

in [4] with that of the algorithm in [7]. We show that while

both can cope reasonably well with slow variations in the system parameters, the approach of [7] is more successful in the

case of discontinuous changes, but neither copes well where

the variation is smooth but fairly rapid (the distinction between slow and rapid variation is explained quantitatively

in Section 3.1). In Section 5, we propose a new hybrid algorithm which embeds what is in eect a hill-climbing operator within the EA and show that this new algorithm is much

more successful for the dicult problem of tracking rapid

variations.

835

Noise, w(n)

Unknown system

H(z)

x(n)

y(n) +

+

a(n)

Adaptive filter

H(z)

y (n)

Error, (n)

2

N 1

z1

2

N 1

N

Input, x(n)

z1

N 1

z1

2

z1

1

0

y(n)

Figure 1: System identification.

2.

ENVIRONMENTS

The standard genetic algorithm (GA), with its strong selection policy and low rate of mutation, quickly eliminates diversity from the population as it proceeds. In typical function

optimization applications, where the environment remains

static, we are not usually concerned with the population diversity at later stages of the search, so long as the best or mean

value of the population fitness is somewhere near to an acceptable value. However, when the function to be optimized

is nonstationary, the standard GA runs into considerable

problems once the population has substantially converged

on a particular region of the search space. At this point, the

GA is eectively reliant on the small number of random mutations, occurring each generation, to somehow redirect its

search to regions of higher fitness since standard crossover

operators are ineective when the population has become

largely homogeneous. This view is borne out by Pettits and

Swiggers study [10] in which a Holland-type GA was compared to cognitive (statistical predictive) and random pointmutation models in a stochastically fluctuating environment.

In all cases, the GA performed poorly in tracking the changing environment even when the rate of fluctuation was slow.

An approach to providing EAs capable of functioning well in

time-varying systems is the mutation-based strategy adopted

by Cobb and Grefenstette [5, 6, 7]. In this approach, population diversity is sustained either by replacing a proportion of

the standard GAs population with randomly generated individuals, the random immigrants strategy, or by increasing

the mutation rate when the performance of the GA degrades

(triggered hypermutation). Cobbs hypermutation operator is

adaptive, briefly increasing the mutation rate when it detects

that a degradation of performance (measured as a running

average of the best performing population members over five

generations) has occurred. However, it is easy to contrive categories of environmental change which would not trigger the

hypermutable state. On continuously changing functions,

the hypermutation GA has a greater variance in its tracking

performance than either the standard or random immigrants

GA. In oscillating environments, where the changes are more

drastic, the high mutation level of the hypermutation GA

destroys much of the information contained in the current

its prior state, the GA has to locate the previous optimum

from scratch.

3.

One of the main diculties encountered in recursive adaptive systems is the fact that the system can become unstable

if the coecients are unconstrained. With many filter structures, it is not immediately obvious whether any particular

set of coecients will result in the presence of a pole outside the unit circle, and hence instability. On the other hand,

it is important that the adaptive algorithm is able to cover

the entire stable coecient space, so it is desirable to adopt

a structure which will make this possible at the same time as

making stability monitoring easy. It is for this reason that the

pole-zero lattice filter [11] was adopted for this work. A block

diagram of the filter structure is given in Figure 2.

The input-output relation of the filter is given by

y(n) =

N

i (n)Bi (n),

(1)

i=0

where Fi (n) and Bi (n) are the forward and backward residuals denoted by

Bi (n) = Bi1 (n) + i (n)Fi (n),

i = 1, 2, . . . , N,

i = N, . . . , 1,

FN (n) = x(N).

(2)

It can be shown that a necessary and sucient condition for all of the roots of the pole polynomial to lie within

the unit circle is |ki | < 1, i = 1, . . . , N, so the stability of

candidate models can be guaranteed merely by restricting

the range over which the feedback coecients are allowed

to vary. Since this must be done when implementing the GA

anyway, the ability to maintain filter stability is essentially obtained without cost.

3.1.

being tracked

employs the concept of the nonstationarity degree to embody

the notions of both the size and speed of time variations. The

836

d(n) =

E t(n)2

min (n)

(3)

in the unknown system and min (n) is the output noise power

in the absence of time variations in the system.

Having devised a metric incorporating both the speed

and size of time variations, Macchi [12] goes on to describe

three distinct classes of nonstationarity. Slow variations are

those in which the nonstationarity degree is much less than

one, that is, the variation noise is masked by the measurement noise. For the LMS adaptive filter, slow changes to the

plant impulse response are seen to be easy to track since

the time variations need not be estimated very accurately.

This class of time variations is further subdivided into two

groups in which the unknown filter coecients undergo

deterministic or random evolution patterns. Rapid variations (d(n) permanently greater than one), however, present

a much greater problem to LMS and LS adaptive filters. In

the case of time-varying line enhancement at low signal-tonoise ratio, where the frequency of the sinusoidal input signal

is chirped, Macchi et al. state that . . . slow adaptation/slow

variation condition implies an upper limit for the chirp rate

. This limit is the level above which the misadjustment is

larger than the original additive noise. The noisy signal is

thus a better estimate of the sinusoid than the adaptive system output. The slow adaptation condition is therefore required, in practice, to implement the adaptive system [13,

page 360].

In the case of LMS adaptive and inverse adaptive modelling, adaptive filters cannot track time variations which

are so rapid that d(n) is permanently greater than one. Indeed within a single iteration, the algorithm cannot acquire

starting from H(n)

[12, page

298].

As a consequence, only a special subset of rapid time

variations is generally considered in the context of LMS filter adaptation. The jump class of nonstationarity produces

scarce large changes in the unknown filter impulse-response.

Hence the definition of jump variations is variations where

occasionally

d(n) 1,

(4)

d(n) 1.

(5)

but otherwise,

enough for the algorithm to achieve the steady-state where

the error is approximately equal to the additive noise.

4.

ALGORITHMS

In this section, the performance of two genetic adaptive algorithms operating in a variety of nonstationary environments

adaptive algorithm described in [4]. The lattice coecients

are encoded as floating-point numbers and the mutation operator used is that from the breeder genetic algorithm (BGA)

described in [14]. This scheme randomly chooses, with probability 1/32, one of the 32 points (215 A, 214 A, . . . , 20 A),

where A defines the mutation range and is, in these simulations, set to 0.1 coecient range. The crossover operator involved selecting two parent filter structures at random

and generating identical copies. Two cut points were randomly selected and coecients lying between these limits

were swapped between the ospring. The newly generated

lattice filters were then inserted into the population replacing the two parent structures.

A measure of fitness of the new filter was obtained by

calculating the MSE for a block of current input and output

data. A block length of 10 input-output pairs was used for the

experiments reported below on a slowly varying system while

a length of 5 input-output pairs was used for the rapidly varying system. Fitness scaling was used, as described in Goldberg [15, page 77], and fitness proportional selection was

implemented using Bakers stochastic universal sampling algorithm [16]. Elitism was used to preserve the best performing individual from each generation. Crossover and mutation

rates were set to 0.1 and 0.6, respectively, and the population

contained 400 models. It was hoped that the use of the BGA

mutation scheme would give this algorithm a greater ability

to follow system changes than that of a GA using a more conventional mutation scheme, as the BGA algorithm retains,

even when the population has comparatively converged, significant probability of making substantial changes in the coecients if the system that it is modelling is found to have

changed.

In competition with this genetic optimizer, the random

immigrants mechanism of Cobb and Grefenstette, discussed

above, was placed. For this set of simulation experiments,

20% of the population was replaced by randomly generated

individuals every 10 generations. The same controlling parameters were used for both GAs.

4.1.

making nonrandom alterations to the coecients of a sixthorder all-pole lattice filter. In the case of slow and rapid time

variations, the lattice coecients were varied in a sinusoidal

or cosinusoidal fashion taking in the full extent of the coecient range (1). Changes to the plant coecients were

eected at every sample instant with the precise magnitude

of these variations reflected in the value of d for each environment. With measurement noise suitably scaled to give a

signal-to-noise ratio of approximately 40 dB, the nonstationarity degrees of the slow and rapidly varying systems are 0.03

and 1.6, respectively.

Traditional (nonevolutionary) adaptive algorithms can

run into problems when called upon to track rapid time

variations (d permanently greater than one). When these

changes occur infrequently, however, the well-documented

837

10

2.0

1.0

a1

1.0

10

NMSE (dB)

0.0

2.0

20

200

400

600

800

1000

200

400

600

800

1000

200

400

600

Generations

800

1000

2.0

1.0

a2

30

0.0

1.0

40

200

400

600

Generations

800

1.0

1000

0.0

a3

50

2.0

Standard GA

Random immigrants GA

2.0

3.0

rapidly varying environment (d = 1.6).

to describe the time to convergence and excess MSE that results. In order to investigate the performance of the genetic

adaptive algorithm under such conditions, an environment

was constructed in which the time variations of the plant

coecients are occasional and are often large in magnitude.

The system to be modelled was once again a sixth-order allpole filter. The infrequent time variations were introduced

by periodically negating one of the plant lattice coecients.

As a consequence, for much of the simulation, the unknown

system is time invariant (d = 0) with the nonstationarity degree greater than zero only during the occasional step

changes.

4.2. Results

The performance of the BGA-based algorithm and random

immigrants GA was evaluated in each of the three timevarying environments detailed. In each case, fifty GA runs

were performed using the same environment (time-varying

system).

In both the slowly changing and the jump environments,

the behaviour was more or less as expected. In the slowly

changing environment, both algorithms were able to reduce

the error to near the 40 dB noise floor (set by the level of

noise added to the system) and inspection of the parameters

shows them to be following the changes in the system well.

In the case of the step changes, the random immigrants algorithm exhibited better behaviour, recovering more quickly

when the system changed. The tracking of rapid changes

however is more dicult than either of these, and hence of

more interest, and in this neither of the algorithms are particularly successful. The error reduction performance of the

two adaptive algorithms is illustrated in Figure 3. In addi-

1.0

Standard GA

Random immigrants GA

True value of the coecient

rapidly varying environment (d = 1.6).

blocked input-output data, the extent to which the unknown

system is correctly identified fluctuates on a more macroscopic scale. The normalised mean square error (NMSE)

varies between the theoretical minimum of 40 dB and a

maximum of around 8 dB, eventual settling down to a

mean of around 20 dB.

These phenomena can be explained when one looks at a

graph of the coecient tracking performance (Figure 4). The

graph shows the time evolutions of the first three direct-form

coecients of the plant (represented by a dotted line) and the

best adaptive filter in the population. The coecients generated by the standard floating point GA are depicted by a gray

line whilst those produced by the random immigrants GA

are represented by a black line. Neither the standard floatingpoint GA nor the random immigrants GA were able to track

the rapid variations in the plant coecients throughout the

entire run. The periods when the best adaptive filter coefficient values diered significantly from the optimal values

correspond, in both cases, to the times when the identification was poor (see Figure 3).

5.

rapid changes system parameters would be useful. A possible

method is to devise a hybrid algorithm combining the global

properties of the GA with a local search method to follow

838

the local variations in the parameters. In this way, the two

major failings of the individual components of the hybrid

can be addressed. The GA is often capable of finding reasonable solutions to quite dicult problems but its characteristic slow finishing is legendary. Conversely, the huge array of

gradient-based and gradientless local search techniques run

the risk of becoming hopelessly entangled in local optima. In

combining these two methodologies, the hybrid GA has been

shown to produce improvements in performance over the

constituent search techniques in certain problem domains

[17, 18, 19, 20].

Goldberg [15, page 202] discusses a number of ways in

which local search and GAs may be hybridized. In one configuration, the hybrid is described in terms of a batch scheme.

The GA is run long enough for the population to become

largely homogeneous. At this point, the local optimization

procedure takes over and continues the search, from perhaps the best 5 or 10% of solutions in the population, until improvement is no longer possible. This method allows

the GA to determine the gross features of the solution space,

hopefully resulting in convergence to the basin of attraction

around the global optimum, before switching to a technique

better suited to fine tuning of the solutions. An alternative

approach is to embed the local search within the framework

of the GA, treating it rather like another genetic operator.

This is the scheme adopted by Kido et al. [18] (who combine GA, simulated annealing, and TABU search), Bersini

and Renders [20] (whose GA incorporates a hill-climbing

operator), and Miller et al. [19] (who employ a variety of

problem-specific local improvement operators). This second

hybrid configuration is better suited to the identification of

time-varying systems. In this case, the local search heuristic

is embedded within the framework of the EA and is treated

as another genetic operator. The local optimization scheme is

enabled for a certain number of iterations at regular intervals

in the GA run.

The hybrid approach utilizes a random hill-climbing

technique to perform periodic local optimization. This procedure is ideally suited to incorporation in the EA since it

does not require calculation of gradients or any other auxiliary information. Instead, the same evaluation function

can be employed to determine the merit of the newly sampled points in the coecient space. Since the technique is

greedy, the locally optimized solution is always at least as

good as its genetic predecessor. In addition, once a change

in the unknown system has occurred and is detected by a

degradation of the models performance, no new data samples are required. The hill-climbing method incorporated

here into the GA is the random search technique proposed

by Solis and Wets [21]. This algorithm randomly generates a new search point from a uniform distribution centred about the current coecient set. The standard deviation of the distribution k is expanded or contracted in

relation to the success of the algorithm in locating better

performing models. If the first-chosen new point is not an

improvement on the original point, the algorithm tests another point the same distance away in exactly the opposite

direction.

In detail, the structure of the algorithm as used here is as

follows. Firstly, the parameter k is updated, being increased

by a factor of 2 if the previous 5 iterations have all yielded

improved fitness, decreased by a factor of 2 if the previous

3 iterations have all failed to find an improved fitness, and

left unchanged if neither of these conditions has been met.

In the second step, a new candidate point in coecient space

is obtained from a normal distribution of standard deviation

k centred on the current point. The fitness of this new point

is then evaluated. If the fitness is improved, the new point

is retained and becomes the current point; if the fitness is

not improved, the point an equal distance in the opposite

direction is tested; and if better, it becomes the current point.

If neither yields an improvement, the current point is kept

and the algorithm returns to the first step.

The use of this hybrid arrangement of EA and hill climber

introduces further control parameters into the adaptive system, namely, the number of structures to undergo local optimization and the number of iterations in each hill-climbing

episode. Two extremes were investigated. In the first, hybrid A, every model in the population underwent a limited

amount of hill climbing. The other configuration, hybrid B,

locally optimized only the best structure in the population at

each generational step. In order to allow for direct comparison with the results in the previous section, the population

size was reduced so that there would be approximately the

same number of function evaluations in each case. For hybrid A, each model in a population of 100 underwent three

iterations of the hill-climbing algorithm at every generational

step while for hybrid B the population was set to 300 and

then the best at each generation was optimized over approximately 100 iterations of the random hill-climbing procedure.

Simulation experiments indicated that both hybrids were

able to track the slowly varying environment requiring less

than two hundred generations to acquire near-optimal coefficient values. The smaller population size implemented in

each case resulted in poorer initial performance, but this was

oset by the increased rate of improvement brought about

by the local hill-climbing operator. In the case of intermittent step changes in the unknown system characteristics, the

performance of the two hybrids was observed to fall between

that of the standard and random immigrants GAs. Figure 5

compares the tracking performance of these two hybrid GA

configurations in a rapidly changing environment. Hybrid A

(development of every individual) is represented by a gray

line. The second hill-climbing/GA hybrid (development of

the best individual) is shown by a black solid line. Although

a slight bias in the estimated coecients is sometimes in evidence, hybrid A is clearly able to track the qualitative behaviour of the plant coecients. Development of the best individual, however, is not sucient to induce reliable tracking

and the performance of hybrid B suers as a result.

The addition of individual improvement within the EA

framework has resulted in an adaptive algorithm which is

able to track the coecients of a rapidly varying system

(d > 1) with some success. This is a feat which poses considerable problems to conventional adaptive algorithms (see

Section 3.1). Wholesale local improvement was observed to

839

2.0

2.0

1.0

1.0

a1

a1

0.0

200

400

600

800

2.0

1000

2.0

2.0

1.0

1.0

0.0

0.0

a2

a2

2.0

1.0

2.0

200

400

600

800

2.0

1000

200

400

600

800

1000

200

400

600

800

1000

200

400

600

Generations

800

1000

1.0

0.0

1.0

a3

0.0

1.0

2.0

3.0

1.0

1.0

a3

0.0

1.0

1.0

2.0

200

400

600

Generations

800

3.0

1000

Hybrid B: Development of best individual

True value of the coecient

rapidly varying environment (d = 1.6).

latter technique leaves the remainder of the population trailing behind the best structure. As the nonstationarity degree

of the plant is increased, an adaptive algorithm relying solely

upon evolutionary principles will lag further behind the time

variations. This hybrid technique, however, permits the provision of greater local optimization flexibility (more iterations of the hill climber) when required.

Figure 6 illustrates the tracking performance of the hybrid GA subjected to a time-varying environment in which

the nonstationarity degree was three times greater than in

the previous experiment (d = 4.8). The population in this

case contained 400 models, each one undergoing ten local

optimization iterations at every generational step. The inputoutput block size was further reduced to just two samples

in order that the plant coecients would not vary substantially within the duration of a data block. This resulted in

the coecient estimates generated by the hybrid adaptive algorithm fluctuating about their trajectory to a greater extent. Individual evaluations of candidate models, however,

required far less computation. The overall tracking performance of the hybrid was observed to be less accurate in

this case but the mean estimates of the time-varying plant

coecients were observed to express the correct qualitative

behaviour.

With emphasis shifting away from the role of evolutionary improvement in the hybrid adaptive algorithm as the

time variations become more extreme, the balance of explo-

rapidly varying environment (d = 4.8).

altered. This highlights that no single adaptation scheme is

likely to outperform all others on every class of time-varying

problem. On slowly varying systems, for example, a more

or less conventional EA provided good performance. When

the unknown system was aected by intermittent but largescale time variations, the wider ranging search of the random immigrants operator was required. If the error surface is

multimodal, hill-climbing operators are unlikely to provide

the desired search characteristics. Conversely, with a rapidly

changing system, the fast local search engendered by the hillclimbing operator provides the necessary response since only

relatively minor changes to the optimal coecients occur at

each generational step. However, this classification assumes

that the nature of the time variations aecting the unknown

system is known in advance. When such information is not

available or when more than one class of time variation is

present, some combination of techniques may be desirable.

6.

CONCLUSIONS

are changing slowly (d 1), both the floating-point GA

and the random immigrants GA were able to track the time

variations. However, when the time variations were infrequent but large in magnitude (jump variations), the standard

GA was unable to react quickly to the changes in the coecient values; but the random immigrants mechanism, on the

other hand, produced sucient diversity in the population

to rapidly respond to such step-like time variations. Neither

algorithm was able to successfully track the plant coecients

840

when the time variations were rapid and continuous (d > 1).

In the final section of the paper, a hybrid scheme is introduced and shown to be more eective than either of the earlier schemes for tracking these rapid variations.

REFERENCES

[1] D. M. Etter, M. J. Hicks, and K. H. Cho, Recursive

adaptive filter design using an adaptive genetic algorithm,

in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing

(ICASSP 82), vol. 2, pp. 635638, IEEE, Paris, France, May

1982.

[2] R. Nambiar, C. K. K. Tang, and P. Mars, Genetic and learning

automata algorithms for adaptive digital filters, in Proc. IEEE

Int. Conf. Acoustics, Speech, Signal Processing (ICASSP 92), pp.

4144, IEEE, San Francisco, Calif, USA, March 1992.

[3] K. Kristinsson and G. A. Dumont, System identification and

control using genetic algorithms, IEEE Trans. Systems, Man,

and Cybernetics, vol. 22, no. 5, pp. 10331046, 1992.

[4] M. S. White and S. J. Flockton, Adaptive recursive filtering

using evolutionary algorithms, in Evolutionary Algorithms

in Engineering Applications, D. Dasgupta and Z. Michalewicz,

Eds., pp. 361376, Springer-Verlag, Berlin, Germany, 1997.

[5] H. G. Cobb, An investigation into the use of hypermutation as an adaptive operator in genetic algorithms having continuous, time-dependent nonstationary environments, Tech.

Rep. 6760, Navy Center for Applied Research in Artificial Intelligence, Washington, DC, USA, December 1990.

[6] J. J. Grefenstette, Genetic algorithms for changing environments, in Proc. 2nd International Conference on Parallel Problem Solving from Nature (PPSN II), R. Manner and B. Manderick, Eds., pp. 137144, Elsevier, Amsterdam, September 1992.

[7] H. G. Cobb and J. J. Grefenstette, Genetic algorithms for

tracking changing environments, in Proc. 5th International

Conference on Genetic Algorithms (ICGA 93), S. Forrest, Ed.,

pp. 523530, Morgan Kaufmann, San Mateo, CA, USA, July

1993.

[8] A. Neubauer, A comparative study of evolutionary algorithms for on-line parameter tracking, in Proc. 4th International Conference on Parallel Problem Solving from Nature

(PPSN IV), H.-M. Voigt, W. Ebeling, I. Rechenberg, and H.P. Schwefel, Eds., pp. 624633, Springer-Verlag, Berlin, Germany, September 1996.

[9] F. Vavak, T. C. Fogarty, and K. Jukes, A genetic algorithm

with variable range of local search for tracking changing environments, in Proc. 4th International Conference on Parallel Problem Solving from Nature (PPSN IV), H.-M. Voigt,

W. Ebeling, I. Rechenberg, and H.-P. Schwefel, Eds., pp. 376

385, Springer-Verlag, Berlin, Germany, September 1996.

[10] E. Pettit and K. M. Swigger, An analysis of genetic-based pattern tracking and cognitive-based component tracking models of adaptation, in Proc. National Conference on Artificial

Intelligence (AAAI 83), pp. 327332, Morgan Kaufmann, San

Mateo, CA, USA, August 1983.

[11] A. H. Gray Jr. and J. D. Markel, Digital lattice and ladder filter

synthesis, IEEE Transactions on Audio and Electroacoustics,

vol. 21, no. 6, pp. 491500, 1973.

[12] O. Macchi, Adaptive Processing: The Least Mean Squares Approach with Applications in Transmission, John Wiley & Sons,

Chichester, UK, 1995.

[13] O. Macchi, N. Bershad, and M. Mboup, Steady-state superiority of LMS over LS for time-varying line enhancer in noisy

environment, IEE Proceedings F, vol. 138, no. 4, pp. 354360,

1991.

[14] H. Muhlenbein and D. Schlierkamp-Voosen, Predictive

models for the breeder genetic algorithm I. Continuous parameter optimization, Evolutionary Computation, vol. 1, no.

1, pp. 2549, 1993.

[15] D. E. Goldberg, Genetic Algorithms in Search, Optimization

and Machine Learning, Addison-Wesley Publishing, Reading,

Mass, USA, 1989.

[16] J. E. Baker, Reducing bias and ineciency in the selection algorithm, in Genetic Algorithms and Their Applications: Proc. 2nd International Conference on Genetic Algorithms

(ICGA 87), J. J. Grefenstette, Ed., pp. 1421, Lawrence Erlbaum Associates, Hillsdale, NJ, USA, July 1987.

[17] H. Muhlenbein, M. Schomisch, and J. Born, The parallel

genetic algorithm as a function optimizer, in Proc. 4th International Conference on Genetic Algorithms (ICGA 91), R. K.

Belew and L. B. Booker, Eds., pp. 271278, Morgan Kaufmann, University of California, San Diego, Calif, USA, July

1991.

[18] T. Kido, H. Kitano, and M. Nakanishi, A hybrid search

for genetic algorithms: Combining genetic algorithms, TABU

search, and simulated annealing, in Proc. 5th International

Conference on Genetic Algorithms (ICGA 93), S. Forrest, Ed.,

p. 641, Morgan Kaufmann, University of Illinois, UrbanaChampaign, Ill, USA, July 1993.

[19] J. A. Miller, W. D. Potter, R. V. Gandham, and C. N. Lapena,

An evaluation of local improvement operators for genetic algorithms, IEEE Trans. Systems, Man, and Cybernetics, vol. 23,

no. 5, pp. 13401351, 1993.

[20] H. Bersini and J.-M. Renders, Hybridizing genetic algorithms with hill-climbing methods for global optimization:

two possible ways, in Proc. 1st IEEE Conference on Evolutionary Computation (ICEC 94), D. B. Fogel, Ed., vol. I, pp.

312317, IEEE, Picataway, NJ, USA, June 1994.

[21] F. J. Solis and R. J.-B. Wets, Minimization by random search

techniques, Mathematics of Operations Research, vol. 6, no. 1,

pp. 1930, 1981.

Michael S. White was a student at Royal Holloway, University of

London, where he received the B.S. and Ph.D. degrees. He is currently employed by a New York-based hedge fund.

Stuart J. Flockton received the B.S. and Ph.D. degrees from the

University of Liverpool. He is a Senior Lecturer at Royal Holloway,

University of London. His research interests centre around signal

processing and evolutionary algorithms.

c 2003 Hindawi Publishing Corporation

to Multiclass Object Detection Using

Genetic Programming

Mengjie Zhang

School of Mathematical and Computing Sciences, Victoria University of Wellington, P.O. Box 600, Wellington, New Zealand

Email: mengjie@mcs.vuw.ac.nz

Victor B. Ciesielski

School of Computer Science and Information Technology, RMIT University, GPO Box 2476v Melbourne, 3001 Victoria, Australia

Email: vc@cs.rmit.edu.au

Peter Andreae

School of Mathematical and Computing Sciences, Victoria University of Wellington, P.O. Box 600, Wellington, New Zealand

Email: pondy@mcs.vuw.ac.nz

Received 30 June 2002 and in revised form 7 March 2003

This paper describes a domain-independent approach to the use of genetic programming for object detection problems in which

the locations of small objects of multiple classes in large images must be found. The evolved program is scanned over the large

images to locate the objects of interest. The paper develops three terminal sets based on domain-independent pixel statistics

and considers two dierent function sets. The fitness function is based on the detection rate and the false alarm rate. We have

tested the method on three object detection problems of increasing diculty. This work not only extends genetic programming

to multiclass-object detection problems, but also shows how to use a single evolved genetic program for both object classification

and localisation. The object classification map developed in this approach can be used as a general classification strategy in genetic

programming for multiple-class classification problems.

Keywords and phrases: machine learning, neural networks, genetic algorithms, object recognition, target detection, computer

vision.

1.

INTRODUCTION

the need for programs which can find objects of interest in

a database of images is increasing. For example, it may be

necessary to find all tumors in a database of x-ray images,

all cyclones in a database of satellite images, or a particular

face in a database of photographs. The common characteristic of such problems can be phrased as given subimage1 ,

subimage2 , . . . , subimagen which are examples of the objects

of interest, find all images which contain this object and its

location(s). Figure 10 shows examples of problems of this

kind. In the problem illustrated by Figure 10b, we want to

find centers of all of the Australian 5-cent and 20-cent coins

and determine whether the head or the tail side is up. Examples of other problems of this kind include target detection

problems [1, 2, 3], where the task is to find, say, all tanks,

trucks, or helicopters in an image. Unlike most of the cur-

detect only objects of one class [1, 4, 5], our objective is to

detect objects from a number of classes.

Domain independence means that the same method will

work unchanged on any problem, or at least on some range

of problems. This is very dicult to achieve at the current

state of the art in computer vision because most systems require careful analysis of the objects of interest and a determination of which features are likely to be useful for the detection task. Programs for extracting these features must then

be coded or found in some feature library. Each new vision

system must be handcrafted in this way. Our approach is to

work from the raw pixels directly or to use easily computed

pixel statistics such as the mean and variance of the pixels

in a subimage and to evolve the programs needed for object

detection.

Several approaches have been applied to automatic object detection and recognition problems. Typically, they use

842

multiple independent stages, such as preprocessing, edge detection, segmentation, feature extraction, and object classification [6, 7], which often results in some eciency and eectiveness problems. The final results rely too much upon the

results of earlier stages. If some objects are lost in one of the

early stages, it is very dicult or impossible to recover them

in the later stage. To avoid these disadvantages, this paper introduces a single-stage approach.

There have been a number of reports on the use of genetic programming (GP) in object detection and classification [8, 9]. Winkeler and Manjunath [10] describe a GP

system for object detection in which the evolved functions

operate directly on the pixel values. Teller and Veloso [11]

describe a GP system and a face recognition application in

which the evolved programs have a local indexed memory.

All of these approaches are based on detecting one class of

objects or two-class classification problems, that is, objects

versus everything else. GP naturally lends itself to binary

problems as a program output of less than 0 can be interpreted as one class and greater than or equal to 0 as the other

class. It is not obvious how to use GP for more than two

classes. The approach in this paper will focus on object detection problems in which a number of objects in more than

two classes of interest need to be localised and classified.

1.1. Outline of the approach to object detection

A brief outline of the method is as follows.

(1) Assemble a database of images in which the locations

and classes of all of the objects of interest are manually

determined. Split these images into a training set and

a test set.

(2) Determine an appropriate size (n n) of a square

which will cover all single objects of interest to form

the input field.

(3) Invoke an evolutionary process with images in the

training set to generate a program which can determine the class of an object in its input field.

(4) Apply the generated program as a moving window

template to the images in the test set and obtain the

locations of all the objects of interest in each class. Calculate the detection rate (DR) and the false alarm rate

(FAR) on the test set as the measure of performance.

1.2. Goals

The overall goal of this paper is to investigate a learning/adaptive, single-stage, and domain-independent approach to multiple-class object detection problems without

any preprocessing, segmentation, and specific feature extraction. This approach is based on a GP technique. Rather

than using specific image features, pixel statistics are used

as inputs to the evolved programs. Specifically, the following

questions will be explored on a sequence of detection problems of increasing diculty to determine the strengths and

limitations of the method.

(i) What image features involving pixels and pixel statistics would make useful terminals?

(ii) Will the 4 standard arithmetic operators be sucient

for the function set?

(iii) How can the fitness function be constructed, given that

there are multiple classes of interest?

(iv) How will performance vary with increasing diculty

of image detection problems?

(v) Will the performance be better than a neural network

(NN) approach [12] on the same problems?

1.3.

Structure

then describes the main components of this approach including the terminal set, the function set, and the fitness function. After describing the three image databases used here, we

present the experimental results and compare them with an

NN method. Finally, we analyse the results and the evolved

programs and present our conclusions.

2.

LITERATURE REVIEW

2.1.

Object detection

objects in large images. This includes both object classification and object localisation. Object classification refers to the

task of discriminating between images of dierent kinds of

objects, where each image contains only one of the objects of

interest. Object localisation refers to the task of identifying the

positions of all objects of interest in a large image. The object

detection problem is similar to the commonly used terms automatic target recognition and automatic object recognition.

We classify the existing object detection systems into

three dimensions based on whether the approach is segmentation free or not, domain independent or specific, and on

the number of object classes of interest in an image.

2.1.1

detection procedure, we divide the detection methods into

two categories.

(i) Segmentation-based approach, which uses multiple independent stages for object detection. Most research on object detection involves 4 stages: preprocessing, segmentation,

feature extraction, and classification [13, 14, 15], as shown in

Figure 1. The preprocessing stage aims to remove noise or

enhance edges. In the segmentation stage, a number of coherent regions and suspicious regions which might contain objects are usually located and separated from the entire

images. The feature extraction stage extracts domain-specific

features from the segmented regions. Finally, the classification stage uses these features to distinguish the classes of

the objects of interest. The algorithms or methods for these

stages are generally domain specific. Learning paradigms,

such as NNs and genetic algorithms/programming, have

usually been applied to the classification stage. In general,

each independent stage needs a program to fulfill that specific task and, accordingly, multiple programs are needed for

object detection problems. Success at each stage is critical

Source

databases

843

Preprocessing

Segmentation

Feature

extraction

Classification

(1)

(2)

(3)

(4)

trucks and tanks in visible, multispectral infrared, and synthetic aperture radar images [2], and recognition of tanks in

cluttered images [6] are two examples.

(ii) Single-stage approach, which uses only a single stage

to detect the objects of interest in large images. There is only a

single program produced for the whole object detection procedure. The major property of this approach is that it is segmentation free. Detecting tanks in infrared images [3] and

detecting small targets in cluttered images [16] based on a

single NN are examples of this approach.

While most recent work on object detection problems

concentrates on the segmentation-based approach, this paper focuses on the single-stage approach.

2.1.2 Domain-specific approach versus

domain-independent approach

In terms of the generalisation of the detection systems, there

are two major approaches.

(i) Domain-specific object detection, which uses specific

image features as inputs to the detector or classifier. These

features, which are usually highly domain dependent, are extracted from entire images or segmented images. In a lentil

grading and quality assessment system [17], for example, features such as brightness, colour, size, and perimeter are extracted and used as inputs to an NN classifier. This approach

generally involves a time-consuming investigation of good

features for a specific problem and a handcrafting of the corresponding feature extraction programs.

(ii) Domain-independent object detection, which usually

uses the raw pixels directly (no features) as inputs to the

detector or classifier. In this case, feature selection, extraction, and the handcrafting of corresponding programs can

be completely removed. This approach usually needs learning and adaptive techniques to learn features for the detection task. Directly using raw image pixel data as input to

NNs for detecting vehicles (tanks, trucks, cars, etc.) in infrared images [1] is such an example. However, long learning/evolution times are usually required due to the large

number of pixels. Furthermore, the approach generally requires a large number of training examples [18]. A special

case is to use a small number of domain-independent, pixel

level features (referred to as pixel statistics) such as the mean

and variance of some portions of an image [19].

2.1.3 Multiple class versus single class

Regarding the number of object classes of interest in an image, there are two main types of detection problems.

(i) One-class object detection problem, where there are

multiple objects in each image, however they belong to a sin-

gle class. One special case in this category is that there is only

one object of interest in each source image. In nature, these

problems contain a binary classification problem: object versus nonobject, also called object versus background. Examples

are detecting small targets in thermal infrared images [16]

and detecting a particular face in photograph images [20].

(ii) Multiple-class object detection problem, where there

are multiple object classes of interest, each of which has multiple objects in each image. Detection of handwritten digits

in zip code images [21] is an example of this kind.

It is possible to view a multiclass problem as series of binary problems. A problem with objects 3 classes of interest

can be implemented as class1 against everything else, class2

against everything else, and class 3 against everything else.

However, these are not independent detectors as some methods of dealing with situations when two detectors report an

object at the same location must be provided.

In general, multiple-class object detection problems are

more dicult than one-class detection problems. This paper

is focused on detecting multiple objects from a number of

classes in a set of images, which is particularly dicult. Most

research in object detection which has been done so far belongs to the one-class object detection problem.

2.2.

Performance evaluation

In this paper, we use the DR and FAR to measure the performance of multiclass object detection problems. The DR

refers to the number of small objects correctly reported by a

detection system as a percentage of the total number of actual objects in the image(s). The FAR, also called false alarms

per object or false alarms/object [16], refers to the number

of nonobjects incorrectly reported as objects by a detection

system as a percentage of the total number of actual objects

in the image(s). Note that the DR is between 0 and 100%,

while the FAR may be greater than 100% for dicult object

detection problems.

The main goal of object detection is to obtain a high DR

and a low FAR. There is, however, a trade-o between them

for a detection system. Trying to improve the DR often results

in an increase in the FAR, and vice versa. Detecting objects in

images with very cluttered backgrounds is an extremely difficult problem where FARs of 2002000% (i.e., the detection

system suggests that there are 20 times as many objects as

there really are) are common [5, 16].

Most research which has been done in this area so far only

presents the results of the classification stage (only the final

stage in Figure 1) and assumes that all other stages have been

properly done. However, the results presented in this paper

are the performance for the whole detection problem (both

the localisation and the classification).

844

2.3. Related workGP for object detection

Since the early 1990s, there has been only a small amount

of work on applying GP techniques to object classification,

object detection, and other vision problems. This, in part,

reflects the fact that GP is a relatively young discipline compared with, say, NNs.

2.3.1 Object classification

Tackett [9, 22] uses GP to assign detected image features to a

target or nontarget category. Seven primitive image features

and twenty statistical features are extracted and used as the

terminal set. The 4 standard arithmetic operators and a logic

function are used as the function set. The fitness function is

based on the classification result. The approach was tested

on US Army NVEOD Terrain Board imagery, where vehicles,

such as tanks, need to be classified. The GP method outperformed both an NN classifier and a binary tree classifier on

the same data, producing lower rates of false positives for the

same DRs.

Andre [23] uses GP to evolve functions that traverse an

image, calling upon coevolved detectors in the form of hitmiss matrices to guide the search. These hit-miss matrices

are evolved with a two-dimensional genetic algorithm. These

evolved functions are used to discriminate between two letters or to recognise single digits.

Koza in [24, Chapter 15] uses a turtle to walk over a

bitmap landscape. This bitmap is to be classified either as a

letter L, a letter I, or neither of them. The turtle has access to the values of the pixels in the bitmap by moving over

them and calling a detector primitive. The turtle uses a decision tree process, in conjunction with negative primitives, to

walk over the bitmap and decide which category a particular

landscape falls into. Using automatically defined functions as

local detectors and a constrained syntactic structure, some

perfect scoring classification programs were found. Further

experiments showed that detectors can be made for dierent

sizes and positions of letters, although each detector has to

be specialised to a given combination of these factors.

Teller and Veloso [11] use a GP method based on the

PADO language to perform face recognition tasks on a

database of face images in which the evolved programs have

a local indexed memory. The approach was tested on a

discrimination task between 5 classes of images [25] and

achieved up to 60% correct classification for images without

noise.

Robinson and McIlroy [26] apply GP techniques to the

problem of eye location in grey-level face images. The input data from the images is restricted to a 3000-pixel block

around the location of the eyes in the face image. This approach produced promising results over a very small training set, up to 100% true positive detection with no false positives, on a three-image training set. Over larger sets, the GP

approach performed less well however, and could not match

the performance of NN techniques.

Winkeler and Manjunath [10] produce genetic programs

to locate faces in images. Face samples are cut out and

scaled, then preprocessed for feature extraction. The statis-

tics gleaned from these segments are used as terminals in GP

which evolves an expression returning how likely a pixel is

to be part of a face image. Separate experiments process the

grey-scale image directly, using low-level image processing

primitives and scale-space filters.

2.3.2

Object detection

All of the reported GP-based object detection approaches belong to the one-class object detection category. In these detection problems, there is only one object class of interest in the

large images.

Howard et al. [19] present a GP approach to automatic

detection of ships in low-resolution synthetic aperture radar

imagery. A number of random integer/real constants and

pixel statistics are used as terminals. The 4 arithmetic operators and min and max operators constitute the function

set. The fitness is based on the number of the true positive

and false positive objects detected by the evolved program.

A two-stage evolution strategy was used in this approach. In

the first stage, GP evolved a detector that could correctly distinguish the target (ship) pixels from the nontarget (ocean)

pixels. The best detector was then applied to the entire image and produced a number of false alarms. In the second

stage, a brand new run of GP was tasked to discriminate between the clear targets and the false alarms as identified in the

first stage and another detector was generated. This two-stage

process resulted in two detectors that were then fused using

the min function. These two detectors return a real number,

which if greater than zero denotes a ship pixel, and if zero or

less denotes an ocean pixel. The approach was tested on images chosen from commercial SAR imagery, a set of 50 m and

100 m resolution images of the English Channel taken by the

European Remote Sensing satellite. One of the 100 m resolution images was used for training, two for validation, and two

for testing. The training was quite successful with perfect DR

and no false alarms, while there was only one false positive

in each of the two test images and the two validation images

which contained 22, 22, 48, and 41 true objects.

Isaka [27] uses GP to locate mouth corners in small

(50 40) images taken from images of faces. Processing each

pixel independently using an approach based on relative intensities of surrounding pixels, the GP approach was shown

to perform comparably to a template matching approach on

the same data.

A list of object detection related work based on GP is

shown in Table 1.

3.

3.1.

The GP system

multiple-class object detection problems. Figure 2 shows an

overview of this approach, which has a learning process and

a testing procedure. In the learning/evolutionary process, the

evolved genetic programs use a square input field which is

large enough to contain each of the objects of interest. The

programs are applied in a moving window fashion to the

845

Problems

Applications

Authors

Tank detection

(classification)

Tackett

1993

[9]

Tackett

1994

[22]

Andre

1994

[23]

Koza

1994

[24]

1995

[11]

Letter recognition

Object classification

Face recognition

Small target classification

Object detection

Year

Source

1998

[28]

1997

[10]

Shape recognition

1995

[25]

Eye recognition

1995

[26]

Ship detection

Howard et al.

1999

[19]

Mouth detection

Isaka

1997

[27]

Benson

2000

[29]

Vehicle detection

Howard et al.

2002

[30]

Edge detection

Lucier et al.

1998

[31]

Other vision problems

Image analysis

Koza

1992

[32]

Koza

1993

[33]

Howard et al.

2001

[34]

Poli

1996

[35]

Model interpretation

Lindblad et al.

2002

[36]

Stereoscopic vision

Graae et al.

2000

[37]

Image compression

1996

[38]

entire images in the training set to detect the objects of interest. In the test procedure, the best evolved genetic program

obtained in the learning process is then applied to the entire images in the test set to measure object detection performance.

The learning/evolutionary process in our GP approach is

summarised as follows.

(1) Initialise the population.

(2) Repeat until a termination criterion is satisfied.

(2.1) Evaluate the individual programs in the current

population. Assign a fitness to each program.

(2.2) Until the new population is fully created, repeat

the following:

(i) select programs in the current generation;

(ii) perform genetic operators on the selected

programs;

(iii) insert the result of the genetic operations

into the new generation.

(3) Present the best individual in the population as the

outputthe learned/evolved genetic program.

In this system, we used a tree-like program structure

to represent genetic programs. The ramped half-and-half

method was used for generating the programs in the initial

population and for the mutation operator. The proportional

selection mechanism and the reproduction, crossover, and

mutation operators were used in the learning process.

In the remainder of this section, we address the other aspects of the learning/evolutionary system: (1) determination

of the terminal set, (2) determination of the function set, (3)

development of a classification strategy, (4) construction of

the fitness measure, and (5) selection of the input parameters and determination of the termination strategy.

3.2.

For object detection problems, terminals generally correspond to image features. In our approach, we designed three

dierent terminal sets: local rectilinear features, circular features, and pixel features. In all these cases, the features are

statistical properties of regions of the image, and we refer to

them as pixel statistics.

3.2.1

in Table 2, are extracted from the input field as shown in

Figure 3. The input field must be suciently large to contain

the biggest object and some background, yet small enough to

include only a single object. In this way, the evolved program,

as a detector, could automate the human eye system of

identifying pixels/object centres which stand out from their

local surroundings.

In Figure 3, the grey-filled circle denotes an object of interest and the square A1 B1 C1 D1 represents the input field.

846

Entire images

(detection training set)

GP learning/evolutionary process

Entire images

(detection test set)

General programs

Detection results

detection.

are included. This assists the finding of object centres

in the sweeping procedureif the evolved program is

considered as a moving window template, the match

between the template and the subimage forming the

input field will be better when the moving template is

close to the centre of an object.

(iii) They are domain-independent and easy to extract.

These features belong to the pixel level and can be part

of a domain-independent preexisting feature library of

terminals from which the GP evolutionary process is

expected to automatically learn and select only those

relevant to a particular domain. This is quite dierent

from the traditional image processing and computer

vision approaches where the problem-specific features

are often needed.

(iv) The number of these features is fixed. In this approach,

the number of features is always twenty no matter what

size the input field is. This is particularly useful for the

generalisation of the system implementation.

3.2.2

Pixel statistics

Mean

SD

F1

F3

F5

F7

F9

F11

F13

F15

F17

F19

F2

F4

F6

F8

F10

F12

F14

F16

F18

F20

big square A1 B1 C1 D1

small central square A2 B2 C2 D2

upper left square A1 E1 OG1

upper right square E1 B1 H1 O

lower left square G1 OF1 D1

lower right square OH1 C1 F1

central row of the big square G1 H1

central column of the big square E1 F1

central row of the small square G2 H2

central column of the small square E2 F2

pixel statistics will be computed. The 4 central lines (rows

and columns) are also used for a similar purpose.1 The mean

and standard deviation of the pixels comprising each of these

regions are used as two separate features. There are 6 regions

giving 12 features, F1 to F12 . We also use pixels along the main

axes (4 lines) of the input field, giving features F13 to F20 .

In addition to these pixel statistics, we use a terminal

which generates a random constant in the range [0, 255].

This corresponds to the range of pixel intensities in grey-level

images.

These pixel statistics have the following characteristics.

(i) They are symmetrical.

1 These lines can be considered special local regions. If the input field size

n is an even number, each of these lines is a rectangle consisting of two

rows or two columns of pixels.

features, as shown in Figure 4. The features were computed

based on a series of concentric circles centred in the input

field. This terminal set focused on boundaries rather than regions. The gap between the radii of two neighbouring circles

is one pixel. For instance, if the input field is 19 19 pixels, then the number of central circles will be 19/2 + 1 = 10

(the central pixel is considered as a circle with a zero radius);

accordingly, there would be 20 features. Compared with the

rectilinear terminal set, the number of these circular features in this terminal set depends on the size of the input

field.

3.2.3

pixels as terminals in GP. To decrease the computation cost,

we considered a 2 2 square, or 4 pixels, as a single pixel.

The average value of the 4 pixels in the square was used as

the value of this pixel, as shown in Figure 5.

3.3.

arithmetic operations only, and a combination of arithmetic

and transcendental functions.

3.3.1

Function set I

were used to form the nonterminal nodes:

FuncSet1 = {+, , , / }.

(1)

addition, subtraction, and multiplication, while / represents

protected division which is the usual division operator

A1

E1

847

B1

Squares:

n/2

A2

E2

B2

A1 B1 C1 D1 , A2 B2 C2 D2 ,

A1 E1 OG1 , E1 B1 H1 O,

G1

H2

G2

H1

G1 OF1 D1 , OH1 C1 F1

n/2

D2

F2

D1

F1

n/2

G1 H1 , E1 F1 , G2 H2 , E2 F2

C2

G2 H2 = A2 B2 = E2 F2 = B2 C2 :

C1

n/2

n

Figure 3: The input field and the image regions and lines for feature selection in constructing terminals.

Features

Mean SD

F1

F2

F4

F3

F5

F6

..

..

.

.

F(2i+1) F(2i+2)

..

..

.

.

F(2n+1) F(2n+2)

C1 C2 Ci Cn

Local boundaries

Central pixel

Circular boundary C1

Circular boundary C2

..

.

Circular boundary Ci

..

.

Circular boundary Cn

Figure 4: The input field and the image boundaries for feature extraction in constructing terminals.

a number of rectilinear terminals is shown in Figure 6. The

LISP form of this program is shown in Figure 7.

This program performed particularly well for the coin

images.

3.3.2

these functions takes two arguments. This function set was

designed to investigate whether the 4 standard arithmetic

functions are sucient for the multiple-class object detection problems.

Function set II

that convergence might be quicker if the function values were

close to the range (1, 1) and more functions might lead to

better results if the 4 arithmetic functions were not sucient.

We introduced some transcendental functions, that is, the

absolute function dabs, the trigonometric sine function sin,

the logarithmetic function log, and the exponent (to base e)

function exp, to form the second function set:

FuncSet2 = {+, , , /, dabs, sin, log, exp}.

3.4.

(2)

The output of a genetic program in a standard GP system is a floating point number. Genetic programs can be

848

F11

F16

F

F14 F20

+ F5 +

+ F12 F14 (F9 F11 F1 F10 F9 F17 ) 5

F14

F11

F18

F6

(133.082 F17 )

F17 + (F11 + F12 ) F20 + F2 + 145.765

F11

+ (F6 F5 F3 F6 )

F12

F18

F11

F14 F20

F14 ))) (- (* (- (* (* (* F9 F11 ) F1 ) F10 ) (* F9 F17 )) (/ F5 F18 )) ((+ (+ F17 (* (+ F11 F12 ) F20 )) (* (- (+ F2 145.765) (/ F6 F11 )) (133.082 F17 ))) (/ F11 (* F14 F20 ))))) (* (- (* (- (- F6 F5 ) (* F3

F6 )) (/ (+ (+ F1 145.765) (* F16 F10 )) F18 )) F12 ) (+ (+ F17 (* (+ F17

F12 ) F20 )) (* (+ F14 F12 ) (- (+ F1 F12 ) F17 )))))

Figure 7: LISP format of the generated program in Figure 6.

used to perform one-class object detection tasks by utilising the division between negative and nonnegative numbers of a genetic program output. For example, negative

numbers can correspond to the background and nonnegative numbers to the objects in the (single) class of interest. This is similar to binary classification problems in standard GP where the division between negative and nonnegative numbers acts as a natural boundary for a distinction

between the two classes. Thus, genetic programs generated

by the standard GP evolutionary process primarily have the

ability to represent and process binary classification or oneclass object detection tasks. However, for the multiple-class

object detection problems described here, where more than

two classes of objects of interest are involved, the standard

GP classification strategy mentioned above cannot be applied.

In this approach, we develop a dierent strategy which

uses a program classification map, as shown in Figure 8, for

the multiple-class object detection problems. Based on the

output of an evolved genetic program, this map can identify

which class of the object located in the current input field belongs to. In this map, m refers to the number of object classes

of interest, v is the output value of the evolved program, and

T is a constant defined by the user, which plays a role of a

threshold.

3.5. The fitness function

Since the goal of object detection is to achieve both a high DR

and a low FAR, we should consider a multiobjective fitness

function in our GP system for multiple-class object detection

problems. In this approach, the fitness function is based on

training set during the learning process. Figure 9 shows the

object detection procedure and how the fitness of an evolved

genetic program is obtained.

The fitness of a genetic program is obtained as follows.

(1) Apply the program as a moving n n window template

(n is the size of the input field) to each of the training

images and obtain the output value of the program at

each possible window position. Label each window position with the detected object according to the object classification strategy described in Figure 8. Call

this data structure a detection map. An object in a detection map is associated with a floating point program output.

(2) Find the centres of objects of interest only. This is done

as follows. Scan the detection map for an object of interest. When one is found, mark this point as the centre

of the object and continue the scan n/2 pixels later in

both horizontal and vertical directions.

(3) Match these detected objects with the known locations

of each of the desired true objects and their classes. A

match is considered to occur if the detected object is

within tolerance pixels of its known true location. A

tolerance of 2 means that an object whose true location is (40, 40) would be counted as correctly located

at (42, 38) but not at (43, 38). The tolerance is a constant parameter defined by the user.

(4) Calculate the DR and the FAR of the evolved program.

(5) Compute the fitness of the program as follows:

fitness(FAR, DR) = W f FAR + Wd (1 DR),

(3)

849

v

background,

class 1,

class

2,

Class = ..

class i,

..

class m,

v < 0,

0 v T,

T v 2T,

..

.

(i 1) T v i T,

..

.

v i T,

(m 1) T

Class m

.

.

.

iT

Class i

.

.

.

T

Class 1

0

Background

Sweep programs

on training images

and fitness parameters.

3.6.1

Match objects

Compute fitness

the relative importance of FAR versus DR.2

With this design, the smaller the fitness, the better the

performance. Zero fitness is the ideal case, which corresponds to the situation in which all of the objects of interest in each class are correctly found by the evolved program

without any false alarms.

3.6. Main parameters

Once a GP system has been created, one must choose a set

of parameters for a run. Based on the roles they play in the

learning/evolutionary process, we group these parameters

2 Theoretically, W and W could be replaced by a single parameter since

f

d

they have only one degree of freedom. However, the two cases of using a single and double parameters have dierent eects for stopping the evolutionary process. For convenience, we use two parameters.

Search parameters

The search parameters used here include the number of individuals in the population (population-size), the maximum

depth of the randomly generated programs in the initial population (initial-max-depth), the maximum depth permitted

for programs resulting from crossover and mutation operations (max-depth), and the maximum generations the evolutionary process can run (max-generations). These parameters control the search space and when to stop the learning

process. In theory, the larger these parameters, the more the

chance of success. In practice, however, it is impossible to set

them very large due to the limitations of the hardware and

high cost of computation.

There is another search parameter, the size of the input

field (input-size), which decides the size of the moving window in which a genetic program is computed in the program

sweeping procedure.

3.6.2

Genetic parameters

The genetic parameters decide the number of genetic programs used/produced by dierent genetic operators in the

mating pool to produce new programs in the next generation. These parameters include the percentage of the best

individuals in the current population that are copied unchanged to the next generation (reproduction-rate), the percentage of individuals in the next generation that are to be

produced by crossover (cross-rate), the percentage of individuals in the next generation that are to be produced by mutation (mutation-rate = 100% reproduction-rate cross-rate),

the probability that, in a crossover operation, two terminals will be swapped (cross-term), and the probability that,

in a crossover operation, random subtrees will be swapped

(cross-func = 100% cross-term).

3.6.3

Fitness parameters

in the object classification algorithm, a tolerance parameter

850

Table 3: Parameters used for GP training for the three databases.

Parameter kinds

Parameter names

Easy images

Coin images

Retina images

Search parameters

Population-size

Initial-max-depth

Max-depth

Max-generations

Input-size

100

4

8

100

14 14

500

5

12

150

24 24

700

6

20

150

16 16

Genetic parameters

Reproduction-rate

Cross-rate

Mutation-rate

Cross-term

Cross-func

10%

65%

25%

15%

85%

1%

74%

25%

15%

85%

2%

73%

25%

15%

85%

Fitness parameters

T

Wf

Wd

Tolerance (pixels)

100

50

1000

2

100

50

1000

2

100

50

3000

2

parameters (W f and Wd ) reflecting the relative importance

of the DR and the FAR in obtaining the fitness of a genetic

program.

program is zero.

(ii) The number of generations reaches the predefined

number, max-generations. Max-generations was determined empirically in a number of preliminary runs as

a point before overtraining generally occurred. While

it would have been possible to use a validation set to

determine when to stop training, we have not done

this. Comparison of training and test DRs and FARs

indicated that overfitting was not significant.

Good selection of these parameters is crucial to success. The

parameter values can be very dierent for various object detection tasks. However, there does not seem to be a reliable

way of a priori deciding these parameter values. To obtain

good results, these parameter values were carefully chosen

through an empirical search in experiments. Values used are

shown in Table 3.

For detecting circles and squares in the easy images, for

example, we set the population size to 100. On each iteration, 10 programs are created by reproduction, 65 programs

by crossover, and 25 by mutation. Of the 65 crossover programs, 10 (15%) are generated by swapping terminals and

55 (85%) by swapping subtrees. The programs are randomly

initialised with a maximum depth of 4 at the beginning and

the depth can be increased to 8 during the evolutionary process. We also use 100, 50, 1000, and 2 as the constant parameters T, W f , Wd , and tolerance, which are used for the

program classification and the calculation of the fitness function. The maximum generation permitted for the evolutionary process is 100 for this detection problem. The size of the

input field is the same as that used in the NN approach [12],

that is, 14 14.

3.7. Termination criteria

In this approach, the learning/evolutionary process is terminated when one of the following conditions is met.

(i) The detection problem has been solved on the training

set, that is, all objects in each class of interest in the

training set have been correctly detected with no false

4.

We used three dierent databases in the experiments. Example images and key characteristics are given in Figure 10. The

databases were selected to provide detection problems of increasing diculty. Database 1 (easy) was generated to give

well-defined objects against a uniform background. The pixels of the objects were generated using a Gaussian generator with dierent means and variances for each class. There

are three classes of small objects of interest in this database:

black circles (class1), grey squares (class2), and white circles

(class3). The Australian coin images (database 2) were intended to be somewhat harder and were taken with a CCD

camera over a number of days with relatively similar illumination. In these images, the background varies slightly in different areas of the image and between images, and the objects

to be detected are more complex, but still regular. There are

4 object classes of interest: the head side of 5-cent coins (class

head005), the head side of 20-cent coins (class head020), the

tail side of 5-cent coins (class tail005), and the tail side of 20cent coins (class tail020). All the objects in each class have

a similar size. They are located at arbitrary positions and

with some rotations. The retina images (database 3) were

taken by a professional photographer with special apparatus at a clinic and contain very irregular objects on a very

851

Number of images: 10

Object classes: 3

Image size 700 700

Number of images: 20

Object classes: 4

Image size 640 680

Number of images: 15

Object classes: 2

Image size 1024 1024

Experiments

Terminal sets

Function sets

TermSet1 (rectilinear)

FuncSet1

TermSet2 (circular)

FuncSet1

II

TermSet3 (pixels)

FuncSet1

III

TermSet1 (rectilinear)

FuncSet2

retinal pathologieshaemorrhages and microaneurisms. To

give a clear view of representative samples of the target objects in the retina images, one sample piece of these images is

presented in Figure 11. In this figure, haemorrhage and microaneurism examples are labeled using white surrounding

squares.

5.

EXPERIMENTAL RESULTS

Table 4. The first group of experiments is based on the first

two terminal sets (rectilinear features and circular features)

and the first function set (the 4 standard arithmetic functions). The second group of experiments uses the third terminal set consisting of raw pixel and the first function set.

The third group of experiments uses the first terminal set

consisting of rectilinear features and the second function set

consisting of additional transcendental functions.

In these experiments, 4 out of 10 images in the easy image database are used for training and 6 for testing. For the

coin images, 10 out of 20 are used for training and 10 for

testing. For the retina images, 10 are used for training and

5 for testing. The total number of objects is 300 for the easy

image database, 400 for the Australian coin images, and 328

for the retina images. The results presented in this section

were achieved by applying the evolved genetic programs to

the images in the test sets.

5.1.

Experiment I

This group constitutes the major part of the investigation. The main goal here is to investigate whether this

GP approach can be applied to multiple-class object detection problems of increasing diculty. The parameters used

in these experiments are shown in Table 3 (Section 3.6.4).

The average performance of the best 10 genetic programs

(evolved from 10 runs) for the easy and the coin databases,

and the average performance of the best 5 genetic programs

(out of 5 runs, due to the high computational cost) for the

retina images are presented.

The results are compared with those obtained using an

NN approach for object detection on the same databases

852

[12, 39]. The NN method used was the same as the GP

method shown in Section 1.1, except that the evolutionary

process was replaced by a network training process in step

(3) and the generated genetic program was replaced by a

trained network. In this group of experiments, the networks

also used the same set of pixel statistics as TermSet1 (rectilinear) as inputs. Considerable eort was expended in determining the best network architectures and training parameters. The results presented here are the best results achieved

by the NNs and we believe that the comparison with the GP

approach is a fair one.

5.1.1 Easy images

Table 5 shows the best results of the GP approach with the

two dierent terminal sets (GP1 with TermSet1, GP2 with

TermSet2) and the NN method for the easy images. For class1

(black circles) and class3 (grey circles), all the three methods

achieved a 100% DR with no false alarms. For class2 (grey

squares), the two GP methods also achieved 100% DR with

zero false alarms. However, the NN method had an FAR of

91.2% at a DR of 100%.

5.1.2 Coin images

Experiments with coin images gave similar results to the easy

images. These are shown in Table 6. Detecting the heads and

tails of 5 cents (class head005, tail005) appears to be relatively

straight forward. All the three methods achieved a 100% DR

without any false alarms. Detecting heads and tails of 20cent coins (class head020, tail020) is more dicult. While the

NN method resulted in many false alarms, the two GP methods had much better results. In particular, the GP1 method

achieved the ideal results, that is, all the objects of interest

were correctly detected without any false alarms for all the 4

object classes.

5.1.3 Retina images

The results for the retina images are summarised in Table 7.

Compared with the results for the other image databases,

these results are not satisfactory.3 However, the FAR is greatly

improved over the NN method.

The results over the three databases show similar patterns: the GP-based method always gave a lower FAR than

the NN approach for the same detection rate. While GP2 also

gave the ideal results for the easy images, it produced a higher

FAR on both the coin and the retina images than the GP1

method. This suggests that the local rectilinear features are

more eective for these detection problems than the circular

features.

5.1.4 Training times

We performed these experiments on a 4-processor ULTRASPARC4. The training times for the three databases are very

3 With the current techniques applied in this area, detecting objects in

images with a highly cluttered background is an extremely dicult problem

[5, 16]. In fact, these results are quite competitive to other methods for very

dicult detection problems. As a young discipline, it is quite promising for

GP to achieve such results.

Table 5: Comparison of the object detection results for the easy

images: the GP approaches versus the NN approach. (Input field

size = 14 14; repetitions = 10.)

Easy images

class1

False alarm rate (%)

NN

GP1

GP2

Object classes

class2

class3

100

100

100

0

0

0

91.2

0

0

0

0

0

images. The GP approaches versus the NN approach. (Input field

size = 24 24, repetitions = 10.)

Coin images

Object classes

head005 tail005 head020 tail020

100

100

100

100

NN

False alarm rate (%) GP1

GP2

0

0

0

0

0

0

182

0

38.4

37.5

0

26.7

images. The GP approaches versus the NN approach. (Input field

size = 16 16, repetitions = 5.)

Object classes

Haem Micro

Retina images

Best detection rate (%)

73.91

100

NN

GP1

GP2

2859

1357

1857

10104

588

732

(%)

dierent due to various degrees of diculty of the detection problems. The average training times used in the GP

evolutionary process (GP1) for the easy, the coin, and the

retina images are 2 minutes, 36 hours, and 93 hours, respectively.4 This is much longer than the NN method, which took

2 minutes, 35 minutes, and 2 hours on average. However,

the GP method gave much better detection results on all the

three databases. This suggests that the GP method is particularly applicable to tasks where accuracy is the most important factor and training time is seen as relatively unimportant.

4 Even if the training time for dicult problems is very long, the time

spent on applying the learned genetic program to the test set is usually very

short, say, from several seconds to about one minute.

853

Easy images

Coin images

Retina images

Best detection rate (%)

False alarm rate (%)

100

100

100

100

100

100

100

73.91

100

1214

463

5.2. Experiment II

Instead of using rectilinear and circular features (pixel statistics) as in experiment I, experiment II directly uses the pixel

values as terminals (the third terminal set). For the input

field sizes of 14 14, 24 24, and 16 16, for the easy, the

coin, and the retina images, the number of terminals are 49

(7 7), 144 (12 12), and 64 (8 8), respectively. For the easy

images, the learning took about 70 hours on a 4-processor

ULTRA-SPARC4 machine to reach perfect detection performance on the training set and 78 generations were taken. The

population size used was 1000, the maximum depth of the

program was 30, the maximum initial depth 10, the maximum number of generations 100. For the coin images and

the retina images, the situation was worse. Since a large number of terminals were used, the maximum depth of the program trees was increased to 50 for the coin images and 60

for the retina images. The population size for both databases

used was 3000 with a maximum number of generations of

100. The evolutionary process took three weeks to complete

50 generations for the coin images and five weeks to complete

50 generations for the retina images. The best detection results were overall 22% FAR at a 100% DR for the coin images,

and about 850% FAR at a DR of 100% for microaneurisms

in the retina images.

While these results are worse than those obtained by the

GP1 and GP2 using the rectilinear and circular features, they

are still better than the NN approach. If we use a larger population (e.g., 10000 or 50000), a larger program size (e.g., 100),

and a larger number of generations (e.g., 300), the results

could be better according to our experience. While this is not

possible to investigate with the current hardware we use, it

shows a promising future direction with the improvement

and development of more powerful hardware, for example,

parallel or genetic hardware.

5.3. Experiment III

Instead of using the four standard arithmetic functions,

this experiment focused on using the extended function

set (FuncSet2), as shown in Section 3.3.2. The parameters

shown in Table 3 (Section 3.6.4) were used in this experiment. The best detection results for the three databases are

shown in Table 8.

As can be seen from Table 8, this function set also gave

ideal results for the easy and the coin images and a better

result for the retina images. The best DR for detecting micro

is 100% with a corresponding FAR of 463%. The best DR

for haem is still 73.91% but the FAR is reduced to 1214%. In

and retina images. This suggests that dabs, sin, log, and exp

are particularly useful for more dicult problems.

6.

6.1.

DISCUSSION

Analysis of results on the retina images

images and the coin images, but resulted in some false alarms

on the retina images, particularly for the detection of objects

in class haem in which the FAR was very high and more than

a quarter of the real objects in this class were not detected by

the evolved genetic program.

We identified two possible reasons for the results on the

retina images being worse than the results on the easy and the

coin images. The first reason concerns the complexity of the

background. In the easy and coin images, the background is

relatively uniform, whereas in the retina images it is highly

cluttered. In particular, the background of the retina images

contains many objects, such as veins and other anatomical

features, that are not members of the two classes of interest (microaneurisms and haemorrhages). These objects of

noninterest must be classified as background, in just the

same way as the genuine background. The more complex the

boundary between classes in the input space, the more complex an evolved program has to be to distinguish the classes.

It may be that the more complex background class in the

retina images requires a more complex evolved program than

the GP system was able to discover. It may even be that the

set of terminals and functions is not adequate/sucient to

represent an evolved program to distinguish the objects of

interest from such a rich background.

The second possible reason concerns the variation in size

of the objects. In the easy and coin images, all of the objects in a class have similar sizes, whereas in the retina images, the sizes of the objects in each class vary. This variation

means that the evolved genetic program must cover a more

complicated region of the input space. The sizes of the micro objects vary from 3 3 to 5 5 pixels and the sizes of

the haem objects vary from 6 6 to 14 14 pixels. Given

the size of the input field (16 16) and the choice of terminals, the variance in the size of the haem objects is particularly problematic since it ranges from just one quarter of

the input field (hence entirely inside the central detection region) to almost the entire input field. The fact that the performance on the haem class is worse than the performance

on the micro class (especially in experiment III) provides

854

Program 1

Program 2

Program 3

F3 F14 F15

F5

F3

F19 F7 F10 F17 F16 F18 +

F14 +

F6 F14

F5

F5

F5

F19

F3 F6

F

+ 6

F5

F5 F11

F15

F15

F18 (F5 + F18 ) (F7 + F4 ) + F10 F19

F18

F4 F16

F3 + F5 F3 F5

F4 F16

(F16 + F7 ) F15 F4

F9

F13 F5

F

9 F4

F19

+ F11

F18

F10

Figure 12: Three sample generated programs for simple object detection in the easy images.

F16 ) F18 )) (+ (* (/ F5 F5 ) F14 ) (/ F3 F5 ))) (+ (* (/ F3 F5 ) (/ (/ F5 F6 ) (/

F11 F15 ))) (/ (/ F19 F6 ) (/ F5 F15 ))))

Figure 13: LISP format of Program 1.

poor performance.

The first reason suggests that the current approach is limited on images containing cluttered backgrounds. One possible modification to address this limitation is to evolve multiple programs rather than a single program, either having

a separate program for each class of interest, or having several programs to exclude dierent parts of the background.

Another possible modification is to extend the terminal set

and/or function set to enrich the expressive power of the

evolved programs.

The second reason suggests that the current approach has

limited applicability to scale invariant detection problems.

This would not be surprising, given the current set of terminals and functions. In particular, although the pixel statistics

used in the rectilinear and circular terminal sets are robust

to small variations in scale, they are not robust to large variations. We will explore alternative pixel statistics that are more

robust to scale variations, and also function sets that would

allow disjunctive programs that could better represent classes

that contained objects of several dierent size ranges.

the evolutionary process. The LISP format of the first program is, for example, shown in Figure 13. Note that we did

not simplify themsimplification of evolved genetic programs is beyond the goal of this paper.) All of these programs

achieved the ideal results: all of the circles and squares were

correctly detected with no false alarms.

There are several things we can note about these programs. Firstly, the programs are not trivial, and are decidedly nonlinear. It is hard to interpret these programs even for

the easy images. Secondly, the programs use many, but not

all, of the terminals, but do not use any constants. There are

no groups of the terminals that are unusedboth the means

and standard deviations of both the square regions and the

lines are used in the programs, so it does not appear that any

of the terminals could be safely removed. Thirdly, although

the programs are not in their simplest form (e.g., the factor

F5 /F5 could be removed from the first program), there is not

a large amount of redundancy, so that the GP search is finding reasonably ecient programs.

6.2.2

This section gives a brief analysis of the best generated programs for the three databases. The genetic programs evolved

by the GP1 in experiment I are used as examples.

6.2.1 Easy images

Figure 12 shows three good sample evolved programs for the

easy images. (These programs were the direct mathematical

Coin images

In addition to the program shown in Figure 6, we present another generated program in Figure 14, which also performed

perfectly for the coin images.

Compared with those for the easy images, these programs

are more complex, which reflects the greater diculty of the

detection problem in the coin images. One dierence is that

these programs also contain constants. The set of possible

programs is considerably expanded by allowing constants as

well as the terminals, but the search for good values for the

855

F F F9

+

F2

F19

87.251

F1

F5

F9

F17 F2 + F12 F12 F11

F11

F15 + F8

F16

F16

F9

F

F15 15

F8

F13 F15 87.251

+

+

+F10 F12 F9

F19

F17 F2

F1

F5

Figure 14: A sample generated program for regular object detection in the coin images.

constants is dicult. Our current GP is biased so that constants are only introduced rarely, but it is clear that the detection problem on the coin images is suciently dicult to

require some of these constants.

6.2.3 Retina images

One evolved genetic program for the retina images is presented in Figure 15. (The program is presented in LISP format rather than standard format because of its complexity.)

This program is much more complex than any of the programs for the easy and the coin images. The program uses

all 20 terminals and 8 constants. It does not seem possible

to make any meaningful interpretation of this program. It

may be that with high-level, domain-specific features and

domain-specific functions, it would be possible for the GP

system to construct simpler and more interpretable programs; however, this would be against one of the goals of

this paper which is to investigate domain-independent approaches.

Even the best programs for the retina images gave quite a

high number of false alarms, and it appears that the 20 terminals and 4 standard arithmetic functions are not sucient

for constructing programs for such dicult detection problems. Nonetheless, the program above still had much better

performance than an NN with the same input features.

6.3. Analysis of classification strategy

As described in Figure 8, we used a program classification

map as the classification strategy. In this map, a constant

T was used to give fixed-size ranges for determining the

classes of those objects from the output of the program. The

parameter can be regarded as a threshold or a class boundary

parameter. Using just a single value for T forces most of the

classes to have an equal possible range in the program output, which might lead to a relatively long time of evolution.

A natural question to raise is whether we can replace the single parameter T with a set of parameters, say, T1 , T2 , . . . , Tm ,

one for each class of interest.

To answer this question, we ran a set of experiments

on the easy images with three parameters, T1 , T2 , and, T3 ,

for the thresholds in the program classification map. The

experiments showed that some sets of values of the parameters resulted in an ideal performance but other sets of values

did not. Also, the learning/evolutionary process converged

very fast with some sets of values but very slowly with others. However, the results of the experiments gave no guidelines for selecting a good set of values for these parameters.

In some cases, using separate parameters for each threshold

may lead to a better performance than using a single parameter, but appropriate values for the parameters need to be

empirically determined. In practice, this is dicult because

there is no a priori knowledge in most cases for setting these

parameters.

We also tried an alternative classification strategy, which

we called multiple binary map, to classify multiple classes of

objects. In this method, we convert a multiple-class classification problem to a set of binary classification problems. Given

a problem L with m classes L = {c1 , c2 , . . . , cm }, the problem is decomposed into L1 = {c1 , other}, L2 = {c2 , other}, . . .,

Lm = {cm , other}, where ci denotes the ith class of interest and

other refers to the class of nonobjects of interest. In this way, a

multiple-class object detection problem is decomposed into

a set of one-class object detection tasks, and GP is applied to

each of the subsets to obtain the detection result for a particular class of interest. We tested this method on the detection

problems in the three image databases and the results were

similar to those of the original experiments.

One disadvantage of this method is that several genetic

programs have to be evolved. On the other hand, the genetic programs may be simpler, which may reduce the training time for each program. In fact, for the coin images problem, a considerably shorter total training time was required

to create a set of one-class programs than to create a single

multiple-class program. A more detailed discussion of this

method is outside the goal of this paper, and is left to future

work.

6.4.

should not be used in GP [32], while some others insist

that a high mutation rate would help the GP evolution converge [40, 41]. To investigate the eects of mutation in GP

for multiclass object detection problems, we carried out ten

856

(- (- F18 F17 ) (- F19 87.05))))

(+ 17.0792 (+ F9 F14 )))

(/ (+ F19 (* (+ (+ F11

(- (* (- (- F15 F18 ) (+ 40.58 F16 ))

(- (* F13 (+ (/ 57.64 F16 ) F13 ))

(- F9 F6 )))

(/ (* F3 F1 ) F1 )))

(* (- (* (- (/ (+ (+ F18 (+ (/ (/ F14 F6 )

(+ F6 F1 ))

89.70))

(* F10 F12 )) F2 ) F9 )

(+ (+ F16 14.75) F9 )) F18 )

(/ (/ F13 F1 ) (* (+ F6 F12 ) F9 ))))

(+ F16 F8 )))

(+ (- (- (+ (/ F10 (* F9 F6 )) F13 ) F10 ) F18 )

(+ (* (- (+ F1 F2 ) (+ F17 F8 )) F5 )

(* (* F20 F16 ) F10 )))))

(* (+ (- (* (+ F11

(+ (* F14 F3 )

(/ F15 (/ (+ (* F2 14.5251)

(* (* (/ (* F18

(/ (* F2 F13 ) F15 ))

F1 )

(/ (/ F11 F13 ) (/ F7 F5 )))

(+ (+ F18 (* F2 F13 ))

(/ F8 F12 ))))

F17 )))) F11 ) F16 )

(* (- F1 (+ F3 F8 )) F5 ))

(/ (+ (- F7 F20 ) F18 ) F20 ))))

(* (* (* (* F2 F13 ) F2 )

(/ (* F4 (/ (* F2 F13 ) F15 )) (* F18 F12 )))

(* F14 F2 )))

(+ (+ (- (+ (- F19 F3 ) F2 ) F7 ) (- (+ F8 F17 ) F18 ))

(/ (+ F15 60.10)

(* (* F1 (/ (/ F12 (- (+ (/ (/ F12 F13 ) (/ F15 F5 )) F17 ) F18 ))

(/ F7 F5 ))) F8 ))))

(* (/ (* F10 (/ (* F2 F13 ) F15 )) F18 )

(* (* (* (* F2 F2 ) (/ (/ (/ F18 (+ F1 F2 )) F13 )

(/ (/ (- F15 96.16) (* F4 14.53)) F5 ))) F4 )

(/ (/ F12 F13 ) (/ F1 (+ (/ F10 F1 ) F4 ))))))

Figure 15: A sample generated program for very dicult detection problems in the retina images.

on the easy images, as shown in Figure 16. The reproduction rate was held constant at 10%, and the mutation rate

varied from 0% to 40%. The graph shows the distribution

of the number of generations to convergence by a box-andwhisker plot with the limits of the central box at the 30%

and 70% percentiles. With both 0% and 40% mutation, the

search sometimes did not converge within the limit of 250

generations. There was a clear eect of the mutation rate on

the number of generations to convergence. The best mutation rate was 25%, where only 48 generations on average were

required to find a good solution, with slower convergence at

coin and the retina images gave a similar trend. This suggests

that, in GP for multiple-class object detection problems described in this paper, mutation plays an important role for

keeping the diversity of the population, and that convergence

could be sped up when an appropriate mutation rate was

used. However, such a good mutation rate is generally task

dependent, and 15%30% is a good choice for similar tasks.

6.5.

Analysis of reproduction

In early GP, the reproduction rule did a probabilistic selection of genetic programs from the current population based

857

300

250

200

150

Best fitness

200

100

50

100

0

0%

10%

15%

20%

25%

30%

40%

0

into the new population. The better the fitness, the more

likely the individual program is to be selected [24, 42]. However, this mechanism does not guarantee that the best program will survive. An alternative reproduction rule is one

that removes the probabilistic element, and simply reproduces the best n genetic programs from the current population. We ran experiments on the easy images with both reproduction rules and plotted the best fitness in each generation (see Figure 17). The dotted curve shows the best fitness with the probabilistic reproduction rule. Over the 100

generations, there are 4 clear intervals (at generation 7, 22,

45, and 67) where the fitness got worse rather than better,

which delayed the convergence of learning. In contrast, the

deterministic reproduction rule had a steady improvement

in fitness. Furthermore, the deterministic reproduction rule

converged on an ideal program after just 71 generations,

while the probabilistic reproduction rule had still not converged on an ideal program after 100 generations. (In fact,

the fitness did not improve at all during the final 30 generations!) Clearly, the new reproduction rule greatly improved

the training speed and convergence.

7.

CONCLUSIONS

learning/adaptive approach for detecting small objects of

multiple classes in large images based on GP. This goal was

achieved by the use of GP with a set of domain-independent

pixel statistics as terminals, a number of standard operators

as functions, and a linear combination of the DR and FAR

as the fitness measure. A secondary goal was to compare the

performance of this method with an NN method. Here the

GP approach outperformed the NN approach in terms of detection accuracy.

The approach appears to be applicable to detection problems of varying diculty as long as the objects are approximately the same size and the background is not too cluttered.

The paper diers from most work in object detection

in two ways. Most work addresses the one-class problem,

that is, object versus nonobject, or object versus background.

This paper has shown a way of solving a multiple-class object detection problem without breaking it into a collection

20

40

60

Generations

80

100

New reproduction rule

Figure 17: Training easy images based on the old and the new reproduction rules.

of one-class problems. Also, most current research uses different algorithms in multiple independent stages to solve the

localisation problem and the classification problem; in contrast, this paper uses a single learned genetic program for

both object classification and object localisation.

The experiments showed that mutation does play an important role in the three multiple-class object detection tasks.

This is in contrast to Kozas early claim that GP does not need

mutation. For GP applied to multiple-class object detection

problems, the experiments suggest that a 15%30% mutation rate would be a good choice.

The experiments also identified some limitations of the

particular approach taken in the paper. The first limitation concerns the choice of input features and the function set. For the simple and medium-diculty object detection problems, the 20 regional/rectilinear features and 4

standard arithmetic functions performed very well; however,

they were not adequate for the most dicult object detection task. In particular, they were not adequate for detecting

classes of objects with a range of sizes. Further work will be

required to discover more eective domain-independent features and function sets, especially ones that provide some size

invariance.

A second limitation is the high training time required.

One aspect of this training time is the experimentation required to find good values of the various parameters for each

dierent problem. The GP method appears to be applicable

to multiple-class object detection tasks where accuracy is the

most important factor and training time is seen as relatively

unimportant, as is the case in most industrial applications.

Further experimentation may reveal more eective ways of

determining parameters which will reduce the training times.

Subject to these limitations, the paper has demonstrated that GP can be used eectively for the multiple-class

858

detection problem and provides more evidence that GP has

a great potential for application to a variety of dicult problems in the real world.

ACKNOWLEDGMENTS

We would like to thank Dr. James Thom at RMIT University

and Dr. Zhi-Qiang Liu at the University of Melbourne for a

number of useful discussions. Thanks also to Peter Wilson

whose basic GP package was used in this project and to Chris

Kamusinski who provided and labelled the retina images.

REFERENCES

[1] P. D. Gader, J. R. Miramonti, Y. Won, and P. Coeld, Segmentation free shared weight networks for automatic vehicle detection, Neural Networks, vol. 8, no. 9, pp. 14571473,

1995.

[2] A. M. Waxman, M. C. Seibert, A. Gove, et al., Neural processing of targets in visible, multispectral IR and SAR imagery,

Neural Networks, vol. 8, no. 7-8, pp. 10291051, 1995.

[3] Y. Won, P. D. Gader, and P. C. Coeld, Morphological

shared-weight networks with applications to automatic target recognition, IEEE Transactions on Neural Networks, vol.

8, no. 5, pp. 11951203, 1997.

[4] H. L. Roitblat, W. W. L. Au, P. E. Nachtigall, R. Shizumura,

and G. Moons, Sonar recognition of targets embedded in

sediment, Neural Networks, vol. 8, no. 7-8, pp. 12631273,

1995.

[5] M. W. Roth, Survey of neural network technology for automatic target recognition, IEEE Transactions on Neural Networks, vol. 1, no. 1, pp. 2843, 1990.

[6] D. P. Casasent and L. M. Neiberg, Classifier and shiftinvariant automatic target recognition neural networks, Neural Networks, vol. 8, no. 7-8, pp. 11171129, 1995.

[7] S. K. Rogers, J. M. Colombi, C. E. Martin, et al., Neural networks for automatic target recognition, Neural Networks, vol.

8, no. 7-8, pp. 11531184, 1995.

[8] J. R. Sherrah, R. E. Bogner, and A. Bouzerdoum, The evolutionary pre-processor: automatic feature extraction for supervised classification using genetic programming, in Proc.

2nd Annual Conference on Genetic Programming (GP-97), J. R.

Koza, K. Deb, M. Dorigo, et al., Eds., pp. 304312, Morgan

Kaufmann, Stanford, Calif, USA, July 1997.

[9] W. A. Tackett, Genetic programming for feature discovery

and image discrimination, in Proc. 5th International Conference on Genetic Algorithms, ICGA-93, S. Forrest, Ed., pp. 303

309, Morgan Kaufmann, Urbana-Champaign, Ill, USA, July

1993.

[10] J. F. Winkeler and B. S. Manjunath, Genetic programming

for object detection, in Proc. 2nd Annual Conference on Genetic Programming (GP-97), J. R. Koza, K. Deb, M. Dorigo,

et al., Eds., pp. 330335, Morgan Kaufmann, Stanford, Calif,

USA, July 1997.

[11] A. Teller and M. Veloso, A controlled experiment: evolution

for learning dicult image classification, in Proc. 7th Portuguese Conference On Artificial Intelligence, C. Pinto-Ferreira

and N. J. Mamede, Eds., vol. 990 of Lecture Notes in Computer

Science, pp. 165176, Springer-Verlag, Funchal, Madeira Island, Portugal, October 1995.

[12] M. Zhang and V. Ciesielski, Centred weight initialization

in neural networks for object detection, in Computer Science 99: Proc. 22nd Australasian Computer Science Conference,

J. Edwards, Ed., pp. 3950, Springer-Verlag, Auckland, New

Zealand, January 1999.

[13] T. Caelli and W. F. Bischof, Machine Learning and Image Interpretation, Plenum Press, New York, NY, USA, 1997.

[14] O. Faugeras, Three-Dimensional Computer VisionA Geometric Viewpoint, MIT Press, Cambridge, Mass, USA, 1993.

[15] E. Gose, R. Johnsonbaugh, and S. Jost, Pattern Recognition and

Image Analysis, Prentice-Hall, Upper Saddle River, NJ, USA,

1996.

[16] M. V. Shirvaikar and M. M. Trivedi, A neural network filter to detect small targets in high clutter backgrounds, IEEE

Transactions on Neural Networks, vol. 6, no. 1, pp. 252257,

1995.

[17] P. Winter, S. Sokhansanj, H. C. Wood, and W. Crerar, Quality assessment and grading of lentils using machine vision,

in Canadian Society of Agricultural Engineering Annual Meeting at the Agricultural Institute of Canada Annual Conference,

Lethbridge, AB, Canada, July 1996, CSAE paper No. 96-310.

[18] E. Baum and D. Haussler, What size net gives valid generalization?, Neural Computation, vol. 1, no. 1, pp. 151160,

1989.

[19] D. Howard, S. C. Roberts, and R. Brankin, Target detection

in SAR imagery by genetic programming, Advances in Engineering Software, vol. 30, no. 5, pp. 303311, 1999.

[20] S.-H. Lin, S.-Y. Kung, and L.-J. Lin,

Face recognition/detection by probabilistic decision-based neural network, IEEE Transactions on Neural Networks, vol. 8, no. 1,

pp. 114132, 1997.

[21] Y. LeCun, B. Boser, J. S. Denker, et al., Backpropagation applied to handwritten zip code recognition, Neural Computation, vol. 1, no. 4, pp. 541551, 1989.

[22] W. A. Tackett, Recombination, selection, and the genetic construction of computer programs, Ph.D. thesis, Faculty of the

Graduate School, University of Southern California, Canoga

Park, Calif, USA, April 1994.

[23] D. Andre, Automatically defined features: the simultaneous evolution of 2-dimensional feature detectors and an algorithm for using them, in Advances in Genetic Programming,

K. E. Kinnear, Jr., Ed., pp. 477494, MIT Press, Cambridge,

Mass, USA, 1994.

[24] J. R. Koza, Genetic Programming II: Automatic Discovery of

Reusable Programs, MIT Press, Cambridge, Mass, USA, 1994.

[25] A. Teller and M. Veloso, PADO: learning tree structured algorithms for orchestration into an object recognition system,

Tech. Rep. CMU-CS-95-101, Department of Computer Science, Carnegie Mellon University, Pittsburgh, Pa, USA, 1995.

[26] G. Robinson and P. McIlroy, Exploring some commercial

applications of genetic programming, in Proc. AISB Workshop on Evolutionary Computing, T. C. Fogarty, Ed., vol. 993

of Lecture Notes in Computer Science (LNCS), pp. 234264,

Springer-Verlag, Sheeld, UK, April 1995.

[27] S. Isaka, An empirical study of facial image feature extraction

by genetic programming, in Late Breaking Papers at the 1997

Genetic Programming Conference, J. R. Koza, Ed., pp. 9399,

Stanford Bookstore, Stanford, Calif, USA, July 1997.

[28] S. A. Stanhope and J. M. Daida, Genetic programming

for automatic target classification and recognition in synthetic aperture radar imagery, in Evolutionary Programming

VII: Proc. 7th Annual Conference on Evolutionary Programming, V. W. Porto, N. Saravanan, D. Waagen, and A. E. Eiben,

Eds., vol. 1447 of Lecture Notes in Computer Science (LNCS),

pp. 735744, Springer-Verlag, San Diego, Calif, USA, March

1998.

[29] K. Benson, Evolving finite state machines with embedded genetic programming for automatic target detection within SAR

imagery, in Proc. 2000 Congress on Evolutionary Computation

CEC00, pp. 15431549, IEEE Press, La Jolla, Calif, USA, July

2000.

[30] D. Howard, S. C. Roberts, and C. Ryan, The boru data

crawler for object detection tasks in machine vision, in Proc.

EvoWorkshops 2002, Applications of Evolutionary Computing,

S. Cagnoni, J. Gottlieb, E. Hart, M. Middendorf, and G. Raidl,

Eds., vol. 2279 of Lecture Notes in Computer Science (LNCS),

pp. 220230, Springer-Verlag, Kinsale, Ireland, April 2002.

[31] B. J. Lucier, S. Mamillapalli, and J. Palsberg, Program optimization for faster genetic programming, in Proc. 3rd Annual Conference on Genetic Programming (GP-98), J. R. Koza,

W. Banzhaf, K. Chellapilla, et al., Eds., pp. 202207, Morgan

Kaufmann, Madison, Wis, USA, July 1998.

[32] J. R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection, MIT Press, Cambridge,

Mass, USA, 1992.

[33] J. R. Koza, Simultaneous discovery of reusable detectors

and subroutines using genetic programming, in Proc. 5th

International Conference on Genetic Algorithms, (ICGA 93),

S. Forrest, Ed., pp. 295302, Morgan Kaufmann, UrbanaChampaign, Ill, USA, 1993.

[34] D. Howard, S. C. Roberts, and C. Ryan, Evolution of an object detection ant for image analysis, in Genetic and Evolutionary Computation Conference Late Breaking Papers, E. D.

Goodman, Ed., pp. 168175, San Francisco, Calif, USA, July

2001.

[35] R. Poli, Genetic programming for image analysis, in Proc.

1st Annual Conference on Genetic Programming (GP-96), J. R.

Koza, D. E. Goldberg, D. B. Fogel, and R. L. Riolo, Eds., pp.

363368, MIT Press, Stanford, Calif, USA, July 1996.

[36] F. Lindblad, P. Nordin, and K. Wol, Evolving 3d model interpretation of images using graphics hardware, in Proc. 2002

Congress on Evolutionary Computation CEC2002, pp. 225

230, Honolulu, Hawaii, USA, May 2002.

[37] C. T. M. Graae, P. Nordin, and M. Nordahl, Stereoscopic vision for a humanoid robot using genetic programming, in

Proc. EvoWorkshops 2000, Real-World Applications of Evolutionary Computing, S. Cagnoni, R. Poli, G. D. Smith, et al.,

Eds., vol. 1803 of Lecture Notes in Computer Science (LNCS),

pp. 1221, Springer-Verlag, Edinburgh, Scotland, UK, April

2000.

[38] P. Nordin and W. Banzhaf, Programmatic compression of

images and sound, in Proc. 1st Annual Conference on Genetic

Programming (GP-96), J. R. Koza, D. E. Goldberg, D. B. Fogel,

and R. L. Riolo, Eds., pp. 345350, MIT Press, Stanford, Calif,

USA, July 1996.

[39] N. Rai, Pixel statistics in neural networks for domain independent object detection, Minor thesis, Department of Computer Science, Faculty of Applied Science, RMIT University,

2001.

[40] M. Fuchs, Crossover versus mutation: an empirical and theoretical case study, in Proc. 3rd Annual Conference on Genetic Programming (GP-98), J. R. Koza, W. Banzhaf, K. Chellapilla, et al., Eds., pp. 7885, Morgan Kaufmann, Madison,

Wis, USA, July 1998.

[41] K. Harries and P. Smith, Exploring alternative operators and

search strategies in genetic programming, in Proc. 2nd Annual Conference on Genetic Programming (GP-97), J. R. Koza,

K. Deb, M. Dorigo, et al., Eds., pp. 147155, Morgan Kaufmann, Stanford, Calif, USA, July 1997.

[42] P. Wilson, Development of genetic programming strategies

for use in the robocup domain, Tech. Rep., Department of

Computer Science, RMIT, 1998, Honours thesis.

859

Mengjie Zhang received a B.E. (mechanical engineering) and an M.E. (computer

applications) in 1989 and 1992 from the

Department of Mechanical and Electrical Engineering, Agricultural University of

Hebei, China, and a Ph.D. in computer

science from RMIT University, Melbourne,

Australia, in 2000. During 19921995, he

worked at the Artificial Intelligence Research Centre, Agricultural University of

Hebei, China. In 2000, he moved to Victoria University of Wellington, New Zealand. His research is focused on data mining, machine

learning, and computer vision, particularly genetic programming,

neural networks, and object detection. He is also interested in web

information extraction, and knowledge-based systems.

Victor B. Ciesielski received his B.S. and

M.S. degrees in 1972 and 1975, respectively,

from the University of Melbourne, Australia

and his Ph.D. degree in 1980 from Rutgers University, USA. He is currently Associate Professor at the School of Computer Science and Information Technology,

RMIT University, where he heads the Evolutionary Computation and Machine Learning Group. Dr. Ciesielskis research interests

include evolutionary computation, computer vision, data mining,

machine learning for robot soccer, and, in particular, genetic programming approaches to object detection and classification.

Peter Andreae received a B.E. (honours) in

electrical engineering from the University

of Canterbury, New Zealand, in 1977 and

a Ph.D. in artificial intelligence from MIT

in 1985. Since 1985, he has been teaching

computer science at Victoria University of

Wellington, New Zealand. His research interests are centered in the area of making

agents that can learn behaviour from experience, but he has also worked on a wide

range of topics, ranging from reconstructing vasculature from xrays, clustering algorithms, analysis of micro-array data, programming by demonstration, and software reuse.

- Bio152 F11 Courseintro-InfoЗагружено:Ramsha Rehan
- Genetic Algorithm NotesЗагружено:tsnrao30
- ffЗагружено:jocansino4496
- Seminar Report MeeeeЗагружено:Nelson Chacko
- tmp61DE.tmpЗагружено:Frontiers
- Educational Technology Lesson PlanЗагружено:dnunley1
- nhess-10-2527-2010Загружено:vliviu
- AYu0rkuQTxmWFsSjsuJp Topic 9Загружено:leafo
- Antolin Et Al EvEdRx Evolution 2012Загружено:Akhtar Abbas
- Cost and Performance Optimization of Induction Motor Using GeneticЗагружено:IAEME Publication
- Fuzzy Cognitive Maps With Genetic Algorithm for Goal-Oriented Decision SupportЗагружено:pavithramasi
- ijicic-10-01002Загружено:Hanan Mouchtakiri
- tmpFF65.tmpЗагружено:Frontiers
- A Look at Linguistic EvolutionЗагружено:gpjimenez10
- 86-199_Pages-5Загружено:Zellagui Energy
- Optimization of Airport Ground OperationsЗагружено:Savu Bogdan
- 580S-2Загружено:Irahan Otoniel Jose
- Class 10Загружено:Alexandra Adina Rădescu
- Fuzzy multiobjective optimization of mechanical structures.pdfЗагружено:Joe Luis
- DaCruzetal2006 QIEANumericalOptn IEEE CECp2630 7Загружено:alexandru_bratu_6
- Application of Harmony Search to Design Storm Estimation From Probability Distribution ModelsЗагружено:alex
- Ubicacion Optima PR en MT.pdfЗагружено:JOSEPH RR
- school newsletterЗагружено:api-398384726
- 30555601 AP Biology Objectives Chapter 22 25Загружено:herrk1
- The Use of Artificial Evolution in RoboticsЗагружено:Sumiit B. Ahire
- Apologetics 07Загружено:Tarlan Fisher
- Cazul Cyril Burt 2Загружено:Cosmina Ștefănescu
- Pap Srm GalesiaЗагружено:harirajini
- science- ab edЗагружено:api-383437918
- takingsides12Загружено:api-279305158

- Activist Media and BiopoliticsЗагружено:Raquel Renno
- AlexTelfarProposal t SNEЗагружено:Alex Telfar
- Computational Complexity - Christos Papadimitriou.pdfЗагружено:Alex Telfar
- 20131125090431 Xia Weiliang Discussion Nov.26Загружено:Alex Telfar
- TR1312Загружено:Alex Telfar
- Position Description - Data Quality AnalystЗагружено:Alex Telfar
- Manifold hypothesisЗагружено:Alex Telfar
- Shower Tray Systems SpecsЗагружено:Alex Telfar
- Claude Shannon - A Mathematical Theory of CommunicationЗагружено:api-3769743
- How to Choose Good ProblemЗагружено:Andrés Abad
- 1108.4199Загружено:Alex Telfar
- Oweek Tickets 214539Загружено:Alex Telfar
- 1950’s QuizЗагружено:Alex Telfar
- Sarah ReadmeЗагружено:Alex Telfar
- ClaudineTranscription1Загружено:stormlordmh
- Animal TestingЗагружено:Alex Telfar
- Merchants of DoubtЗагружено:Alex Telfar
- Forests for PeopleЗагружено:Alex Telfar
- oct13nletterЗагружено:Alex Telfar
- dec13nletterЗагружено:Alex Telfar
- aug13nletterЗагружено:Alex Telfar
- fncir-07-00106Загружено:Alex Telfar
- COW new music 06.04Загружено:Alex Telfar
- Fiddlers Rally MpdsetЗагружено:Alex Telfar
- 2009 Bennett Et Al Are We PreparedЗагружено:Alex Telfar
- Influence the Psychology of PersuasionЗагружено:Alex Telfar
- Partitura Toy StoryЗагружено:Miguel_Maximil_6737
- Global WarmingЗагружено:Alex Telfar
- Laputa - Castle in the SkyЗагружено:slowlove

- 2. SQLGraph -- When ClickHouse marries graph processing Amoisbird.pdfЗагружено:miatayoung
- sol of hw2 dbmsЗагружено:Rahul Keshri
- 24 Complex Number -Mc-exercise 1Загружено:Dikshit Arora
- Probability Density Functions SheetsЗагружено:dela2
- ADXL346Загружено:dhanysiregar
- DESIGN OF FOOTING.docxЗагружено:rumylo f. agustin
- TrussesЗагружено:NamRata ThApa
- autometaЗагружено:sowmya
- Front MatterЗагружено:Belinda Angel
- Drug ReleaseЗагружено:SriArthi
- Management Accounting 13-27 & 13-40 Solutions(1)Загружено:sabrinzzz
- KDD-2002-whatsthecodeЗагружено:kaltorak
- mark scheme math paper 1Загружено:shrutianuhya
- Man Offset ReportЗагружено:fariskongrockefeller
- electrical_course_book.pdfЗагружено:anunila
- Visualization and Optimization of Shielding Gas Flows in Arc WeldingЗагружено:kaliappan45490
- 05_Statistical_Inference.pdfЗагружено:Rama Dulce
- Mortal Men and Immortal EquationsЗагружено:southafricaguidance0
- civil2016v11releasenote-151009073931-lva1-app6892Загружено:ThaiAnhBo
- PoolTogether Audit ReportЗагружено:lay2000lbs
- PERINI_The Truth in PicturesЗагружено:rosenbergalape
- 155271_1710-EET2166 Three Phase Circuits- slides (2).pptЗагружено:Lucas Lee
- 6 Boundary Integral Equations.pdfЗагружено:Saif Rahman
- chem101-ch1Загружено:altwirqi
- Thesis Request Letter (Ppa2)Загружено:JessicaGonzales
- 10. CML TheoryЗагружено:Vaidyanathan Ravichandran
- InventoryЗагружено:Arkodeb Chakraborty
- Tutorial_ How to Obtain Relative Accurate Results From Stress Analysis in Autodesk Inventor - GrabCADЗагружено:tonyjaja
- Introduction 2001 a Mathematical Introduction to Logic Second EditionЗагружено:Douglas Barroso
- Oilene - Pirates of the SilverlandЗагружено:Tip Taptwo

## Гораздо больше, чем просто документы.

Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.

Отменить можно в любой момент.