Вы находитесь на странице: 1из 330

Infosys Science Foundation Series

in Applied Sciences and Engineering

Rituparna Datta
Kalyanmoy Deb Editors

Evolutionary
Constrained
Optimization

Infosys Science Foundation Series


Applied Sciences and Engineering

More information about this series at http://www.springer.com/series/13554

Rituparna Datta Kalyanmoy Deb

Editors

Evolutionary Constrained
Optimization

123

Editors
Rituparna Datta
Department of Electrical Engineering
Korea Advanced Institute of Science
and Technology
Daejeon
Republic of Korea

Kalyanmoy Deb
Electrical and Computer Engineering
Michigan State University
East Lansing, MI
USA

ISSN 2363-6149
ISSN 2363-6157 (electronic)
Infosys Science Foundation Series
ISSN 2363-4995
ISSN 2363-5002 (electronic)
Applied Sciences and Engineering
ISBN 978-81-322-2183-8
ISBN 978-81-322-2184-5 (eBook)
DOI 10.1007/978-81-322-2184-5
Library of Congress Control Number: 2014957133
Springer New Delhi Heidelberg New York Dordrecht London
Springer India 2015
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or
dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt
from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained
herein or for any errors or omissions that may have been made.
Printed on acid-free paper
Springer (India) Pvt. Ltd. is part of Springer Science+Business Media (www.springer.com)

To my parents, Ranjit Kumar Datta (father)


and Khela Datta (mother).
Rituparna Datta
To Sadhan Chandra Deb (Baro Mama,
Eldest Uncle) whose inspiration has always
shown me the way.
Kalyanmoy Deb

Preface

Optimization is an integral part of research in most scientic and engineering


problems.
The critical challenge in optimization lies in iteratively nding the best
combination of variables which minimize or maximize one or more objective
functions by satisfying the variable requirements and restrictions which are largely
known as constraints. Most optimization problems involve one or many constraints
due to the limitation in the availability of resources, physical viability, or other
functional requirements. The existence of constraints in problems in science and
engineering is continuously motivating researchers to develop newer and more
efcient methods of constraint handling in optimization.
Evolutionary optimization algorithms are population-based metaheuristic
techniques to deal with optimization problems. These algorithms have been
successively applied to a wide range of optimization problems due to their ability to
deal with nonlinear, nonconvex, and discontinuous objective and constraint
functions. Originally, evolutionary algorithms (EAs) were developed to solve
unconstrained problems. However, as demands for solving practical problems
arose, evolutionary algorithm researchers have been regularly devising new
and efcient constraint handling techniques. Out of these constraint handling
techniques, some are borrowed from the classical literature, while others use different strategies like preference of feasible solutions over infeasible ones, choice of
less constraint-violated solutions, separation of objective and constraint functions,
special operators, and hybrid classical evolutionary methods, to name a few.
In most top evolutionary computation conferences, a good number of papers are
regularly published to discuss various ways of handling constraints using different
EAs. Almost all books and journals on evolutionary computation consist one or many
topics on constrained optimization. In 2009, Springer Studies in Computational
Intelligence came up with a full monograph on EA-based constrained optimization
(Constraint-Handling in Evolutionary Optimization by Mezura-Montes; ISBN: 9783-642-00618-0). This book takes the same direction as that monograph, and presents
a more updated view of the subject matter. Moreover, this book aims to serve as a
self-contained collection of the current research addressing general constrained
vii

viii

Preface

optimization. The book can also serve as a textbook for advanced courses and as a
guide to the future direction of research in the area. Many constraint handling
techniques that exist in bits and pieces are assembled together in the present
monograph. Hybrid optimization, which is gaining a lot of popularity today due to its
capability of bridging the gap between evolutionary and classical optimization is
broadly covered here. These areas will be helpful for researchers, novices and experts
alike.
The book consists of ten chapters covering diverse topics of constrained
optimization using EAs.
Helio J.C. Barbosa, Afonso C.C. Lemonge, and Heder S. Bernardino review the
adaptive penalty techniques in the rst chapter that mainly deals with constraints using
EAs. The penalty function approach is one of the most popular constraint handling
methodologies due to its simple working principle and its ease of integration with
any unconstrained technique. The study also indicates the need for implementation of
different adaptive penalty methods in a single search engine. It will facilitate better
information for the decision maker to choose a particular technique.
The theoretical understanding of constrained optimization is one of the key
features to select the best constraint handling mechanism for any problem.
To tackle this issue, Shayan Poursoltan and Frank Neumann have studied the
influence of tness landscape in Chap. 2. The study introduces different methods to
quantify the ruggedness of a given constrained optimization problem.
Rommel G. Regis proposes a constraint handling method to solve computationally expensive constrained black-box optimization using surrogate-assisted
evolutionary programming (EP) in Chap. 3. The proposed algorithm creates
surrogates model for the black-box objective function and inequality constraint
functions in every generation of the EP. Furthermore, at the end of each generation
a trust-region-like approach is used to rene the best solution. Hard and soft
constraints are common in constrained optimization problems.
In Chap. 4, Richard Allmendinger and Joshua Knowles point out a new type of
constraint known as ephemeral resource constraints (ERCs). The authors have
explained the presence of ERCS in real-world optimization problems.
A combination of multi-membered evolution strategy and an incremental
approximation strategy-assisted constraint handling method is proposed by Sanghoun Oh and Yaochu Jin in Chap. 5 to deal with highly constrained, tiny and
separated feasible regions in the search space. The proposed approach generates an
approximate model for each constraint function with increasing order of accuracy.
It starts with a linear model and consecutively reaches to the complexity similar to
the original constraint function.
Chapter 6, by Tetsuyuki Takahama and Setsuko Sakai, describes a method
combining the e-constrained method and the estimated comparison. In this method,
rough approximation is utilized to approximate both the objective function as well
as constraint violation. The methodology is integrated with differential evolution
(DE) for its simple working principle and robustness.

Preface

ix

Jeremy Porter and Dirk V. Arnold carry out a detailed analysis of the behavior of a
multi-recombinative evolution strategy that highlights both cumulative step size
adaptation and a simple constraint handling technique in Chap. 7. In order to obtain
the optimal solution at the cones apex, a linear optimization problem is considered for
analysis with a feasible region dened by a right circular cone, which is symmetric
about the gradient direction.
A niching technique is explored in conjunction with multimodal optimization by
Mohammad Reza Bonyadi and Zbigniew Michalewicz in Chap. 8 to locate feasible
regions, instead of searching for different local optima. Since in continuous
constrained optimization, feasible search space is more likely to appear with many
disjoint regions, the global optimal solution might be located within any one
of them. A particle swarm optimization is used as search engine.
In Chap. 9, Rammohan Mallipeddi, Swagatam Das, and Ponnuthurai Nagaratnam
Suganthan present an ensemble of constraint handling techniques (ECHT). Due to
the nonexistence of a universal constraint handling method, an ensemble method can
be a suitable alternative. ECHT is collated with an improved (DE) algorithm and the
proposed technique is known as EPSDE.
Rituparna Datta and Kalyanmoy Deb propose an adaptive penalty function
method using genetic algorithms (GA) in the concluding chapter (Chap. 10) of this
book. The proposed method amalgamates a bi-objective evolutionary approach
with the penalty function methodology in order to overcome individual weakness.
The bi-objective approach is responsible for the approximation of appropriate
penalty parameter and the starting solution for the unconstrained penalized function
by a classical method, which is responsible for exact convergence.
We would like to thank the team at Springer. In particular we acknowledge the
contributions of our Editor, Swati Meherishi, and the editorial assistants, Kamya
Khatter and Aparajita Singh, who helped bring this manuscript to fruition.
Rituparna Datta would like to thank his wife Anima and daughter Riddhi for their
love and affection.
Daejeon, Korea, September 2014
East Lansing, MI, USA

Rituparna Datta
Kalyanmoy Deb

Acknowledgments to Reviewers

With deep gratitude we convey our heartfelt greetings and congratulations to the
following colleagues and key researchers who spared no pains for reviewing this
book to make it a signal success.
Richard Allmendinger, University College London, UK
Dirk Arnold, Dalhousie University, Canada
Helio J.C. Barbosa, Universidade Federal de Juiz de Fora, Brazil
Heder S. Bernardino, Laboratorio Nacional de Computacao Cientica, Brazil
Hans-Georg Beyer, FH Vorarlberg, University of Applied Sciences, Austria
Fernanda Costa, University of Minho, Portugal
Dilip Datta, Tezpur University, India
Oliver Kramer, University of Oldenburg, Germany
Afonso Celso de Castro Lemonge, Federal University of Juiz de Fora, Brazil
Xiaodong Li, RMIT University, Australia
Rammohan Mallipeddi, Kyungpook National University, South Korea
Tomasz Oliwa, Toyota Technological Institute at Chicago, USA
Khaled Rasheed, University of Georgia, USA
Rommel G. Regis, Saint Josephs University, USA

xi

Contents

A Critical Review of Adaptive Penalty Techniques


in Evolutionary Computation . . . . . . . . . . . . . . . . . . . . . . . . . . .
Helio J.C. Barbosa, Afonso C.C. Lemonge
and Heder S. Bernardino
Ruggedness Quantifying for Constrained Continuous
Fitness Landscapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Shayan Poursoltan and Frank Neumann
Trust Regions in Surrogate-Assisted Evolutionary
Programming for Constrained Expensive Black-Box
Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Rommel G. Regis

29

51

Ephemeral Resource Constraints in Optimization. . . . . . . . . . . . .


Richard Allmendinger and Joshua Knowles

Incremental Approximation Models for Constrained


Evolutionary Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sanghoun Oh and Yaochu Jin

135

Efcient Constrained Optimization by the Constrained


Differential Evolution with Rough Approximation . . . . . . . . . . . .
Tetsuyuki Takahama and Setsuko Sakai

157

Analyzing the Behaviour of Multi-recombinative Evolution


Strategies Applied to a Conically Constrained Problem . . . . . . . .
Jeremy Porter and Dirk V. Arnold

181

95

xiii

xiv

Contents

Locating Potentially Disjoint Feasible Regions


of a Search Space with a Particle Swarm Optimizer . . . . . . . . . . .
Mohammad Reza Bonyadi and Zbigniew Michalewicz
Ensemble of Constraint Handling Techniques for Single
Objective Constrained Optimization . . . . . . . . . . . . . . . . . . . . . .
Rammohan Mallipeddi, Swagatam Das
and Ponnuthurai Nagaratnam Suganthan

205

231

10 Evolutionary Constrained Optimization: A Hybrid Approach . . . .


Rituparna Datta and Kalyanmoy Deb

249

About the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

315

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

317

About the Editors

Rituparna Datta is a postdoctoral research fellow with the Robot Intelligence


Technology (RIT) Laboratory at the Korea Advanced Institute of Science and
Technology (KAIST). He earned his Ph.D. in Mechanical Engineering at Indian
Institute of Technology (IIT) Kanpur and thereafter worked as a Project Scientist in
the Smart Materials, Structures, and Systems Lab at IIT Kanpur. His current research
work involves investigation of Evolutionary Algorithms-based approaches to constrained optimization, applying multiobjective optimization in engineering design
problems, memetic algorithms, derivative-free optimization, and robotics. He is a
member of ACM, IEEE, and IEEE Computational Intelligence Society. He has been
invited to deliver lectures in several institutes and universities across the globe,
including Trinity College Dublin (TCD), Delft University of Technology
(TUDELFT), University of Western Australia (UWA), University of Minho, Portugal, University of Nova de Lisboa, Portugal, University of Coimbra, Portugal, and
IIT Kanpur, India. He is a regular reviewer of IEEE Transactions on Evolutionary
Computation, Journal of Applied Soft Computing, Journal of Engineering Optimization, Journal of The Franklin Institute, and International Journal of Computer
Systems in Science and Engineering, and was in the program committee of Genetic
and Evolutionary Computation Conference (GECCO 2014), iNaCoMM2013,
GECCO 2013, GECCO 2012, GECCO 2011, eighth international conference on
Simulated Evolution And Learning (SEAL 2010), international conference on molecules to materials (ICMM-06), and some Indian conferences. He has also chaired
sessions in ACODS 2014 and UKIERI Workshop on Structural Health Monitoring
2012, GECCO 2011, IICAI 2011 to name a few. He was awarded an international
travel grant (Young Scientist) from Department of Science and Technology,
Government of India, in July 2011 and June 2012 and travel grants from Queensland
University, Australia, June 2012, GECCO Student Travel Grant, ACM, New York.

xv

xvi

About the Editors

Prof. Kalyanmoy Deb is Koenig Endowed Chair Professor at the Department of


Electrical and Computer Engineering in Michigan State University (MSU), East
Lansing, USA. He also holds a professor position at the Department of Computer
Science and Engineering, and at the Department of Mechanical Engineering in
MSU. Prof. Debs main research interests are in genetic and evolutionary optimization algorithms and their application in optimization, modeling, and machine
learning. He is largely known for his seminal research in developing and applying
Evolutionary Multi-objective Optimization. Prior to coming to MSU, he was
holding an endowed chair professor position at Indian Institute of Technology
Kanpur, India, where he established KanGAL (http://www.iitk.ac.in/kangal) to
promote research in genetic algorithms and multi-criterion optimization since 1997.
His Computational Optimization and Innovation (COIN) Laboratory (http://www.
egr.msu.edu/kdeb) at Michigan State University continues to act in the same spirit.
He has consulted with various industries and software companies in the past.
Prof. Deb was awarded the prestigious Infosys Prize in 2012, TWAS Prize in
Engineering Sciences in 2012, CajAstur Mamdani Prize in 2011, JC Bose
National Fellowship in 2011, Distinguished Alumni Award from IIT Kharagpur
in 2011, Edgeworth-Pareto award in 2008, Shanti Swarup Bhatnagar Prize in
Engineering Sciences in 2005, Thomson Citation Laureate Award from Thompson Reuters. Recently, he has been awarded a Honarary Doctorate from University
of Jyvaskyla, Finland. His 2002 IEEETEC NSGA-II paper is judged as the Most
Highly Cited paper and a Current Classic by Thomson Reuters having more than
4,200+ citations. He is a fellow of IEEE, ASME, Indian National Science Academy
(INSA), Indian National Academy of Engineering (INAE), Indian Academy of
Sciences (IASc), and International Society of Genetic and Evolutionary Computation (ISGEC). He has written two text books on optimization and more than 375
international journal and conference research papers with Google Scholar citations
of 65,000+ with h-index of 85. He is in the editorial board on 20 major international
journals. More information about his research can be found from http://www.egr.
msu.edu/kdeb.

Chapter 1

A Critical Review of Adaptive Penalty


Techniques in Evolutionary Computation
Helio J.C. Barbosa, Afonso C.C. Lemonge and Heder S. Bernardino

Abstract Constrained optimization problems are common in the sciences,


engineering, and economics. Due to the growing complexity of the problems tackled,
nature-inspired metaheuristics in general, and evolutionary algorithms in particular,
are becoming increasingly popular. As move operators (recombination and mutation)
are usually blind to the constraints, most metaheuristics must be equipped with a
constraint handling technique. Although conceptually simple, penalty techniques
usually require user-defined problem-dependent parameters, which often significantly impact the performance of a metaheuristic. A penalty technique is said to
be adaptive when it automatically sets the values of all parameters involved using
feedback from the search process without user intervention. This chapter presents a
survey of the most relevant adaptive penalty techniques from the literature, identifies
the main concepts used in the adaptation process, as well as observed shortcomings,
and suggests further work in order to increase the understanding of such techniques.
Keywords Adaptive techniques Penalty techniques Evolutionary computation

1.1 Introduction
Constrained optimization problems are common in the sciences, engineering, and
economics. Due to the growing complexity of the problems tackled, nature-inspired
metaheuristics in general, and evolutionary algorithms in particular, are becoming
H.J.C. Barbosa (B)
National Laboratory for Scientific ComputingLNCC, Petropolis, Rio de Janeiro, RJ, Brazil
e-mail: hcbm@lncc.br
A.C.C. Lemonge
Department of Applied and Computational Mechanics, Federal University of Juiz de Fora,
Juiz de Fora, MG, Brazil
e-mail: afonso.lemonge@ufjf.edu.br
H.S. Bernardino H.J.C. Barbosa
Department of Computer Science, Federal University of Juiz de Fora, Juiz de Fora, MG, Brazil
e-mail: heder@ice.ufjf.br
Springer India 2015
R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,
Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_1

H.J.C. Barbosa et al.

increasingly popular. That is due to the fact that, in contrast to classical mathematical
programming techniques, they can be readily applied to situations where the objective
function(s) and/or constraints are not known as explicit functions of the decision
variables. This happens when potentially expensive computer models (generated by
means of the finite element method (Hughes 1987), for example) must be run in order
to compute the objective function and/or check the constraints every time a candidate
solution needs to be evaluated. For instance, in the design of truss structures, one
possible definition of the problem is to find the cross-section areas of the bars that
minimize the structures weight subject to limitations in the nodal displacements and
in the stress of each bar (Krempser et al. 2012). Notice that although the structures
weight can be easily calculated from the design variables, the values of the nodal
displacements and of the stress in each bar are determined by solving the equilibrium
equations defined by the finite element model.
As move operators (recombination and mutation) are usually blind to the
constraints (i.e., when operating upon feasible individual(s) they do not necessarily
generate feasible offspring) most metaheuristics must be equipped with a constraint
handling technique. In simpler situations, repair techniques (Salcedo-Sanz 2009),
special move operators (Schoenauer and Michalewicz 1996), or special decoders
(Koziel and Michalewicz 1998) can be designed to ensure that all candidate solutions are feasible.
We do not attempt to survey the current literature on constraint handling in this
chapter, and the reader is referred to survey papers of, e.g., Michalewicz (1995),
Michalewicz and Schoenauer (1996), Coello (2002), and Mezura-Montes and Coello
(2011) as well as to the other chapters in this book. Instead we consider the oldest, and
perhaps most general class of constraint handling methods: the penalty techniques,
where infeasible candidate solutions have their fitness value reduced and are allowed
to coexist and evolve with the feasible ones.
Although conceptually simple, penalty techniques usually require user-defined
problem-dependent parameters, which often significantly impact the performance of
a metaheuristic.
The main focus of this chapter is on adaptive penalty techniques, which automatically set the values of all parameters involved using feedback from the search
process without user intervention. This chapter presents a survey of the most relevant adaptive penalty techniques from the literature as well as a critical assessment of
their assumptions, rationale for the design choices made, and reported performance
on test-problems.
The chapter is structured as follows. Section 1.2 summarizes the penalty method,
Sect. 1.3 introduces the main taxonomy for strategy parameter control, and Sect. 1.4
reviews some representative proposals for adapting penalty parameters. Section 1.5
presents a discussion of the main findings and the chapter ends with some conclusions,
including suggestions for further work in order to increase the understanding of such
adaptive techniques.

1 A Critical Review of Adaptive Penalty Techniques in Evolutionary Computation

1.2 The Penalty Method


We consider in this chapter the constrained optimization problem consisting in the
minimization of a given objective function f (x), where x Rn is the vector of
decision/design variables, which are subject to inequality constraints gp (x) 0,
p = 1, 2, . . . , p as well as equality constraints hq (x) = 0, q = 1, 2, . . . , q . In many
applications the variables are also subject to bounds xiL xi xiU . However, this type
of constraint is usually trivially enforced in an EA and are not considered here. The
set of all feasible solutions are denoted by F , while d(x, F ) is a distance measure
of the element x to the set F . The definition of d(x, F ) depends on the particular
constraint-handling strategy adopted and is specified for each strategy independently.
The penalty method, which transforms a constrained optimization problem into
an unconstrained one, can be traced back at least to the paper by Courant (1943) in
the 1940s, and its adoption by the evolutionary computation community happened
very soon.
In this chapter, penalty techniques used within evolutionary computation
methods are classified as multiplicative or additive. In the multiplicative case, a
positive penalty factor p(v(x), T ) is introduced where v(x) denotes a measure of how
constraints are violated by the candidate solution x and T denotes a temperature.
The idea is to amplify the value of the fitness function of an infeasible individual (in
a minimization problem):
F(x) = p(v(x), T ) f (x).
One would have p(v(x), T ) = 1, for any feasible candidate solution x, and
p(v(x), T ) > 1 otherwise. Also, p(v(x), T ) increases with the temperature T and
with the magnitude of the constraint violation v(x). An initial value for the temperature is required together with the definition of a schedule for T such that T grows as
the evolution advances. This type of penalty has however received much less attention
in the EC community than the additive type. The most recent work seems to be by
Puzzi and Carpinteri (2008), where the technique introduced by Yokota et al. (1995)
and later modified in Gen and Cheng (1996), is also presented. Harrell and Ranjithan
(1999) compare additive and multiplicative penalty techniques for an instance of the
watershed management problem.
In the additive case, a penalty functional is added to the objective function in
order to define the fitness value of an infeasible element. They can be further divided
into: (a) interior techniques, when a barrier functional B(x)which grows rapidly
as x F approaches the boundary of the feasible domainis added to the objective
function
1
Fk (x) = f (x) + B(x)
k

H.J.C. Barbosa et al.

and (b) exterior techniques, where a penalty functional is introduced


Fk (x) = f (x) + kP(d(x, F ))

(1.1)

such that P(d(x, F )) = 0 if x is feasible (x F ) and P(.) > 0 otherwise.


In both cases (a) and (b), under reasonable conditions, as k any limit point
of the sequence {xk } of solutions of the unconstrained problem of minimizing Fk (x)
is a solution of the original constrained problem (Luenberger and Ye 2008).
In order to define d(x, F ) it is useful to define a measure of the violation of the
jth constraint by a given candidate solution x Rn . One possibility is to take

vj (x) =

|hj (x)|,
max{0, gj (x)}

for an equality constraint,


otherwise

(1.2)

However, the equality constraints hj (x) = 0 are often replaced by the inequalities
|hj (x)| 0, for some small positive , and one would have

vj (x) =

max{0, |hj (x)| },


max{0, gj (x)}

for an equality constraint,


otherwise

(1.3)

For computational efficiency the violations vj (x) are used to compute a substitute for
d(x, F ) in the design of penalty functions that grow with the vector of violations
v(x) Rm , where m = p + q is the number of constraints to be penalized. At this
point it is easy to see that interior penalty techniques, in contrast to exterior ones,
require feasible solutions (which are often hard to find) thus explaining the high
popularity of the later.
The most popular penalty function is perhaps (Luenberger and Ye 2008)
P(x) =

m

(vj (x))2

(1.4)

j=1

where P(d(x, F )) is equal to the square of the Euclidean norm of v(x).


Although it is conceptually easy to obtain the unconstrained problem, the definition of good penalty parameter(s) is usually a time-consuming, problem dependent,
trial-and-error process.
One must also note that even if both the objective function f (x) and distance to the
feasible set F (usually based on the constraint violations vj (x)) are defined for all
x, it is not possible to know in general which of the two given infeasible solutions is
closer to the optimum or should be operated upon or kept in the population. One can
have f (x1 ) > f (x2 ) and v(x1 ) = v(x2 ) or f (x1 ) = f (x2 ) and v(x1 ) > v(x2 )
and still have x1 closer to the optimum. Figure 1.1 illustrates these situations.

1 A Critical Review of Adaptive Penalty Techniques in Evolutionary Computation

(a)

(b)

Fig. 1.1 Illustration of situations in which x1 is closer to the optimum (x ) than x2 even when:
a f (x1 ) = f (x2 ) and v(x1 ) > v(x2 ); or b f (x1 ) > f (x2 ) and v(x1 ) = v(x2 )

1.3 A Taxonomy
In order to organize the large amount of penalty techniques available in the literature
Coello (2002) proposed the following taxonomy: (a) static penalty, (b) dynamic
penalty, (c) annealing penalty, (d) adaptive penalty, (e) co-evolutionary penalty, and
(f) death penalty. We think however that the general definitions proposed by Eiben and
Smith (2003) with respect to the way strategy parameters are set within metaheuristics
in general and evolutionary algorithms in particular can be naturally adopted here.
Beyond the simplest static case where strategy parameters are defined by the user
and remain fixed during the run, dynamic schemes have been also used where an
exogenous schedule is proposed in order to define the strategy parameters at any
given point in the search process. It is easy to see that if setting fixed parameters is
not trivial, defining the way they should vary during the run seems to be even harder.
It is also felt that such strategy parameters should not be defined before the run but
rather vary according to what is actually happening in the search process. This gives
rise to the so-called adaptive techniques, where feedback from the search process is
used to define the current strategy parameters.
From the reasoning above, the death penalty can be included as a particular case
of static penalty, and the annealing penalty can be seen as a dynamic penalty scheme.
Co-evolutionary penalty techniques are considered in Sect. 1.5.2.
It should be noted here that the design of the adaptive mechanisms mentioned
above often involve meta-parameters or, at least, implicit design choices. The rationale here is that such meta-parameters should be easier to set appropriately; preferably fixed by the designer, with no posterior user intervention required. However,
the parameter setting in some adaptive techniques can be as hard as in the case of the
static ones (Coello 2002), contradicting the main objective of the adaptive penalty
methods.
Finally, an even more ambitious proposal can be found in the literature: the selfadaptive schemes. In this case, strategy parameters are coded together with the
candidate solution, and conditions are created so that the evolutionary algorithm

H.J.C. Barbosa et al.

not only evolves increasingly better solutions but also better adapted strategy parameters. With this increasing sophistication in the design of the algorithms one not
only seeks to improve performance but also to relieve the user from the task of
strategy parameter setting and control.
However, as will be shown in the next section, another possibility, which has not
been contemplated in the taxonomy considered above, can be found in the literature
for the task of automatically setting strategy parameters. The idea is to maintain an
additional population with the task of co-evolving such strategy parameters (here
penalty coefficients) along with the standard population evolving the solutions to the
constrained optimization problem at hand.

1.4 Some Adaptive Techniques


In this section some selected papers from the literature are reviewed in order to
provide an overview of the diversity of techniques proposed for automatically setting
parameters involved in the various penalty schemes for constrained optimization.
Such techniques not only intend to relieve the user from the task of parameter setting
for each new application but also to improve the final performance in the case at hand
by adapting the values of those parameters along the search process in a principled
way. Table 1.1 presents a summary of the adaptive penalty techniques cited in this
section. Some references are not included in the table as their work extends a previous
one but do not require any additional information.
The main lines of reasoning have been identified and a few representative
proposals of each line have been grouped together in the following subsections.

1.4.1 The Early Years


A procedure where the penalty parameters change according to information gathered
during the evolution process was proposed by Bean and Alouane (1992). The fitness
function is again given by (1.1) but with the penalty parameter k = (t) adapted at
each generation by the following rules:
1
1 (t),
(t + 1) = 2 (t),

(t)

if bi F
if bi F
otherwise

for all t g + 1 i t
for all t g + 1 i t

where bi is the best element at generation i, F is the feasible region, 1 = 2 and


1 , 2 > 1. In this method the penalty parameter of the next generation (t + 1)
decreases when all best elements in the last g generations were feasible, increases if
all best elements were infeasible and otherwise remains without change.

1 A Critical Review of Adaptive Penalty Techniques in Evolutionary Computation


Table 1.1 Summary of the adaptive penalty techniques described here
Reference
Used information
Bean and Alouane (1992)
Coit et al. (1996)

Feasibility of the best


Degree of infeasibility
Difference between the fitnesses of the best and best
feasible individuals
Hamida and Schoenauer (2000)
Percentage of feasible individuals
Ratio between the sum of the objective function values
and constraint violations
Nanakorn and Meesomklin (2001) Mean of the objective function values of the feasible
solutions
Beaser et al. (2011)
Average of the objective function values
Degree of infeasibility
Barbosa and Lemonge (2003b)
Average of the objective function values
Lemonge and Barbosa (2004)
Average of the violation values of each constraint
Rocha and Fernandes (2009)
Farmani and Wright (2003)
Normalized violation values
Objective function value of the worst solution
Lin and Wu (2004)
Percentage of feasible solutions with respect to each
constraint
Rate between the objective function value and a given
constraint violation
Fitness of the best solution
Number of objective function evaluations
Difference between the medians of the objective function
values of feasible and inFeasible solutions
Ratio of the previous value and the median of the
Constraint violations
Tessema and Yen (2006, 2009)
Percentage of feasible solutions
Average of the normalized constraint violation values
Normalized objective function value
Wang et al. (2009)
Degree of infeasibility
Percentage of feasible solutions
Gan et al. (2010)
Percentage of Feasible solutions
Costa et al. (2013)
Degree of infeasibility
Objective function value of the worst solution
Constraint violation of the equality constraints for the best
solution
Vincenti et al. (2010)
Objective function value of the best feasible solution
Montemurro et al. (2013)
Objective function value of the best infeasible solution
Difference between the two previous values
Ratio between the previous difference and the violation
value of each constraint

H.J.C. Barbosa et al.

The method proposed by Coit et al. (1996), uses the fitness function F(x) written as
F(x) = f (x) + (Ffeas Fall )


m 

dj (x, F ) Kj
j=1

NFTj

where f (x) is the unpenalized objective function for the solution x, Fall corresponds to
the best solution already found, Ffeas corresponds to the best feasible solution already
found, and dj (x, F ) returns the distance between x and the feasible region (dependent
of the problem). Kj and NFTj , the near-feasible threshold of the jth constraint, are
user-defined parameters.
Rasheed (1998) proposed an adaptive penalty approach for handling constraints
within a GA. The strategy required the user to set a relatively small penalty parameter
and then it would increase or decrease it on demand as the optimization progresses.
The method was tested in a realistic continuous-variable conceptual design of a
supersonic transport aircraft, and the design of supersonic missile inlets, as well as
in benchmark engineering problems. The fitness of each individual was based on the
sum of an adequate measure of merit computed by a simulator (such as the take-off
mass of an aircraft). If the fitness value is between V and 10V , where V is a power of
V
. The proposed algorithm featured
10, the penalty coefficient starts with the value 100
two points: (i) the individual that has the least sum of constraint violations and
(ii) the individual that has the best fitness value. The penalty coefficient is considered
adequate if both individuals are the same and otherwise the penalty coefficient is
increased to make the two solutions have equal fitness values. The author concluded
that the idea of starting with a relatively small initial penalty coefficient and increasing
it or decreasing it on demand proved to be very good in the computational experiments
conducted.
Hamida and Schoenauer (2000) proposed an adaptive scheme named as Adaptive
Segregational Constraint Handling Evolutionary Algorithm (ASCHEA) employing:
(i) a function of the proportion of feasible individuals in the population; (ii) a seduction/selection strategy to mate feasible and infeasible individuals applying a specific
feasibility-oriented selection operator, and (iii) a selection scheme to give advantage
for a given number of feasible individuals. The ASCHEA algorithm was improved
(Hamida and Schoenauer 2002) by considering a niching technique with adaptive
radius to handle multimodal functions and also (i) a segregational selection that distinguishes between feasible and infeasible individuals, (ii) a constraint-driven recombination, where in some cases feasible individuals can only mate with infeasible ones,
and (iii) a population-based adaptive penalty method that uses global information
on the population to adjust the penalty coefficients. Hamida and Schoenauer (2002)
proposed the following penalty function:
P(x) =

m

j=1

j vj (x)

(1.5)

1 A Critical Review of Adaptive Penalty Techniques in Evolutionary Computation

where j is adapted as


j (t + 1) = j (t)/fact
j (t + 1) = j (t) fact

if (t (j) > target )


otherwise

(1.6)

where fact > 1 and target are to be defined by the user (although the authors suggest
target = 0.5), and t (j) is the proportion of individuals which do not violate the
jth constraint. The idea is to have feasible and infeasible individuals on both sides
of the corresponding boundary. The adapted parameter j , with initial value j (0),
are computed using the first population, trying to balance objective function and
constraint violations:

j (0) = 1,
if
vj (xi ) = 0
i
(1.7)

j (0) =
ni |f (xi )| 100, otherwise
|v (x )|
i

The early proposals reviewed here were not able in general to adequately deal with
the problem, suggesting that more information from the search process, at the price
of added complexity, was required.

1.4.2 Using More Feedback


Nanakorn and Meesomklin (2001) proposed an adaptive penalty function for a GA
that is able to adjust itself during the evolutionary process. According to that method,
the penalty is such that
/F
F(x) (t)favg for x

(1.8)

where favg represents the average fitness value of all feasible individuals in the current
generation and (t) depends on favg . Thus, the fitness function is defined as
F(x) = f (x) (t)E(x)

(1.9)

where
E(x) =

m


vi (x)

(1.10)

i=1

The adaptive parameter (t) is written as





f (x) (t)favg
.
(t) = max 0, max
E(x)
x
/F

(1.11)

10

H.J.C. Barbosa et al.

The function (t) is defined according to the user defined parameter . If 1 then
(t) =

Cfavg + Fmax ( 1) favg


(C 1) favg

(1.12)

where C is a user defined parameter which is the maximum scaled fitness value
assigned to the best feasible member. The scaled fitness values are used only in the
selection procedure and will not be described here.
Otherwise (if < 1), then (t) is defined by an iterative process which is initialized with (t) = 1 and is repeated until the value of (t) becomes unchanging. The
steps of the procedure are
(i) to calculate by means of Eq. (1.11)
(ii) to evaluate the candidate solutions according to Eq. (1.9)
(iii) to obtain xmin and x , where Fmin = F (xmin ) is the minimum value of F and
x is the candidate solution that leads to
(t) =

F (x ) (t) favg
E (x )

(1.13)

(iv) (t) is updated by


(t) =



( 1) E (xmin ) F (x ) + E (x ) F (xmin ) + favg F (xmin )
favg [E (x ) + ( 1) E (xmin )]
(1.14)

Beaser et al. (2011) updates the adaptive penalty function theory proposed by
Nanakorn and Meesomklin (2001), expanding its validity beyond maximization
problems to minimization as well. The expanded technique, using a hybrid genetic
algorithm, was applied to a problem in chemistry.
The first modification was introduced in the Eq. (1.8):
/F
F(x) (t)favg for x

(1.15)

Then, the modified value for the parameter (t) is defined as





f (x) (t)favg
(t) = min 0, min
E(x)
x
/F

(1.16)

An adaptive decision maker (ADM) proposed by Gan et al. (2010) is designed


in the form of an adaptive penalty function. The method decides which individual
is maintained in a Pareto optimal set and decides which individuals are going to be
replaced. The fitness function in this strategy is written as usual:
F(x) = f (x) + C G(x).

(1.17)

1 A Critical Review of Adaptive Penalty Techniques in Evolutionary Computation

11

A parameter rf is introduced denoting the proportion of feasible solutions in the


population and C is designed as a function of rf , i.e., C(rf ), and two basic rules
need to be satisfied: (1) It should be a decreasing function, because the coefficient C
decreases as rf increases and, (2) When rf varies from 0 to 1, C decreases sharply
from a large number at the early stage, and decreases slowly to a small number at
the late stage. The reason is that, with rf increasing (it means that there are more
and more feasible solutions in the population), the search emphasis should shift from
low constraint violations to good objective function values quickly. The proposed
function that satisfies these two rules is expressed as C(rf ) = 10(1rf ) , where is
a positive constant coefficient to be adjusted, and the fitness function is rewritten as
F(x) = f (x) + 10(1rf ) G(x)

(1.18)

Besides, two properties are established: (1) the fitness assignment maps the twodimensional vector into the real number space: in this way, it is possible to
compare the solutions in the Pareto optimal set, selecting which one is preferable
and (2) the penalty coefficient C varies with the feasibility proportion of the current
population and, if there are no feasible solutions in the population, this parameter
will receive a relatively large value in order to guide the population in the direction
of the feasible space.
The common need for user-defined parameters together with the difficulty of
finding adequate parameter values for each new application pointed the way to the
challenge of designing penalty techniques which do not require such parameters.

1.4.3 Parameterless Techniques


A parameterless adaptive penalty scheme for GAs was proposed by Barbosa and
Lemonge (2003b), which does not require the knowledge of the explicit form of the
constraints as a function of the design variables and is free of parameters to be set
by the user. In contrast with other approaches where a single penalty parameter is
used for all constraints, an adaptive scheme automatically sizes the penalty parameter
corresponding to each constraint along the evolutionary process. The fitness function
proposed is written as

if x is feasible,

f (x), m

(1.19)
F(x) = f (x) +
kj vj (x), otherwise

j=1

The penalty parameter is defined at each generation by


vj (x)
2
l=1 [ vl (x) ]

kj = | f (x) |
m

(1.20)

12

H.J.C. Barbosa et al.

where f (x) is the average of the objective function values in the current population
and vl (x) is the violation of the lth constraint averaged over the current population.
The idea is that the values of the penalty coefficients should be distributed in a way
that those constraints that are more difficult to be satisfied should have a relatively
higher penalty coefficient.
With the proposed definition one can prove the following property: an individual
whose jth violation equals the average of the jth violation in the current population
for all j, has a penalty equal to the absolute value of the average fitness function of
the population.
The performance of the APM was examined using test problems from the evolutionary computation literature as well as structural engineering constrained optimization problems but the algorithm presented difficulties in solving some benchmark
problems, for example, the functions G2 , G6 , G7 and G10 proposed by Michalewicz
and Schoenauer (1996). That was improved in the conference paper (Barbosa and
Lemonge 2002), where f (x) in the definition of the objective function of the infeasible
individuals in Eq. (1.19) was changed to

f (x) =

f (x),
f (x)

if f (x) > f (x) ,


otherwise

(1.21)

and f (x) is the average of the objective function values in the current population.
The new version was tested (Lemonge and Barbosa 2004) in benchmark engineering
optimization problems and in the GSuite (Michalewicz and Schoenauer 1996) with
a more robust performance.
The procedure proposed by Barbosa and Lemonge (2002), originally conceived
for a generational GA, was extended to the case of a steady-state GA (Barbosa and
Lemonge 2003a), where, in each generation, usually only one or two new individuals are introduced in the population. Substantial modifications were necessary to
obtain good results in a standard test-problem suite (Barbosa and Lemonge 2003a).
The fitness function for an infeasible individual is now computed according to the
equation:
m


kj vj (x)

(1.22)

if there is no feasible element in the population,


otherwise

(1.23)

F(x) = H +

j=1

where H is defined as

f (xworst )
H=
f (xbestFeasible )

and the penalty coefficients are redefined as


vj (x)
2
l=1 [ vl (x) ]

kj = H
m

(1.24)

1 A Critical Review of Adaptive Penalty Techniques in Evolutionary Computation

13

Also, every time a better feasible element is found (or the number of new elements
inserted into the population reaches a certain level), H is redefined and all fitness
values are recomputed. The updating of each penalty coefficient is performed in
such a way that no reduction in its value is allowed. The fitness function value is
then computed using Eqs. (1.22)(1.24). It is clear from the definition of H in (1.23)
that if no feasible element is present in the population, one is actually minimizing
a measure of the distance of the individuals to the feasible set, since the actual
value of the objective function is not taken into account. However, when a feasible
element is found then it immediately enters the population as, after updating all
fitness values using (1.19), (1.23), and (1.24), it becomes the element with the best
fitness value.
Later, APM variants were introduced with respect to the definition of the penalty
parameter kj (Barbosa and Lemonge 2008). The APM, as originally proposed,
computes the constraint violations in the initial population, and updates all penalty
coefficients, for each constraint, after a given number of offspring is inserted in
the population. A second variant, called sporadic APM with constraint violation
accumulation, accumulates the constraint violations during a given number of insertions of new offspring in the population, updates the penalty coefficients, and keeps
the penalty coefficients for the next generations. The APM with monotonic penalty
coefficients is the third variant, where the penalty coefficients are calculated as in
the original method, but no penalty coefficient is allowed to have its value reduced
along the evolutionary process. Finally, the penalty coefficients are defined by using
a weighted average between the previous value of a coefficient and the new value
predicted by the method. This variant is called the APM with damping. Besides that,
these variants of the APM were extended to the steady-state GA and presented in
Lemonge et al. (2012).
Rocha and Fernandes (2009) proposed alternative expressions for the APM
penalty coefficients
 pop


pop


vj (x i )

i 
f (x )
m i=1
kj = 

pop

 k=1 i=1 vk (x i )
i=1
and also

 pop


pop
l


i)
v
(x
j


kj = 
f (x i ) exp
m i=1
1

pop
i


k=1
i=1 vk (x )
i=1

with l {1, 2}.


Farmani and Wright (2003) introduced a parameterless adaptive technique that
uses information about the degree of infeasibility of solutions written as
1  vj (x)
m
vjmax
m

u(x) =

j=1

(1.25)

14

H.J.C. Barbosa et al.

where m is the total number of inequality and equality constraints, and vjmax
is the maximum value of the jth violation in the current population. The xworst of the
infeasible solutions is selected by comparing all infeasible individuals against the
best individual xbest . Two potential population distributions exist in relation to this:
(i) if one or more of the infeasible solutions have an objective function value that
is lower than the f (xbest ), the f (xworst ) of the infeasible solutions is taken as the
infeasible solution having the highest infeasibility value and an objective function
value that is lower than the f (xbest ) solution. If more than one individual exists
with the same highest degree of infeasibility, then f (xworst ) is taken as the solution
with maximum infeasibility value and the lower of the objective function values,
and (ii) when all of the infeasible solutions have an objective function value that is
greater than f (xbest ). Thus, f (xworst ) is identified as being the solution with the highest degree of infeasibility value. Having more than one individual in the population
with the same highest infeasibility value, then f (xworst ) is taken as the solution with
the maximum infeasibility value and the higher of the objective function values. The
highest objective function value in the current population to penalize the infeasible
individuals is defined as fmax . The method is applied in two stages where the first
stage considers the case where one or more infeasible solutions have a lower and
potentially
better objective
problem) than the xbest solution


 function (minimization
and x| f (x) < f (xmax ) (u(x) > 0.0) . A linear relationship between the degree
of infeasibility of the xbest and xworst is considered as
u (x) =

u(x) u(xworst )
u(xbest ) u(xworst )

(1.26)

Thus, the fitness function F1st (x), in the first stage, is written as
F1st (x) = f (x) + u (x)(f (xmax ) f (xworst ))

(1.27)

The second stage increases the objective function such that the penalized objective
function of the worst infeasible individual F2nd (x) is equal to the worst objective
individual (Eqs. (1.28) and (1.29)).


exp(2.0u(x)) 1
F2nd (x) = F1st (x) + | F1st (x) |
exp(2.0) 1


(1.28)

and

f (xmax )f (xbest )

f (xbest )
= 0,

f (xmax )f (xworst ) ,
f (xworst )

if f (xworst ) f (xbest )
if f (xworst ) = f (xmax ).
if f (xworst ) > f (xbest )

(1.29)

The scaling factor , is introduced to ensure that the penalized value of worst
infeasible solution is equivalent to the highest objective function value in the current
population. = 0 (second case in Eq. (1.29)) is used when the worst infeasible

1 A Critical Review of Adaptive Penalty Techniques in Evolutionary Computation

15

individual has an objective function value equal to the highest in the population. In this
case, no penalty is applied since the infeasible solutions would naturally have a low
fitness and should not be penalized further. The use of absolute values of the fitness
function in Eq. (1.29) is considered since minimization of objective functions may
have negative values. The use of absolute values of the fitness function in Eq. (1.29)
is considered since minimization of objective functions may have negative values.
A self-organizing adaptive penalty strategy (SOAPS) is presented in Lin and Wu
(2004) featuring the following aspects: (1) The values of penalty parameters are
automatically determined according to the population distribution; (2) The penalty
parameter for each constraint is independently determined; (3) The objective and
constraint functions are automatically normalized; (4) No parameters need to be
defined by the user; (5) Solutions are maintained evenly distributed in both feasible
and infeasible parts of each constraint. The pseudo objective function defined by
the proposed algorithm is given as
F(x) = f (x) + P(x)

(1.30)

where the penalty function P(x) is written as



1
100 + t

rjt vj (x)
100
p + 2q
m

P(x) =

(1.31)

j=1

where t is the generation, rjt is the penalty parameter for the jth constraint at generation t, and p and q are the number of inequality and equality constraints, respectively.
The penalty parameter rjt for the jth constraint at the tth generation is set as

t (j) 0.5
,
1
5


rjt

rjt1

t1

(1.32)

where t (j) is the percentage of feasible solutions with respect to the jth constraint
at the tth generation. This parameter will be adapted during the evolutionary process
and its initial value is set as
rj0

1
QRobj
=
1
QRcon
j

(1.33)

1
1
and QRcon
where QRobj
j are the interquartile ranges of the objective function and
the jth constraint function values, respectively, in the initial population.
Although the proposed algorithm performed satisfactorily on constrained optimization problems with inequality constraints, it had difficulties in solving problems
with equality constraints. The authors presented in the same paper (Wu and Lin
2004) a modification (with added complexity) of the first version of the algorithm.
They detected that the initial penalty parameter for a constraint may become undesirably large due to the poor initial population distribution. A sensitivity analysis of

16

H.J.C. Barbosa et al.

the parameter rj0 was done by the authors and they concluded that enlarged penalties
undesirably occur because solutions with these unexpected large constraint violations
are not evenly sampled in the initial population. The value for F(x) in the second
generation of SOAPS is written as

F(x) =

f (x),
f (x) (1 rGEN ) + FBASE rGEN + P(x)

if x F
otherwise

(1.34)

where FBASE means the minimum value of all feasible solutions or, in the absence
of them, the infeasible solutions with the smallest amount of constraint violation.
The value of rGEN is given by the number of function evaluations performed so far
divided by the total number of function evaluations. The expression for P(x) is
P(x) =

rj vj (x)

(1.35)

The modified initial penalty coefficient is rewritten as

rj0

med1obj,feasj med1obj,infeasj

med1

if med1obj,feasj med1obj,infeasj

0.5

otherwise

conj

med1obj,infeasj med1obj,feasj
med1conj

(1.36)

where med1obj,feasj is the median of the objective function value of the feasible solutions, and med1obj,infeasj is the median of all infeasible solutions with respect to the
jth constraint, in the initial population. The value med1conj represents the median of
all constraint violations of the jth constraint in the initial population. The value of
medobj,feas , used in Eq. (1.36), is written as
medobj,feas = med,feas = med,infeas
= medobj,infeas + r medcon

(1.37)

where med,feas is the median of the pseudo-objective function values of feasible designs, and med,infeas is the median of the pseudo-objective function values
of infeasible designs. The latter, med,infeas consists of medobj,infeas the median
of objective function values of all infeasible designs and medcon , the median of
constraint violations of all infeasible designs. The second generation of SOAPS was
tested in two numerical illustrative problems and one engineering problem.
Tessema and Yen (2006) proposed an adaptive penalty function for solving constrained optimization problems using a GA. A new fitness value, called distance
value, in the normalized fitness-constraint violation space, and two penalty values
are applied to infeasible individuals so that the algorithm would be able to identify the best infeasible individuals in the current population. The performance of the
algorithm was tested on the G1 to G13 test-problems and the algorithm was considered
able to find competitive results when compared with others from the literature.

1 A Critical Review of Adaptive Penalty Techniques in Evolutionary Computation

17

In (Tessema and Yen 2009) an algorithm that aims to exploit infeasible individuals
with low objective value and low constraint violation was proposed. The fraction
of feasible individuals in the population is used to guide the search process either
toward finding more feasible individuals or searching for the optimum solution. The
objective function of all individuals in the current population will be evaluated first,
and the smallest and the largest values will be identified as fmin and fmax , respectively.
The fitness function of each individual is normalized as
f (x) fmin
f (x) =
fmax fmin

(1.38)

The normalized constraint violation of each infeasible individual is evaluated by


Eq. (1.25) and the modified fitness function is then written as

for a feasible solution

f (x),
u(x),
if
there is no feasible ind.
F(x) = 



f (x)2 + u(x)2 + (1 r )u(x) + r f (x) , otherwise


f
f
where rf [0, 1] is the fraction of feasible individuals in the population, and u(x) is
the average of the normalized violations (vj (x)).
A hybrid evolutionary algorithm and an adaptive constraint-handling technique
is presented by Wang et al. (2009). The hybrid evolutionary algorithm simultaneously uses simplex crossover and two mutation operators to generate the
offspring population. The proposed method operates upon three types of population:
(1) a population that contains only infeasible solutions, (infeasible situation), (2) a
population that contains feasible and infeasible solutions, (semi-feasible situation),
and (3) a population

that contains only feasible solutions, (feasible situation).


Denoting G(x) = m
j=1 Gj (x) as the degree of constraint violation of the individual
x, one has
1. Infeasible situation: the constrained optimization problem is treated as a constraint satisfaction problem. Thus, finding feasible solutions is the most important objective in this situation. To achieve this, the constraint violations G(x) of
the individuals in the population, and the objective function f (x) is disregarded
completely. First, the individuals in the parent population are ranked based on
their constraint violations in ascending order, and then the individuals with the
least constraint violations are selected and form the offspring population.
2. Semi-feasible situation: the population is divided into the feasible group K1 and
the infeasible group K2 . After that, the best feasible xbest and the worst feasible
solutions xworst are identified from the feasible group K1 . Then, the objective
function f (x) of a candidate solution is written as


f (xi ) =

f (xi ),
max {f (xbest ) + (1 )f (xworst ), f (xi )}

if xi K1
if xi K2

(1.39)

18

H.J.C. Barbosa et al.

where is the proportion of feasible solutions in the last population P(t).


The normalized objective function is obtained using the Eq. (1.38). Also, the
normalized constraints are written as

0,
if xi K1

G(x
)

min
G(x)
i
i) =
(1.40)
G(x
xK2

max G(x) min G(x) , if xi K2


xK2

xK2

If only one infeasible solution appears in the population, the normalized constraint
of such individual will always be equal to 0. To avoid it, the normalized
violation G
of such individual is set to a value uniformly chosen between
constraint violation G
0 and 1. The fitness function is defined by adding the normalized objective function
values and constraint violations and defined as
i)
F(xi ) = f (xi ) + G(x

(1.41)

3. Feasible situation: in this case, the comparisons of individuals are based only on
the objective function f (x).
Costa et al. (2013) proposed an adaptive constraint handling technique where the
fitness function of an infeasible individual is defined as
F(x) = fmax +

m


vj (x)

(1.42)

j=1

and vj (x) is defined as in Eq. (1.3). An adaptive tolerance was introduced in order
to handle equality constraints. An initial tolerance 0 is defined and it is adaptively
updated along the evolutionary process, with a periodicity of generations, according
to the expression:
k+1 = k + (1 )Cbest 2

(1.43)

where is a smoothing factor, Cbest is the vector of equality constraints for the best
point in the population, and  2 is the Euclidean norm.
A parameterless adaptive penalty technique used within a GA has been proposed
in Vincenti et al. (2010), Montemurro et al. (2013) where the basic idea is that some
good infeasible individuals (in the sense of having good objective function values)
can be useful to attract the exploration toward the boundary of the feasible domain,
as the optimum usually has some active constraints. The penalty coefficients ci and qj
(for equality and inequality constraints, respectively) are computed at each generation
t as


 F
 F
NF 
NF 
f
f
best fbest
best fbest
i = 1, . . . , q and qj (t) =  NF
j = 1, . . . , p
ci (t) =
(gi )NF
hj best
best
(1.44)

1 A Critical Review of Adaptive Penalty Techniques in Evolutionary Computation

19

where the superscripts F and NF stand for feasible and non-feasible, respectively.
F and f NF are the values of the objective function for the best individuals within
fbest
best
the feasible and the infeasible sides of the domain, respectively, while (gi )NF
best and
 NF
hj best represent the violation of inequality and equality constraints, respectively,
for the best infeasible solution.
Individuals that are infeasible with respect to the kth constraint are grouped and
ranked with respect to their objective function values: the objective function of the
NF
while the individuals that are feasible with
best individual of such a group is fbest
respect to the kth constraint are grouped and ranked with respect to their objective
F .
function values: the objective function of the best individual of this group is fbest
When no feasible individuals are available in the population with respect to the
kth constraint, the population is then sorted into two groups: individuals having
smaller values of the kth constraint violation (10 % of the population) are grouped
as virtually feasible while the rest are grouped as infeasible and ranked in terms of
their objective function values: the objective function of the best individual of such
NF
.
a group is fbest
It is worth noting that the definition in Eq. (1.44) forces the value of the objective
function of the best infeasible individual to be equal to that of the best feasible
individual. In the next section, further (perhaps less popular) ways of implementing
penalty techniques are briefly described.

1.5 Related Techniques


1.5.1 Self-adapting the Parameters
The direct implementation of a standard self-adaptive penalty technique (following
Eiben and Smith (2003)) would entail the encoding of one (or more) penalty
coefficients in the same chromosome where the candidate solution is encoded. They
are then subject to the evolutionary process, undergoing recombination and mutation
just as the problem variables in the chromosome. However, evolution would discover that the best strategy is to drive down all penalty coefficients of an individual
to zerothus eliminating any reduction in the fitness of the corresponding candidate
solutionand actually finding the solution of the unconstrained problem (Eiben and
Smith 2003).
Eiben et al. (2000) proposed a scheme to prevent EAs from cheating when
solving constraint satisfaction problems (CSPs). When solving CSPs by means of
EAs, weights are associated with each constraint to add a penalty to the individual
if that constraint is not satisfied. Changes in the weights along the run will cause
the EA to put more pressure into the satisfaction of the corresponding constraint.
Eiben et al. introduced a tournament selection that uses the maximum of each of
the weights, across all competitors, as a way to eliminate cheating in the CSP case,
without resorting to any feedback mechanism from the search process. Unfortunately,

20

H.J.C. Barbosa et al.

to the best of our knowledge, no strict self-adaptive technique has been applied so
far to constrained optimization problems in Rn .

1.5.2 Coevolving the Parameters


Coello (2000) introduced a co-evolutionary algorithm to adapt the penalty coefficients of a fitness function in a GA with two populations P1 (size M1 ) and P2
(size M2 ). The fitness function is written as
F(x) = f (x)k (sum_viol(x) w1 + num_viol(x) w2 )

(1.45)

where w1 and w2 are two (integer) penalty coefficients, and sum_viol(x) and
num_viol(x) are, respectively, the sum of the violations and the number of constraints
which are violated by the candidate solution x. The second of these populations, P2 ,
encodes the set of weight combinations (w1 and w2 ) that will be used to compute
the fitness value of the candidate solutions in P1 whereas P2 contains the penalty
coefficients that will be used in the fitness function evaluation. Benchmark problems
from the literature, especially mechanical engineering optimization, are used in the
numerical tests but only inequality constraints were considered in the experiments.
The co-evolutionary idea was also analyzed in He and Wang (2007) and He et al.
(2008). In these works, the penalty factors are adapted by a co-evolutionary particle
swarm optimization approach (CPSO). Two kinds of swarms are used in He and
Wang (2007) and He et al. (2008): one population of multiple swarms is used to
solve the search problem and other one is responsible to adapt the penalty factors.
Each particle j in the second population represents the penalty coefficients for a set
of particles in the first one. The two populations evolve by a given G1 and G2 number
of generations. The adopted fitness function is the one proposed by Richardson et al.
(1989), where not only the amount of violation contributes to the quality of a given
candidate solution but also the number of of violated constraints. According to He
and Wang (2007) and He et al. (2008),
Fj (x) = f (x) + sum_viol(x) wj,1 + num_viol(x) wj,2 ,
where f (x) is the objective function value, and wj,1 and wj,2 are the penalty coefficients from the particle j in the second swarm population. The penalty factors wj,1
and wj,2 are evolved according to the following fitness:

sum_feas
num_feas
num_feas, if there is at least one feasible solution in the subset

pop
i

pop
G(j) =
i=1 sum_viol(x )
max(Gvalid ) +
pop
i=1 num_viol(x i ), otherwise,
num_viol(x i )
i=1

where sum_feas denotes the sum of objective function values of feasible solutions,
num_feas is the number of feasible individuals, and max(Gvalid ) denotes the maximum

1 A Critical Review of Adaptive Penalty Techniques in Evolutionary Computation

21

G over all valid particles; the valid particles are those ones which operate over a subset
of particles where there is at least one feasible solution.

1.5.3 Using Other Tools


It is interesting to note that, despite all the effort that has been devoted to the research
of penalty techniques in the context of nature inspired metaheuristics in the last 20
years or so, the subject still draws the attention of the researchers, and new tools are
being constantly introduced to this arena. Fuzzy logic and rough set theory are just
two recent examples that will be mentioned in the following.
Wu etal. (2001) proposed a fuzzy penalty function strategy using information
contained in individuals. The fitness function of an infeasible individual is
F(x) = f (x) + rG(x)

(1.46)

where G(x) is the amount of constraint violation from inequality and equality constraints, and r is the penalty coefficient.
f and G are taken as fuzzy variables with the corresponding linguistic values such
as very large, large,
 small, very small, etc. The ranges for f and G are defined by
Df = fmin , fmax and DG = [Gmin , Gmax ]. Those ranges must then be partitioned
which is a problem dependent, non-trivial taskand linguistic values are associated
with each part. The sets A and B are introduced as fuzzy sets for f and G, respectively,
and r k , k = 1, . . . , l is defined as a fuzzy singleton for r which is inferred from
appropriate membership functions and finally used in (1.46).
In their numerical experiments, three partitions were used for both f and G with
triangle membership functions, and five points were used for the output. The rule
base contained 9 rules in the form
If f is Ai and G is Bj then r = r k .
Lin (2013) proposed perhaps the first constraint-handling approach which applies
the information granulation of rough set theory to address the indiscernibility relation
among penalty coefficients in constrained optimization. Adaptive penalty coefficients
for each constraint wtk , k = 1, . . . , m were defined in a way that a high penalty is
assigned to the coefficient of the most difficult constraint. In addition, the coefficients
are also depended on the current generation number t. Using the standard definition
for the violation of the jth constraint (vj (x)), the fitness function reads as
F(x) = f (x) +

m

j=1

wtk vj2 (x)

22

H.J.C. Barbosa et al.

where wtk = (Ct)(k,t) and C is a severity factor. The exponent (k, t), initialized
as (k, 0) = 2 for all k, is defined as

(k, t 1) k , if k = 1
(k, t) =
(k, t 1)
if k = 0
according to the discernible mask and the representative attribute value k of the
superior class Xgood (see the paper for details). If the kth constraint is discernible
(i.e., k = 1), the exponent (k, t) is adjusted by the representative attribute value
(k ); otherwise, the exponent retains the same value as in the previous generation.

1.6 Discussion
1.6.1 User-Defined Parameters
Some of the proposals considered do not require from the user the definition of penalty
parameters, and can as such be considered parameterless. This is very useful for
the practitioner. However, it should be noted that essentially all proposals do embody
some fixed values that are hidden from the user and, as a result, cannot be changed.
Furthermore, all proposals involve design decisions which were madewith variable level of justificationand incorporated into the definition of the technique. It
seems natural to assume that some of those could possibly be changeda research
opportunityleading to improved results.

1.6.2 Comparative Performance


In order to test the performance of a constraint handling technique, several testproblems have been used over the years. The most popular suite of continuous
constrained optimization problems is that containing the 24 problems used for the
competition held during the 2006 IEEE Congress on Evolutionary Computation
which are described in Liang et al. (2006). Later, larger problems were considered
in another competition, held during the 2010 edition of the same conference. The
details can be found in Mallipeddi and Suganthan (2010).
It can be noticed that the claims concerning the performance of each proposal in
the papers reviewed have been deliberately omitted. This is due to several factors. One
of them is that a statistical study in order to assure a possible statistically significant
superiority of the proposed technique over others from the literature is often missing.
Another criticism is that often the claimed superiority of the proposed technique can
only be observed after the fourth or fifth significant digit of the final results, with
no consideration for the facts (i) that the original model itself may not have such
accuracy, and (ii) that the compared solutions may be indistinguishable from the
practical point of view.

1 A Critical Review of Adaptive Penalty Techniques in Evolutionary Computation

23

Another major issue that makes it impossible to rigorously assess the relative
performance of the adaptive penalty techniques (APTs) reviewed is that the final
results depend not only on the penalty technique considered but also on the search
engine (SE) adopted. The competing results often derive from incomparable arrangements such as APT-1 embedded in SE-1 (a genetic algorithm, for instance) versus
APT-2 applied to SE-2 (an evolution strategy, for instance). The results using stochastic ranking (SR) within an evolution strategy (ES) (Runarsson and Yao 2000) were
shown to outperform APM embedded in a binary-coded genetic algorithm (GA)
(Lemonge and Barbosa 2004) when applied to a standard set of benchmark constrained optimization problems in Rn . This seems to be dueat least in partto the
fact that the ES adopted performs better in this continuous domain than a standard GA.
A proper empirical assessment of the constraint handling techniques considered
(SR versus APM) should be performed by considering settings such as (SR+GA versus APM+GA) and (SR+ES versus APM+ES). An attempt to clarify this particular
question is presented by Barbosa et al. (2010b). It is clear that there is a need for
more studies of this type in order to better assess the relative merits of the proposals
reviewed here.
The standard way of assessing the relative performance of a set A of na
algorithms ai , i {1, . . . , na }, is to define a set P of np representative problems pj ,
j {1, . . . , np }, and then test all algorithms against all problems, measuring the
performance tp,a of algorithm a A when applied to problem p P.
In order to evaluate tp,a one can alternatively (i) define a meaningful goal
(say, level of objective function value) and then measure the amount of resources
(say, number of function evaluations) required by the algorithm to achieve that goal,
or (ii) fix a given amount of resources to be allocated to each algorithm and then
measure the goal attainment.
Considering that tp,a is the CPU time spent by algorithm a to reach the stated goal
in problem p a performance ratio can be defined as
rp,a =

tp,a
.
min{tp,a : a A}

(1.47)

Although each tp,a or rp,a is worth considering by itself, one would like to be able
to assess the performance of the algorithms in A on a large set of problems P in a
user-friendly graphical form. This has been achieved by Dolan and Mor (2002) who
introduced the so-called performance profiles, an analytical tool for the visualization
and interpretation of the results of benchmark experiments. For more details and an
application in the constrained optimization case, see Barbosa et al. (2010a).
One has also to consider that it is not an easy task to define a set P which is
representative of the domain of interest, as one would like P (i) to span the target
problem-space and, at the same time, (ii) to be as small as possible, in order to
alleviate the computational burden of the experiments. Furthermore, it would also
be interesting to assess the relative performance of the test-problems themselves
with respect to the solvers. Are all test-problems relevant to the final result? Are
some test-problems too easy (or too difficult) so that they do not have the ability to

24

H.J.C. Barbosa et al.

discriminate the solvers? Efforts in this direction, exploring the performance profile
concept, were attempted in Barbosa et al. (2013).

1.6.3 Implementation Issues


Although not always considered in the papers reviewed, the simplicity of the technique (both conceptually and from the implementation point of view) is relevant. It
seems quite desirable that the proposed technique could be easily implemented as
an additional module to any existing metaheuristic for unconstrained optimization
with a minimum interference with the current code. In this respect, techniques resorting to coevolution would typically require another population, an additional set of
parameters, and would lead to more interference and modifications to the original
code.

1.6.4 Extensions
It seems natural to expect that most of, if not all, the proposals reviewed here can
be easily extended to the practically important case of constrained multi-objective
optimization. Although papers presenting such extension have not been reviewed
here, it seems that there is room, and indeed a need, to explore this case.
The same can perhaps be said of the relevant case of mixed (discrete and
continuous) decision variables, as well as the more complex problem of constrained
multi-level optimization.

1.7 Conclusion
This chapter presented a review of the main adaptive penalty techniques available
for handling constraints within nature inspired metaheuristics in general and evolutionary techniques in particular. The main types of evidence taken from the search
process in order to inform the decision-making process of continuously adapting the
relevant parameters of the penalty technique have been identified.
As the different adaptive techniques have not been implemented on a single
given search engine, the existing comparative studies, which are usually based on
the final performance on a set of benchmark problems, are not very informative of the
relative performance of each penalty technique, as the results are also affected by the
different search engines adopted in each proposal. The need for better comparative
studies investigating the relative performance of the different adaptive techniques
when applied within a single search engine in larger and more representative sets of
benchmark problems are also identified.

1 A Critical Review of Adaptive Penalty Techniques in Evolutionary Computation

25

Acknowledgments The authors thank the reviewers for their comments, which helped improve
the quality of the final version, and acknowledge the support from CNPq (grants 308317/2009-2,
310778/2013-1, 300192/2012-6 and 306815/2011-7) and FAPEMIG (grant TEC 528/11).

References
Barbosa HJC, Lemonge ACC (2002) An adaptive penalty scheme in genetic algorithms for constrained optimization problems. In: Langdon WB, Cant-Paz E, Mathias KE, Roy R, Davis D,
Poli R, Balakrishnan K, Honavar V, Rudolph G, Wegener J, Bull L, Potter MA, Schultz AC,
Miller JF, Burke EK (eds) Proceedings of the genetic and evolutionary computation conference
(GECCO). Morgan Kaufmann, San Francisco
Barbosa HJC, Lemonge ACC (2003a) An adaptive penalty scheme for steady-state genetic algorithms. In: Cant-Paz E, Foster JA, Deb K, Davis LD, Roy R, OReilly U-M, Beyer H-G, Standish
R, Kendall G, Wilson S, Harman M, Wegener J, Dasgupta D, Potter MA, Schultz AC, Dowsland
KA, Jonoska N, Miller J (eds) Genetic and evolutionary computation (GECCO). Lecture Notes
in Computer Science. Springer, Berlin, pp 718729
Barbosa HJC, Lemonge ACC (2003b) A new adaptive penalty scheme for genetic algorithms. Inf
Sci 156:215251
Barbosa HJC, Lemonge ACC (2008) An adaptive penalty method for genetic algorithms in constrained optimization problems. Front Evol Robot 34:934
Barbosa HJC, Bernardino HS, Barreto AMS (2010a) Using performance profiles to analyze the
results of the 2006 CEC constrained optimization competition. In: 2010 IEEE congress on evolutionary computation (CEC), pp 18
Barbosa HJC, Lemonge ACC, Fonseca LG, Bernardino HS (2010b) Comparing two constraint
handling techniques in a binary-coded genetic algorithm for optimization problems. In: Deb K,
Bhattacharya A, Chakraborti N, Chakroborty P, Das S, Dutta J, Gupta SK, Jain A, Aggarwal V,
Branke J, Louis SJ, Tan KC (eds) Simulated evolution and learning. Lecture Notes in Computer
Science. Springer, Berlin, pp 125134
Barbosa HJC, Bernardino HS, Barreto AMS (2013) Using performance profiles for the analysis
and design of benchmark experiments. In: Di Gaspero L, Schaerf A, Stutzle T (eds) Advances in
metaheuristics. Operations Research/computer Science Interfaces Series, vol 53. Springer, New
York, pp 2136
Bean J, Alouane A (1992) A Dual Genetic Algorithm For Bounded Integer Programs. Technical Report Tr 92-53, Department of Industrial and Operations Engineering, The University of
Michigan
Beaser E, Schwartz JK, Bell CB, Solomon EI (2011) Hybrid genetic algorithm with an adaptive
penalty function for fitting multimodal experimental data: application to exchange-coupled nonKramers binuclear iron active sites. J Chem Inf Model 51(9):21642173
Coello CAC (2000) Use of a self-adaptive penalty approach for engineering optimization problems.
Comput Ind 41(2):113127
Coello CAC (2002) Theoretical and numerical constraint-handling techniques used with evolutionary algorithms: a survey of the state of the art. Comput Methods Appl Mech Eng 191
(1112):12451287
Coit DW, Smith AE, Tate DM (1996) Adaptive penalty methods for genetic optimization of constrained combinatorial problems. INFORMS J Comput 8(2):173182
Costa L, Santo IE, Oliveira P (2013) An adaptive constraint handling technique for evolutionary
algorithms. Optimization 62(2):241253
Courant R (1943) Variational methods for the solution of problems of equilibrium and vibrations.
Bull Am Math Soc 49:123

26

H.J.C. Barbosa et al.

Dolan E, Mor JJ (2002) Benchmarking optimization software with performance profiles. Math
Program 91(2):201213
Eiben AE, Smith JE (2003) Introduction to evolutionary computing. Springer, New York
Eiben AE, Jansen B, Michalewicz Z, Paechter B (2000) Solving CSPs using self-adaptive constraint
weights: how to prevent EAs from cheating. In: Whitley, LD (ed) Proceedings of the genetic and
evolutionary computation conference (GECCO). Morgan Kaufmann, San Francisco, pp 128134
Farmani R, Wright J (2003) Self-adaptive fitness formulation for constrained optimization. IEEE
Trans Evol Comput 7(5):445455
Gan M, Peng H, Peng X, Chen X, Inoussa G (2010) An adaptive decision maker for constrained
evolutionary optimization. Appl Math Comput 215(12):41724184
Gen M, Cheng R (1996) Optimal design of system reliability using interval programming and
genetic algorithms. Comput Ind Eng, (In: Proceedings of the 19th international conference on
computers and industrial engineering), vol 31(12), pp 237240
Hamida H, Schoenauer M (2000) Adaptive techniques for evolutionary topological optimum design.
In: Parmee I (ed) Proceedings of the international conference on adaptive computing in design
and manufacture (ACDM). Springer, Devon, pp 123136
Hamida S, Schoenauer M (2002) ASCHEA: new results using adaptive segregational constraint
handling. In: Proceedings of the IEEE service center congress on evolutionary computation
(CEC), vol 1. Piscataway, New Jersey, pp 884889
Harrell LJ, Ranjithan SR (1999) Evaluation of alternative penalty function implementations in a
watershed management design problem. In: Proceedings of the genetic and evolutionary computation conference (GECCO), vol 2. Morgan Kaufmann, pp 15511558
He Q, Wang L (2007) An effective co-evolutionary particle swarm optimization for constrained
engineering design problems. Eng Appl Artif Intell 20(1):8999
He Q, Wang L, zhuo Huang F (2008) Nonlinear constrained optimization by enhanced coevolutionary PSO. In: IEEE congress on evolutionary computation, CEC 2008. (IEEE World
Congress on Computational Intelligence), pp 8389
Hughes T (1987) The finite element method: linear static and dynamic finite element analysis.
Prentice Hall Inc, New Jersey
Koziel S, Michalewicz Z (1998) A decoder-based evolutionary algorithm for constrained parameter
optimization problems. In: Eiben A, Bck T, Schoenauer M, Schwefel H-P (eds) Parallel problem
solving from nature (PPSN). LNCS, vol 1498. Springer, Berlin, pp 231240
Krempser E, Bernardino H, Barbosa H, Lemonge A (2012) Differential evolution assisted by surrogate models for structural optimization problems. In: Proceedings of the international conference
on computational structures technology (CST). Civil-Comp Press, p 49
Lemonge ACC, Barbosa HJC (2004) An adaptive penalty scheme for genetic algorithms in structural
optimization. Int J Numer Methods Eng 59(5):703736
Lemonge ACC, Barbosa HJC, Bernardino HS (2012) A family of adaptive penalty schemes for
steady-state genetic algorithms. In: 2012 IEEE congress on evolutionary computation (CEC).
IEEE, pp 18
Liang J, Runarsson TP, Mezura-Montes E, Clerc M, Suganthan P, Coello CC, Deb K (2006) Problem
definitions and evaluation criteria for the CEC 2006 special session on constrained real-parameter
optimization. Technical report, Nanyang Technological University, Singapore
Lin C-H (2013) A rough penalty genetic algorithm for constrained optimization. Inf Sci 241:
119137
Lin C-Y, Wu W-H (2004) Self-organizing adaptive penalty strategy in constrained genetic search.
Struct Multidiscip Optim 26(6):417428
Luenberger DG, Ye Y (2008) Linear and nonlinear programming. Springer, New York
Mallipeddi R, Suganthan PN (2010) Problem definitions and evaluation criteria for the CEC 2010
competition on constrained real-parameter optimization. Technical report, Nanyang Technological University, Singapore
Mezura-Montes E, Coello CAC (2011) Constraint-handling in nature-inspired numerical optimization: past, present and future. Swarm Evol Comput 1(4):173194

1 A Critical Review of Adaptive Penalty Techniques in Evolutionary Computation

27

Michalewicz Z (1995) A survey of constraint handling techniques in evolutionary computation


methods. In: Proceedings of the 4th annual conference on evolutionary programming. MIT Press,
pp 135155
Michalewicz Z, Schoenauer M (1996) Evolutionary algorithms for constrained parameter optimization problems. Evol Comput 4(1):132
Montemurro M, Vincenti A, Vannucci P (2013) The automatic dynamic penalisation method (ADP)
for handling constraints with genetic algorithms. Comput Methods Appl Mech Eng 256:7087
Nanakorn P, Meesomklin K (2001) An adaptive penalty function in genetic algorithms for structural
design optimization. Comput Struct 79(2930):25272539
Puzzi S, Carpinteri A (2008) A double-multiplicative dynamic penalty approach for constrained
evolutionary optimization. Struct Multidiscip Optim 35(5):431445
Rasheed K (1998) An adaptive penalty approach for constrained genetic-algorithm optimization.
In: Koza J, Banzhaf W, Chellapilla K, Deb K, Dorigo M, Fogel D, Garzon M, Goldberg D,
Iba H, Riolo R (eds) Proceedings of the third annual genetic programming conference. Morgan
Kaufmann, San Francisco, pp 584590
Richardson JT, Palmer MR, Liepins GE, Hilliard M (1989) Some guidelines for genetic algorithms
with penalty functions. In: Proceedings of the international conference on genetic algorithms.
Morgan Kaufmann, San Francisco, pp 191197
Rocha AMAC, Fernandes EMDGP (2009) Self-adaptive penalties in the electromagnetism-like
algorithm for constrained global optimization problems. In: Proceedings of the 8th world congress
on structural and multidisciplinary optimization, Lisbon, Portugal
Runarsson T, Yao X (2000) Stochastic ranking for constrained evolutionary optimization. IEEE
Trans Evol Comput 4(3):284294
Salcedo-Sanz S (2009) A survey of repair methods used as constraint handling techniques in evolutionary algorithms. Comput Sci Rev 3(3):175192
Schoenauer M, Michalewicz Z (1996) Evolutionary computation at the edge of feasibility. In:
Proceedings of parallel problem solving from nature (PPSN). LNCS, Springer, pp 245254
Tessema B, Yen GG (2006) A self adaptive penalty function based algorithm for constrained optimization. In: IEEE congress on evolutionary computation, CEC 2006. IEEE, pp 246253
Tessema B, Yen G (2009) An adaptive penalty formulation for constrained evolutionary optimization. IEEE Trans Syst, Man Cybern, Part A: Syst Hum 39(3):565578
Vincenti A, Ahmadian MR, Vannucci P (2010) BIANCA: a genetic algorithm to solve hard combinatorial optimisation problems in engineering. J Glob Optim 48(3):399421
Wang Y, Cai Z, Zhou Y, Fan Z (2009) Constrained optimization based on hybrid evolutionary
algorithm and adaptive constraint-handling technique. Struct Multidiscip Optim 37(4):395413
Wu B, Yu X, Liu L (2001) Fuzzy penalty function approach for constrained function optimization with evolutionary algorithms. In: Proceedings of the 8th international conference on neural
information processing. Citeseer, pp 299304
Wu W-H, Lin C-Y (2004) The second generation of self-organizing adaptive penalty strategy for
constrained genetic search. Adv Eng Softw 35(12):815825
Yokota T, Gen M, Ida K, Taguchi T (1995) Optimal design of system reliability by an improved
genetic algorithm. Trans Inst Electron Inf Comput Eng J78-A(6):702709 (in Japanese)

Chapter 2

Ruggedness Quantifying for Constrained


Continuous Fitness Landscapes
Shayan Poursoltan and Frank Neumann

Abstract Constrained optimization problems appear frequently in important


real-world applications. In this chapter, we study algorithms for constrained
optimiation problems from a theoretical perspective. Our goal is to understand
how the fitness landscape influences the success of certain types of algorithms.
One important feature for analyzing and classifying fitness landscape is its ruggedness. It is generally assumed that rugged landscapes make the optimization process by
bio-inspired computing methods much harder than smoothed landscapes, which give
clear hints toward an optimal solution. We introduce different methods for quantifying the ruggedness of a given constrained optimization problem. They, in particular,
take into account how to deal with infeasible regions in the underlying search space.
Keywords Constrained optimization
scapes Ruggedness

Continuous optimization

Fitness land-

2.1 Introduction
Constrained optimization problems (COP)s, especially nonlinear ones, are important
and widespread in many real-world applications such as chemical engineering, VLSI
chip design, and structural design (Floudas and Pardalos 1990). Various algorithmic
approaches have been introduced to tackle constrained optimization problems. The
major component of these optimization algorithms is devoted to the handling of the
involved constraints.

S. Poursoltan (B) F. Neumann


Optimisation and Logistics, School of Computer Science, University of Adelaide,
Adelaide, SA 5005, Australia
e-mail: shayan.poursoltan@adelaide.edu.au
F. Neumann
e-mail: frank.neumann@adelaide.edu.au
Springer India 2015
R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,
Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_2

29

30

S. Poursoltan and F. Neumann

Different types of evolutionary algorithms such as evolutionary strategies


(Schwefel 1993), differential evolution (Storn and Price 1997), and particle swarm
optimization (PSO) (Eberhart and Kennedy 1995) have been applied to constrained
continuous optimization problems. Constraint handling mechanisms that are
frequently used include penalty functions, decoder-based methods, and special operators that separate the treatment of the objective function and the constraints. We
refer the reader to Mezura-Montes and Coello Coello (2011) for an overview of the
different types of methods. Among the various types of optimization algorithms,
penalty methods are well known as one of the most successful and popular approaches
for dealing with constraints. They penalize the violation of constraints by adding
penalty values to the fitness value of a given solution. Effectively, this transforms the
constrained problem into an unconstrained one. Turning constrained optimization
problems into unconstrained ones by using penalty functions makes the problem
easily accessible to a wide range of methods for unconstrained optimization and can
be regarded as one of the major reasons for the popularity of penalty functions.
There are a wide range of optimization algorithms for constrained continuous optimization problems and their performances are usually evaluated based on the results
of popular benchmark problems (Liang et al. 2006; Mallipeddi and Suganthan 2010).
These benchmark problems are designed to impose different types of difficulties for
optimization algorithms. As evolutionary algorithms make heavy use of random decisions, it is hard to understand the behavior of these algorithms from an analytical
perspective. More importantly, it is hard to predict which algorithm would perform the
best for a newly given real-world optimization problem. Mersmann et al. (2011) have
proposed the following steps to select the best possible algorithm from a given suite
of algorithms. First, one has to extract important problem properties from the class of
problems under investigation. Secondly, it is necessary to analyze the performance
of different algorithms based on the problem properties and build a prediction model
that allows to select the best possible algorithm based on problem characteristics.
There are various problem properties associated with the fitness landscape. In
other words, analyzing the fitness landscape helps us to classify them with related
characteristics that make problems easy or hard to solve by certain types of algorithms. In recent years, fitness landscape analysis has become very popular to describe
the characteristics of optimization problems. Important attributes that are associated
with fitness landscapes and that impact the optimization process of evolutionary
algorithms include the smoothness, multi-modality, feasibility rate, and variable
separability of the landscape and the considered problem (Naudts and Kallel 2000).
Among several characteristics associated with fitness landscapes, the notion of
fitness landscape ruggedness plays a vital role in determining the problem difficulty.
If the objective function is unsteady and goes up and down frequently, choosing
the right direction to continue becomes difficult for many solvers. Since ruggedness
and problem difficulty are closely related to each other, many studies have been
conducted to analyze this feature. For discrete landscapes, one important approach
is to consider autocorrelations by calculating the correlation of fitness values of

2 Ruggedness Quantifying for Constrained Continuous Fitness Landscapes

31

search points that are visited by a random walk on the landscape (Weinberger 1990).
Furthermore, there have been many studies that extend the basic autocorrelation
approach to provide additional insights into fitness landscapes (Box et al. 2013;
Hordijk 1996). One of the drawbacks of using autocorrelation by these statistical
analysis techniques is that the calculated value is a vague notion that does not clearly
reflect the landscape ruggedness. Thus, Vassilev proposed a new technique based on
the assumption that each landscape is an ensemble of different objects (the nodes
seen by a random walk on the fitness landscape), which can be grouped by their form,
size, and distribution (Vassilev et al. 2000). Vasillevs approach was applicable to
discrete problems. For real parameter landscapes, Malan and Engelbrecht (2009) used
Vassilevs information theoretic analysis to measure the fitness landscape ruggedness
in the continuous domain. So far, these landscape analysis techniques have been
conducted only for unconstrained or discrete problems. Measuring the landscape
ruggedness for constrained continuous problems imposes additional challenges and
we will propose how to tackle them in this chapter.
We propose an approach to measure the fitness landscape ruggedness of constrained continuous optimization problems. The quantification of ruggedness
combined with other analytical problem characteristics can help to build an algorithm
selection model based on the relation of different algorithms and problem properties.
This chapter includes a methodology for quantifying fitness landscape ruggedness
of constrained continuous problems. In order to do this, we extend Malans approach
to quantify the fitness landscape ruggedness of constrained continuous problems.
The information obtained by using simple random walks on constrained problems
landscape is not useful enough since it is mostly related to infeasible areas that are
unlikely to be seen by the solver. To cope with constraints in nearly infeasible problems, our approach replaces Malans random walk with a biased one. The obtained
samples are used to quantify the ruggedness of landscapes using the approach of
Vassilev et al. (2000). We evaluate our approach on well-known benchmarks taken
from the recent CEC competitions (Mallipeddi and Suganthan 2010) and discuss
the benefits and drawbacks of our new approach.
The remainder of this chapter is organized as follows: In Sect. 2.2, we introduce
constrained continuous optimization and discuss approaches that have been used to
analyze the ruggedness of unconstrained fitness landscapes. We present our approach
for quantifying ruggedness of constrained continuous fitness landscapes in Sect. 2.3
and the results of our experimental investigations in Sect. 2.4. Finally, we end our
research with some concluding remarks.

2.2 Preliminaries
In this section, we introduce basic notations and summarize the previous works on
measuring the ruggedness of fitness landscapes.

32

S. Poursoltan and F. Neumann

2.2.1 Constrained Continuous Optimization Problem


Constrained continuous optimization problems are optimization problems where a
function on real-valued variables should be optimized with respect to a given set of
constraints. Constraints are usually given by a set of inequalities and/or equalities.
Without loss of generality, we present our approach for minimization problems.
Formally, we consider single-objective functions f : S R, with S Rn . The
constraints impose a feasible subset F S of the search space S and the goal is to
find an element x S F that minimizes f.
We consider problems of the following form:
Minimize

f (x), x = (x1 , . . . , xn ) Rn

(2.1)

such that x S F.
The feasible region F S of the search space S is defined as
li x i u i , 1 i n

(2.2)

where li and u i are lower and upper bounds on the variable xi , 1 i n. Additional
constraints are given by the functions
gi (x) 0, 1 i q,
h i (x) = 0, q + 1 i p
In order to work with iterative optimization algorithms for these problems, it is
common to relax the equality constraints
h i (x) = 0, q + 1 i p
to
|h i (x)| , q + 1 i p

(2.3)

where is a very small positive value that determines how much the original
constraints can be violated. In our experimental study, we work with = 0.0001
which is the same setting as used in Mallipeddi and Suganthan (2010).

2.2.2 Fitness Landscape Ruggedness Analysis


Using the Entropy Measure
A fitness landscape see Stadler et al. (1995) is given by a search space S, a fitness
function f : S R which assigns a value f (s) to each search point s S, and a

2 Ruggedness Quantifying for Constrained Continuous Fitness Landscapes

33

function : S 2 S that assigns to each search point s, a set (s) S of search


points. The elements in (s) are called the neighbors of s.
Various techniques have been used for the statistical analysis of fitness landscapes.
Popular techniques measure the correlation of the search points visited by a random
walk algorithm (Lipsitch 1991; Manderick et al. 1991; Weinberger 1990). However,
it has been shown that this information is very basic and not very useful to reflect
problem difficulty (Mattfeld et al. 1999). Vassilev et al. (2000) conducted an information theoretic approach to quantify fitness landscape ruggedness. The difference
between Vassilevs and the previous approaches is that his technique focuses on
the relation between ruggedness and neutrality of the problem landscape. Vassilevs
method performs a random walk on a fitness landscape to generate a sequence of
fitness values { f t }nt=0 . This random walk starts from a random position on a discrete
landscape and moves to its neighbor using bit flips. The aim of this method is to
extract a ensemble of objects from a sequence of fitness values. These objects can
be classified into three categories:
Flat objects: The fitness value of each point is similar to its two visited neighbors
(predecessor and successor).
Isolated objects: Each point has higher or lower fitness value compared to its two
neighbors.
Points that do not belong to the former two groups.
The aim of the approach is to extract the ensemble of objects mentioned above from
the values in a sequence of fitness values. The following function represents the time
series as a set of objects. The ensemble is defined as a string S() = (s1 s2 s3 . . . sn )
0, 1} given by
with si {1,

1,
si = ft (i, ) = 0,

1,

if f i f i1 <
if | f i f i1 |
if f i f i1 >

(2.4)

where the parameter is the real positive number that represents the accuracy of
the calculation of the string S(). According to the function, if = 0 then the
function will be sensitive to the differences in adjacent points. It can be observed
that increasing the value of reduces the sensitivity of the function. Therefore, if the
value of equals the difference of the highest and lowest points in the walk, then the
fitness sequence will only consist of zeros.
To measure the ruggedness, the entropy of the string S() is calculated as follows:
H (S()) =

P[ pq]log 6 P[ pq]

(2.5)

p=q

where pq is a substring of the string S() consisting of two elements. Furthermore,


H (S()) is the information content, which is an estimation of the variety of different shapes within the string of S(). This measurement is used to characterize the

34

S. Poursoltan and F. Neumann

landscape ruggedness with respect to the flat areas where neutrality is present. P[ pq]
refers to the frequency of the blocks where p and q have different values ( p = q):
P[ pq] =

n [ pq]
n

(2.6)

In other words, in order to measure ruggedness with respect to neutrality, it is


necessary to include the rugged block in our estimation ( p = q). Thus, sub-blocks
with two similar elements are excluded in this function (case p = q). The formula calculates the frequencies of sub-blocks with different symbols. As discussed
above, since there are six different possibilities of rugged sub-blocks in the string
(according to Table 2.1), the logarithm base is set to 6. The different possibilities of
rugged objects are considered as isolated areas where each point has different values.
Tables 2.1 and 2.2 show different possibilities of rugged and flat sub-blocks of pq in
the string of S().
As discussed earlier, the variable controls the sensitivity of the function
(see Eq. 2.4). It can be observed that greater values for lead to more neutrality in
the measurement. It is suggested that using smaller values of makes the behaviour of
H (S()) significant for characterising the ruggedness with respect to the landscape
neutrality (Vassilev et al. 2003). Therefore, for comparing various problems with
different fitness ranges, the smaller values of are used for H (S()). The values of
used in Malan and Engelbrecht (2009) are:
= 2k (k = 1, 2, . . . , 8).

(2.7)

in which, is the smallest value that generates all sub-blocks as zeros and consequently the landscape becomes flat. Also, k is considered 18 to calculate smaller
values for s. Note that the parameter can be calculated as the difference in the
highest and lowest fitness that has been found in the random walk.
An entropic measure H (S()) requires a sequence of search points S(). In order
to generate a set of time series, a simple random walk on a landscape path can be
used (see Algorithm 1).
The above method was used for measuring the ruggedness of discrete problems.
The major issue of using this approach for continuous problems is that (unlike the
discrete problems) it is not possible to generate or access all possible neighbors of the
Table 2.1 Various sub-blocks in Si considered as rugged objects

10
11
Sub-block
01
10
Object type

Rugged

Rugged

Rugged

Rugged

11

01

Rugged

Rugged

Object figure
Table 2.2 Various
sub-blocks in Si considered
as flat objects

Sub-block

00

1 1

11

Object type

Flat

Flat

Flat

Object figure

2 Ruggedness Quantifying for Constrained Continuous Fitness Landscapes


1.
2.
3.
4.

35

Choose a random place within the bound as a starting the point


Generate all the neighbors of the chosen point using permutation
Choose one neighbor randomly and save its value
Go back to step 2

Algorithm 1: Random walk


1. Input: Problem domain (domain), number of the dimensions (dimension)
and number of steps (MaxStepNumber) for the walk
2. Calculate the maximum step size
Range of the problem domain
100
Set counter = 0 and create an array steps to save the steps in the walk
Assign a random position to steps[0] within the boundaries of the problem
Repeat
For every dimension i of the problem
currentStep = random(0,MaxStepSize);
steps(counter) = steps(counter-1)+currentStep;
MaxStepSize =

3.
4.
5.
6.
7.
8.
9.
10.
If steps(counter) > boundaries
11.
steps(counter) = steps(counter-1)-(Range of the problem domain);
12.
Endif
13. Endfor
14. Until (counter < MaxStepNumber)

Algorithm 2: Random increasing walk algorithm

visited individual. Thus, Malan and Engelbrecht (2009) modified the approach to use
it for unconstrained continuous problems. The proposed approach adopts a random
increasing walk which increases the step size over time. Furthermore, the step size
is decreased if the algorithm produces a solution that is not within the boundaries
given by the constraints. The algorithm for the random increasing walk proposed in
Malan and Engelbrecht (2009) is given in Algorithm 2. Here, we assume that the
variable range is the same for all dimensions, which implies that the maximum step
size is the same for all dimensions. The algorithm can be easily adjusted to problems
with different variable ranges by using a maximum step size for each variable.

2.3 Ruggedness Quantification for Constrained


Continuous Optimization
In this section, we present a new approach for quantifying the ruggedness of a fitness
landscape of a constrained continuous optimization problem. Since we are working
on constrained optimization problems, dealing with infeasible areas is the important

36

S. Poursoltan and F. Neumann

and challenging part. Often in these problems, the infeasibility rate is high and it
might be even very hard to find one feasible solution. This implies that random walk
methods are usually not very helpful as they would produce infeasible solutions most
of the time. Most constraint handling methods direct the search process to feasible
regions of the search space and therefore often allow to optimize in the feasible
region of the search space, which might be a very small proportion of the size of the
overall space.

2.3.1 Ruggedness Quantification


In the following, we discuss the drawbacks of applying the previous approaches
for ruggedness quantification when dealing with constrained continuous optimization problems. Later, we explain the solution to these issues by following our new
approach. As mentioned in the previous section, random walk algorithms have been
used to measure the ruggedness of fitness landscapes. However, random walk algorithms are often not useful when it comes to constrained optimization problems. We
discuss the different reasons below.
A random walk algorithm is not accurate enough to reflect the fitness landscape as
a whole, which is already true for unconstrained optimization, but becomes even more
evident when dealing with constrained problems. Random walk algorithms cannot
discriminate accurately between two different search spaces (feasible and infeasible
space) since they do not make decisions based on the fitness values. Experiments show
that the statistics obtained by random walks on landscapes are biased to areas with
low fitness (Smith et al. 2002). Hence, various landscapes with different high fitness
value areas and the same low areas generate similar data for walks and, consequently,
the obtained ruggedness measures are within the same range when using previous
methodologies. To address this issue, we introduce methods that take into account
the individual fitness values in the sampling process. Using this method forces the
algorithm to explore higher fitness values in landscape, which is more interesting for
optimization algorithms. Therefore, the calculated fitness landscape ruggedness is
more interesting as it reflects the landscape structure in regions of the search space
that are crucial for optimization.
The chance of finding even a few feasible individuals when using random walk
algorithms is likely to be very low for highly infeasible landscapes. Since the majority of constrained optimization problems are nearly infeasible, it is more likely to
have more infeasible individuals when using a random walk to explore the landscape. Optimization algorithms prefer to move and search in feasible regions. In
order to solve this problem, the sampling method for exploring fitness landscapes
of constraineds optimization problems needs to move toward feasible areas in the
search space. Our remedy for this issue is that we introduce methods that have the
ability to distinguish between feasible and infeasible individuals when choosing the
next step in the walk. Our method is flexible and can be tuned such that the walk
contains more or less feasible individuals in it.

2 Ruggedness Quantifying for Constrained Continuous Fitness Landscapes

37

2.3.2 Biased Sampling Using Evolution Strategies


We use a biased walk in our approach to quantify the ruggedness of a constrained
problem fitness landscape. Considering the fitness values of individuals in the
sampling process improves the reliability of the calculated measure. Our biased
walk is using a simple evolution strategy (Schwefel 1993). Since the adjacent steps
in the walk should be different, we use a (,)-ES. This means that the selection
is performed among the offspring and their parents are excluded from the new
generation.
In the (,)-ES, each individual (both parents and offspring) is a vector (xi ,i )
consisting of the coordinates of the search point and the step sizes for the different
coordinates. The initial population is generated by choosing solutions uniformly
at random from the search space and the initial step size of variable j in individual
i is given as
xi, j
(0)
i, j =
n
in which i, j refers to the jth component of vector i and xi, j is the difference in
upper and lower bounds on i, j (Schwefel 1993). It is noteworthy that the calculated
strategy parameters for each generation are used in the next generation. The step
sizes for each generation are as follows:

i j (t + 1) = i j (t)e

where =

1
2n

N (0,1)+ N j (0,1)

and = 1 are learning rates and N (0, 1) is normally distributed


2 n

random variable and N j (0, 1) denotes that there is a new value for each component
of .
By calculating the next generation strategy parameters (as above), each parent
produces new individuals as

x h, j = xi, j (t) + N j (0, 1)h, j (t)


where h {1, . . . , } and i {1, . . . , }. The pseudo-code for (,)-ES is shown
in Algorithm 3. In this chapter, we use = 1, i.e., a (1,)-ES. This implies that each
search point in the sequence we are generating is an offspring of the previous point
in this sequence.

2.3.3 Dealing with Infeasible Areas


Among all categories of constraint handling methods, it has been shown that penalty
methods in general have good performance (Mallipeddi and Suganthan 2010). Some
methods calculate the constraint violation as a sum of violation of all constraints and
integrate them into the objective function.

38

S. Poursoltan and F. Neumann


1. Initialize the strategy parameters, set generationCounter = 0
2. Initialize and create the population of solution of x using uniform n
dimensional probability distribution on problem search space ( individuals)
3. Evaluate the fitness of population
4. Repeat
5. Generate offspring using Eqs. 3.2 and 3.2 (mutation)
6. Evaluate the fitness of offspring
7. Apply the selection process to select from offspring individuals for next
generation (selection).
8. generationCounter = generationCounter +1
9. Until stopping condition is true

Algorithm 3: (,)-ES used as biased walking

When integrating constraint violations into the objective function, the main problem is to choose an appropriate penalty coefficient that determines how strongly the
constraint violation influences the objective value. There are also penalty methods
that use the constraint violation and objective functions separately. In this case, they
optimize the constraint violation and objective function in lexicographic order so
that the main goal is to obtain a feasible solution.
As discussed earlier, to deal with nearly infeasible problems, there is a need to use
a walk with the ability to distinguish between feasible and infeasible individuals. We
choose the stochastic ranking method proposed by Runarsson and Yao (2000) as our
constraint handling mechanism to sample and collect individuals for the time series
S(). It has been observed that there should be a balance between accepting infeasible
individuals and preserving feasible ones. Hence, neither over- nor under-penalizing
infeasible solution is a proper choice as constraint handling method (Gen and Cheng
2000). It is worth noting that all penalty methods try to adjust the balance between the
objective and the penalty function. The proposed stochastic ranking method adjusts
this balance in a direct way. By using this method, the walk is directed toward feasible
areas of the search space.
The stochastic ranking method is used to rank offspring in the evolutionary
strategy discussed earlier (see Algorithm 4). Ranking is achieved by comparing
adjacent individuals in at least sweeps. Ranking is terminated once no change
occurs during a whole sweep. To determine the balance of offspring selection, the
probability of P f is introduced in Runarsson and Yao (2000). In other words, P f
is the probability of comparing two adjacent individuals based on their objective
function. It is obvious that if two comparing individuals are feasible, then P f is 1.

2.3.4 Ruggedness Quantifying Method Using Constraint


Handling Biased Walk
We already explained how we use a biased walk that can distinguish between feasible
and infeasible individuals. In order to obtain more interesting individuals, we need

2 Ruggedness Quantifying for Constrained Continuous Fitness Landscapes


1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.

39

Initialize probability of P f
I j = {1, . . . , }
For i starts at 1, i< N , increment i
For j starts at 1, h< -1, increment j
Generate a random number (U ) in the range of (0,1)
If ((I j ) = (I j+1 ) = 0) or (U < P f )
If f (I j ) > f (I j+1 )
swap(I j , I j+1 )
End if
else
If (I j ) > (I j+1 )
swap(I j , I j+1 )
End if
End if
End for
Break if no changes occurred within a complete sweep
End for

Algorithm 4: Stochastic ranking for dealing with infeasible areas. N is the number of sweeps needed for the whole population, is the number of individuals
that are ranked by at least sweeps and is a real-valued function that imposes
penalty
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.

Parameter setting: P f = 0.4, MaxStepNumber=5000


Set counter=0 and create and array of steps to save the steps in the walk
Repeat
Produce new individuals using evolutionary strategy (ES) in algorithm es
Rank generated offspring by employing stochastic ranking method in Algorithm 4
Save the highest ranking individual fitness (infeasible/feasible) in array
of steps[counter]
counter = counter +1
Until (counter < MaxStepNumber)
Set = max(steps[]) min(steps[])
Generate ensemble of objects (Eq. 4)
Calculate the entropic measure H (S()) (Eq. 5)

Algorithm 5: Ruggedness quantifying for constrained continues fitness landscape problem

to use a biased walk that moves through good regions of the fitness landscape. It is
necessary to have feasible solution within the walk steps in order to obtain an effective ruggedness measure. Therefore, our approach uses a biased walk by constraint
handling methods, which makes it possible to have feasible individuals in the path.
In the algorithm, the individuals that are found by the simple evolutionary strategy
are ranked by the stochastic ranking method. Later, the highest rank individual is
selected for the step walk. The pseudo-code of our methodology to quantify the
ruggedness of constrained continuous fitness landscapes is given in Algorithm 5.

40

S. Poursoltan and F. Neumann

As mentioned in the previous section, P f controls the probability of comparing


two adjacent individual x and y based on their objective function. According to
Runarsson and Yao (2000), the probability of winning for x is given by
Pw = P f P f + P (1 P f )

(2.8)

where P f is the probability that individual x wins when x and y are compared
according to their objective function value, and P is the probability that x wins
when they are compared according to the penalty function.
As discussed in Sect. 2.3.1, the walking algorithm should consider both feasible
and infeasible areas. Thus, P f determines whether the comparison is based on the
objective or the penalty function. Of course, the impact of this parameter setting
depends on the fitness landscape under investigation. By adjusting the parameter
P f , we can control the number of feasible or infeasible individuals in the walk and,
consequently, the calculated ruggedness measure is more likely based on the feasible
or infeasible regions.

2.4 Experimental Studies


In this part, we describe experimental studies to evaluate our approach for measuring
the ruggedness of a constrained continuous fitness landscape. We carry out experimental investigations on two different types of problems. The first consists of a
constrained version of the classical Sphere function. Imposing constraints that lead
to different infeasible areas, we examine our approach with respect to the number
of feasible solutions obtained during the run of the algorithm and compare it to the
other approaches outlined in Sect. 2.2.2. Then, we examine our approach on different
benchmark functions taken from the special session on single objective constrained
real parameter optimization (Mallipeddi and Suganthan 2010) at CEC 2010.

2.4.1 Constrained Sphere Function


To investigate the proposed method, we first consider the following constrained
version of the two-dimensional classical Sphere function:
min Sphere(x) =

n


xi2

5.12 xi 5.12

i=1

subject to g(x) 0
where g(x) imposes the constraints of the two-dimensional sphere function. We
construct three different problems that differ from each other by using each of the

2 Ruggedness Quantifying for Constrained Continuous Fitness Landscapes

41

following constraints:
n
g1 (x) = 10(i=1
| cos3 (xi 40)|) 4,
n
g2 (x) = 10(i=1 | cos3 (xi 40)|) 8,
n
| cos3 (xi 40)|) 12
g3 (x) = 10( i=1
In this experiment different optimization problems (Sphereg1 , Sphereg2 , Sphereg3 )
have low, medium, and high feasibility rate. In this experiment, we consider twodimensional sphere function to analyze the results more accurately. Figures 2.1
and 2.2 show the feasible areas in these three functions (n = 2).
We apply and compare the random increasing walk (see Algorithm 2) with our
methodology on these problems with different feasibility rates. In this experiment,
we use (1,7)-ES algorithm and P f = 0.4 that means the ES has a tendency to focus
on feasible solutions. We performed 20 independent runs consisting of 1,000 steps
each and for each problem the percentage of feasible solutions is represented in
Table 2.3.
Due to the stochastic nature of evolutionary optimization, the above test is repeated
20 times and the two-tail t-test significance is performed. In all tests, the significant
level is assigned as 0.05. The p-values for each function are represented in Table 2.4.
The results show that the difference in means are significant and less than 0.05.
Clearly, our methodology is less influenced by increasing the infeasibility rate of
the problem. Also, comparing both walks shows that using our biased walk is more
likely to obtain feasible individuals (steps) in the walk, (see Table 2.3). The standard

Fig. 2.1 Two-dimensional constrained sphere function using the functions

42

S. Poursoltan and F. Neumann

Fig. 2.2 Two-dimensional space of the constrained sphere functions with infeasible areas marked
white: a sphereg1 , b sphereg2 , c sphereg3 having low, medium, and high infeasibility rate
Table 2.3 Percentage of feasible individuals in the walks
Sphereg1
Sphereg2
Random increasing walk
Biased walk

71.3
75.8

55.8
68.1

Sphereg3
28.7
48.7

Table 2.4 p-values for significance of a difference between two means for running random increasing and biased walk over three functions
Sphereg1
Sphereg2
Sphereg3
p-value

0.0043

7.0834E 06

9.4817E 06

deviations of feasible individuals in both walks are shown in Fig. 2.3. It is clear that
the standard deviation of feasible individuals is higher for random walks.
Thus, the obtained ruggedness measure is related to the feasible parts, which is
more likely to be seen by the solver.

2 Ruggedness Quantifying for Constrained Continuous Fitness Landscapes

43

Fig. 2.3 Standard deviation for average percentage of feasible individuals in walks using random
increasing and biased walks

2.4.2 CEC Benchmark Problems


Also, we investigate our new method on benchmark problems from CEC 2010
competition (Mallipeddi and Suganthan 2010). First, we compare our method with
random increasing walk in terms of number of feasible individuals (steps) in the
walk. In order to this, we use (1,7)-ES in this experiment and P f is considered as
0.4, which forces the walk toward feasible areas (see Eq. 2.8). We calculate the number of feasible steps (individuals) taken by the walking algorithm within 5,000 steps
for nearly infeasible problems. Figure 2.4 shows the results of 30 independent runs
on CEC problems. It can be observed that for nearly infeasible problems (Mallipeddi
and Suganthan 2010), our method performs better to include more feasible individual
in the steps (see Fig. 2.4).
Also, to test the ability of our methodology in ruggedness quantifying, we used
different CEC benchmark problems with D = 10. To quantify the ruggedness, we
calculate the entropic measure H (S()) for different values of (Eq. 2.7). Table 2.5
shows our experimental results. The results indicate the mean value of H (S())s
for different values of s over 30 runs. Based on Malan and Engelbrecht (2009),
the ruggedness feature of problem is considered as the maximum value of H (S())
among all different s. These numbers are values describing the ruggedness of each
problem fitness landscape with respect to neutrality. Also, the standard deviation for
different s is shown in Table 2.6.

44

S. Poursoltan and F. Neumann

Fig. 2.4 Percentage of feasible individuals in walks for nearly infeasible CEC benchmark problems

Table 2.5 Ruggedness results for functions in CEC 2010 benchmarks (10D)

Function (10D) 2
4
8
16
32
64
128
256
C01
C02
C03
C06
C07
C09
C10
C17
C18

0
0
0
0
0
0
0
0
0

0.001
0.001
0.000
0.006
0.001
0.001
0.002
0.002
0.001

0.005
0.003
0.001
0.010
0.004
0.002
0.002
0.003
0.002

0.013
0.004
0.004
0.012
0.006
0.003
0.003
0.005
0.003

0.024
0.006
0.009
0.014
0.007
0.005
0.004
0.008
0.004

0.035
0.010
0.011
0.018
0.009
0.006
0.006
0.013
0.007

0.060
0.015
0.014
0.023
0.012
0.009
0.007
0.015
0.009

0.102
0.023
0.014
0.035
0.013
0.012
0.01
0.011
0.012

Ruggedness

0.153
0.035
0.013
0.027
0.015
0.014
0.012
0.019
0.017

0.153
0.035
0.014
0.027
0.015
0.014
0.012
0.019
0.017

The values for different s are mean values in 30 independent runs

Table 2.6 Standard deviation values for different s in 30 independent runs

Function
2
4
8
16
32
64
(10D)
STD STD
STD
STD
STD
STD
STD
C01
C02
C03
C06
C07
C09
C10
C17
C18

0
0
0
0
0
0
0
0
0

0.002
0.002
0.000
0.013
0.001
0.001
0.001
0.002
0.001

0.005
0.003
0.000
0.016
0.002
0.001
0.001
0.002
0.001

0.006
0.003
0.000
0.016
0.003
0.002
0.002
0.005
0.002

0.009
0.005
0.001
0.017
0.004
0.002
0.003
0.011
0.002

0.0160
0.008
0.002
0.019
0.006
0.004
0.004
0.022
0.004

0.028
0.0140
0.003
0.024
0.007
0.006
0.005
0.041
0.004

128

256

STD

STD

0.044
0.022
0.004
0.035
0.009
0.010
0.007
0.008
0.006

0.058
0.035
0.009
0.028
0.009
0.011
0.009
0.009
0.010

2 Ruggedness Quantifying for Constrained Continuous Fitness Landscapes

45

To interpret this table, it is convenient to classify the problems based on their


objective functions. Problems C17 and C18 are similar according to their objective
functions and present close values for their ruggedness. For problems C03, C07,
C09, and C10 (with the same objective function), the ruggedness measure is in the
same range. C02 and C06 with the same objective function have different ruggedness
measures compared to C01, which has the largest value in ruggedness. Therefore, it
can be concluded that it is more likely that similar problems have similar ruggedness
measures. Based on the table, we can conclude that C01 is more rugged than other
categories.

2.5 Conclusions
In this chapter, we have reviewed the literature on measuring ruggedness of fitness
landscapes and discussed the drawbacks of the current methods when dealing with
constrained problems. In order to address constrained continuous optimization problems, we have presented a new technique to quantify the ruggedness of constrained
continuous problem landscapes. The modification is based on replacing the random
sampling data by a biased walk using a (1,)-evolution strategy, which can distinguish the feasible and infeasible individuals. We evaluated our approach on different
benchmark functions and show that it produces more feasible solutions during its
run. Furthermore, we evaluated our method on CEC 2010 benchmark problems and
discussed the results.

Appendix
The experimented benchmark functions described in Mallipeddi and Suganthan
(2010) are summarised here. In this experiment is considered as 0.0001.

C01
Minimize


D
 D
2 (z ) 
 i=1 cos4 (z i ) 2 i=1
cos
i 


f (x) = 
 z = x o
D
2


iz
i=1

46

S. Poursoltan and F. Neumann

subject to
g1 (x) = 0.75

zi 0

i=1

g2 (x) =

D


0.75D 0

i=1
D

x [0, 10]

C02
Minimize
f (x) = max(z) z = x o, y = z 0.5
subject to
g1 (x) = 10

D
1  2
[z i 10 cos(2 z i ) + 10] 0
D
i=1

D
1  2
g2 (x) =
[z i 10 cos(2 z i ) + 10] 15 0
D
i=1

h(x) =

D
1  2
[yi 10 cos(2 yi ) + 10] 20 0
D
i=1

x [5.12, 5.12] D

C03
Minimize
f (x) =

D1


(100(z i2 z i+1 )2 + (z i 1)2 ) z = x o

i=1

subject to
h(x) =

D1


(z i z i+1 )2 = 0

i=1

x [1,000,1,000] D

2 Ruggedness Quantifying for Constrained Continuous Fitness Landscapes

C06
Minimize
f (x) = max(z) z = x o,
y = (x + 483.6106156535 o)M 483.6106156535
subject to
D

1 
yi sin
|yi | = 0
D

h 1 (x) =

i=1

D

1 
yi cos 0.5 |yi | = 0
D

h 2 (x) =

i=1

x [600, 600] D

C07
Minimize
f (x) =

D1


(100(z i2 z i+1 )2 + (z i 1)2 )

i=1

z = x + 1 o, y = x o
subject to





D
D
1 

1
g(x) = 0.5 exp 0.1
yi2 3 exp
cos(0.1y)
D
D

i=1

i=1

+ exp(1) 0
x [ 140, 140] D

C09
Minimize
f (x) =

D1


(100(z i2 z i+1 )2 + (z i 1)2 )

i=1

47

48

S. Poursoltan and F. Neumann

z = x + 1 o, y = x o
subject to
h 1 (x) =

D


yi sin


|yi | = 0

i=1

x [500, 500] D

C10
Minimize
f (x) =

D1


(100(z i2 z i+1 )2 + (z i 1)2 )

i=1

z = x + 1 o, y = (x o)M
subject to
h 1 (x) =

D


yi sin


|yi | = 0

i=1

x [500, 500] D

C17
Minimize

D

(z i z i+1 )2 z = x o
f (x) =
i=1

subject to
g1 (x) =

zi 0

i=1

g2 (x) =

D

i=1

zi 0

2 Ruggedness Quantifying for Constrained Continuous Fitness Landscapes

h(x) =

D


49


z i sin 4 |z i | = 0

i=1

x [10, 10] D

C18
Minimize
f (x) =

D

(z i z i+1 )2 z = x o
i=1

subject to
g(x) =

D


z i sin

|z i |



i=1

h(x) =

D


z i sin


|z i | = 0

i=1

x [50, 50] D

References
Box GE, Jenkins GM, Reinsel GC (2013) Time series analysis: forecasting and control. Wiley
Eberhart R, Kennedy J (1995) A new optimizer using particle swarm theory. In: Proceedings of the
sixth international symposium on micro machine and human science, MHS95. IEEE, pp 3943
Floudas CA, Pardalos PM (1990) A collection of test problems for constrained global optimization
algorithms, vol 455. Springer, Berlin
Gen M, Cheng R (2000) Genetic algorithms and engineering optimization, vol 7. Wiley, New York
Hordijk W (1996) A measure of landscapes. Evol Comput 4(4):335360
Liang J, Runarsson TP, Mezura-Montes E, Clerc M, Suganthan P, Coello CC, Deb K (2006) Problem
definitions and evaluation criteria for the CEC 2006 special session on constrained real-parameter
optimization. J Appl Mech 41
Lipsitch M (1991) Adaptation on rugged landscapes generated by local interactions of neighboring
genes. In: Proceedings of the fourth international conference on genetic algorithms. San Mateo
Malan KM, Engelbrecht AP (2009) Quantifying ruggedness of continuous landscapes using entropy.
In: IEEE congress on evolutionary computation, CEC09, pp 14401447
Mallipeddi R, Suganthan PN (2010) Problem definitions and evaluation criteria for the CEC 2010
competition on constrained real-parameter optimization. Nanyang Technological University,
Singapore

50

S. Poursoltan and F. Neumann

Manderick B, de Weger, M, Spiessens P (1991) The genetic algorithm and the structure of the fitness
landscape. In: Proceedings of the fourth international conference on genetic algorithms. Morgan
Kauffman, San Mateo, pp 143150
Mattfeld DC, Bierwirth C, Kopfer H (1999) A search space analysis of the job shop scheduling
problem. Ann Oper Res 86:441453
Mersmann O, Bischl B, Trautmann H, Preuss M, Weihs C, Rudolph G (2011) Exploratory landscape
analysis. In: Proceedings of the 13th annual conference on genetic and evolutionary computation.
ACM, pp 829836
Mezura-Montes E, Coello Coello CA (2011) Constraint-handling in nature-inspired numerical optimization: past, present and future. Swarm Evol Comput 1(4):173194
Naudts B, Kallel L (2000) A comparison of predictive measures of problem difficulty in evolutionary
algorithms. IEEE Trans Evol Comput 4(1):115
Runarsson TP, Yao X (2000) Stochastic ranking for constrained evolutionary optimization. IEEE
Trans Evol Comput 4(3):284294
Schwefel HPP (1993) Evolution and optimum seeking: the sixth generation. Wiley, New York
Smith T, Husbands P, Layzell P, OShea M (2002) Fitness landscapes and evolvability. Evol Comput
10(1):134
Stadler PF et al (1995) Towards a theory of landscapes. In: Complex systems and binary networks.
Springer, Heidelberg, pp 78163
Storn R, Price K (1997) Differential evolutiona simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341359
Vassilev VK, Fogarty TC, Miller JF (2000) Information characteristics and the structure of landscapes. Evol Comput 8(1):3160
Vassilev VK, Fogarty TC, Miller JF (2003) Smoothness, ruggedness and neutrality of fitness landscapes: from theory to application. In: Advances in evolutionary computing. Springer, pp 344
Weinberger E (1990) Correlated and uncorrelated fitness landscapes and how to tell the difference.
Biol Cybern 63(5):325336

Chapter 3

Trust Regions in Surrogate-Assisted


Evolutionary Programming for Constrained
Expensive Black-Box Optimization
Rommel G. Regis

Abstract This paper develops a new surrogate-assisted evolutionary programming


(EP) algorithm for computationally expensive constrained black-box optimization.
The proposed algorithm, TRICEPS (Trust Regions In Constrained Evolutionary
Programming using Surrogates) builds surrogates for the black-box objective function
and inequality constraint functions in every generation of the EP and uses a trustregion-like approach to refine the best solution at the end of each generation. Each
parent produces a large number of trial offspring in each generation, and then
the surrogates are used to identify promising trial offspring, which become the
actual offspring where the objective and constraint functions are evaluated. After the
function evaluations at these offspring, TRICEPS finds a minimizer of the surrogate
of the objective function within a trust region centered at the current best solution and
subject to surrogate inequality constraints with a small margin and with a distance
requirement from previously evaluated points. The trust region is either expanded
or reduced depending on whether the subproblem solution turned out to be feasible and whether the ratio of the actual improvement to the predicted improvement
exceeds or falls below certain thresholds. TRICEPS is implemented using a cubic
radial basis function (RBF) model with a linear polynomial tail and is compared to
an RBF-assisted EP called CEP-RBF (Regis 2014b) and to other alternatives on 18
benchmark problems and on an automotive application with 124 decision variables
and 68 black-box inequality constraints. Performance and data profiles show that
TRICEPS is a substantial improvement over CEP-RBF and it is much better than the
other alternatives on the test problems used.
Keywords Constrained optimization Evolutionary programming Surrogateassisted evolutionary algorithm Radial basis function Trust region Large-scale
optimization

R.G. Regis (B)


Department of Mathematics, Saint Josephs University, Philadelphia, PA 19131, USA
e-mail: rregis@sju.edu
Springer India 2015
R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,
Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_3

51

52

R.G. Regis

3.1 Introduction
In many real-world engineering optimization problems, the values of the objective
and constraint functions are outputs of computationally expensive simulations. These
types of optimization problems are found in the automotive and aerospace industries
(e.g., Jones 2008; Ong et al. 2003) and in various parameter estimation problems
(e.g., Mugunthan et al. 2005; Tolson and Shoemaker 2007). A reasonable strategy
for solving these problems is to use surrogate-based or surrogate-assisted optimization methods, including surrogate-assisted evolutionary algorithms (EAs) Jin (2011),
where the algorithm uses surrogate models to approximate the black-box objective and constraint functions. For instance, Regis (2014b) successfully developed
a surrogate-assisted Evolutionary Programming (EP) algorithm and applied it to a
large-scale automotive optimization application with 124 decision variables and 68
black-box inequality constraints given a severely limited computational budget of
only 1,000 simulations, where one simulation yields the objective function value
and each of the constraint function values at a given input vector. The purpose of
this paper is two-fold: (1) To develop a new surrogate-assisted EP for constrained
black-box optimization that improves on the algorithm by Regis (2014b) on a set of
benchmark problems, including the above-mentioned large-scale automotive application and (2) To compare the new approach with alternative methods, including a
mathematically rigorous penalty derivative-free algorithm, on the same problems.
This chapter focuses on constrained black-box optimization problems of the
following form:
min f (x)
s.t.
(3.1)
x Rd
gi (x) 0, i = 1, 2, . . . , m
axb
Here, f is the black-box objective function and g1 , . . . , gm are black-box inequality
constraint functions and a, b Rd define the bound constraints of the problem.
Throughout this paper, assume that for any input x [a, b] Rd , the values of
f (x), g1 (x), . . . , gm (x) are obtained by running a time-consuming simulator (a computer code) at the input x. Moreover, assume that f , g1 , . . . , gm are all deterministic
and that their gradients are not available. Furthermore, for simplicity, assume that
[a, b] Rd is a hypercube since any hyper-rectangle can be easily transformed to
the unit hypercube [0, 1]d . Problems with equality constraints or noisy functions will
be treated in future work.
Problem (3.1) is difficult when the dimension d and the number of black-box
constraints m are large, and it is even more difficult when the computational budget
is relatively limited. Although much progress has been made in the development of
constraint handling techniques for EAs (Mezura-Montes and Coello Coello 2011),
most of these approaches require a large number of simulations even on problems of
moderate size, and hence they are not appropriate in the computationally expensive

3 Trust Regions in Surrogate-Assisted Evolutionary Programming . . .

53

setting. As mentioned above, a sensible approach is to learn the structure of these


black-box functions by constructing and updating dynamic surrogate models, one
for the objective and one for each of the constraint functions as was done in Regis
(2014b). These surrogates are then used to identify promising offspring or other
promising points in the search space and the expensive simulations are performed
only on these points.
This paper develops a new surrogate-assisted EP for computationally expensive
constrained black-box optimization. The proposed algorithm, called TRICEPS (Trust
Regions In Constrained Evolutionary Programming using Surrogates), does not use
a penalty to handle constraints but builds surrogates for the black-box objective and
constraint functions in every generation of the EP. Moreover, it incorporates a trustregion-like approach to refine the best parent solution at the end of each generation.
As in the surrogate-assisted EP by Regis (2014b), each parent in TRICEPS produces a large number of trial offspring in each generation, and then the surrogates
are used to identify promising trial offspring, which become the actual offspring
where the objective and constraint functions are evaluated. After performing simulations at the offspring of the current generation, TRICEPS solves a trust-region-like
subproblem where it finds a minimizer of the surrogate of the objective function
within a trust region centered at the current best solution and subject to surrogate
inequality constraints with a small margin and with a distance requirement from
previously evaluated points. The margin on the surrogate constraints is meant to
increase the chances of obtaining feasible points. In addition, the trust region is
either expanded or reduced depending on whether the subproblem solution turned
out to be feasible and whether the ratio of the actual improvement to the predicted
improvement exceeds or falls below certain thresholds. In the numerical experiments,
TRICEPS is implemented using a cubic radial basis function (RBF) model with a
linear polynomial tail and is compared to the previously developed RBF-assisted
EP called CEP-RBF (Regis 2014b) and to other alternatives, including the mathematically rigorous penalty derivative-free algorithm SDPEN (Liuzzi et al. 2010),
on 18 benchmark problems and on an automotive application with 124 decision
variables and 68 black-box inequality constraints proposed by Jones (2008) during
the MOPTA08 conference. Performance and data profiles show that TRICEPS with
RBF surrogates is a substantial improvement over CEP-RBF and it is much better
than the other alternatives, including SDPEN, on the test problems used when the
computational budget is relatively limited.
Although this paper is about a surrogate-assisted EP, it is also possible to develop
other surrogate-assisted EAs, including surrogate-assisted evolution strategies (ES),
for constrained black-box optimization using the ideas presented here. However,
when the problem is highly constrained and the computational budget is severely
limited, Regis (2014b) suggests using EAs that mainly use conservative mutation
operators since recombination might have a tendency to produce offspring that violate
one of the many constraints. For example, the (1+1)-CMA-ES by Arnold and Hansen
(2012) would be another good candidate to combine with a surrogate since it only
uses mutation operators.

54

R.G. Regis

This paper is organized as follows. Section 3.2 provides a review of the relevant
literature. Section 3.3 describes the proposed TRICEPS algorithm and the RBF surrogate model used. Sections 3.4 and 3.5 discuss the numerical experiments and results.
Finally, Sect. 3.6 provides some conclusions.

3.2 Review of Literature


Various constraint handling techniques have been used with EAs. Some of the
most common techniques use penalty functions (e.g., Mezura-Montes et al. 2003;
Runarsson and Yao 2000; Tessema and Yen 2006), multi-objective optimization
(Wang and Cai 2012), a combination of a bi-objective optimization and a penalty
approach (Datta and Deb 2013; Deb and Datta 2013), the epsilon constrained method
(Takahama and Sakai 2012), cultural algorithms (Coello Coello and Landa-Becerra
2004), and those that distinguish between feasible and infeasible solutions (MezuraMontes and Coello Coello 2005). A recent survey on constraint-handling techniques
in evolutionary and swarm algorithms is given by Mezura-Montes and Coello Coello
(2011) and a tutorial is given by Coello Coello (2012).
As mentioned above, surrogates or metamodels of the objective and constraint
functions have been used to assist EAs for computationally expensive black-box
optimization. In particular, surrogates for the objective function have been used to
approximate objective function values (e.g., Regis and Shoemaker (2004)), while
surrogates for the constraint functions have been used by Kramer et al. (2009) to
check feasibility, repair infeasible solutions, and rotate the mutation ellipsoid in
CMA-ES. Examples of surrogate models that have been used with EAs include
multivariate quadratic polynomials (Araujo et al. 2009; Regis and Shoemaker 2004;
Shi and Rasheed 2008; Wanner et al. 2005), multilayer perceptron neural networks
(Jin et al. 2002), kriging and Gaussian process models (Emmerich et al. 2002; Zhou
et al. 2007), radial basis functions (Isaacs et al. 2007, 2009; Ong et al. 2003; Regis
2014b; Regis and Shoemaker 2004; Zhou et al. 2007), support vector machines
(SVM) (Gieseke and Kramer 2013; Loshchilov et al. 2012; Shi and Rasheed 2008)
and nearest neighbors regression (Runarsson 2004). Moreover, multiple surrogates
may be used to balance exploration and exploitation in an evolutionary algorithm
(e.g., Montao et al. (2012)). A recent survey on surrogate-assisted EAs is provided
by Jin (2011).
Penalty functions are also commonly used to handle constraints in surrogateassisted EAs. For example, Shi and Rasheed (2008) use a stochastic penalty function
and an adaptive mechanism for switching from lower complexity polynomial models
to higher complexity SVM models while Runarsson (2004) uses a penalty-based
Stochastic Ranking ES combined with a nearest neighbor regression model. However,
Powell (1994) notes that the use of a penalty might not be the most effective way to
handle expensive black-box constraints since information about individual constraint

3 Trust Regions in Surrogate-Assisted Evolutionary Programming . . .

55

violations is lost. In fact, some numerical evidence to support this idea can be found in
Regis (2014b). Instead, Powell (1994) suggests treating the constraints individually
by building individual surrogates, one for each constraint.
According to Mezura-Montes and Coello Coello (2011), surrogates are still
seldom used to approximate constraints in nature-inspired algorithms. One example
is a GA combined with Feasible Sequential Quadratic Programming (FSQP) developed by Ong et al. (2003), where local RBF surrogates are used to model the objective
and constraint functions. Other examples are given by Araujo et al. (2009) and Wanner et al. (2005), where quadratic models are used to approximate the objective and
constraint functions in GAs. Moreover, Isaacs et al. (2007, 2009) used RBF networks to model objective and constraint functions in evolutionary multi-objective
optimization. In addition, Emmerich et al. (2006) proposed using local Gaussian
Random Field Metamodels for modeling constraint functions in single- and multiobjective evolutionary optimization. More recently, Gieseke and Kramer (2013) used
SVMs to estimate nonlinear constraints in CMA-ES for expensive optimization.
While there are relatively few algorithms that use surrogates to approximate
black-box constraints, there are even fewer algorithms that have been used on
high-dimensional (more than a hundred decision variables) and highly constrained
problems. In Ong et al. (2003), the GA coupled with FSQP that uses local RBF surrogates was tested only on problems with at most 20 decision variables and at most 4
inequality constraints. The metamodel-based CiMPS method (Kazemi et al. 2011)
was only tested on problems with at most 13 decision variables and 9 inequality
constraints. On the other hand, ConstrLMSRBF (Regis 2011), CEP-RBF (Regis
2014b) and COBRA (Regis 2014a) all use global RBF surrogates and were all successful compared to alternatives on well-known benchmark problems and on the
MOPTA08 automotive application with 124 decision variables and 68 black-box
inequality constraints (Jones 2008). One of the goals of this paper is to develop a
new surrogate-assisted EP that improves upon the surrogate-assisted EP in Regis
(2014b) on benchmark test problems and on the MOPTA08 automotive problem.

3.3 Trust Regions in Constrained Evolutionary


Programming Using Surrogates
3.3.1 Overview
This section describes a pseudo-code for the proposed TRICEPS algorithm, which
is a new surrogate-assisted EP for optimization problems with black-box inequality constraints. A detailed description is given in the next subsection. Unlike many
constrained EAs in the literature, TRICEPS does not use a penalty function. Instead,
it is similar to the surrogate-assisted EP by Regis (2014b) in that it treats each inequality constraint separately and builds and updates a surrogate model for each constraint
function using all previously evaluated points (both feasible and infeasible points).

56

R.G. Regis

Moreover, as in Regis (2014b), each parent generates multiple trial offspring in every
generation and then the surrogates for the objective and constraint functions are used
to rank these trial offspring according to rules that favor offspring with the best
predicted objective function values among those with the minimum number of predicted constraint violations. The computationally expensive simulations (evaluations
of the objective and constraint functions) are then carried out only on the most
promising offspring of each parent.
TRICEPS differs from the surrogate-assisted EP by Regis (2014b) in that it incorporates a trust-region-like approach to refine the best solution at the end of each
generation. That is, after performing simulations at the offspring of the current
generation, TRICEPS solves a trust-region-like subproblem where it finds a minimizer of the surrogate of the objective function within a trust region centered at
the current best solution and subject to surrogate inequality constraints with a small
margin and with a distance requirement from previously evaluated points. The idea
of refining the best solution at the end of each generation has been implemented in
surrogate-assisted particle swarm algorithms for bound constrained problems (e.g.,
Parno et al. (2012); Regis (2014c)). However, these previous approaches did not use
trust regions that can be expanded or reduced. In TRICEPS, the adjustment of the
trust region depends on whether the subproblem solution turned out to be feasible,
whether the ratio of the actual improvement to the improvement predicted by the
surrogate exceeds or falls below certain thresholds, and also whether the number of
consecutive successful local refinements or the number of consecutive unsuccessful
local refinements have reached certain thresholds. Also, the idea of using a margin
on the surrogate inequality constraints was first proposed by Regis (2014a) and its
purpose is to increase the chances of obtaining feasible points.
When the optimization problem has a large number of decision variables and has
many black-box inequality constraints, Regis (2011, 2014b) implemented a Block
Coordinate Search (BCS) strategy where new trial solutions (or offspring) are generated by perturbing only a small fraction of the coordinates of the current solution
under consideration (i.e., a particular parent solution, including possibly the current
best feasible solution). The BCS strategy resulted in a dramatic improvement for
the ConstrLMSRBF (Regis 2011) and CEP-RBF (Regis 2014b) when applied to
the MOPTA08 benchmark problem from the auto industry proposed by Jones (2008)
involving 124 decision variables and 68 black-box inequality constraints. When only
a small number of coordinates of a parent solution are perturbed, fewer constraint
violations are likely to be introduced in the trial offspring and the trial offspring will
tend to be closer to the parent solution. If this parent solution is feasible, many of the
trial offspring will tend to be feasible thereby making it more likely to find a feasible
solution with an improved objective function value. Hence, the BCS strategy is also
implemented in TRICEPS when it is used for high-dimensional problems with many
black-box inequality constraints.
Figure 3.1 presents a flowchart that shows the main steps of the TRICEPS algorithm. The algorithm begins by initializing the parent population and algorithm
parameters and then calculating the objective and constraint functions at the initial

3 Trust Regions in Surrogate-Assisted Evolutionary Programming . . .

57

Initialize parent
population and
algorithm parameters

Evaluate objective and constraint


functions at initial population

Stop

yes

Computational
budget
reached?

Update
trustregion

Evaluate objective and constraint functions


at trust-region point

no

Update surrogates
of the objective and
constraint functions

Fit or update surrogates for the objective


and constraints

Generate trial ospring for each parent

Solve trust-region
subproblem

Evaluate
surrogates
at trial
ospring

Select best ospring


for each parent

Evaluate objective
and constraint functions at best ospring
for each parent

Fig. 3.1 Flowchart of the TRICEPS algorithm

population. Then TRICEPS goes through a main loop that terminates only when the
computational budget (i.e., maximum number of function evaluations) is reached.
In the first part of the loop, TRICEPS performs the same steps as in CEP-RBF
(Regis 2014b). That is, TRICEPS fits the surrogates for the objective and constraint
functions, generates a large number of trial offspring for each parent, and then uses
the surrogates to select only the most promising trial offspring and this is where
the function evaluations are performed. In the second part of the loop, TRICEPS
performs a trust-region-like refinement of the best parent solution. That is, the surrogates are updated using information from recently evaluated points, the trust-region
subproblem is solved, then function evaluations are performed on the solution to the
trust-region subproblem, and finally, the algorithm parameters and the trust region
are updated. Note that the surrogates are updated twice in a single iteration, once
before the trial offspring are generated and once before the trust-region step. Hence,
surrogate modeling is integrated into the optimization process in two ways by using
it: (1) to select the most promising among multiple trial offspring for each parent
solution and (2) to identify a local refinement point for the current best solution
during the trust-region step.

58

R.G. Regis

3.3.2 Algorithm Description


The main input to TRICEPS is an optimization problem of the form (3.1) together
with a simulator (a computer code) that yields the values of f (x), g1 (x), . . . , gm (x)
for any input x [a, b] Rd . Moreover, assume that a feasible starting point x0 is
provided. This assumption is not unreasonable since for some real-world engineering
optimization problems, an initial feasible solution to the problem is provided and the
goal is simply to find a better feasible solution. If a feasible solution is not initially
available, then one can develop an extension of TRICEPS that can handle infeasible starting points by using an approach that is similar to the two-phase approach
described in Regis (2014a). The first phase finds a feasible point while the second
phase improves on this feasible point. This two-phase approach will be included in
future work.
Below is a detailed description of the TRICEPS algorithm. It has several userspecified parameters:
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)

(the number of parents in one generation)


(the number of trial offspring generated for each parent)
init (the initial standard deviation of the Gaussian mutations)
pmut (the probability of perturbing a coordinate of a parent solution when
generating a trial offspring)
init , min , max (the initial, minimum, and maximum trust-region radii,
respectively)
0 , 1 (the ratio thresholds indicating whether the trust-region iterations were
successful or not)
0 < 0 < 1 < 1 (contraction and expansion factors for the trust region,
respectively)
Tfail (tolerance for the number of consecutive unsuccessful trust-region iterations before the trust region is reduced)
Tsuccess (threshold for the number of consecutive successful trust-region iterations before the trust region is expanded)
init > 0 (initial margin for the surrogate inequality constraints)
Tinfeas (threshold for the number of consecutive generations where a feasible
solution to the trust-region subproblem was not found)
(distance requirement from previous sample points)

Each individual is a pair of d-dimensional vectors (xi (t), i (t)), where t is the generation number, i is the index of the individual in the current population, xi (t) is
the vector of values of the decision variables, and i (t) is the vector of standard
deviations for the Gaussian mutations.
( + )-TRICEPS for Constrained Black-Box Optimization
(1) Set generation counter t = 0 and set initial population P(0) = {(x1 (0), 1 (0)),
. . . , (x (0), (0))}, where i (0) = init for i = 1, . . . , and x1 (0) = x0
(feasible starting point).

3 Trust Regions in Surrogate-Assisted Evolutionary Programming . . .

59

(2) Initialize trust-region radius 0 = init and the margins 0 = init .


(3) Initialize the counters for the number of consecutive successful local refinements
Csuccess = 0 and the number of consecutive unsuccessful local refinements
Cfail = 0. Also, initialize the counter for the number of consecutive generations where a feasible solution for the trust-region subproblem was not found:
Cinfeas = 0.
(4) Evaluate the objective and constraint functions at the points in P(0): For each
i = 1, . . . , , run the simulator to determine f (xi (0)), g1 (xi (0)), . . . , gm (xi (0)).
Relabel the subscripts of the individuals in P(0) so that x1 (0) is the best point in
P(0).
(5) While the termination criteria are not satisfied do
(0) (1)
(m)
(5.1) Fit or update surrogates st , st , . . . , st for the objective and constraint
functions f , g1 , . . . , gm , respectively, using all available function values
from previous simulations (see Sect. 3.3.3).
(5.2) For i = 1, . . . ,
(5.2(a)) For j = 1, . . . , , generate (xij (t), ij (t)) = Mutate((xi (t), i (t)), pmut ).
(5.2(b)) Evaluate the surrogates of the objective and constraint functions at the
 (t),  (t)), . . . , (x  (t),  (t))}: For
=
{(xi1
points in Pi (t)
i1
i
i
(0) 
j = 1, . . . , , calculate st (xij (t)), st(1) (xij (t)), . . . , st(m) (xij (t)).
(5.2(c)) (xi (t), i (t)) = Select(Pi (t)).
(5.2(d)) Evaluate the objective and constraint functions at the selected point: Run
the simulator to determine f (xi (t)), g1 (xi (t)), . . . , gm (xi (t)).
End
(5.3) P(t + 1) = Select(P(t) P (t)) where P (t) = {(x1 (t), 1 (t)), . . . , (x (t),
 (t))}. Relabel the subscripts of the individuals in P(t + 1) so that x1 (t + 1)
is the best point in P(t + 1).
(0) (1)
(m)
(5.4) Update surrogates st , st , . . . , st for the objective and constraint functions f , g1 , . . . , gm , respectively, using newly obtained function values from
simulations on the current offspring.
(5.5) Relabel all previously evaluated points by v1 , . . . , vn and let vn be the best
feasible point so far. Solve the subproblem below and let
x(t) be the solution
obtained:
(0)

min st (x)
s.t. x Rd , a x b
x vn  t
(i)
st (x) + t 0, i = 1, 2, . . . , m
x vj  , j = 1, . . . , n

(3.2)

(5.6) If a feasible solution is found for Problem (3.2), then let


x(t) be the solution
x(t) be the best solution
obtained and reset Cinfeas = 0. Otherwise, let 
(infeasible for (3.2)) among a set of randomly generated points within the
trust region {x [a, b] : x vn  t } and reset Cinfeas = Cinfeas + 1.

60

R.G. Regis

(5.7) Evaluate the objective and constraint functions at


x(t): Run the simulator to
x(t)), . . . , gm (
x(t)).
determine f (
x(t)), g1 (
(5.8) If 
x(t) is feasible, then do
(5.8(a)) Calculate predicted and actual improvements: pred = st(0) (x1 (t + 1))
st(0) (
x(t)) and actual = f (x1 (t + 1)) f (
x(t)).
(5.8(b)) If f (
x(t)) < f (x1 (t + 1)), then replace x1 (t + 1) (the best parent in the
next generation) by 
x(t) and reset Csuccess = Csuccess + 1 and Cfail = 0.
Otherwise, reset Csuccess = 0 and Cfail = Cfail + 1.
(5.8(c)) If pred > 0, then do
actual
(i) If t = pred 1 and Csuccess Tsuccess , then t+1 =

min(1 t , max ) and reset Csuccess = 0.


actual
(ii) Else if t = pred < 0 and Cfail Tfail , then t+1 =

max(0 t , min ) and reset Cfail = 0.


Else
(iii) If Cfail Tfail , then t+1 = max(0 t , min ) and reset Cfail = 0.
End.
Else
(5.8(d)) Set Csuccess = 0 and Cfail = Cfail + 1.
(5.8(e)) If Cfail Tfail , then t+1 = max(0 t , min ) and reset Cfail = 0.
End.
(5.9) If Cinfeas Tinfeas , then reduce the margins t+1 = t /2 and reset Cinfeas = 0.
Otherwise, t+1 = t .
(5.10) Increment generation counter: t t + 1.
End While
(6) Return best solution found.
As in the surrogate-assisted Constrained EP in Regis (2014b), Step 1 of TRICEPS
generates the initial parent population and initializes the standard deviations of the
mutations. Step 2 initializes the trust-region radius and the margin for the surrogate inequality constraints while Step 3 initializes the counters that keep track of
the number of consecutive successful local refinements, the number of consecutive
unsuccessful local refinements and the number of consecutive generations where the
trust-region subproblem did not yield any feasible points. Then, in Step 4, the simulator is run times to determine the objective and constraint function values of the
initial parent solutions. For convenience, the initial parent population is reordered
so that the first one is the best point. Since a feasible starting point is provided, the
best parent solution must be feasible.
Next, in Step 5, TRICEPS loops through the generations. At the beginning of each
generation, surrogates for the objective and constraint functions are built using all
(0)
available function values from previous simulations (Step 5.1). In particular, st is
(1)
(m)
the surrogate for f while st , . . . , st are the surrogates for g1 , . . . , gm , respectively.
Next, for each of the parent solutions, trial offspring are generated by mutation
(Step 5.2(a)). Then, the surrogates for the objective and constraints are evaluated

3 Trust Regions in Surrogate-Assisted Evolutionary Programming . . .

61

at each trial offspring (Step 5.2(b)) and the most promising of the trial offspring
from each parent is chosen (Step 5.2(c)). Next, the simulator is run to determine the
objective and constraint function values at the selected offspring (Step 5.2(d)). Then,
the algorithm selects the the parent population for the next generation (Step 5.3). As
before, the new parent population is reordered so that the first one is the best point.
The next several steps attempt to refine the current best solution, which is the best
parent in the next generation x1 (t + 1). In Step 5.4, the surrogates for the objective
and constraints are updated using the newly obtained function values at the offspring
of the current generation. In Step 5.5, a trust-region subproblem (3.2) is solved. For
convenience, all points in the search space where the simulator has been run are
relabeled as v1 , . . . , vn and let vn be the best feasible point found so far. Because of
previous relabeling, vn = x1 (t +1). In this step, the algorithm finds a local minimizer
of the surrogate of the objective within the trust region of radius t centered at the
current best point and subject to the surrogate inequality constraints with a small
margin t and subject to a distance requirement from previously evaluated points.
Then, in Step 5.6,
x(t) is either a solution to the trust-region subproblem (3.2) or it is
the best infeasible solution to (3.2) from a set of randomly generated points within
the trust region. Here, 
x(t) is referred to as the local refinement point. In Step 5.7,
the simulator is run to determine the objective and constraint function values at the
local refinement point 
x(t). Then, in Step 5.8, the local refinement point replaces
the best parent in the next generation (which is also the current best solution) if
the former is a better point than the latter. Moreover, the trust-region radius is either
expanded or reduced depending on whether the local refinement point
x(t) is feasible,
whether the ratio of the actual improvement to the improvement originally predicted
by the surrogate for
x(t) exceeds 1 or falls below 0 , and also whether the counters
Csuccess or Cfail have reached the thresholds Tsuccess or Tfail . In addition, in Step 5.9,
the margin for the surrogate inequality constraints is reduced if the counter Cinfeas
reached the threshold Tinfeas . Then, Step 5.10 increments the generation counter and
the algorithm goes back into the loop until a stopping criterion is satisfied. Finally,
the best solution found is returned in Step 6. As with the surrogate-assisted EP in
Regis (2014b), the stopping criterion is a fixed number of simulations.
As in Regis (2014b), each parent generates trial offspring, only one of which
becomes an actual offspring for the current generation. The value of the parameter is
chosen to be large so that the expensive simulations are only run on trial offspring that
are very promising as predicted by the surrogates. Moreover, TRICEPS allows for the
possibility of using the BCS strategy from Regis (2011, 2014b) for high-dimensional
or highly constrained problems. In BCS, the mutations are more conservative in that
only a fraction of the components of the parent vector is perturbed when generating
the trial solutions so the probability of perturbing any component pmut < 1. (When
pmut = 1, the algorithm does not use the BCS strategy.) As explained in Regis
(2011, 2014b), the BCS strategy is helpful for high-dimensional problems or highly
constrained problems because perturbing too many components of a parent vector
that is already good is either likely to make the objective function value worse or it
is likely to result in more constraint violations.

62

R.G. Regis

More precisely, in Step 5.2(a), each parent (xi (t), i (t)) in generation t creates
exactly trial offspring (xij (t), ij (t)) for j = 1, . . . , as follows: For k = 1, . . . , d,
(1) Generate a random number from the uniform distribution on [0, 1].
(2) If pmut , then
xij (t)(k) = xij (t)(k) + ij (t)(k) Nk (0, 1),
ij (t)(k) = ij (t)(k) exp(  N(0, 1) + Nk (0, 1)).
Else
xij (t)(k) = xij (t)(k) ,
ij (t)(k) = ij (t)(k) .
End.
In Step 5.2(c), the trial offspring solutions are ranked in the same manner as in
Regis (2014b):
(1) Between two solutions that are predicted to be feasible, the one with the better
predicted objective value wins.
(2) Between a solution that is predicted to be feasible and a solution that is predicted
to the infeasible, the former wins.
(3) Between two solutions that are predicted to be infeasible, the one with the fewer
number of predicted constraint violations wins.
(4) Between two solutions that are predicted to be infeasible with the same number
of predicted constraint violations, the one with the better predicted objective
value wins.
In implementing TRICEPS, a continuously differentiable surrogate whose gradient is easy to compute is highly recommended so that efficient gradient-based
techniques can be used to solve the trust-region subproblem (3.2). One such example
of a surrogate is provided in the next section. Note that the gradients of the trustregion constraints and the distance constraints are easy to calculate. In particular, for
the trust-region constraint Tt (x) = x vn  t 0 and the distance constraints
Dt,j (x) = x vj  0 for j = 1, . . . , n, the gradients are given by:
Tt (x) =

(x vj )
x vn
and Dt,j (x) =
.
x vn 
x vj 

3.3.3 Radial Basis Function Interpolation


TRICEPS can be implemented using any type of surrogate but, as pointed out above,
it is recommended to use one that is continuously differentiable and whose gradients

3 Trust Regions in Surrogate-Assisted Evolutionary Programming . . .

63

are easy to calculate. One popular choice is kriging or Gaussian process modeling,
but this method is computationally intensive and requires an enormous amount of
memory in high dimensions. This study uses the simpler radial basis function (RBF)
model in Powell (1992) that has been successfully used to develop various RBF
methods (e.g., Bjrkman and Holmstrm 2000; Gutmann 2001; Regis 2011; Regis
and Shoemaker 2007; Wild et al. 2008). Fitting this model differs from the training
method typically used for RBF networks. It involves solving a linear system that
possesses good theoretical properties that can be taken advantage of to solve the
system in a stable and efficient manner.
Given n distinct points x1 , . . . , xn Rd and the function values u(x1 ), . . . , u(xn ),
where u(x) could be the objective function or one of the constraint functions, TRICEPS is implemented below using an interpolant of the form
s(x) =

n


i (x xi ) + p(x), x Rd ,

i=1

where  is the Euclidean norm, i R for i = 1, . . . , n, p(x) is a linear polynomial


in d variables, and can take one of severalforms, including (r) = r 3 (cubic),
(r) = r 2 log r (thin plate spline), (r) = r 2 + 2 (multiquadric) and (r) =
exp( r 2 ) (Gaussian). Here, is a parameter to be determined.
In the numerical experiments, a cubic RBF model is used because it has been
successfully used in various surrogate-based and surrogate-assisted optimization
algorithms (e.g., Bjrkman and Holmstrm 2000; Gutmann 2001; Regis and Shoemaker 2004; Wild et al. 2008), including those that performed relatively well on the
124-dimensional MOPTA08 problem (Regis 2011, 2014a, b) and on problems with
200 decision variables Regis and Shoemaker (2013b). One advantage of this cubic
RBF model over the Gaussian RBF model is that it does not require a parameter.
The parameter in the Gaussian RBF is typically found using leave-one-out crossvalidation and this adds to the computation time for fitting the model. Moreover,
recent work by Wild and Shoemaker (2011) suggests that cubic RBFs might be more
suitable than Gaussian RBFs for surrogate-based optimization. Finally, in preliminary numerical experiments, some settings of the parameter result in Gaussian
RBF models that have many more local minima than the black-box functions that
they are trying to approximate. In contrast, this did not seem to be a problem for the
cubic RBF model.
To fit the above cubic RBF model, define the matrix Rnn by: ij :=
(xi xj ), i, j = 1, . . . , n. Also, define the matrix P Rn(d+1) so that its ith
row is [1, xiT ]. Now, the cubic RBF model that interpolates the points (x1 , u(x1 )), . . . ,
(xn , u(xn )) is obtained by solving the system


P
PT 0(d+1)(d+1)

  


U
,
=
c
0d+1

(3.3)

64

R.G. Regis

where 0(d+1)(d+1) R(d+1)(d+1) is a matrix of zeros, U = (u(x1 ), . . . , u(xn ))T ,


0d+1 Rd+1 is a vector of zeros, = (1 , . . . , n )T Rn and c = (c1 , . . . , cd+1 )T
Rd+1 consists of the coefficients for the linear polynomial p(x). The coefficient
matrix in (3.3) is invertible if and only if rank(P) = d + 1 (Powell 1992). This
condition is equivalent to having a subset of d + 1 affinely independent points among
the points {x1 , . . . , xn }.
The above RBF model is used to construct surrogates for the objective function
f (x) and each of the constraint functions g1 (x), . . . , gm (x) in every generation. For a
given set of data points where the objective and constraint function values are known,
the same interpolation matrix is used so fitting multiple RBF models can be done
relatively efficiently even when m is large by means of standard matrix factorizations.
For the local refinement step, the gradients of the RBF surrogates for the objective
and constraint functions are used to solve the trust-region subproblem. The gradient
of the above RBF model is given by
s(x) =

n

i=1

i  (x vi )

(x vi )
+ p(x), x Rd ,
x vi 

x = vi for all i,

where  (r) is the derivative of the radial function (r).

3.4 Numerical Experiments


3.4.1 Benchmark Constrained Optimization Problems
The proposed TRICEPS-RBF algorithm is tested on 18 well-known benchmark test
problems, mostly from Mallipeddi and Suganthan (2010), Michalewicz and Schoenauer (1996), and on a large-scale black-box optimization problemfrom the auto
industry proposed by Don Jones (2008) at the MOPTA (Modeling and Optimization:
Theory and Applications) 2008 conference. The test problems have 230 decision
variables and 111 inequality constraints and they are given in Appendix A and
also in Regis (2014b). They include four 30-dimensional problems from Mallipeddi
and Suganthan (2010) and many of the problems from Michalewicz and Schoenauer
(1996) that only have inequality constraints or bound constraints. As explained in
Regis (2014b), the constraint functions of some of these test problems are rescaled by
either dividing by some positive constant or by applying a logarithmic transformation
without changing the feasible region.
The automotive optimization problem from Jones (2008) is called MOPTA08 and
it is available as a Fortran code at http://anjos.mgi.polymtl.ca/MOPTA2008Bench
mark.html. The MOPTA08 problem has a single black-box objective function to be
minimized, 124 decision variables normalized to [0, 1], and 68 black-box inequality
constraints that are well normalized (Jones 2008). It is much larger and more complex

3 Trust Regions in Surrogate-Assisted Evolutionary Programming . . .

65

than the problems typically used in surrogate-based or surrogate-assisted optimization (e.g., Basudhar et al. 2012; Egea et al. 2009; Viana et al. 2010). The goal of this
problem is to determine the values of the decision variables (e.g., shape variables)
that minimize the mass of the vehicle subject to performance constraints (e.g., crashworthiness, durability). The MOPTA08 problem is a relatively inexpensive model
of an actual automotive design problem. It is based on kriging response surfaces to
a real automotive problem. Each simulation of this problem takes about 0.32 s on
an Intel(R) Core(TM) i7 CPU 860 2.8 Ghz desktop machine while each simulation
of the real version could take 13 days (Jones 2008). However, as in Regis (2011,
2014b) the different algorithms are compared by assuming that the simulations are
expensive.

3.4.2 Alternative Methods


The effectiveness of the proposed TRICEPS-RBF algorithm is evaluated by
comparing it with a previously developed surrogate-assisted EP called CEP-RBF
(Regis 2014b) and also with a standard EP for constrained problems described in
Regis (2014b). Moreover, TRICEPS-RBF is compared with Stochastic Ranking Evolution Strategy (SRES) (Runarsson and Yao 2000), Scatter Search (eSS) (Egea et
al. 2007), and with an RBF-assisted EP for bound constrained problems that has
been modified to handle the inequality constraints via a penalty approach (Regis
2014b). In addition, the proposed method is compared with the ConstrLMSRBF
(Regis 2011) heuristic and with a sequential penalty derivative-free algorithm called
SDPEN (Liuzzi et al. 2010) that has a mathematically rigorous convergence guarantee. Although there are other surrogate-assisted evolutionary algorithms for constrained optimization in the literature (e.g., kriging-assisted scatter search (Egea et al.
2009) and surrogate-assisted SRES (Runarsson 2004)), the codes for these methods
are not yet publicly available.

3.4.3 Experimental Setup and Parameter Settings


In the results below, the TRICEPS-RBF algorithm is labeled as ( + )-TRICEPSRBF while the previously developed RBF-assisted EP from Regis (2014b) is labeled
as ( + )-CEP-RBF. Moreover, this paper uses the algorithm labels from Regis
(2014b) such as the ( + )-CEP for the standard constrained EP and the ( + )PenCEP-RBF for the RBF-assisted penalty-based constrained EP. In addition, an
algorithm label is given a BCS suffix if the algorithm uses the BCS strategy that
is meant for high-dimensional problems. As in Regis (2014b), the BCS strategy is
applied only to the 124-dimensional highly constrained MOPTA08 problem.

66
Table 3.1 Parameter settings
for TRICEPS-RBF

R.G. Regis
Parameter

Value

init
pmut
0
min
max
0
1
0
1
Tfail
Tsuccess
init
Tinfeas

2 or 5
= min(103 d, 104 )
0.05([a, b])
0.1 (with BCS) or 1 (without BCS)
0.05([a, b])
0.0125([a, b])
0.1([a, b])
0
0.5
0.5
2
min(max( pmut d, 5), 30)
2
0.0005([a, b])

max(3,  d)
0.0005([a, b])

The number of parents in each generation for the EP methods (including the
RBF-assisted ones) is = 2 or 5 and the initial standard deviation of the Gaussian
mutations is init = 0.2([a, b]), where ([a, b]) is the side length of the hypercube
[a, b] in (3.1). For the RBF-assisted EPs (TRICEPS-RBF, CEP-RBF and PenCEPRBF), the number of trial offspring for each parent is = min(103 d, 104 ). Moreover,
when applying the BCS strategy, the probability of perturbing a coordinate is pmut =
0.1 as in Regis (2014b). The other parameters for the ( + )-TRICEPS-RBF are
summarized in Table 3.1.
All algorithms are run on Matlab 7.12 using an Intel(R) Core(TM) i7 CPU
860 2.8 Ghz desktop machine. In particular, a Matlab version of SDPEN, called
SDPENm, is used on the test problems. Each algorithm is run for 10 trials on the
MOPTA08 problem and 30 trials on each of the other test problems. Moreover, each
trial of each algorithm is run for 1,000 simulations on the MOPTA08 problem, 300
simulations on the 30-dimensional test problems, and 200 simulations on the remaining (mostly lower dimensional) problems. Each trial begins with a feasible point that
is the same for all algorithms. For the MOPTA08 problem, only one feasible starting
point is given in Jones (2008) so all trials use this point. This feasible point has an
objective function value of 251.0706, and according to Jones (2008), any algorithm
that can achieve a feasible objective function value of 228 or lower within a relatively
limited number of simulations (say a few thousand simulations) is a good algorithm
for this problem. Moreover, each trial of an EP (with or without RBF surrogates)
begins with the feasible initial point together with a randomly generated Latin hypercube design (LHD) consisting of d + 1 affinely independent points, none of which

3 Trust Regions in Surrogate-Assisted Evolutionary Programming . . .

67

are guaranteed to be feasible. The case where no feasible point is available at the
beginning will be dealt in future work. In addition, all EP algorithms (with or without
RBF surrogates) use the same LHD in a given trial and their initial parent populations
consist of the best points from d + 2 points: the d + 1 LHD points and the feasible
starting point.
The settings for the alternative methods are the same as those used in Regis
(2014b). For example, for SRES (Runarsson and Yao 2000), = 8 and = 50
for the regular test problems and = 20 and = 140 for the MOPTA08 problem.
The initial population consists of the best points from the same initial points used
by the EP algorithms and the default values are used for the other parameters. For
the eSS code (Egea et al. 2007), the default parameters are modified to reduce the
time spent on the initialization phase. For example, the number of solutions generated by the diversificator is set to 2d, whereas the default is 10d. In addition,
ConstrLMSRBF is initialized by the LHDs used by the RBF-assisted EPs so it is
labeled as ConstrLMSRBF-LHD. Finally, SDPEN has no user-specified parameters
but it requires an initial point, which is the best point among the LHD points and the
feasible starting point.

3.5 Results and Discussion


3.5.1 Performance and Data Profiles
TRICEPS-RBF is compared to other methods using performance and data profiles
(Mor and Wild 2009) instead of the average progress curves used in Regis (2011,
2014b). An average progress curve is a plot of the mean of the best feasible objective
function value obtained by an algorithm versus the number of simulations. It has the
disadvantage of providing a somewhat inaccurate picture of the comparisons when
the distributions of the best feasible objective function values are strongly skewed,
thereby making the mean inaccurate as a measure of the center of a distribution.
Performance and data profiles do not have this difficulty and they greatly simplify
the comparisons in that the analysis can be done for an entire collection of test
problems instead of doing separate analysis for each test problem.
Let P be the set of problems where a given problem p corresponds to a particular test problem and a particular feasible starting point. Since there are 18 test
problems and 30 feasible starting points (corresponding to the 30 trials), there are
18 30 = 540 problems for the profiles. Moreover, let S be the set of solvers
(e.g., (2+2)-TRICEPS-RBF, (2+2)-CEP-RBF, (2+2)-PenCEP-RBF, (2+2)-CEP,
ConstrLMSRBF, Scatter Search, Stochastic Ranking ES, and SDPEN). For any pair
(p, s) of a problem p and a solver s, the performance ratio is
rp,s =

tp,s
,
min{tp,s : s S }

68

R.G. Regis

where tp,s is the number of simulations required to satisfy the convergence test defined
below. Here, one simulation means one evaluation of the objective and each of the
inequality constraint functions. Clearly, rp,s 1 for any p P and s S , and the
best solver for a given problem attains rp,s = 1. By convention, rp,s = whenever
solver s fails to yield a solution that satisfies the convergence test.
Now, for any solver s S and for any 1, the performance profile of s with
respect to is the fraction of problems where the performance ratio is at most , i.e.,
s () =


1 
{p P : rp,s } .
|P|

For any solver s S , the performance profile curve of s is the graph of the performance profiles of s for a range of values of .
In derivative-free, constrained expensive black-box optimization, algorithms are
compared given a fixed and relatively limited number of simulations. Hence, the
convergence test by Mor and Wild (2009) uses a tolerance > 0 and the minimum
feasible objective function value fL obtained by any of the solvers on a particular
problem within a given number s of simulations and it checks if a feasible point x
obtained by a solver satisfies
f (x (0) ) f (x) (1 )(f (x (0) ) fL ),
where x (0) is a feasible starting point corresponding to the given problem. That is,
x is required to achieve a reduction that is 1 times the best possible reduction
f (x (0) ) fL . Here, feasibility is determined according to some constraint tolerance,
which is set to 106 ([a, b]) in this study. Moreover, the parameter is set to 0.05 in
the numerical experiments.
Next, given a solver s S and > 0, the data profile of s with respect to
(Mor and Wild 2009) is given by



tp,s
1 
 ,
pP :
ds () =
|P| 
np + 1
where tp,s is the number of simulations required by solver s to satisfy the convergence
test on problem p and np is the number of decision variables in problem p. For any
solver s S , the data profile curve of s is the graph of the data profiles of s for
a range of values of . For a given solver s and any > 0, ds () is the fraction
of problems solved (i.e., problems where the solver generated a feasible point
satisfying the convergence test) by s within (np + 1) simulations (equivalent to
simplex gradient estimates (Mor and Wild 2009)).
Mor and Wild (2009) point out that data profiles are more suitable for comparing optimization algorithms when function evaluations are computationally expensive. This is because performance profiles can only compare algorithms at a fixed

3 Trust Regions in Surrogate-Assisted Evolutionary Programming . . .

69

computational budget (say after 200 simulations) while data profiles can compare
algorithms at different computational budgets and this is more valuable to users in
the computationally expensive setting where the short-term behavior of algorithms is
more important than long-term behavior. Moreover, since the number of simulations
needed to satisfy the above convergence test typically grows with the problem size,
data profiles take into account the number of decision variables in the problems.
On the other hand, performance profiles ignore problem size. Hence, in some cases
below, only the data profiles are shown to avoid clutter in the presentation of results.

3.5.2 Comparisons Between TRICEPS-RBF and CEP-RBF


on the Benchmark Test Problems
First, TRICEPS-RBF is compared with CEP-RBF (Regis 2014b), which is a recently
developed RBF-assisted EP, and also with a standard constrained EP described in
Regis (2014b). Figure 3.2 shows the performance and data profile curves of (2 + 2)TRICEPS-RBF, (5+5)-TRICEPS-RBF, (2+2)-CEP-RBF, (5+5)-CEP-RBF, (2+2)CEP and (5 + 5)-CEP after 200 simulations on the 18 test problems. It is clear
from both profiles that the RBF-assisted EPs (TRICEPS-RBF and CEP-RBF) are
dramatically much better than the corresponding standard EPs. Moreover, (2 + 2)TRICEPS-RBF is better than the (2 + 2)-CEP-RBF but (5 + 5)-TRICEPS-RBF
does not seem to have any advantage over (5 + 5)-CEP-RBF. However, when the
set of problems is restricted to the 30-dimensional test problems from Mallipeddi
and Suganthan (2010) (30 trials with different feasible starting points on C07, C08,
C14, and C15), the resulting performance and data profiles after 300 simulations in
Fig. 3.3 show that the two TRICEPS-RBF algorithms are now both better than the
corresponding CEP-RBF algorithms. Also, the advantage of (2 + 2)-TRICEPS-RBF
over (2 + 2)-CEP-RBF is more pronounced. Moreover, a similar result is obtained
when the set of problems is restricted to test problems that have at least 5 inequality
constraints or problems that have at least 20 decision variables (30 trials on Speed
Reducer, Welded Beam, G3MOD, G7, G10, Hesse, C07, C08, C14, and C15) as
can be seen from the resulting performance and data profiles in Fig. 3.4. A possible
explanation for this is that the more thorough local refinement step that uses the
gradients of the RBF models of the objective and constraint functions is able to yield
a more promising point than the one provided by the simpler sampling procedure
in CEP-RBF on more difficult problems (either high-dimensional or has many constraints). These results provide evidence that the trust-region-like local refinement
step in TRICEPS-RBF yields better results than the previously developed CEP-RBF
on the higher dimensional or more highly constrained problems.

70

R.G. Regis
Performance profiles after 200 simulations (constraint tolerance = 106 )
1
0.9
0.8
0.7
0.6

(2+2)TRICEPSRBF
(5+5)TRICEPSRBF
(2+2)CEPRBF
(5+5)CEPRBF
(2+2)CEP
(5+5)CEP

() 0.5
s
0.4
0.3
0.2
0.1
0

1.5

2.5

3.5

Performance Factor
6

Data profiles up to 50 simplex gradients (constraint tolerance = 10 )


1
0.9
0.8
0.7
0.6

(2+2)TRICEPSRBF
(5+5)TRICEPSRBF
(2+2)CEPRBF
(5+5)CEPRBF
(2+2)CEP
(5+5)CEP

d ()
s

0.5
0.4
0.3
0.2
0.1
0

10

15

20

25

30

35

40

45

50

Number of Simplex Gradients [Function Evaluations/(d+1)]

Fig. 3.2 Performance and data profiles for (+)-TRICEPS-RBF, (+)-CEP-RBF and (+)CEP on all test problems

3 Trust Regions in Surrogate-Assisted Evolutionary Programming . . .

71

Performance profiles after 300 simulations (constraint tolerance = 106 )


1
0.9
0.8
0.7
0.6

(2+2)TRICEPSRBF
(5+5)TRICEPSRBF
(2+2)CEPRBF
(5+5)CEPRBF
(2+2)CEP
(5+5)CEP

() 0.5
s
0.4
0.3
0.2
0.1
0

1.5

2.5

3.5

Performance Factor
6

Data profiles up to 10 simplex gradients (constraint tolerance = 10 )


1
(2+2)TRICEPSRBF
(5+5)TRICEPSRBF
(2+2)CEPRBF
(5+5)CEPRBF
(2+2)CEP
(5+5)CEP

0.9
0.8
0.7
0.6

d () 0.5
s
0.4
0.3
0.2
0.1
0

10

Number of Simplex Gradients [Function Evaluations/(d+1)]

Fig. 3.3 Performance and data profiles for (+)-TRICEPS-RBF, (+)-CEP-RBF and (+)CEP on the 30-dimensional test problems

72

R.G. Regis
Performance profiles after 200 simulations (constraint tolerance = 106 )
1
0.9
0.8
0.7
0.6

(2+2)TRICEPSRBF
(5+5)TRICEPSRBF
(2+2)CEPRBF
(5+5)CEPRBF
(2+2)CEP
(5+5)CEP

s() 0.5
0.4
0.3
0.2
0.1
0

1.5

2.5

3.5

Performance Factor
Data profiles up to 50 simplex gradients (constraint tolerance = 106 )
1
0.9
0.8
0.7
0.6

d ()
s

0.5
0.4
0.3

(2+2)TRICEPSRBF
(5+5)TRICEPSRBF
(2+2)CEPRBF
(5+5)CEPRBF
(2+2)CEP
(5+5)CEP

0.2
0.1
0

10

15

20

25

30

35

40

45

50

Number of Simplex Gradients [Function Evaluations/(d+1)]

Fig. 3.4 Performance and data profiles for (+)-TRICEPS-RBF, (+)-CEP-RBF and (+)CEP on test problems with at least 20 decision variables or with at least 5 inequality constraints

3 Trust Regions in Surrogate-Assisted Evolutionary Programming . . .

73

3.5.3 Comparisons Between TRICEPS-RBF and Alternative


Methods on the Benchmark Test Problems
The (2 + 2)-TRICEPS-RBF is also compared with alternative methods including
(2 + 2)-PenCEP-RBF, ConstrLMSRBF, Scatter Search, Stochastic Ranking ES, and
SDPEN. The performance profiles on the test problems after 200 simulations and
the data profiles up to a maximum number of simulations equivalent to 50 simplex
gradients are shown in Fig. 3.5. It is clear from the performance and data profiles
that the (2 + 2)-TRICEPS-RBF is generally much better than other alternatives,
including the mathematically rigorous sequential penalty derivative-free algorithm
SDPEN that is published in a prestigious optimization journal. However, to be fair,
Scatter Search, Stochastic Ranking ES and SDPEN do not use surrogates and it would
be interesting to see how their performance would change if they are also combined
with surrogates.
To get some idea of how the different algorithms compare on individual test
problems, figures in Appendix B show the data profiles on some of the test problems.
For example, Figs. 3.10, 3.11, 3.12, 3.13 and 3.14 show the data profiles on some
problems where the (2 + 2)-TRICEPS-RBF performed very well in comparison with
the alternatives. However, although the (2 + 2)-TRICEPS-RBF is generally much
better than the alternatives on the test problems, Figs. 3.15, 3.16 and 3.17 show some
test problems where its performance is not as good as some of the alternatives.

3.5.4 Comparisons Between TRICEPS-RBF and Alternatives


on the MOPTA08 Automotive Application Problem
Table 3.2 provides the statistics on the best feasible objective function value (over
10 trials) obtained by TRICEPS-RBF and the alternative methods after 1,000 simulations of the MOPTA08 problem. Some of these results are taken from Regis
(2014a, b). It is clear from this table that the (2 + 2)-TRICEPS-RBF-BCS is the best
among the different algorithms used on the MOPTA08 problem. In particular, the
(2 + 2)-TRICEPS-RBF-BCS is an improvement over the (2 + 2)-CEP-RBF-BCS
and it is better than ConstrLMSRBF-LHD-BCS (Regis 2011) on the MOPTA08
problem. Moreover, (2 + 2)-TRICEPS-RBF (without the BCS strategy) is a substantial improvement over (2 + 2)-CEP-RBF (without BCS). This suggests that the
trust-region-like local refinement step in TRICEPS-RBF is also helpful for the larger
and more complex MOPTA08 problem. As before, it is of interest to note that the
(2 + 2)-TRICEPS-RBF-BCS, (2 + 2)-TRICEPS-RBF, and (2 + 2)-CEP-RBF-BCS
performed much better than SDPEN, which is a sequential penalty derivative-free
algorithm with a mathematically rigorous convergence guarantee.

74

R.G. Regis
6

Performance profiles after 200 simulations (constraint tolerance = 10 )

1
0.9
0.8
0.7
(2+2)TRICEPSRBF
(2+2)CEPRBF
(2+2)PenCEPRBF
ConstrLMSRBF
Scatter Search
Stochastic Ranking ES
SDPENm

0.6

() 0.5
s
0.4
0.3
0.2
0.1
0

Performance Factor
Data profiles up to 50 simplex gradients (constraint tolerance = 106)

1
0.9
0.8
0.7
(2+2)TRICEPSRBF
(2+2)CEPRBF
(2+2)PenCEPRBF
ConstrLMSRBF
Scatter Search
Stochastic Ranking ES
SDPENm

0.6

d () 0.5
s
0.4
0.3
0.2
0.1
0

10

15

20

25

30

35

40

45

50

Number of Simplex Gradients [Function Evaluations/(d+1)]

Fig. 3.5 Performance and data profiles for ( + )-TRICEPS-RBF and alternative methods on all
test problems

3 Trust Regions in Surrogate-Assisted Evolutionary Programming . . .

75

Table 3.2 Statistics on best feasible objective function value after 1,000 simulations of the
MOPTA08 problem (10 trials)
Algorithm
Best
Median
Worst
Mean
Std Error
(2+2)-TRICEPS-RBF
(2+2)-TRICEPS-RBF-BCS
(2+2)-CEP-RBF
(2+2)-CEP-RBF-BCS
(2+2)-PenCEP-RBF
(2+2)-PenCEP-RBF-BCS
(2+2)-CEP
Stochastic Ranking ES
Scatter Search (eSS)
ConstrLMSRBF-LHD-BCS
SDPEN

227.27
225.48
231.18
226.76
251.07
246.96
251.07
251.07
251.07
225.75
231.77

228.18
226.19
238.62
228.51
251.07
247.84
251.07
251.07
251.07
227.30
231.77

228.76
227.42
251.07
228.92
251.07
248.99
251.07
251.07
251.07
228.64
231.77

228.20
226.43
240.13
228.16
251.07
247.84
251.07
251.07
251.07
227.27
231.77

0.14
0.22
2.10
0.23
0.00
0.22
0.00
0.00
0.00
0.26
0

There is only one trial for SDPEN because it is deterministic

3.5.5 Sensitivity of TRICEPS-RBF to Algorithm Parameters


As can be seen from Sect. 3.3.2, TRICEPS depends on many user-specified parameters. This section analyzes how sensitive TRICEPS-RBF is to some of these parameters. In particular, the (2 + 2)-TRICEPS-RBF is run on the same test problems by
varying the values of the parameters (the number of trial offspring generated for
each parent), init (the initial standard deviation of the Gaussian mutations), and init
(the initial trust-region radius). As before, the (2 + 2)-TRICEPS-RBF using a given
set of parameters is run for 30 trials for each test problem. The sensitivity analysis
is only performed on three of the parameters since a full analysis of all parameters
is computationally prohibitive since the use of surrogates in TRICEPS-RBF incurs
substantial computing cost.
Figure 3.6 shows the data profiles of (2+2)-TRICEPS-RBF with = min(1,000d,
104 ) (default), = min(500d, 104 ), and = min(100d, 104 ). Note that there
does not seem to be much difference in performance between the default and
= min(500d, 104 ) but there was some deterioration in performance for the much
smaller value = min(100d, 104 ). This indicates that (2 + 2)-TRICEPS-RBF is not
very sensitive to when it is reasonably large. This is somewhat expected because
when the value of is large enough to generate trial offspring that adequately sample
the neighborhood of a parent solution, adding more trial offspring is not expected
to improve performance. However, a much smaller value of could result in a less
thorough search for promising offspring for each parent solution thereby resulting
in diminished performance.
Figure 3.7 shows the data profiles of (2 + 2)-TRICEPS-RBF with init =
0.05([a, b]) (default), init = 0.1([a, b]), and init = 0.2([a, b]) on all test problems. (Recall that for all test problems, [a, b] = [0, 1]d so ([a, b]) = 1.) Moreover,
Fig. 3.8 shows the data profiles of the same algorithms on the problems with at least 5

76

R.G. Regis
6

Data profiles up to 30 simplex gradients (constraint tolerance = 10 )

1
0.9
0.8
0.7
0.6

d () 0.5
s
0.4
0.3
0.2

(2+2)TRICEPSRBF ( = min(1000d, 104) )

0.1

(2+2)TRICEPSRBF ( = min(500d, 104) )

(2+2)TRICEPSRBF ( = min(100d, 104) )

10

15

20

25

30

Number of Simplex Gradients [Function Evaluations/(d+1)]

Fig. 3.6 Data profiles for (2 + 2)-TRICEPS-RBF with different values of on all test problems

Data profiles up to 30 simplex gradients (constraint tolerance = 106 )

1
0.9
0.8
0.7
0.6

d ()
s

0.5
0.4
0.3
0.2

(2+2)TRICEPSRBF (init = 0.05)

0.1

(2+2)TRICEPSRBF (init = 0.1)


(2+2)TRICEPSRBF (

init

10

15

20

25

= 0.2)

30

Number of Simplex Gradients [Function Evaluations/(d+1)]

Fig. 3.7 Data profiles for (2 + 2)-TRICEPS-RBF with different values of init on all test problems

3 Trust Regions in Surrogate-Assisted Evolutionary Programming . . .

77
6

Data profiles up to 30 simplex gradients (constraint tolerance = 10 )

1
0.9
0.8
0.7
0.6

d ()
s

0.5
0.4
0.3
(2+2)TRICEPSRBF (init = 0.05)

0.2

(2+2)TRICEPSRBF (

= 0.1)

(2+2)TRICEPSRBF (

= 0.2)

init

0.1

init

10

15

20

25

30

Number of Simplex Gradients [Function Evaluations/(d+1)]

Fig. 3.8 Data profiles for (2 + 2)-TRICEPS-RBF with different values of init on problems with
at least 5 inequality constraints

Data profiles up to 30 simplex gradients (constraint tolerance = 106 )

1
0.9
0.8
0.7
0.6

d ()
s

0.5
0.4
0.3
(2+2)TRICEPSRBF (init = 0.05)

0.2

(2+2)TRICEPSRBF (

= 0.1)

(2+2)TRICEPSRBF (

= 0.2)

init

0.1

init

10

15

20

25

30

Number of Simplex Gradients [Function Evaluations/(d+1)]

Fig. 3.9 Data profiles for (2 + 2)-TRICEPS-RBF with different values of init on all test problems

78

R.G. Regis

inequality constraints. Note that the (2 + 2)-TRICEPS-RBF appears to be somewhat


sensitive to the choice of the initial standard deviation of the Gaussian mutations
init with the default setting (the smallest init ) being the best choice among the three
settings for the test problems used. A possible explanation for this is that, in a constrained problem, it makes sense to be conservative with the mutations when starting
from a feasible point. Larger values of init are more likely to generate points that
violate one of the constraints, especially when there are many constraints.
Finally, Fig. 3.9 shows the data profiles of (2 + 2)-TRICEPS-RBF with init =
0.05([a, b]) (default), init = 0.1([a, b]), and init = 0.2([a, b]). Note that the
(2 + 2)-TRICEPS-RBF appears to be somewhat sensitive to the choice of the initial
trust-region radius init . In particular, on the test problems used, a larger initial trustregion radius than the default value seems to result in better performance possibly
because it allows for larger steps.

3.6 Conclusions
This paper developed the TRICEPS algorithm, which is a surrogate-assisted Evolutionary Programming (EP) algorithm for computationally expensive constrained
optimization problems having only black-box inequality constraints and bound constraints. It is meant to be an improvement over CEP-RBF (Regis 2014b) in that the
algorithm performs a trust-region-like local refinement step at the end of every generation where it finds a minimizer of the surrogate model of the objective within a trust
region subject to surrogate inequality constraints with a small margin and subject to
some distance requirement from previously evaluated points. Moreover, TRICEPS
is implemented using a cubic RBF with a linear polynomial tail and a gradient-based
algorithm is used to solve the trust-region-like subproblem. TRICEPS-RBF and CEPRBF are among the few surrogate-assisted EAs that use surrogates to approximate
the constraints and that have been successfully applied to a problem that is considered
large-scale in surrogate-based or surrogate-assisted optimization. TRICEPS-RBF is
compared with alternatives, including CEP-RBF and the mathematically rigorous
sequential penalty derivative-free algorithm SDPEN (Liuzzi et al. 2010), on 18 wellknown benchmark problems and on the MOPTA08 automotive application with 124
decision variables and 68 black-box inequality constraints, which is much larger than
the typical problem used in this area.
TRICEPS-RBF and the alternatives are compared on the 18 test problems using
performance and data profiles (Mor and Wild 2009) instead of average progress
curves such as the ones used in Regis (2014b). Moreover, the algorithms are compared in terms of the best feasible objective function value obtained after only 1,000
simulations on the MOPTA08 problem. The profile curves show that TRICEPS-RBF
is an improvement over CEP-RBF on problems that are either high-dimensional or
highly constrained. Moreover, the results confirm the previous findings in Regis
(2014b) that using an RBF surrogate can dramatically improve the performance of a
constrained EP. Furthermore, the (2 + 2)-TRICEPS-RBF algorithm is substantially

3 Trust Regions in Surrogate-Assisted Evolutionary Programming . . .

79

and consistently much better than the SDPEN algorithm, an RBF-assisted penaltybased EP, Stochastic Ranking Evolution Strategy (SRES) and Scatter Search (eSS)
on the problems in this study when the algorithms are given a very limited computational budget. In addition, TRICEPS-RBF is also better than the ConstrLMSRBFLHD heuristic (Regis 2011). Finally, sensitivity analyses of TRICEPS-RBF to some
of the user-specified parameters on the test problems suggest that it is somewhat
sensitive to the choice of the initial standard deviation of the Gaussian mutations and
the initial trust-region radius but not so much on the number of trial offpsring for
each parent solution.
On the MOPTA08 problem, (2 + 2)-TRICEPS-RBF-BCS is better than both
(2+2)-CEP-RBF-BCS (Regis 2014b) and ConstrLMSRBF-LHD-BCS (Regis 2011)
while requiring much less computational overhead than ConstrLMSRBF-LHD-BCS.
Moreover, both (2 + 2)-TRICEPS-RBF-BCS and (2 + 2)-CEP-RBF-BCS are much
better than the other alternatives, including SDPEN, on the MOPTA08 problem.
In addition, the results also confirm the previous finding in Regis (2014b) that the
BCS strategy (Regis 2011, 2014b) is very promising for high-dimensional problems
and highly constrained problems. Overall, TRICEPS-RBF is very promising for
computationally expensive constrained black-box optimization and it helps push the
frontier of surrogate-assisted constrained evolutionary optimization.
Acknowledgments Special thanks to Don Jones from General Motors Product Development for
proposing the MOPTA08 benchmark problem and for making a Fortran simulation code for this
problem publicly available. I would also like to thank Prof. Thomas Philip Runarsson for the Matlab
code for Stochastic Ranking Evolution Strategy, Dr. Julio Bangas research group for the Matlab
code for Scatter Search, and Drs. Mallipeddi and Suganthan for the codes that implement the
benchmark problems from the CEC 2010 competition.

Appendix
A. Test Problems
There are four engineering design test problems: Welded Beam Design Problem
(WB4) (Coello Coello and Mezura-Montes 2002; Hedar 2004), Pressure Vessel
Design Problem (PVD4) (Coello Coello and Mezura-Montes 2002; Hedar 2004),
Gas Transmission Compressor Design Problem (GTCD) (Beightler and Phillips
1976), and Speed Reducer Design for small aircraft engine (SR7) (Floudas and
Pardalos 1990). Nine of the test problems are from the well-known constrained optimization test problems in Michalewicz and Schoenauer (1996). These are labeled
G2, G3MOD, G4, G5MOD, G6, G7, G8, G9, and G10. The G3MOD and G5MOD
problems are obtained from G3 and G5 by replacing all equality constraints with
inequality constraints. The Hesse problem is from Hesse (1973). Finally, four of the
test problems are the 30-dimensional versions of the problems C07, C08, C14 and
C15 from Mallipeddi and Suganthan (2010).

80

R.G. Regis

As mentioned earlier, some of the constraint functions are modified by either


dividing by a positive constant or by applying a logarithmic transformation without
changing the feasible region. A similar modification of the constraint functions was
performed by Jones (2008) on the MOPTA08 problem so that the constraints are wellnormalized. The plog transformation used in some of the constraints was introduced
in Regis and Shoemaker (2013a) and it is defined by

plog(x) =

log(1 + x)
log(1 x)

if x 0
if x < 0

where log is the natural logarithm. The mathematical properties of this transformation
are discussed in Regis and Shoemaker (2013a). In particular, it is strictly increasing,
symmetric with respect to the origin, and it tones down extremely high or extremely
negative function values without changing the location of the local minima and
maxima.
Welded Beam (WB4) (Coello Coello and Mezura-Montes 2002; Hedar 2004):
f (x) = 1.10471x12 x2 + 0.04811x3 x4 (14.0 + x2 )
s.t.
P = 6, 000, L = 14, E = 30 106 , G = 12 106
tmax = 13600, smax = 30, 000, xmax = 10, dmax = 0.25

M = P(L + x2 /2), R = 0.25(x22 + (x1 + x3 )2 )

J = 2x1 x2 (x22 /12 + 0.25(x1 + x3 )2 )



E/G
4.013E
3
x
x
Pc =
1

0.25x
3 4
3
6L 2
L

t1 = P/( 2x1 x2 ), t2 = MR/J

t = t12 + t1 t2 x2 /R + t22
s = 6PL/(x4 x32 )
d = 4PL 3 /(Ex4 x33 )
g1 (x) = (t tmax )/tmax 0
g2 (x) = (s smax )/smax 0
g3 (x) = (x1 x4 )/xmax 0
g4 (x) = (0.10471x12 + 0.04811x3 x4 (14.0 + x2 ) 5.0)/5.0 0
g5 (x) = (d dmax )/dmax 0
g6 (x) = (P Pc )/P 0
0.125 x1 10, 0.1 xi 10 for i = 2, 3, 4

3 Trust Regions in Surrogate-Assisted Evolutionary Programming . . .

81

Pressure Vessel Design (PVD4) (Coello Coello and Mezura-Montes 2002; Hedar
2004):
f (x) = 0.6224x1 x3 x4 + 1.7781x2 x32 + 3.1661x12 x4 + 19.84x12 x3
s.t.
g1 (x) = x1 + 0.0193x3 0
g2 (x) = x2 + 0.00954x3 0
g3 (x) = plog( x32 x4 43 x33 + 12, 96, 000) 0
0 x1 , x2 1, 0 x3 50, 0 x4 240
Speed Reducer (SR7) (Floudas and Pardalos 1990):
f (x) = 0.7854x1 x22 A 1.508x1 B + 7.477C + 0.7854D
where
A = 3.3333x32 + 14.9334x3 43.0934
B = x62 + x72
C = x63 + x73
D = x4 x62 + x5 x72
s.t.
g1 (x) = (27 x1 x22 x3 )/27 0
g2 (x) = (397.5 x1 x22 x32 )/397.5 0
g3 (x) = (1.93 (x2 x64 x3 )/x43 )/1.93 0
g4 (x) = (1.93 (x2 x74 x3 )/x53 )/1.93 0
0.5

A1 = (745x4 /(x2 x3 ))2 + (16.91 106 )
B1 = 0.1x63
g5 (x) = ((A1/B1) 1100)/1100 0
0.5

A2 = (745x5 /(x2 x3 ))2 + (157.5 106 )
B2 = 0.1x73
g6 (x) = ((A2/B2) 850)/850 0
g7 (x) = (x2 x3 40)/40 0
g8 (x) = (5 (x1 /x2 ))/5 0
g9 (x) = ((x1 /x2 ) 12)/12 0
g10 (x) = (1.9 + 1.5x6 x4 )/1.9 0
g11 (x) = (1.9 + 1.1x7 x5 )/1.9 0
2.6 x1 3.6, 0.7 x2 0.8, 17 x3 28
7.3 x4 , x5 8.3, 2.9 x6 3.9, 5.0 x7 5.5

82

R.G. Regis

Gas Transmission Compressor Design (GTCD) (Beightler and Phillips 1976):


2/3 1/2
x4
+ (3.69 104 )x3
+ (7.72 108 )x11 x20.219 (765.43 106 )x11
1/2

f (x) = (8.61 105 )x1 x2 x3


s.t.

g1 (x) = x4 x22 + x22 1 0


20 x1 50, 1 x2 10, 20 x3 50, 0.1 x4 60

G2 (Michalewicz and Schoenauer 1996) (d = 10):






 d
 i=1 cos4 (xi ) 2 di=1 cos2 (xi ) 




f (x) = 

d
2


i=1 ixi
s.t.

 d


/plog(10d ) 0
g1 (x) = plog xi + 0.75

g2 (x) =

i=1
d


xi 7.5d /(2.5d) 0

i=1

0 xi 10 for i = 1, 2, . . . , d
G3MOD (Michalewicz and Schoenauer 1996) (d = 20):


d
d
xi
f (x) = plog ( d)

i=1

s.t.
g1 (x) =

d


xi2 1 0

i=1

0 xi 1 for i = 1, 2, . . . , d
G4 (Michalewicz and Schoenauer 1996):
f (x) = 5.3578547x32 + 0.8356891x1 x5 + 37.293239x1 40792.141
s.t.
u = 85.334407 + 0.0056858x2 x5 + 0.0006262x1 x4 0.0022053x3 x5
g1 (x) = u 0
g2 (x) = u 92 0

3 Trust Regions in Surrogate-Assisted Evolutionary Programming . . .

v = 80.51249 + 0.0071317x2 x5 + 0.0029955x1 x2 + 0.0021813x32


g3 (x) = v + 90 0
g4 (x) = v 110 0
w = 9.300961 + 0.0047026x3 x5 + 0.0012547x1 x3 + 0.0019085x3 x4
g5 (x) = w + 20 0
g6 (x) = w 25 0
78 x1 102, 33 x2 45, 27 xi 45 for i = 3, 4, 5

G5MOD (Michalewicz and Schoenauer 1996):


f (x) = 3x1 + 106 x13 + 2x2 + (2 106 /3)x23
s.t.
g1 (x) = x3 x4 0.55 0
g2 (x) = x4 x3 0.55 0
g3 (x) = 1,000 sin(x3 0.25) + 1,000 sin(x4 0.25) + 894.8 x1 0
g4 (x) = 1,000 sin(x3 0.25) + 1,000 sin(x3 x4 0.25) + 894.8 x2 0
g5 (x) = 1,000 sin(x4 0.25) + 1,000 sin(x4 x3 0.25) + 1294.8 0
0 x1 , x2 1, 200, 0.55 x3 , x4 0.55

G6 (Michalewicz and Schoenauer 1996):


f (x) = (x1 10)3 + (x2 20)3
s.t.
g1 (x) = ((x1 5)2 (x2 5)2 + 100)/100 0
g2 (x) = ((x1 6)2 + (x2 5)2 82.81)/82.81 0
13 x1 100, 0 x2 100
G7 (Michalewicz and Schoenauer 1996):
f (x) = x12 + x22 + x1 x2 14x1 16x2 + (x3 10)2 + 4(x4 5)2
+ (x5 3)2 + 2(x6 1)2 + 5x72 + 7(x8 11)2
+ 2(x9 10)2 + (x10 7)2 + 45
s.t.
g1 (x) = (4x1 + 5x2 3x7 + 9x8 105)/105 0
g2 (x) = (10x1 8x2 17x7 + 2x8 )/370 0
g3 (x) = (8x1 + 2x2 + 5x9 2x10 12)/158 0
g4 (x) = (3(x1 2)2 + 4(x2 3)2 + 2x32 7x4 120)/1258 0

83

84

R.G. Regis

g5 (x) = (5x12 + 8x2 + (x3 6)2 2x4 40)/816 0


g6 (x) = (0.5(x1 8)2 + 2(x2 4)2 + 3x52 x6 30)/834 0
g7 (x) = (x12 + 2(x2 2)2 2x1 x2 + 14x5 6x6 )/788 0
g8 (x) = (3x1 + 6x2 + 12(x9 8)2 7x10 )/4048 0
10 xi 10 for i = 1, 2, . . . , 10
G8 (Michalewicz and Schoenauer 1996):
f (x) =

sin3 (2x1 ) sin(2x2 )


x13 (x1 + x2 )

s.t.
g1 (x) = x12 x2 + 1 0
g2 (x) = 1 x1 + (x2 4)2 0
0 x1 , x2 10
G9 (Michalewicz and Schoenauer 1996):
f (x) = (x1 10)2 + 5(x2 12)2 + x34 + 3(x4 11)2
+10x56 + 7x62 + x74 4x6 x7 10x6 8x7
s.t.
g1 (x) = (2x12 + 3x24 + x3 + 4x42 + 5x5 127)/127 0
g2 (x) = (7x1 + 3x2 + 10x32 + x4 x5 282)/282 0
g3 (x) = (23x1 + x22 + 6x62 8x7 196)/196 0
g4 (x) = 4x12 + x22 3x1 x2 + 2x32 + 5x6 11x7 0
10 xi 10 for i = 1, . . . , 7
G10 (Michalewicz and Schoenauer 1996):
f (x) = x1 + x2 + x3
s.t.
g1 (x) = 1 + 0.0025(x4 + x6 ) 0
g2 (x) = 1 + 0.0025(x4 + x5 + x7 ) 0
g3 (x) = 1 + 0.01(x5 + x8 ) 0
g4 (x) = plog(100x1 x1 x6 + 833.33252x4 83333.333) 0
g5 (x) = plog(x2 x4 x2 x7 1, 250x4 + 1, 250x5 ) 0
g6 (x) = plog(x3 x5 x3 x8 2, 500x5 + 12, 50, 000) 0
102 x1 104 , 103 x2 , x3 104 ,
10 xi 103 for i = 4, 5, . . . , 8

3 Trust Regions in Surrogate-Assisted Evolutionary Programming . . .

85

Hesse (1973):
f (x) = 25(x1 2)2 (x2 2)2 (x3 1)2 (x4 4)2 (x5 1)2 (x6 4)2

s.t.
g1 (x) = (2 x1 x2 )/2 0
g2 (x) = (x1 + x2 6)/6 0
g3 (x) = (x1 + x2 2)/2 0
g4 (x) = (x1 (3x2 ) 2)/2 0
g5 (x) = (4 (x3 3)2 x4 )/4 0
g6 (x) = (4 (x5 3)2 x6 )/4 0
0 x1 5, 0 x2 4, 1 x3 5
0 x4 6, 1 x5 5, 0 x6 10

C07 (Mallipeddi and Suganthan 2010):


f (x) =

d1

[100(zi2 zi+1 )2 + (zi 1)2 ]
i=1

where z = x + 1 o, y = x o and o is given in


the code by Mallipeddi and Suganthan 2010
s.t.


 d
1 
yi2
g1 (x) = 0.5 exp 0.1
d
i=1
 d

1
3 exp
cos(0.1yi ) + exp(1) 1
d
i=1

140 xi 140, i = 1, . . . , d
C08 (Mallipeddi and Suganthan 2010):
f (x) =

d1

[100(zi2 zi+1 )2 + (zi 1)2 ]
i=1

where z = x + 1 o, y = (x o)M and o and M are


given in the code by Mallipeddi and Suganthan 2010
s.t.

86

R.G. Regis

 d
1 
g1 (x) = 0.5 exp 0.1
yi2
d
i=1
 d


1
3 exp
cos(0.1yi ) + exp(1) 1
d

i=1

140 xi 140, i = 1, . . . , d

C14 (Mallipeddi and Suganthan 2010):


f (x) =

d1


[100(zi2 zi+1 )2 + (zi 1)2 ]

i=1

where z = x + 1 o, y = x o and o is given in


the code by Mallipeddi and Suganthan (2010)
s.t.
d


(yi cos( |yi |)) d 0
g1 (x) =
i=1

g2 (x) =

d



(yi cos( |yi |)) d 0

i=1

d


g3 (x) =
(yi sin( |yi |)) 10d 0
i=1

1,000 xi 1,000, i = 1, . . . , d

C15 (Mallipeddi and Suganthan 2010):


f (x) =

d1

[100(zi2 zi+1 )2 + (zi 1)2 ]
i=1

where z = x + 1 o, y = (x o)M and o and M are


given in the code by Mallipeddi and Suganthan (2010)
s.t.
d


(yi cos( |yi |)) d 0
g1 (x) =
i=1

3 Trust Regions in Surrogate-Assisted Evolutionary Programming . . .

g2 (x) =

d


87


(yi cos( |yi |)) d 0

i=1

d


g3 (x) =
(yi sin( |yi |)) 10d 0
i=1

1,000 xi 1,000, i = 1, . . . , d

B. Additional Data Profiles


Figures 3.10, 3.11, 3.12, 3.13, 3.14, 3.15, 3.16 and 3.17.
Data profiles up to 10 simplex gradients (constraint tolerance = 106 )

1
(2+2)TRICEPSRBF
(2+2)CEPRBF
(2+2)PenCEPRBF
ConstrLMSRBF
Scatter Search
Stochastic Ranking ES
SDPENm

0.9
0.8
0.7
0.6

d () 0.5
s
0.4
0.3
0.2
0.1
0

10

Number of Simplex Gradients [Function Evaluations/(d+1)]

Fig. 3.10 Data profiles for ( + )-TRICEPS-RBF and alternative methods on the G3MOD problem

88

R.G. Regis
6

Data profiles up to 10 simplex gradients (constraint tolerance = 10 )

1
0.9
0.8
0.7
0.6

d () 0.5
s
0.4
(2+2)TRICEPSRBF
(2+2)CEPRBF
(2+2)PenCEPRBF
ConstrLMSRBF
Scatter Search
Stochastic Ranking ES
SDPENm

0.3
0.2
0.1
0

10

Number of Simplex Gradients [Function Evaluations/(d+1)]

Fig. 3.11 Data profiles for ( + )-TRICEPS-RBF and alternative methods on the C07 problem

Data profiles up to 30 simplex gradients (constraint tolerance = 106)


1
0.9
0.8
0.7
0.6

ds()

(2+2)TRICEPSRBF
(2+2)CEPRBF
(2+2)PenCEPRBF
ConstrLMSRBF
Scatter Search
Stochastic Ranking ES
SDPENm

0.5
0.4
0.3
0.2
0.1
0

10

15

20

25

30

Number of Simplex Gradients [Function Evaluations/(d+1)]

Fig. 3.12 Data profiles for ( + )-TRICEPS-RBF and alternative methods on the Hesse problem

3 Trust Regions in Surrogate-Assisted Evolutionary Programming . . .

89

Data profiles up to 100 simplex gradients (constraint tolerance = 106 )

1
(2+2)TRICEPSRBF
(2+2)CEPRBF
(2+2)PenCEPRBF
ConstrLMSRBF
Scatter Search
Stochastic Ranking ES
SDPENm

0.9
0.8
0.7
0.6

d () 0.5
s
0.4
0.3
0.2
0.1
0

10

20

30

40

50

60

70

80

90

100

Number of Simplex Gradients [Function Evaluations/(d+1)]

Fig. 3.13 Data profiles for ( + )-TRICEPS-RBF and alternative methods on the G8 problem

Data profiles up to 30 simplex gradients (constraint tolerance = 10 )

1
0.9
0.8
(2+2)TRICEPSRBF
(2+2)CEPRBF
(2+2)PenCEPRBF
ConstrLMSRBF
Scatter Search
Stochastic Ranking ES
SDPENm

0.7
0.6

d ()
s

0.5
0.4
0.3
0.2
0.1
0

10

15

20

25

30

Number of Simplex Gradients [Function Evaluations/(d+1)]

Fig. 3.14 Data profiles for ( + )-TRICEPS-RBF and alternative methods on the Speed Reducer
(SR7) problem

90

R.G. Regis
Data profiles up to 10 simplex gradients (constraint tolerance = 106 )
1

0.9
0.8
0.7
0.6

ds() 0.5
(2+2)TRICEPSRBF
(2+2)CEPRBF
(2+2)PenCEPRBF
ConstrLMSRBF
Scatter Search
Stochastic Ranking ES
SDPENm

0.4
0.3
0.2
0.1
0

10

Number of Simplex Gradients [Function Evaluations/(d+1)]

Fig. 3.15 Data profiles for ( + )-TRICEPS-RBF and alternative methods on the C08 problem

Data profiles up to 30 simplex gradients (constraint tolerance = 10 )

1
(2+2)TRICEPSRBF
(2+2)CEPRBF
(2+2)PenCEPRBF
ConstrLMSRBF
Scatter Search
Stochastic Ranking ES
SDPENm

0.9
0.8
0.7
0.6

d ()
s

0.5
0.4
0.3
0.2
0.1
0

10

15

20

25

30

Number of Simplex Gradients [Function Evaluations/(d+1)]

Fig. 3.16 Data profiles for ( + )-TRICEPS-RBF and alternative methods on the G9 problem

3 Trust Regions in Surrogate-Assisted Evolutionary Programming . . .

91
6

Data profiles up to 50 simplex gradients (constraint tolerance = 10 )

1
0.9
0.8
0.7
0.6

ds()

0.5
(2+2)TRICEPSRBF
(2+2)CEPRBF
(2+2)PenCEPRBF
ConstrLMSRBF
Scatter Search
Stochastic Ranking ES
SDPENm

0.4
0.3
0.2
0.1
0

10

15

20

25

30

35

40

45

50

Number of Simplex Gradients [Function Evaluations/(d+1)]

Fig. 3.17 Data profiles for ( + )-TRICEPS-RBF and alternative methods on the Pressure Vessel
Design (PVD4) problem

References
Araujo MC, Wanner EF, Guimares FG, Takahashi RHC (2009) Constrained optimization based on
quadratic approximations in genetic algorithms. In: Mezura-Montes E (ed) Constraint-handling
in evolutionary computation. Studies in Computational Intelligence, vol 198, Chapter 9. Springer,
Berlin, pp 193217
Arnold DV, Hansen NA (2012) (1 + 1)-CMA-ES for constrained optimisation. In: 2012 genetic
and evolutionary computation conference (GECCO 2012), Philadelphia, July 2012. ACM Press,
pp 297304
Basudhar A, Dribusch C, Lacaze S, Missoum S (2012) Constrained efficient global optimization
with support vector machines. Struct Multidiscip Optim 46(2):201221
Beightler CS, Phillips DT (1976) Applied geometric programming. Wiley, New York
Bjrkman M, Holmstrm K (2000) Global optimization of costly nonconvex functions using radial
basis functions. Optim Eng 1(4):373397
Coello Coello CA (2012) Constraint-handling techniques used with evolutionary algorithms. In:
Proceedings of the genetic and evolutionary computation conference (GECCO 2012) companion,
pp 849872
Coello Coello CA, Mezura-Montes E (2002) Constraint-handling in genetic algorithms through the
use of dominance-based tournament selection. Adv Eng Inform 16(3):193203
Coello Coello CA, Landa-Becerra R (2004) Efficient evolutionary optimization through the use of
a cultural algorithm. Eng Optim 36(2):219236
Datta R, Deb K (2013) Individual penalty based constraint handling using a hybrid bi-objective and
penalty function approach. In: 2013 IEEE congress on evolutionary computation (CEC 2013),
Cancn, Mxico, June 2013. IEEE Press, pp 27202727

92

R.G. Regis

Deb K, Datta R (2013) A bi-objective constrained optimization algorithm using a hybrid evolutionary and penalty function approach. Eng Optim 45(5):503527
Egea JA, Rodriguez-Fernandez M, Banga JR, Mart R (2007) Scatter search for chemical and
bioprocess optimization. J Glob Optim 37(3):481503
Egea JA, Vazquez E, Banga JR, Mart R (2009) Improved scatter search for the global optimization
of computationally expensive dynamic models. J Glob Optim 43(23):175190
Emmerich MTM, Giannakoglou K, Naujoks B (2006) Single- and multiobjective evolutionary
optimization assisted by Gaussian random field metamodels. IEEE Trans Evol Comput 10(4):421
439
Emmerich M, Giotis A, zdemir MM, Bck T, Giannakoglou K (2002) Metamodel-assisted evolution strategies. In: Parallel problem solving from nature VII, pp 362370
Floudas CA, Pardalos PM (1990) A collection of test problems for constrained global optimization
algorithms. Springer, Berlin
Gieseke F, Kramer O (2013) Towards non-linear constraint estimation for expensive optimization.
In: Esparcia-Alczar AI, Isabel A (eds) Evoapplications. Lecture Notes in Computer Science, vol
7835. Springer, Berlin, pp 459468
Gutmann H-M (2001) A radial basis function method for global optimization. J Glob Optim
19(3):201227
Hedar A (2004) Studies on metaheuristics for continuous global optimization problems. PhD thesis,
Kyoto University, Japan
Hesse R (1973) A heuristic search procedure for estimating a global solution of nonconvex programming problems. Oper Res 21:12671280
Isaacs A, Ray T, Smith W (2007) An evolutionary algorithm with spatially distributed surrogates
for multiobjective optimization. In: Randall M et al (eds) Proceedings of the 3rd Australian
conference on progress in artificial life (ACAL 2007) Lecture Notes in Computer Science, vol
4828. Springer, pp 257268
Isaacs A, Ray T, Smith W (2009) Multiobjective design optimization using multiple adaptive spatially distributed surrogates. Int J Prod Dev 9(13):188217
Jin Y (2011) Surrogate-assisted evolutionary computation: recent advances and future challenges.
Swarm Evol Comput 1(2):6170
Jin Y, Olhofer M, Sendhoff B (2002) A framework for evolutionary optimization with approximate
fitness functions. IEEE Trans Evol Comput 6(5):481494
Jones DR (2008) Large-scale multi-disciplinary mass optimization in the auto industry. In: MOPTA,
(2008) modeling and optimization: theory and applications conference, Ontario, Canada, August
2008
Kazemi M, Wang GG, Rahnamayan S, Gupta K (2011) Metamodel-based optimization for problems
with expensive objective and constraint functions. ASME J Mech Des 133(1):014505
Kramer O, Barthelmes A, Rudolph G (2009) Surrogate constraint functions for CMA evolution
strategies. In: Mertsching B, Hund M, Aziz MZ (eds) KI, Lecture Notes in Computer Science,
vol 5803. Springer, pp 169176
Liuzzi G, Lucidi S, Sciandrone M (2010) Sequential penalty derivative-free methods for nonlinear
constrained optimization. SIAM J Optim 20(5):26142635
Loshchilov I, Schoenauer M, Sebag M (2012) Self-adaptive surrogate-assisted covariance matrix
adaptation evolution strategy. In: Proceedings of the genetic and evolutionary computation conference (GECCO 2012), pp 321328
Mallipeddi R, Suganthan PN (2010) Problem definitions and evaluation criteria for the CEC 2010
competition on constrained real-parameter optimization. Technical report, Nanyang Technological University, Singapore
Mezura-Montes E, Coello Coello CA (2005) A simple multimembered evolution strategy to solve
constrained optimization problems. IEEE Trans Evol Comput 9(1):117
Mezura-Montes E, Coello Coello CA (2011) Constraint-handling in nature-inspired numerical optimization: past, present and future. Swarm Evol Comput 1(4):173194

3 Trust Regions in Surrogate-Assisted Evolutionary Programming . . .

93

Mezura-Montes E, Coello Coello CA, Landa-Becerra R (2003) Engineering optimization using


simple evolutionary algorithm. In: Proceedings of the 15th IEEE international conference on
tools with artificial intelligence, November 2003, pp 149156
Michalewicz Z, Schoenauer M (1996) Evolutionary algorithms for constrained parameter optimization problems. Evol Comput 4(1):132
Montao AA, Coello Coello CA, Mezura-Montes E (2012) Multi-objective airfoil shape optimization using a multiple-surrogate approach. In: Proceedings of the IEEE congress on evolutionary
computation 2012. IEEE Press, pp 11881195
Mor J, Wild S (2009) Benchmarking derivative-free optimization algorithms. SIAM J Optim
20(1):172191
Mugunthan P, Shoemaker CA, Regis RG (2005) Comparison of function approximation, heuristic
and derivative-based methods for automatic calibration of computationally expensive groundwater bioremediation models. Water Resour Res 41:W11427
Ong YS, Nair PB, Keane AJ (2003) Evolutionary optimization of computationally expensive problems via surrogate modeling. AIAA J 41(4):687696
Parno MD, Hemker T, Fowler KR (2012) Applicability of surrogates to improve efficiency of
particle swarm optimization for simulation-based problems. Eng Optim 44(5):521535
Powell MJD (1992) The theory of radial basis function approximation in 1990. In: Light W (ed)
Advances in numerical analysis, volume 2: wavelets, subdivision algorithms and radial basis
functions. Oxford University Press, Oxford, pp 105210
Powell MJD (1994) A direct search optimization methods that models the objective and constraint
functions by linear interpolation. In: Gomez S, Hennart JP (eds) Advances in optimization and
numerical analysis. Kluwer, Dordrecht, pp 5167
Regis RG (2011) Stochastic radial basis function algorithms for large-scale optimization involving
expensive black-box objective and constraint functions. Comput Oper Res 38(5):837853
Regis RG (2014a) Constrained optimization by radial basis function interpolation for highdimensional expensive black-box problems with infeasible initial points. Eng Optim 46(2):218
243
Regis RG (2014b) Evolutionary programming for high-dimensional constrained expensive blackbox optimization using radial basis functions. IEEE Trans Evol Comput 18(3):326347
Regis RG (2014c) Particle swarm with radial basis function surrogates for expensive black-box
optimization. J Comput Sci 5(1):1223
Regis RG, Shoemaker CA (2004) Local function approximation in evolutionary algorithms for
costly black box optimization. IEEE Trans Evol Comput 8(5):490505
Regis RG, Shoemaker CA (2007) A stochastic radial basis function method for the global optimization of expensive functions. INFORMS J Comput 19(4):497509
Regis RG, Shoemaker CA (2013a) A quasi-multistart framework for global optimization of expensive functions using response surface models. J Glob Optim 56(4):17191753
Regis RG, Shoemaker CA (2013b) Combining radial basis function surrogates and dynamic coordinate search in high-dimensional expensive black-box optimization. Eng Optim 45(5):529555
Runarsson TP (2004) Constrained evolutionary optimization by approximate ranking and surrogate
models. In: Parallel problem solving from nature VII (PPSN-2004), Lecture Notes in Computer
Science, vol 3242. Springer, pp 401410
Runarsson TP, Yao X (2000) Stochastic ranking for constrained evolutionary optimization. IEEE
Trans Evol Comput 4(3):284294
Shi L, Rasheed K (2008) ASAGA: an adaptive surrogate-assisted genetic algorithm. In: Proceedings
of the genetic and evolutionary computation conference (GECCO 2008), pp 10491056
Takahama T, Sakai S (2012) Efficient constrained optimization by the epsilon constrained rankbased differential evolution. In: Proceedings of 2012 IEEE congress on evolutionary computation
(CEC2012), Brisbane, pp 6269
Tessema B, Yen GG (2006) A self adaptive penalty function based algorithm for constrained optimization. In: IEEE congress on evolutionary computation, (CEC 2006), pp 246253

94

R.G. Regis

Tolson BA, Shoemaker CA (2007) Dynamically dimensioned search algorithm for computationally
efficient watershed model calibration. Water Resour Res 43:W01413
Viana FAC, Haftka RT, Watson LT (2010) Why not run the efficient global optimization algorithm with multiple surrogates? In: 51st AIAA/ASME/ASCE/AHS/ASC structures, structural
dynamics, and materials conference. Orlando
Wang Y, Cai Z (2012) Combining multiobjective optimization with differential evolution to solve
constrained optimization problems. IEEE Trans Evol Comput 16(1):117134
Wanner EF, Guimars FG, Takahashi RH, Saldanha RR, Fleming PJ (2005) Constraint quadratic
approximation operator for treating equality constraints with genetic algorithms. In: 2005 IEEE
congress on evolutionary computation (CEC 2005), vol 3. IEEE Press, Edinburgh, pp 22552262
Wild SM, Shoemaker CA (2011) Global convergence of radial basis function trust region derivativefree algorithms. SIAM J Optim 21(3):761781
Wild SM, Regis RG, Shoemaker CA (2008) ORBIT: optimization by radial basis function interpolation in trust-regions. SIAM J Sci Comput 30(6):31973219
Zhou Z, Ong YS, Nair PB, Keane AJ, Lum KY (2007) Combining global and local surrogate
models to accelerate evolutionary optimization. IEEE Trans Syst, Man, Cybern Part C: Appl Rev
37(1):6676

Chapter 4

Ephemeral Resource Constraints


in Optimization
Richard Allmendinger and Joshua Knowles

Abstract Constraints in optimization come traditionally in two types familiar


to most readers: hard and soft. Hard constraints delineate absolutely between
feasible and infeasible solutions, whereas soft constraints essentially specify
additional objectives. In this chapter, we describe a third type of constraint, much
less familiar and only investigated recently, which we call ephemeral resource
constraints (ERCs). ERCs differ from the other constraints in three major ways.
(i) The constraints are dynamic or temporary (i.e., may be active or not active), and
occur only during optimizationthey do not affect the feasibility of final solutions.
(ii) Solutions violating the constraints cannot be evaluated on the objective function
in fact that is their main defining property. (iii) The constraints that are active are
usually a function of previous solutions evaluated, bringing in a time-linkage aspect to
the optimization. We explain with examples how these constraints arise in real-world
optimization problems, especially when solution evaluation depends on experimental processes (i.e. in closed-loop optimization). Using a theoretical model based
on Markov chains, the effects of these constraints on evolutionary search, e.g., drift
effects on the search direction, are described. Next, a number of strategies for coping
with ERCs are summarized, and evidence for their robustness is provided. In the
final section, we look to the future and consider the many open questions there are
in this new area.
Keywords Closed-loop optimization Constrained optimization Dynamic optimization Evolutionary computation Instrument setup optimization Optimization

R. Allmendinger (B)
Department of Biochemical Engineering, University College London,
Torrington Place, London WC1E 7JE, UK
e-mail: r.allmendinger@ucl.ac.uk
URL: http://www.ucl.ac.uk/ucberal
J. Knowles
University of Manchester, School of Computer Science, Oxford Road,
Manchester M13 9PL, UK
e-mail: j.knowles@manchester.ac.uk
URL: http://www.cs.man.ac.uk/jknowles
Springer India 2015
R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,
Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_4

95

96

R. Allmendinger and J. Knowles

4.1 Introduction
In this chapter, we discuss a new and broad class of constraint that departs quite
strongly from those considered usually in optimization. While typical or standard
constraints place limits on the feasible region (hard constraints), or suggest strong
preferences on solutions (soft constraints), the constraints we describe here instead
pose limits on which solutions in a search space are evaluable. That is to say, when
a solution violates one or more of these constraints, it is not possible to evaluate
that solution on the objective function, even though it may later turn out to be a
good solution to the problem, and one that is feasible in the normal sense. The type
of constraint we discuss here is called an ephemeral resource constraint (or ERC),
and we have introduced it in a number of recent papers (Allmendinger and Knowles
2010, 2011, 2013).
As the name suggests, ERCs arise only temporarily or dynamically during optimization (i.e., are ephemeral) and come about due to limitations on the resources
needed to evaluate (or construct) a solution. As we will explain in detail below, the
motivation for these constraints comes about from considering (mainly though not
exclusively) problems sometimes referred to as closed-loop optimization problems.1
In a closed-loop problem, candidate solutions are evaluated experimentally, and may
need to be realized physically, chemically, or in some other tangible way, thus requiring the use or availability of resources. From this reliance on resourceswhich may
be limitedit follows that candidate solutions cannot be guaranteed to be evaluable
(realizable) at all times during optimization. Thus, both evaluable and non-evaluable
solutions can coexist in the search space, and the boundaries between them can be
described as dynamic (or ephemeral) constraints.
These constraints, and the non-evaluability of solutions, is not rare in practical
applications; for example, Finkel and Kelley (2009) lists eight references where
solutions were non-evaluable, and more examples are given in Knowles (2009),
Allmendinger (2012), as well as later in this chapter. We are also aware from
personal communication that such resourcing issues have been faced by Schwefel
(in his famous jet nozzle optimization experiments from the 70s) (Schwefel 1968;
Klockgether and Schwefel 1970) and others, even if not always reported in the literature. Since closed-loop problems are quite various (see, e.g.,Schwefel (1968),
Klockgether and Schwefel (1970), Judson and Rabitz (1992), Shir (2008), Caschera
et al. (2010), Small et al. (2011), Vaidyanathan et al. (2003), OHagan et al. (2005,
2007), Thompson (1996), Herdy (1997), Knowles (2009) and the tutorials Shir and
Bck (2009), Bck et al. (2010)) and are growing in importance in a number of
domains (e.g., high-throughput automated science, as in Bedau (2010)), it seems
timely to consider the effects these resourcing issues (ERCs) can have on optimization performance, and this has been our objective in recent work.
In this chapter, our aims are threefold. First, we wish to summarize the terminology
and framework for describing ERCs reported in earlier papers (Sects. 4.2 and 4.3).
1

When an EA is used, closed-loop optimization may also be referred to as evolutionary experimentation (Rechenberg 2000) or experimental evolution.

4 Ephemeral Resource Constraints in Optimization

97

Secondly, we wish to augment this earlier work with a theoretical study that considers
the fundamental effects of ERCs on simple evolutionary algorithms (Sect. 4.4). Third,
we evaluate some of the methods we have proposed for handling ERCs and consider
how these can be developed further (Sects. 4.54.8).

4.2 Ephemeral Resource-Constrained Optimization


Problems (ERCOPs) in Overview
Ephemeral Resource-Constrained Optimization Problems (ERCOPs) are best seen as
standard constrained or unconstrained optimization problems2 augmented with one
or more resource constraints, which cause some candidate solutions to be temporarily
non-evaluable. Figure 4.1 shows the loop of an optimization process in which candidate solutions are designed or specified on a computer, but realized and/or evaluated
ex-silico. This is the main type of setup in which ERCs arise, although they can arise
even when computer simulations are used for evaluation too. The resources required
to evaluate solutions (such as equipment, operators, consumables) might run out,
break down or be unavailable, e.g., as a function of time, or previous actions taken
(or both).

Optimization algorithm
(running on a computer)
Decision variables (genotype) of solution x

Physical Experimentation or
expensive computer simulations
Prototype x
E.g.: - Mix drugs
- Adjust instrument
- Run simulation

Plan new solution x


- Ranking, selection, variation
- Prepare performance statistics

Noisy measurements
of quality f ( x)

Phenotype of x

Measure fitness of phenotype of x


E.g.: - Try drug combination in vivo
- Run sample through instrument
- Aggregate simulated data

Fig. 4.1 Schematic of closed-loop optimization. The genotype of a candidate solution x is generated
on the computer but its phenotype is experimentally prototyped. The quality or fitness f (x) of a
solution may be obtained experimentally too and thus may be subject to measurement errors (noise)
2

Indeed, we can consider any optimization problem or benchmark.

98

R. Allmendinger and J. Knowles

The main job in defining ERCOPs, and simulating them so that they can be studied,
is to specify what happens when a candidate solution cannot be evaluated. In a real
situation, when a candidate solution proposed by the optimization algorithm is found
to be non-evaluable, an operator or scientist within the loop (if there is such a person)
may notice, and can choose to ignore this solutionto miss it out. This may seem
to be an adequate solution, but there are several issues here. We need to consider
at what time it is known that a solution cannot be evaluated, for how long it can
remain non-evaluable, whether new resources can be requested in order to fulfill the
optimizers request to evaluate that solution, whether the optimizer is informed that
the solution could not be evaluated, and so on.
If we are able to specify these things, then we can also imagine a range of possible
(automated) remedial actions that the optimizer can take when it is informed about
non-evaluable solutions. It could automatically order more resources, it could wait
(stopping all solution evaluations until the non-evaluable one is again evaluable), it
could carry on and assign the non-evaluable solution a dummy value (or no value
at all), or it could place the non-evaluable solution in a queue to be evaluated later on.
All these types of responses need to be possible within the framework that we use
to describe ERCOPs. To keep things as general and flexible as possible, our ERCOP
framework consists of just two essentials: (1) ERCs are functions of a number of
(visible or hidden) variables which determine when they are switched on and (2) the
optimizer has access to a number of additional functions that allow it to operate in
a well-defined manner when a solution is non-evaluable. To achieve this, and to be
able to talk meaningfully about the performance of optimizers, we also embed the
optimization process in a global clock, so that every action is synchronized and its
time cost can be accounted for. In the following, we put these essentials in a more
mathematical form.

4.2.1 Mathematical Formulation of ERCOPs


ERCOPs can be defined generically, as follows:
maximize y = f (x) subject to x X
where x = (x1 , . . . , xl ) is a solution vector, X the feasible search space, f the objective
function, and with the additional side-condition (only relevant during optimization)
that for optimization timesteps t = 1, . . . , ,

f (xt ) if xt E(t ) X
yt =
null otherwise,
where E(t ) represents a set of evaluable solutions (or evaluable region) at time
step t. The set E(t ) changes over time as a function of a set of problem-specific

4 Ephemeral Resource Constraints in Optimization

99

and time-evolving parameters represented concisely by t . To instantiate a particular


ERCOP, information about how the resource constraints should evolve over time,
and depend on resource levels, random events, and so on, is encoded in t and E(t ).

4.2.2 Review of Basic ERCOP Properties


The purpose of an ERCOP, as defined mathematically above, is to simulate real
experimental optimization scenarios, in particular the way that non-evaluable solutions arise (i.e., as a function of parameters such as time, search history, or costs),
and how they are to be handled.
From the definition, we can now see there are three major differences between
ERCOPs and other constrained and dynamic optimization problems:
While the objective function f (thus also the global optimum) is static and does not
change over time in a standard ERCOP, ERCs are dynamic or temporary (i.e., may
be active or not active), and occur only during optimizationthey do not affect the
feasibility of final solutions. This feature makes ERCOPs materially different from
traditional dynamic optimization problems (Branke 2001) because the objective
space in ERCOPs does not change over time and thus the optimal solution does
not need to be tracked.
Compared to standard soft and hard constraints (Michalewicz and Schoenauer
1996; Nocedal and Wright 1999; Coello 2002) as well as dynamic constraints
(Nguyen 2010), the meaning of ERCs is different: a solution x that violates an
ERC at time t, is not infeasible but non-evaluable at time step t. That is, the
experiment that is associated with x cannot be conducted, thus causing the fitness
of solution x at time t to be undefined (or null).
The constraints that are active are usually a function of previous solutions evaluated, bringing in a time-linkage aspect to the optimization (see e.g., commitment
relaxation ERCs in Sect. 4.3.1). Moreover, time in an ERCOP can be seen as the
simulated time defined by the real closed-loop experimental problem that is to be
simulated. Hence, time may refer not only to function evaluations of single solutions, as is the case in standard optimization problems, but also, e.g., to real time
units (e.g., seconds) or cost units (e.g., pounds). Although we find an interesting
parallel with some work on online (dynamic) optimization problems (Borodin and
El-Yaniv 1998; Bosman and Poutr 2007), which exhibits time-linkage too, there
are clear and important differences to our problem: most importantly, the aim in
online (dynamic) optimization is to improve a cumulative score over some period
of time, whereas ours is to find a single optimal (and ultimate) solution.
Despite the core difference in ERCs and ERCOPs from other related areas, as
explained above, we believe that techniques and/or inspiration for their design for
coping with ERCs can carry over from these areas into our work. For a more formal
problem definition of ERCOPs please refer to Allmendinger (2012), Allmendinger
and Knowles (2013).

100

R. Allmendinger and J. Knowles

4.3 ERCs in More Detail


Ephemeral resource constraints arise in practical optimization problems for a number of different reasons: periodic availabilities of equipment or people; consumable resources that may run out; commitments to particular configurations due
to the cost of changing a configuration; and random breakdowns or other random
events. Considering these distinct reasons, we have in earlier work (Allmendinger
2012; Allmendinger and Knowles 2013) defined a number of fundamental classes
of ERCs, which we now describe. Technically, the constraints differ in how they are
triggered (switched on an off), and how they relate to the search space and other
basic properties. Before we summarize these details for three different ERC types,
we first set out some defining terms common to all ERC types: the constraint time
frame, the activation period, and the constraint schema.
Constraint time frame

Activation period

Constraint schema

The constraint time frame (ctf) of a constraint ERC i is


start (ERC ) t < t end (ERC )} where t represents
{t|tctf
i
i
ctf
some counter unit (e.g., function evaluations of solutions). The constraint ERC i may be active only during the
ctf, i.e., E(t ) X, t ctf, and not outside of the ctf,
start
/ ctf. The period of time 0 t < tctf
i.e., E(t ) = X, t
end
and tctf t T (T is the total optimization time) is the
preparation period and recovery period, respectively (see
Fig. 4.2).
The activation period k(ERC i ) of ERC i , k Z+ , is the
number of counter units for which that ERC remains
active once it is switched on.
For convenience reasons we define the evaluable search
region E(t ) by a set of constraint schemata H(ERC i )
into which solutions have to fall in order to be evaluable.
For instance, if we are dealing with a binary search space
or X {0, 1}l , and an ERC is associated with a schema
H = (1 0), then a solution is deemed evaluable only
if it has a 1 and 0-bit at positions 2 and 5, respectively;
the wildcard symbol gives a bit position the freedom
to take on any possible value, i.e., 0 and 1 in the binary
case. In non-discrete spaces, H might restrict solution
parameters to lie within or out of certain parameter value
ranges rather than to take specific parameter values. Two
general properties of a schema are its order o(H) and
length l(H), representing the number of defined bit positions and the distance between the first and last defined
bit position, respectively (Reeves and Rowe 2003); for
the above example we have o(H) = 2 and l(H) = 3.

4 Ephemeral Resource Constraints in Optimization


preparation period

constraint time frame

start
t ctf

101
recovery period

end
t ctf

Fig. 4.2 An illustration of how the available optimization time T can be divided into the preparation
start , the constraint time frame t start t < t end , and the recovery period t end t T
period 0 t < tctf
ctf
ctf
ctf

4.3.1 Commitment Relaxation ERCs


A commitment relaxation ERC commits (forces) an optimizer to a specific variable
value combination (i.e., constraint schema) for some (variable) period of time whenever it uses this particular combination. Forcing a variable or linked combination
of variables to be fixed for some time models real-world problems involving (large)
change-over costs, such as a cleaning step or a component replacement. We refer to
the period of time during which some variable(s) setting (or schema) H is forbidden
from changing as an epoch, and denote its duration by V . We define the activation
period k(j), 0 k(j) V to be the duration of the period of time we have to commit
to a particular setting H during the jth epoch. Figure 4.3 illustrates the partition of
the optimization time into epochs, and a possible distribution of activation periods.
Imagine the six epochs illustrated by the figure to represent six working days, each
consisting of V = 9 h (assuming working hours to be from 8 am to 5 pm). The
limitation that causes the commitment relaxation ERC to arise, can be:
In an optimization problem involving the selection of instrument settings, the configuration,
b, once set, cannot be changed during the remainder of the working day.

In the above example, the constraint schema H represents the parameter combination that corresponds to instrument configuration b. The length of an activation
period is bounded by 0 k(j) 9. For instance, imagine we select instrument
configuration b in the middle of the day, say at 1pm, as indicated by epoch j = 1 in
the figure. This will activate the ERC for a period of k(1) = 4 (= 5 pm1 pm) hours
(indicated by the dashed part). Activating the ERC later, earlier, or not at all during
a working day changes k(j) accordingly.
start , t end , V , H).
We denote commitment relaxation ERCs by commRelaxERC(tctf
ctf
An extension to this simple commitment relaxation ERC is to maintain not only

V
k ( 1)
0

k ( 2)

...
t
T

Fig. 4.3 An illustration of how a commitment relaxation ERC may partition the optimization time
into epochs of length V , and how it may be potentially activated. The activation period k(j) during
the jth epoch is represented by the dashed part

102

R. Allmendinger and J. Knowles

one but several commitment relaxation ERCs with different constraint schemata Hi .
In this case, we need to consider three aspects: (i) a solution is non-evaluable if it
violates at least one ERC, (ii) a repaired solution has to satisfy all activated ERCs
and not only the ones that were violated, and (iii) it needs to be checked whether a
repaired solution activates an ERC that was not activated before. This extension will
be considered later in Sect. 4.6.

4.3.2 Periodic ERCs


A periodic ERC models the availability of a specific resource, represented by
a constraint schema H, at regular time intervals. That is, the ERC is activated
every P time steps (period length) for an activation period of exactly k time steps
(see Fig. 4.4). As the ERC models the availability of resources, an individual has
to be a member of H during the activation period. An example of a periodic
ERC is:
In an optimization problem requiring skilled engineers to operate instruments, on Mondays,
only engineer engi is available.

In the above example, the activation period is k = 1 (assuming a time step is a day),
the period length is P = 7 (i.e., a week), and the constraint schema H represents the
parameter combination that corresponds to the instruments (or their settings) operated
start , t end , k, P, H).
by engineer engi . We denote periodic ERCs by perERC(tctf
ctf

4.3.3 Commitment Composite ERCs


The last type of ERCs we cover here are commitment composite ERCs. This ERC type
is slightly more complex than the other two types because it combines several realworld limitations. A commitment composite ERC occurs when some variables of a
candidate solution define a composite that requires resources to be locally available
(e.g., in a cache) in order for the solution as a whole to be realized and/or evaluated.
We use the notion of schemata to describe the resource-requiring composite part of

constraint time frame


P
k
0

t start
ctf

t end
ctf T

start , t start , k, P, H). The ERC is activated


Fig. 4.4 An illustration of a periodic ERC perERC(tctf
ctf
every P time steps for an activation period of always k time steps

4 Ephemeral Resource Constraints in Optimization

103

a solution. For example, we would use H# = { ### ##} to state that


bit positions 3, 4, 5, 11, and 12 define a composite; we refer to the bit positions
denoted by # as the composite-defining bits, and the order o(H# ) to be the number
of composite-defining bits in the schema (we refer to H# as the high-level constraint
schema). Here, the composite-defining bits are static, and form a part of the ERC
problem definition.
When a solution is to be evaluated, we must look at the composite-defining bits
of its genotype and compare them to a local cache of composites. Each composite in
the cache is indexed by a bit-string of the same length as the order of the high-level
constraint schema. If there is a match, the solution can be evaluated. Otherwise, the
solution may not be evaluated at the current time step.
We define the cache to be made up of a number of storage cells, #SC. Typically,
the number of storage cells is smaller than the space of possible composites, which is
2o(H# ) in a binary search space. A composite available in a storage cell may be used
in the evaluation of more than one solution: each composite may be used up to RN
(reuse number) times and has a shelf life of SL time steps, and we assume SL RN.
Finally, the composites available in the cache at time t are a function of previous
purchase orders made, and a fixed time lag TL between a purchase being made
and its arriving. When composites arrive at a particular time, they are immediately
put in a storage cell (and any existing composite in that cell is discarded); which
storage cell is selected is defined either at the time of purchase or at the time of
arrival.
To make the constraint more realistic we associate costs of corder and ctime_step
units with each submitted composite order and time step, respectively. The available
budget, which cannot be exceeded, is denoted by C. Any composite can be purchased
as often as desired, as long as we are within the budget. Figure 4.5 gives a visual
example of the ERC.
An example of a commitment composite ERC is:
In an optimization problem involving the selection/design of vehicle parts least harmful to
pedestrians in case of a crash, we wish among others to identify the most suitable configuration for the tyres of the vehicle. A tyre is made of several parameters, such as size, thickness,
and rubber material. Upon defining these parameters, we order the tyres, which is associated
with a fixed cost of 500 and a delivery period of 3 days. To allow for a valid assessment of
tyres, a set of tyres can be involved in at most five crash test trials, and can be kept in storage
for not more than 1 month. The storage itself is limited in size to 10 sets of tyres. Every day
of crash testing involves a fixed charge of 3,000 including things like labor, rent of venue,
and electricity.

In this example a composite is a tyre and the composite-defining bits are the
variables defining a tyre. Ordering tyres is associated with a time lag of TL = 3
(assuming a time step is one day), and tyres have a reuse number of RN = 5 and
a shelf life of SL = 30 (assuming one month consists of 30 days). The number of
storage cells is #SC = 10, and the costs associated with a composite order and time
step are corder = 500 and ctime_step = 3,000, respectively.

104

R. Allmendinger and J. Knowles


# Cell

001
101
SL = 2 SL = 3
RN = 5 RN = 4

000
111
SL = 7 SL = 1
RN = 1 RN = 6

Order 011 and 110


(c+ = ( corder 2))

x = ( 10101)
EA

Experiment
f ( x)
000
111
SL = 6 SL = 0
RN = 1 RN = 6

001
101
SL = 1 SL = 2
RN = 5 RN = 3

Shelf life is over,


empty cell
Composites 011
and 110 arrrived

Update storage cells


store 011 in cell 4
and queue of not arrived
and 110 in cell 1
composite orders

EA

t +1

step )

110
101
000
SL = 20 SL = 2 SL = 6
RN = 10 RN = 3 RN = 1

011
SL = 20
RN = 10

...

...

(c+ = ctime

Fig. 4.5 A visual example of the commitment composite ERC commCompERC(H# = {###
}, #SC = 4, TL = 1, RN = 10, SL = 20); each composite order and time step costs corder and
ctime_step units, respectively. The evaluation step at time step t reduces the reuse number of the
composite in cell 2. At the same time step, the shelf life of the composite in cell 4 expires, and two
new composites are ordered. One time step later, t + 1, the ordered composites arrive and put into
cells determined by the EA

We denote a commitment composite ERC by commCompERC(H# , #SC, TL, RN,


SL).3 For a more formal description of this ERC please refer to Allmendinger and
Knowles (2010).

4.4 Theoretical Analysis of ERCs


Having defined ERCOPs and several ERCs, we conduct in this section an initial
theoretical analysis on the impact of ERCs on evolutionary search. The analysis uses
the concept of Markov chains to investigate the impact of periodic ERCs on two
selection and reproduction schemes commonly used within EAs. After giving a brief
introduction to Markov chains and their application to EAs, the Markov model (transition probabilities) that accounts for periodic ERCs is derived, and subsequently the
start , t end , c
We leave out the variables tctf
order , ctime_step , and C from commCompERC(. . .) for ease
ctf
of presentation. They will be specified where appropriate.

4 Ephemeral Resource Constraints in Optimization

105

simulation results are analyzed and summarized. The Markov chain model presented
here is based on an analysis we carried out in Allmendinger (2012).

4.4.1 Markov Chains


A Markov process is a random process that has no memory of where it has been in
the past such that only the current state of the process can influence the next state.
If the process can assume only a finite or countable set of states, then it is usual to
refer to it as a Markov chain (Norris 1998).
One can think of a Markov chain as a sequence X0 , X1 , X2 , . . . of random events
occurring in time (Reeves and Rowe 2003). Suppose S0 , . . . , S are the +1 possible
values that each of the random variables Xt can take. Then, a chain moves from a state
Sm at time t, to a state Sr at time t + 1 with a probability of pmr = P(Xt+1 Sr |Xt
Sm ). The probabilities pmr (m, r = 0, . . . , ) are called transition probabilities and
form the + 1 + 1 matrix P, the transition matrix. Thus, the probability that
the chain is in state Sr at time t is the rth entry in the probability vector
ut = u0 Pt ,

(4.1)

where u0 is the (+1)-dimensional probability vector that represents the initial distribution over the set of states.
When an EA is modeled by a Markov chain it is easy to see that the population is
the natural choice for describing a state. The transition probabilities then express the
likelihoods that an EA changes from a current population to any other possible population after applying the stochastic effects of selection, crossover, and/or mutation.
It is also possible to consider other effects such as noisy fitness functions (Nakama
2008), niching (Horn 1993) and elitism (He and Yao 2002). Once the transition
matrix is calculated it can be used to calculate a variety of measurements, such as
the first hitting time of a particular state or the probability of hitting a state at all. An
overview of tools of Markov chain analysis can be found in any general textbook on
stochastic processes, such as Norris (1998), Doob (1953).
The drawback of modeling EAs with Markov chains is that the size of the required
transition matrix grows exponentially in both the population size and string length. To
keep Markov chain models manageable it is therefore common to use small population sizes and string lengths (Goldberg and Segrest 1987; Horn 1993). Other options,
which allow the modeling of more realistic EAs, are to make simplifying assumptions about the state space (Mahfoud 1991) or to use matrix notation only (Vose and
Liepins 1991; Nix and Vose 1992; Davis and Principe 1993).

4.4.2 Modeling ERCs with Markov Models


In this section we derive the transition probabilities for EAs optimizing in the presence
of periodic ERCs. Our Markov chain model is based on the model of Goldberg and

106

R. Allmendinger and J. Knowles

Segrest (1987), which considers a simple environment composed of two individual


types: Type A has always a fixed objective value (or fitness) of f (A), while type B
has a fitness of f (B). This limitation allows for an intuitive definition of states. For a
fixed population size of , there are +1 possible states, where state Sm represents a
population with m type A individuals and m type B individuals. Furthermore, in
this simple EA model we do not apply mutation and crossover such that an offspring
shall be simply a copy of the selected parent.
Goldberg and Segrest (1987) used this model to investigate the effect of drift for a
simple EA that used a generational reproduction scheme combined with fitness proportionate selection. They also extended the model to include mutation. Horn (1993)
extended it further to include niching. We extend it to include periodic ERCs and use
the resulting model to analyze the impact of the ERC on two selection strategies, fitness proportionate and binary tournament selection, and two reproduction schemes,
generational and steady-state reproduction, both without elitism.
Readers not interested in the technical details of this Markov chain model can
safely skip to Sect. 4.4.3 where the results of simulations are presented.

4.4.2.1 Selection Probabilities


Under fitness proportionate selection (FPS) we choose an individual of the current
population to serve as a parent (in our environment, to be in the next population) with
a probability that is proportional to its (relative) fitness. In our simple environment,
the probability of choosing a type A individual for the next population while being
in a state Sm is simply
Pm (A) =

mf (A)
.
mf (A) + ( m)f (B)

(4.2)

As there are only two individuals types in total, the probability of choosing a type B
individual is Pm (B) = 1 Pm (A). From the above equation it is apparent that once
a uniform population is reached, i.e., m = 0 or , there is no chance of selecting
individuals from the other type. Thus, the two corresponding states S0 and S are
absorbing states.
Under tournament selection we first randomly select a number of individuals from
the population (with replacement) and then perform a tournament among them with
the fittest one serving subsequently as a parent. It is common to use a tournament
size of two, which will also be used here; this selection strategy is known as binary
tournament selection (BTS). The result of a tournament is clear: the individual with
the higher fitness wins the tournament; there is a draw if an individual meets another
individual with the same fitness in which case the winner is randomly determined; and
an individual will be the winner of a tournament with itself. We distinguish two cases
regarding the fitness of the individual types: (i) f (A) = f (B) and (ii) f (A) > f (B).

4 Ephemeral Resource Constraints in Optimization

107

The following selection probabilities are obtained for each of the cases:
f (A) = f (B) :
f (A) > f (B) :

 2
m
m( m)
+

2
 2
m
m( m)
Pm (A) =
+2
.

Pm (A) =

(4.3)

4.4.2.2 Transition Probabilities


In our environment, the transition probabilities depend on the selected reproduction
scheme, which in turn depends on the selected selection strategy. We first consider a
generational reproduction scheme as already used in the original genetic algorithm
of Holland (1975); we denote this scheme by GGA. With GGA, the entire current
population is replaced by the offspring population. That is, selection steps are
carried out per time step (with replacement). Using the selection probability Pm (A)
either for FPS or BTS, the transition probabilities pmr = P(Xt+1 Sr |Xt Sm ) for
GGA of moving at time t from a state Sm with m type A individuals, to a state Sr
with r type A individuals at time t + 1, are defined as follows:
For m = 0
pmm = 1
pmr = 0,

(4.4)
r = 1, . . . , .

For 0 < m < and 0 r


pmr

 

=
Pm (A)r (1 Pm (A))r .
r

For m =
pmr = 0, r = 0, . . . , 1
pmm = 1.
With steady state reproduction, the population is updated after each selection
step. Usually, an offspring individual replaces the worst individual in the population. This replacement strategy, however, is elitist and ensures that the number of
the less fit individual type in the population does not increase. Thus, to allow for a
fair comparison with GGA, an offspring does not replace the worst individual in the
population but a randomly chosen one regardless of its fitness; we denote this reproduction scheme by SSGA (rri), where rri refers to replacing a random individual. It
has been shown elsewhere (Syswerda 1991) that GGA and SSGA (rri) yield similar
performance. Bearing in mind that one time step corresponds to one selection step
with SSGA (rri), we obtain the following transition probabilities:

108

R. Allmendinger and J. Knowles

For m = 0
pmm = 1
pmr = 0,

(4.5)
r = 1, . . . , .

For 0 < m <


= 0,

pmr
pmm1
pmm
pmm+1
pmr

r = 0, . . . , m 2
m
= (1 Pm (A))

m
m
= Pm (A) + (1 Pk (A))

n
( m)
= Pm (A)

= 0, r = m + 2, . . . , .

For m =
pmr = 0, r = 0, . . . , 1
pmm = 1.
The transition probabilities of either GGA or SSGA (rri) will be the entries of the
transition matrix P.

4.4.2.3 Constrained Transition Probabilities for a Periodic ERC


We have mentioned in the previous section that GGA performs selection steps per
time step, while SSGA (rri) performs one selection step per time step. To be able
to compare the effect of an ERC on the two reproduction schemes, we thus express
ERCs in this section in terms of selection steps rather than time steps.
Let us now derive the transition probabilities in the presence of a periodic ERC.
For this, consider the general periodic ERC, perERC(i, (i + 1), k, , H = (A))
(i N, 0 k ), which is activated at selection step i for a period of
selection steps, i.e., one time step (or generation) for GGA and time steps for
SSGA (rri). During the activation period of k selection steps, we can only
select (and evaluate) type A individuals. Let us assume that if we select a type B
individual during this period, this individual is repaired by simply forcing it into the
right schema; i.e., it is converted into a type A individual. This repairing procedure
is a simple constraint-handling strategy for dealing with non-evaluable solutions;
alternative constraint-handling strategies will be introduced in the following sections.
Before we derive the constrained transition probabilities for GGA we want to point
out a few aspects:

4 Ephemeral Resource Constraints in Optimization

109

If we are in state S0 and the ERC is activated, then S0 is not an absorbing state
anymore and we move directly to state Sk .
As a population contains at least k type A individuals after lifting the constraint,
we are not able to move to a state Sr with r < k during the constrained generation
(time step).
The ERC reduces the number of freely selected offspring down to new = k.
Moving to a state Sr with r > k is already achieved by selecting r new = r k
(instead of r) type A individuals from the current population.
Considering these points, we derive for the time step for which the ERC is activated
the following constrained transition probabilities for GGA:
For m = 0
pmr = 0, r = 0, . . . , k 1, k + 1, . . . ,
pmk = 1.

(4.6)

For 0 < m < and 0 r < k


pmr = 0
For 0 < m < and k r
pmr

 new 

new
new
new
Pm (A)r (1 Pm (A)) r .
=
new
r

For m =
pmr = 0, r = 0, . . . , 1
pmm = 1.
The above periodic ERC is set such that the activation period of k selection steps
is upper bounded by the population size , and, in the case of GGA, starts and
ends within a single time step (generation). This does not need to be necessarily the
case. In fact, a periodic ERC can feature an activation period k that is so long that it
constrains selection steps within two or more successive generations, or so short that
several activation periods may start during a single generation. In such scenarios,
one needs to constrain all generations that are subject to constrained selection steps.
The number of constrained selection steps within a generation, referred to as k in
Eq. (4.6), is then simply the sum of all selection steps that happen to be constrained
during any particular generation. That is, depending on the ERC, the number of
constrained selection steps may change between generations.
With SSGA (rri), the population is updated after each selection step, which remember is a single time step with this scheme. This means that we need to determine for
each selection step (time step) separately whether it lies within the activation period
and thus is constrained or not. During the activation period, the periodic ERC of

110

R. Allmendinger and J. Knowles

above prevents us from moving from a current state Sm to a state Sm1 , which can
only be reached if a type B individual replaces a type A individual. As above, if the
constraint is active, then the state S0 is not an absorbing state anymore, and we move
directly to state S1 . We obtain the following new transition probabilities for each of
the k constrained time steps:
For any m = 0
pmr = 0, r = 0, 2, 3, . . . ,
pm1 = 1.

(4.7)

For any 0 < m <


pmr
pmm
pmm+1
pmr

= 0, r = 0, . . . , m 1
m
=

m
=

= 0, r = m + 2, . . . , .

For any m =
pmr = 0, r = 0, . . . , 1
pmm = 1.
We will denote the transition matrix with the constrained transition probabilities
by Pc .

4.4.2.4 Calculating Proportions of Individual Types in a Population


One way to analyze the impact of an ERC on different selection and reproduction
schemes is to monitor the proportion of the two individual types in a population. To
do so one needs to first calculate the probability of ending up in any of the possible
states Si , i = 0, . . . , after t time steps. In an unconstrained environment, this
can be done according to Eq. (4.1) (see Sect. 4.4.1) using the transition matrix P;
in this equation, the +1 state probabilities at time t are represented in form of
the probability vector ut . In a constrained environment we cannot use the transition
matrix P across all t time steps but have to swap it with the constrained transition
matrix Pc for time steps that consist of constrained selection steps; this dependence
of the transition matrix on time makes it a non-homogeneous Markov chain (Norris
1998). Let us consider the same periodic ERC as in the previous section but this time
with a constraint time frame spanning over g N periods (as opposed to exactly
one), i.e., perERC(i, (i + g), k, , H = (A)). For this ERC we can calculate the

4 Ephemeral Resource Constraints in Optimization

111

probability vector at any time step t for GGA as follows:


ut = u0 Pt ,
ut =
ut =

0 t < i,

u0 Pi Pti
c ,
i g tgi
u0 P Pc P
,

i t < g + i,
g + i t,

where the entries of the transition matrices P and Pc are calculated using Eqs. (4.4)
and (4.6), respectively. The probability vector u0 of the initial state distribution has
a value of 1 at the ith entry and a value of 0 in the others, if we want to start with a
population of exactly i type A individuals.
One time step with GGA corresponds to time steps with SSGA (rri). To compute
the probability vector u for SSGA (rri) we thus need to look at the state distributions
at time step t:
ut = u0 Pt ,
ut =
ut =

0 t < i,

u0 P (Pkc Pk )(ti) ,
u0 Pi (Pkc Pk )g P(tgi) ,
i

i t <g+i
g + i t,

where the transition matrices P and Pc are calculated according to Eqs. (4.5) and
(4.7), respectively.
Having obtained the probabilities of ending up in all the different states, we can
calculate the expected proportions ct (A) and ct (B) of type A and B individuals in a
population at time step t (or t in the case of SSGA (rri)) as follows:

ct (A) =

1 i
iut , ct (B) = 1 ct (A),

i=0

where uti is the ith entry of the probability vector ut .

4.4.3 Simulation Results


This section uses the measure of the expected individual type proportion to analyze the impact of period ERCs on two selection strategies, FPS and BTS, and two
reproduction schemes, GGA and SSGA (rri). We consider first the case where both
individual types have equal fitness values, and then the case where they are different.
If not otherwise stated, the population size is set to = 50.
4.4.3.1 Identical Fitness Values: f (A) = f (B)
In this case there is no selection pressure and thus both selection strategies behave
identically. Ideally, an EA maintains an equal proportion of the two individual

R. Allmendinger and J. Knowles


Proportion of type B individuals ct (B)

112

perERC(400,450,20,50,H=(A))
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

GGA - constrained, real proportion


GGA - constrained, expected proportion
SSGA (rri) - constrained, real proportion
SSGA (rri) - constrained, expected proportion

200

400

600

800

1000

1200

1400

#Selection steps

Fig. 4.6 A plot showing the proportion of type B individuals ct (B) for GGA and SSGA (rri) as a
function of the number of selection steps for the ERC perERC(400, 450, 20, 50, H = (A)). Both
individual types have equal fitness and the constraint settings used are given above the plot. The
terms real and expected refer to proportions obtained by actually running the EA, respectively, by
running the Markov chain. The EA results are averaged across 500 independent runs

types in the population. However, because of genetic drift this is impossible and
an EA eventually converges to a uniform population (i.e., states Si = 0, n). As
the probability of ending up in one of the two states is proportional to the initial
state, the expected individual type proportion is identical to the initial proportion,
which is specified by u0 . Thus, for a random initialization, the expected proportion
is 0.5.
From Fig. 4.6 we can see that an expected proportion of 0.5 is achieved until
selection step 400 at which we activate the periodic ERC, perERC(400, 450, 20, 50,
H = (A)), which has a unique activation period of k = 20 selection steps.4 This
ERC forces us to evaluate k = 20 type A individuals and subsequently, reduces
(increases) the proportion of type B (A) individuals in the population. After the ERC
is lifted at selection step 420, the expected individual type proportion does not get
back to the initial proportion. Although this effect can be put down to the specifics
of the model (no selection pressure toward either individual type), we will see in the
following theoretical and experimental studies several results which display a similar
pattern. That is, a constraint can have a permanent or long-lived effect on search
performance even if it was active for a short time only.
From the figure we can also see that the proportion is affected more severely for
GGA than for SSGA (rri). The reason that SSGA (rri) is more robust is that with this
reproduction scheme there is a chance that an offspring of type A replaces another type
A individual that is currently in the population. Of course, if an offspring replaces
a solution of the same type, then this will not affect the proportion. By contrast,
with GGA, all offspring are carried over to the population of the next generation.
4 Note, in an EA performing optimization of a function, the number of performed selection steps
displayed on the x-axes of Fig. 4.6 would be equivalent to the number of performed function evaluations.

4 Ephemeral Resource Constraints in Optimization


perERC(50,200,k,150,H=(A))
Proportion of type B individuals ct (B)

Fig. 4.7 A plot showing the


proportion of type B
individuals ct (B) for GGA
and SSGA (rri) at selection
step 200 as a function of the
activation period k for the
ERC perERC(50, 200, k, 150,
H = (A)). Both individual
types have equal fitness

113

0.5
GGA
SSGA (rri)

0.4
0.3
0.2
0.1
0

25

50

75

100

125

150

Activation period k

This causes the proportion of type B individuals in the population to be a linear


function of the activation period. This effect is also apparent from Fig. 4.7, where the
performance of both reproduction schemes is shown as a function of the activation
period k. From the figure one can see that SSGA (rri) is able to maintain a proportion
of around ct (B) = 0.2 after an activation period of k = 50, which is equal to the
population size. On the other hand, GGA cannot maintain a single type B individual
in the population because of its linear dependence on k. Note, in the case where
k > 50, the constraint is activated for more than one time step when using GGA. For
example, for k = 70 the constraint restricts all 50 selection steps within one time
step and 20 selection steps within the subsequent one.
As the Markov chain results are exact we omit the experimentally obtained proportions in the following plots.
4.4.3.2 Different Fitness Values: f (A) = f (B)
When both individual types have different fitness values, the aim of an EA is to
converge as quickly as possible to a population state consisting only of the fitter
individual type. We focus our investigations mainly on the more interesting case
where an ERC has a negative effect on the convergence behavior. Hence, the fitness
of the individual type that we have to select during the activation period, in our case
type A, needs to be lower than the fitness of type B individuals. If not otherwise
stated, the fitness values are set to f (A) = 1.0 and f (B) = 1.3.
As the basis for our analysis we use the periodic ERC perERC(50, 400, 20, 50,
H = (A)). This ERC is activated after the initialization (i.e., at selection or evaluation
step 50) for seven periods, each consisting of P = 50 selection steps whereby k = 20
of them are constrained. Figure 4.8 shows the impact of the periodic ERC on the
expected proportion ct (B) for all combinations of the selection and reproduction

R. Allmendinger and J. Knowles


Proportion of type B individuals ct (B)

114

perERC(50,400,20,50,H=(A))
1
0.9

GGA with FPS, unconstrained


SSGA (rri) with FPS, unconstrained
GGA with FPS
SSGA (rri) with FPS

0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

200

400

600

800

1000

1200

1400

Proportion of type B individuals ct (B)

#Selection steps

perERC(50,400,20,50,H=(A))
1
0.9
0.8
0.7
0.6
0.5
0.4

GGA with BTS, unconstrained


SSGA (rri) with BTS, unconstrained
GGA with BTS
SSGA (rri) with BTS

0.3
0.2
0.1
0

200

400

600

800

1000

1200

1400

#Selection steps

Fig. 4.8 Plots showing the proportion of type B individuals ct (B) for FPS (top) and BTS (bottom)
as a function of the number of selection steps for the ERC perERC(50, 400, 20, 50, H = (A)). The
term unconstrained refers to the proportions obtained in an ERC-free environment

schemes: GGA with FPS and SSGA (rri) with FPS (top plot), and GGA with BTS
and SSGA (rri) with BTS (bottom plot).5
We want to point out that during activation periods, SSGA (rri) with BTS and
FPS perform identically, since independently of selection type, an A offspring will
replace an individual selected at random. But during the inactive periods, the stronger
selection pressure of BTS recovers more of the B-to-A replacements, so that overall
BTS maintains a higher proportion of Bs. This behavior can be seen in the zigzag
shape, where there is the same steep falloff of fitness in both methods, but a steeper
recovery for BTS. Overall, the same is true for GGA, (BTS is better for the same
reason) but it is not possible to see this so clearly in the plots.
We get the zigzag-shaped line for SSGA (rri) during the constraint time frame because ct (B) is
plotted after each time step containing here of one selection step. For GGA the change in ct (B) is
smooth because a time step consists of selection steps.

start
perERC(tstart
ctf ,tctf +350,20,50,H=(A))

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

Proportion of type B individuals ct (B)

Proportion of type B individuals ct (B)

4 Ephemeral Resource Constraints in Optimization

GGA with BTS


GGA with FPS
SSGA (rri) with BTS
SSGA (rri) with FPS

200

400

600

800

1000

Start of constraint time frame t start


ctf

115
perERC(50,400,k,50,H=(A))
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

GGA with BTS


GGA with FPS
SSGA (rri) with BTS
SSGA (rri) with FPS

10

20

30

40

50

Activation period k

perERC(50,400,20,50,H=(A))
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

GGA with BTS


GGA with FPS
SSGA (rri) with BTS
SSGA (rri) with FPS

0.5

1.5

2.5

3.5

Fitness ratio f(B)/f(A)

4.5

Proportion of type B individuals ct (B)

Proportion of type B individuals ct (B)

Fig. 4.9 Plots showing the proportion of type B individuals ct (B) at selection step 1,500 as a
start (left) and the activation period k (right) for
function of the start of the constraint time frame tctf
start , t start + 350, 20, 50, H = (A)) and perERC(50, 400, k, 50, H = (A)),
the ERCs perERC(tctf
ctf
respectively
perERC(50,550,25,50,H=(A))
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

GGA with BTS


GGA with FPS
SSGA (rri) with BTS
SSGA (rri) with FPS

0.5

1.5

2.5

3.5

4.5

Fitness ratio f(B)/f(A)

Fig. 4.10 Plots showing the proportion of type B individuals ct (B) at selection step 1,500 as a
function of the fitness ratio f (A)/f (B) for the ERCs perERC(50, 400, 20, 50, H = (A)) (left) and
perERC(50, 550, 25, 50, H = (A)) (right)

Figures 4.9 and 4.10 indicate how the proportion of type B individuals is affected
when altering the constraint parameters. We can observe that:
Longer activation periods degrade the performance of all EAs (see right plot of
Fig. 4.9).
Fixing the constraint time frame duration, but translating it (see left plot of Fig. 4.9),
yields a non-monotonic effect on performance (of all EAs, but most apparently
with FPS): more preparation time gives more time to fill the population with fit
individuals, whereas little recovery time detriments final fitness. These two effects
trade off against each other.
Changing the fitness ratio (see Fig. 4.10) has only a switching effect on BTS (when
the fitter individual changes), but for FPS the ratio smoothly affects final proportion
up to a saturation point.

116

R. Allmendinger and J. Knowles

Overall, comparing GGA with SSGA we see that SSGA achieves the higher proportion of fit individuals during the constraint time frame, and it recovers more
rapidly after the constraint is lifted, but its rate of recovery does not reach the rate
achieved by GGA, and ultimately GGA reaches a higher proportion (see Figs. 4.7
and 4.8). This can be explained by the replacement strategy of SSGA (rri): offspring
may replace individuals in the population that are from the same type. During the
activation period, this is beneficial as the number of poor type A individuals in the
population does not increase linearly with the activation period. However, during
the unconstrained selection steps, this may be disruptive in the sense that fit type
B offspring may replace other type B individuals of the current population, which
slows down the convergence.

4.4.4 Summary of Theoretical Study


We used Markov chains to analyze the impact of periodic ERCs for a simple environment and EA model. The environment was composed of only two individual types
and the EA model applied only a selection operator. In the EA model we considered two selection strategies, FPS and BTS, and two reproduction schemes, GGA and
SSGA (rri). We observed that for one and the same reproduction scheme, BTS is more
robust than FPS due to its independence to the fitness value of the individual types.
However, FPS was able to match and even outperform the performance of BTS if the
ratio of the individual type fitnesses was high, i.e., if a larger selection pressure than
for BTS was obtained. The crucial difference between the two reproduction schemes
we considered is that GGA carries out many selection steps before the population is
updated, while SSGA (rri), or steady-state reproduction in general, carries out only
a single one. This enables SSGA (rri) during the activation periods to replace less fit
individuals with other less fit individuals of the current population, but also prevents
SSGA (rri) in the long run from a quicker convergence in the remaining periods. By
contrast, the performance of GGA depends linearly on the activation period but there
are now drawbacks if the ERC is not activated. This crucial difference between the
reproduction schemes means that SSGA (rri) is able to outperform GGA during the
activation period and in situations where the advantage over GGA gained in the activation period(s) can be maintained until the next activation period or until the end of the
optimization. In terms of the constraint parameters, this occurs when there is a long
activation period, a short recovery period, and the constraint time frame is set late.

4.5 Static Constraint-Handling Strategies


In this section we summarize five static constraint-handling strategies (three repairing
and two non-repairing strategies) and showcast their robustness for commitment
relaxation ERCs and periodic ERCs (though the strategies are applicable in similar

4 Ephemeral Resource Constraints in Optimization

117

Offspring
Regenerating
X

Forcing
Member of Pop

xt

Member of SP

xt,repaired
xt,repaired

E( t )
xt,repaired

Member of both
Pop and SP
Offspring individual and the
potential repaired versions of it

Subpopulation strategy

Fig. 4.11 A depiction of the current population Pop (filled circles and squares) and an offspring
individual xt , which is feasible but not evaluable (because it is in X but not in E(t )). Solutions
indicated by the filled squares coexist in both the actual EA population Pop and the population SP
maintained by the subpopulation strategy. The three solutions xt,repaired indicate repaired solutions
that might have resulted after applying one of the three repairing strategies to xt : while forcing
simply flips incorrectly set bits of xt and thus creates a repaired solution that is as close as possible
to xt but not necessarily fit, regenerating creates a new solution in E(t ) using the genetic material
available in Pop. Similarly, the subpopulation strategy creates also a new solution but uses the
genetic material available in the subpopulation SP (empty and filled squares), which contains only
solutions from E(t )

form to other ERCs). The strategies are static in the sense that they deal with a nonevaluable solution always in the same pre-specified way, as opposed to learningbased strategies that switch between different static strategies during search (see
Sect. 4.6). Some of the static strategies are based on constraint-handling strategies
developed for standard constraints, and this will be pointed out where applicable.
Figure 4.11 depicts how the three repairing strategies, forcing, regenerating, and the
subpopulation strategy, may handle a non-evaluable solution. Below we describe
each static strategy in detail.
1. Forcing. Upon encountering a non-evaluable solution, this strategy forces it into
the constraint schemata Hi of all activated ERCs ERC i , i = 1, . . . , r by flipping
all solution bits that are different from the order-defining bit values of Hi . Similar
repairing strategies have been proposed, e.g., in Liepins and Potter (1991).
2. Regenerating. This strategy, which is similar to the death penalty method
(Schwefel 1975), avoids the evaluation of a non-evaluable solution by iteratively
creating new solutions, based on the current parent population, until an evaluable
one has been created or until L regeneration trials have passed without success. In
the latter case, we pick the solution created within the L trials that has the smallest
sum of Hamming distances to the schemata Hi of all activated ERCs and apply forcing to it. The goal of this strategy is to avoid the potential drawback of forcing of
destroying good genotypes by enforcing changes in decision variable values. On the

118

R. Allmendinger and J. Knowles

other hand, the potential drawback of regenerating is that it can be computationally


expensive for large L, while for small L, it could be that it reduces often to the forcing
strategy.
3. Subpopulation strategy. Assuming the presence of a single ERC, i.e., r = 1, this
strategy keeps record of the fittest J solutions from H1 evaluated so far, and stores
them in a subpopulation (which is maintained alongside the actual population). Upon
encountering a non-evaluable solution, a new solution is created by applying one
selection and variation step to the subpopulation. In case the new solution is nonevaluable, which may happen due to mutation, forcing is applied to it. If multiple
ERCs are present, then (i) the number of subpopulations maintained is upper-bounded
by 2r (r is the number of ERCs), the power set of the total number of ERCs and (ii) a
solution is created using the subpopulation defined by the (set of) schemata Hi of
activated ERCs.
4. Waiting. This strategy avoids repairing a non-evaluable solution by freezing the
optimization (i.e., incrementing the time counter without evaluating a solution) until
the activation periods of all ERCs violated by the solution have passed. It is easy to
see that waiting prevents drift-like effects in the search direction caused by ERCs, but
this might be associated with a smaller number of solutions being evaluated, which
can be a drawback if optimization time is limited.
5. Penalizing. Similar to waiting, this strategy avoids repairing but, instead of freezing the optimization, a non-evaluable solution is penalized by assigning a poor objective value c to it. The effect is that non-evaluated solutions will be allowed to enter the
population but are unlikely to survive for many generations are selected as parents
due to their poor quality. This strategy can be regarded as a static penalty function
method (Coello 2002).
The advantage of penalizing over waiting is that the optimization does not freeze
upon encountering a non-evaluable solution; i.e., the solution generation process
continues and thus solutions might actually be evaluated (without needing to penalize
them) during an activation period. However, since evaluated solutions will have to
fall into the schemata Hi of all currently activated ERCs, penalizing might be subject
to drift-like effects, thus potentially losing the advantage of waiting.

4.5.1 Evaluation of Static Constraint-Handling Strategies


Experimental setup. To evaluate the different strategies for commitment relaxation
and periodic ERCs we augment them on a standard EA that uses a ( + )-ES
reproduction scheme for environmental selection, binary tournament selection (with
replacement) for parental selection, which has shown to be a robust operator in
the theoretical study, uniform crossover (Syswerda 1989) and bit flip mutation. The
parameter settings of the EA are given in Table 4.1. Regarding the constraint-handling
strategy, regenerating uses L = 10,000 regeneration trials (before applying forcing),

4 Ephemeral Resource Constraints in Optimization


Table 4.1 EA parameter
settings as used in the study
of static constraint-handling
strategies

119

Parameter

Setting

Parent population size


Offspring population size
Per-bit mutation probability
Crossover probability

50
50
1/l
0.7

the subpopulation strategy a subpopulation size of J = 30, and penalizing a fitness


value of c = 0 for non-evaluable solutions; these settings have been found to yield
generally robust and good results.
With regard to test functions, it might be considered ideal to use a set of real
experimental problems featuring real resource constraints. However, this approach
is generally not realistic due to the time and/or budgetary burden associated with
physical experimentation. Hence, our studies presented in this and the subsequent
sections will use a range of more familiar artificial test problems. In this section we
show results obtained on the OneMax problem, augmented with ERCs. However,
the impact of the same ERC type on performance tends to be similar for different
problem types, and the interested reader is referred to Allmendinger and Knowles
(2013) for additional results obtained for TwoMax, MAX-SAT, and NK landscapes,
as well as a study involving data and ERCs from a real closed-loop problem.
Experimental results. Figure 4.12 shows how different configurations of a commitment relaxation ERC impact the performance of the static constraint-handling
strategies on the OneMax problem; in this experiment the order-defining bits of a
constraint schema H represented poor genetic material, i.e., 0-bits on the OneMax
problem. From the figure it is apparent that ERCs impact search performance negatively, and clear patterns emerge relating ERC parameters to performance effects:
Altering the order of the constraint schema o(H) controls the trade-off between the
probability of activating an ERC (probability decreases exponentially with o(H))
and the probability that an activation causes a performance impact (probability
is greater for low orders o(H)). This causes the performance to degrade up to an
order of o(H) 4 for strategies that apply repairing and lower orders for waiting
and penalizing, and then again to improve for higher orders (see top left plot).
The performance of waiting only depends on the probability of activating an ERC.
As this probability is largest at o(H) = 1, the performance is poorest at o(H) = 1
and improves exponentially thereafter.
The epoch duration V is correlated positively with the length of an activation
period, causing the performance of a strategy to decrease with increasing V (see
top right plot). Longer activation periods cause waiting to freeze the optimization
for longer and thus result in a poorer performance. The performance of the other
strategies reduces until a certain level beyond which further increases in V have
no effect.
Increasing the recovery time improves the performance of all strategies with
recovery speed being a function of the effort needed to escape from a (semi-)
homogeneous population state (see bottom left plot).

120

R. Allmendinger and J. Knowles


commRelaxERC(0,700,15,H=(0o(H)***...))

commRelaxERC(0,700,V,H=(00***...))

Average best solution fitness

Average best solution fitness

0.95

0.9
Forcing
Regenerating
Waiting
Subpopulation strategy
Penalizing

0.85

0.8

0.98
0.96
0.94
0.92
0.9
0.88
0.86
0.84

10

12

14

Order of constraint schema H, o(H)

Average best solution fitness

Average best solution fitness

commRelaxERC(0,700,15,H=(00***...))

1
0.98
0.96
0.94
0.92
0.9
700

10

15

20

Epoch duration V
start
start
commRelaxERC(tstart
ctf ,tctf +700,15,H=(00***...)), T = tctf +700
1

0.98
0.96
0.94
0.92
0.9

800

900

1000

Optimization time T

1100

1200

100

200

300

400

500

Start of constraint time frame tstart


ctf

Fig. 4.12 Plots showing the average best solution fitness found (across 500 EA runs) and its
standard error on OneMax as a function of the order of the constraint schema o(H) (top left), the
epoch duration V (top right), the optimization time T (bottom left), and the start of the constraint
start (bottom right). Note, while the optimization time in the top plots is fixed to
time frame tctf
T = 700 evaluations, the parameter T varies in the bottom plots. For each setting shown on the
abscissa, a Friedman test (significance level of 5 %) has been carried out. In the top left plot, waiting
performs best in the range 2 < o(H) < 6, while, in the top right plot, it performs best in the range
2 < V < 12 with the subpopulation strategy being best in the range V > 12. In the bottom left
plot, the subpopulation strategy performs best for T = 750, while in the bottom right plot, waiting
start < 300. There is no clear winner for the other settings
performs best in the range 0 < tctf

Shifting the start time of the constraint time frame further to the end of the optimization decreases the probability of activating a commitment relaxation ERC
that is associated with a poor constraint schema and thus has a beneficial impact
on the performance of all strategies (see bottom right plot).
Figure 4.13 analyzes the performance impact of ERCs with constraint schemata that
represent both good and poor genetic material, i.e., 0 and 1-bits are present in H. It is
obvious from the figure that the performance is affected most significantly for loworder schemata regardless of the quality of the genetic material they represent, and
schemata of higher order given they represent good genetic material (i.e., schemata
along or near the diagonal). Other schemata setups have little or no performance
impact as they do not lie on an optimizers search path, reducing the probability of
activating the associated ERC.

4 Ephemeral Resource Constraints in Optimization

121

Forcing - Average best solution fitness

Picking schemata
at random

#1s in H

15

10

Waiting - Average best solution fitness


1
0.99
0.98
0.97
0.96
0.95

15

0.94
0.93
0.92
0.91
0.9

20

#1s in H

20

Picking schemata
at random

10

0.84
0.82
0.8

0
5
10
15
20
Order of constraint schema H, o(H)

0.98
0.96
0.94
0.92
0.9
0.88
0.86

0
5
10
15
20
Order of constraint schema H, o(H)

Fig. 4.13 Plots showing the average best solution fitness obtained (across 500 EA runs) by
forcing (left) and waiting (right) on OneMax (with l = 30 bits) as a function of the order of
the constraint schema o(H), and the number of order-defining bits in H with value 1 for the
ERC commRelaxERC(0, 700, 15, H). The straight line represents the expected performance when
picking a schema (i.e., the order-defining bits and their values) with a particular order at random.
The performance obtained in an unconstrained environment is represented by the square at
o(H) = #1s = 0

From Fig. 4.14 we can see that the performance of the strategies is affected differently when the activation period is set deterministically as done by periodic ERCs.
From the left plot we can clearly see that waiting performs worst for all ERC settings. This is due to the high probability of encountering a non-evaluable solution
during the activation period and subsequently freezing the optimization regardless of

Forcing
Regenerating
Waiting
Subpopulation strategy
Penalizing

0.99
0.98

15

Picking schemata
at random

0.99
0.98

#1s in H

Average best solution fitness

Subpop. strategy - Average best solution fitness


1
20

perERC(0,700,20,50,H=(0o(H)***...))

0.97
0.96

10

0.97
0.96

0.95

5
0.95

0.94

0.93
0

4
6
8
10
12
Order of constraint schema H, o(H)

14

0.94
0
5
10
15
20
Order of constraint schema H, o(H)

Fig. 4.14 The left plot shows the average best solution fitness found and its standard error (across
500 EA runs) on OneMax (with l = 30 bits) as a function of the order of the constraint schema
o(H). For each setting shown on the abscissa, a Friedman test (significance level of 5 %) has been
carried out revealing that the subpopulation strategy performs best for o(H) = 2; there are no clear
winners for the other settings. The right plot shows the average best solution fitness obtained by
the subpopulation strategy as a function of both o(H) and the number of order-defining bits in
H with value 1 for the ERC perERC(0, 700, 20, 50, H). The straight line represents the expected
performance when picking a schema (i.e., the order-defining bits and their values) with a particular
order at random

122

R. Allmendinger and J. Knowles

the order and genetic material represented by a constraint schema. The performance
of the other strategies decreases more smoothly as a function of the order and the
quality of the genetic material represented, as can be seen from the right plot for the
subpopulation strategy.

4.6 Learning-Based Constraint-Handling Strategies


The previous section provided evidence that it is possible to select a suitable (static)
constraint-handling strategy for an ERCOP offline if the ERCs are known in advance.
Inspired by this observation, this section outlines two strategies that learn either
offline (using a reinforcement learning approach) or online (using a multi-armed
bandit algorithm) when to switch between the static constraint-handling strategies
during the optimization process. Finally, the strategies are investigated for commitment relaxation ERCs.
Offline learning-based strategy. To learn offline when to switch between static
constraint-handling strategies during an optimization run, we use the tabular reinforcement learning (RL) algorithm, Sarsa() (Rummery and Niranjan 1994; Sutton
and Barto 1998). The general goal of an RL algorithm is to learn some optimal
policy , a mapping from an environmental state s S to an action a A(s), so
as to maximize some reward R. Sara() achieves this goal by estimating a so-called
action-value function Q(s, a), which represents the expected reward received after
taking action a in state s and following some policy thereafter.
To employ an RL algorithm we need to define a state s, the possible actions a,
and the reward R. Here, we characterize a state by the current population average
fitness and the current time step; we assume that fitness values lie in the interval
[0, 1], and that the optimization time is limited by T . To keep the number of total
states manageable, we bin both variables into 5 equally-sized intervals, resulting
in 25 states in total. In each state, we provide the agent with 5 actions, which are
the static constraint-handling strategies. The reward shall be the average fitness of
the final population to reflect our aim of performing well at the end of the search.
Alternatively, the reward may be the best solution fitness found.
We want to point out that some aspects need further consideration when
applying RL to dynamic constraints, such as ERCs. First, the number and set of
states visited during the optimization depend on how often and when non-evaluable
solutions are encountered during the search, and thus may vary with each optimization run. Secondly, if a non-evaluable solution is encountered, then the first action
(i.e., constraint-handling strategy) selected in a particular state is applied to all nonevaluable solutions encountered in that state.
Online learning-based strategy. To learn online when to switch between static
constraint-handling strategies, we consider the learning problem as a multi-armed
bandit (MAB) problem with the static strategies serving as independent arms. To
tackle the problem we employ an adaptive operator selection method known as the

4 Ephemeral Resource Constraints in Optimization

123

dynamic multi-armed bandit (D-MAB) algorithm (Hartland et al. 2006, 2007; Costa
et al. 2008). The goal of the algorithm is to maximize the sum of rewards received
over a number of actions (or arms played) taken. D-MAB is dynamic in the sense
that it monitors the sequence of rewards obtained using statistical testing, and then
restarts the MAB on detecting a significant deviation in the sequence.6
Unlike the RL agent, a MAB algorithm requires that the play of an arm is followed
by a subsequent reward. We provide a reward immediately after the play of an arm,
and it is the raw fitness of the resulting solution, which is a common credit assignment
scheme.
Note, some alternative common credit assignment schemes are not directly
applicable in the presence of ERCs, such as ones that assign a credit based on the
fitness improvement of an offspring compared to its parent after applying a variation
operator to it. With ERCs, the parent would be the individual that is to be repaired and
the offspring the repaired individual after applying a constraint-handling strategy to
the parent. As we do not know the fitness of the parent because it is non-evaluable,
we cannot quantify by how much its fitness differs from the one of the repaired
individual.

4.6.1 Evaluation of Learning-Based Strategies


Experimental setup. To evaluate the learning-based strategies for commitment
relaxation ERCs we use the same experimental setup as used in the previous section
(see Table 4.1) with the difference that the EA is equipped with an elitist reproduction
scheme, i.e., = 1. The reason for using a modified setup is that we specifically
tuned the EA to perform well on the test problems considered in this section.
For the RL-based strategy, denoted here by RL-EA, we use a training and testing
scheme (similar to Pettinger and Everson (2003)). In the training phase (consisting
of 5,000 EA runs), the RL agent estimates the action-value function Q(s, a), while
in the testing phase (consisting of 100 EA runs), the Q-function is frozen and the
greedy actions a are always selected in each state.7
Experimental results. Suppose we are faced with a closed-loop scenario that
is subject to the following two, a priori known, commitment relaxation ERCs:
ERC(0, 2000, 20, H = (10101 . . .)) and commRelaxERC(0, 2000, 20, H =
( . . . 101)). That is, one ERC constrains the first 5 solution bits, while the other
the last 3 bits. These two ERCs are inspired by change-over restrictions of instrument
parameters encountered in the closed-loop work by OHagan et al. (2005, 2007).
For D-MAB we set the threshold parameter to PH = 0.1, the tolerance parameter to = 0.01,
and the scaling factor to C = 1.
7 RL-EA also employed the -greedy action selection method ( = 0.1), optimistic initial values
for the action-value estimates, and replacing eligibility traces with the eligibility trace being set to
0 at the beginning of each EA run. The decay factor was set to = 1, the discount factor to = 1,
and the learning rate to = 0.1.
6

R. Allmendinger and J. Knowles

Population average fitness

124

comm Relax (0,2000,20, H =( 1010 ...)) ,


comm Relax (0,2000,20, H =( ... 101)) , N = 30, K = 2
1.0
Forcing
0.8
Regenerating
0.6

Waiting

0.4

Subpop.strategy
Penalizing

0.2

Unvisited state during


training phase

0
0

400

800 1200 1600 2000


Timecounter t

Fig. 4.15 A plot showing the greedy actions a learnt by the RL agent for each state s. Training
was done across 5,000 different NK landscapes with N = 30 and K = 2. (For unvisited sates, a
default strategy would need to be selected)

It is unknown whether the schemata associated with the two ERCs represent good
or poor instrument setups. As in OHagan et al. (2005, 2007) we assume that the
fitness landscape to be optimized is subject to epistasis. Please refer to OHagan
et al. (2005, 2007), Allmendinger and Knowles (2011), Allmendinger (2012) for a
detailed description of the closed-loop problem and the ERCs.
We use NK landscapes (Kauffman 1989) to investigate the impact of the two ERCs
as a function of different levels of epistasis. Prior to applying RL-EA online we train
the RL agent offline on 5,000 different NK landscapes with N = 30 and K = 2,
which represent problems with low epistatis. Figure 4.15 shows the greedy actions
(optimal static strategies) a learnt by the agent for each state s during the training
phase. Clear patterns can be observed from the plot: the agent learned to use mainly
waiting at the beginning of the optimization process (to avoid introducing a search
bias early on), penalizing in the middle part of the optimization, and, depending on
the population average fitness, either forcing, waiting, or the subpopulation strategy,
in the final part of the optimization. Other policies, such as using only a repairing
strategy at the beginning of the optimization, were not learnt by the agent as they are
associated with the risk of converging to a homogeneous population state of which
it is difficult to escape if needed (e.g., if schemata represent poor genetic material).
Figure 4.16 compares how the policy learned by the RL agent fares against the
online-learning approach, D-MAB, and the static strategies themselves for NK landscapes with N = 30 and K = {3, 4}; using different problems for training and testing
allows us to assess the robustness of the policy learned. We can see from the plots
that although RL-EA performs poorly at the beginning of the search, at time step
t 800 the performance kicks up due to a change in the static strategy employed,
allowing RL-EA to be the best performing strategy at the end of the search. D-MAB
is not able to perform as well as RL-EA because it selects the currently most useful
static strategy (which is typically a repairing strategy) without accounting for future
consequences this might have. On the other hand, RL-EA is tuned here to optimize

4 Ephemeral Resource Constraints in Optimization


commRelaxERC(0,2000,20,H=(10101***...)),
commRelaxERC(0,2000,20,H=(*...**101)), N=30, K=3

0.7
0.65

Forcing
Regenerating
Waiting
Subpop. strategy
Penalizing
RL-EA
D-MAB
Unconstrained EA

0.6
0.55
0.5

500

1000

1500

0.75

Population average fitness

Population average fitness

0.75

2000

125
commRelaxERC(0,2000,20,H=(10101***...)),
commRelaxERC(0,2000,20,H=(*...**101)), N=30, K=4

0.7
0.65

Forcing
Regenerating
Waiting
Subpop. strategy
Penalizing
RL-EA
D-MAB
Unconstrained EA

0.6
0.55
0.5

Time counter t

500

1000

1500

2000

Time counter t

Fig. 4.16 Plots showing the population average fitness (we do not show the standard error as it was
negligible) obtained by the different constraint-handling strategies on NK landscapes with N = 30
and K = 3 (left) and K = 4 (right) as a function of the time counter t; results are averaged over
100 independent runs using a different randomly generated NK problem instance for each run.
All instances were subject to the commitment relaxation ERCs commRelaxERC(0, 2000, 20, H =
(10101 . . .)) and commRelaxERC(0, 2000, 20, H = ( . . . 101)). The results of Unconstrained EA were obtained by running the EA on the same problem instances but without the ERCs.
According to the Kruskal-Wallis test (significance level of 5 %), the final population average fitness obtained by RL-EA is significantly better than the one obtained with the second best strategy,
waiting, for both problems

the final performance only allowing it to adjust to the problem at hand. For instance, if
the would shorten the optimization time T , then the RL agent would learn a different
policy, while D-MAB would behave the same.
Overall, the strong performance of the RL-EA is encouraging, but we want to
mention that in order to achieve that performance, some tuning of the agent may be
required. For a more in-depth discussion on this topic and an experimental analysis
of alternative agent settings please refer to Allmendinger and Knowles (2011).

4.7 Online Resource-Purchasing Strategies


In this section, our focus shifts to online resource-purchasing strategies to cope with
commitment composite ERCs (see Sect. 4.3.3 for a description of the ERC). We give a
brief description of the strategies only, and refer the interested reader to Allmendinger
and Knowles (2010) for details.
To deal with this ERC a strategy needs to address three aspects:
1. Decide when and which composite (defined by a high-level constraint schema
H# ) is ordered thereby accounting for a lag of TL time steps for the composite to
arrive, and a budget of C limiting the usage of the composites.
2. Determine the storage cell into which a composite is stored once it arrives. As
the number of composites that can be maintained simultaneously is limited by
the number of storage cells #SC, this may also mean to decide which of the
storage cells is to emptied, i.e., which composite is removed, to make space for a

126

R. Allmendinger and J. Knowles

newly arrived composite. Recall that a composite is removed automatically from


a storage cell after a shelf life of SL time steps and/or after is has been reused RN
times.
3. Deal with non-evaluable solutions, e.g., by selecting an alternative composite
from the storage.
We summarize and evaluate three resource-purchasing strategies (for use in a generational EA) that address the above-mentioned aspects in different ways: a just-in-time
strategy, a just-in-time strategy with repairing, and a sliding window strategy.
Just-in-time (JIT) strategy. This strategy avoids repairing by first scheduling the
evaluation of solutions intelligently and then making purchase orders so that composites arrive just in time for the scheduled experiment time. The scheduling involves
to arrange solutions of a population into contiguous groups based on the composites
they require so as maximize the availability of resources. For example, if a, b, c,
and d represent different composites required by solutions, then a potential grouping
would be bbbaddcc . . . . If composites are available in the storage cells because we
have ordered them previously (we call such composites old composites), then the
scheduler aims at using up these first so as to reduce the number of purchase orders
made. For example, suppose the composites aadcac are required, and composite c
is available in one of the storage cells and has 3 uses and 5 time steps of its shelf life
remaining. Then, by evaluating the solutions requiring c first, the evaluation schedule
ccdaaa will save us a purchase order since only two c composites are needed. At
any given time, JIT (and JIT with repairing) ensure that non-identical composites are
kept in storage.
Once an ordered composite arrives, it is stored in an empty storage cell or, if no
cell is empty, replaces an old composite that can be used in the smallest number of
evaluations within the subsequent generation. That is, in the latter case we account
for the remaining reuses and shelf lifes of old composites.
Just-in-time strategy with repairing (JITR). Avoiding repairing as done by JIT
may result in a waste of composite reuses as well as optimization time spent waiting
for composite orders to arrive. For example, suppose each solution of a population
requires a different composite, then up to (RN 1) reuses might be wasted. The
JIT with repairing (JITR) strategy aims at reducing wastage by repairing solutions
such that they use a composite that is nearly the one required (while maintaining
the remaining mechanisms of JIT). Solutions to be repaired are identified by first
clustering their composites using k-medoids (Kaufman and Rousseeuw 1990), and
then trying to find an assignment of solutions to clusters that minimize the total
Hamming distance of all repairs. The medoid composite of a cluster is the composite
that would be used to repair (using the static constraint-handling strategy, forcing) all
solutions in that cluster that require a different composite. To be able to control the
number of repairs needed to perform, we perform several rounds of clustering and
solution-to-cluster assignments for different values of k. The cluster configuration
with the smallest weighted sum score of the total Hamming distance of all repairs
and the number of clusters k is the one according to which we repair. Annealing the

4 Ephemeral Resource Constraints in Optimization

127

weighting factor involved in the weighted sum as a function of optimization time


allows us to keep the number of repairs low at the beginning of the search (i.e.,
strive for cluster configurations with many clusters and small total Hamming way
distances) and increase it toward the end (i.e., strive for cluster configurations with
few clusters and a large total Hamming way distances), which is a good strategy as
we have seen in the previous section.
Sliding window (SW) strategy. Unlike JIT and JITR, the sliding window (SW) strategy submits solutions for evaluation in the order they are generated by the EA, and
non-evaluable solutions are always repaired. To facilitate this process, the strategy
aims to maintain the most useful composites in storage by (i) ordering composites
pre-emptively every min(RN, SL) time steps so as to avoid empty storage cells and
(ii) ensuring that storage cells are filled with composites that were recently requested
by the optimizer.
To achieve the second aspect we maintain a sliding window defined here as a set
(t) containing composites that were requested most recently but were unavailable
at the time of the request. Consequently, whenever new composites are needed we
order the ones from (t) that have been added to this set most recently. To avoid
ordering the same composites, which results in a loss of the population diversity,
we apply mutation to the composites from (t) before ordering them (for simplicity
reasons we use a fixed per-bit mutation rate of 0.05).
We replace all composites in the storage cells upon the arrival of new composites.
In case a non-evaluable solution is encountered, we repair it by forcing it to use
a composite from the storage cell that has the smallest Hamming distance to the
actually required composite.

4.7.1 Evaluation of Online Resource-Purchasing Strategies


Experimental setup. We augment the three online resource-purchasing strategies on
the same elitist generational EA as used in Sect. 4.5. As the test problem we consider
a MAX-SAT (Zhang 2001) problem instance with l = 50 binary variables.8 We
choose the order-defining bits of the high-level constraint schema H# at random at
each run but, of course, use the same schemata across the strategies analyzed.
Experimental results. First, we want to investigate how the key parameters of a commitment composite ERC affect the performance of the three conceptually different
online resource-purchasing strategies. With SW the performance depends crucially
on the number of storage cells #SC and the reuse number RN, as can also be observed
from the left plot of Fig. 4.17; SW performs better as the number of storage cells
increases and/or the reuse number decreases. The reason for this pattern is that, with
8

The instance considered is a uniform random 3-SAT problem and can be downloaded online at
http://people.cs.ubc.ca/~hoos/SATLIB/benchm.html; the name of the instance is uf50-218/uf5001.cnf. The instance consists of 218 clauses and is satisfiable. We treat this 3-SAT instance as a
MAX-SAT optimization problem, with fitness calculated as the proportion of satisfied clauses.

128

R. Allmendinger and J. Knowles


Sliding window - Probability of achieving the
population average fitness of an ERC-free search
0.6
64
0.5
0.4
40
0.3
28
0.2
16

0.1

0
3

12
21
30
39
48
Number of storage cells #SC

0.5

52
Reuse number RN

52
Reuse number RN

Just-in-time - Probability of achieving the


population average fitness of an ERC-free search
0.6
64

0.4
40
0.3
28
0.2
16

0.1

0
0

12

24
36
Time lag TL

48

Fig. 4.17 Plots showing the probability of SW (left) and JIT (right) of achieving the population
average fitness of our base algorithm obtained in an ERC-free environment given a budget and
time limit of C = T = 1,500. For SW this probability is shown as a function of #SC and RN for
the ERC commCompERC(o(H# ) = 30, #SC, TL =10, RN, SL = RN), and for JIT it is shown as a
function of TL and RN for the ERC commCompERC(o(H# ) = 10, #SC =10, TL, RN, SL = RN);
cost parameters were set to corder = 0, ctime_step = 1, and C = 1,500

SW, more storage cells means that the probability of having a required composite
available increases, which in turn reduces the number of repairs. On the other hand, a
smaller reuse number (or shorter shelf life SL) shortens the time gap between asking
for a composite, i.e., adding it to the sliding window, and having it available in a
storage cell.
The performance of a just-in-time strategy, such as JIT and JITR, depends largely
on the time it takes for a resource to arrive once ordered. Consequently, we observe
from the right plot of Fig. 4.17 that the performance of JIT (also for JITR) improves
with shorter time lags TL. An increase in the reuse number RN (or shelf life SL)
yields a slight performance improvement too. The reason for this is that composites
can be kept for longer in the storage cells and thus allow for a more efficient usage of
old composites. A similar effect can be achieved by increasing the number of storage
cells SC (results not shown here).
While JIT and JITR perform similarly for large budgets, there are differences
for scenarios where budget is a limiting factor as can be seen from the right plot of
Fig. 4.18. For small budgets, in the range 0 < c 600, 0 ctime_step 0.5, JITR is
able to outperform JIT as repairing allows the evaluation of more solutions while JIT
would have to wait for suitable composites to arrive. The weak performance of JIT
for small budgets is also apparent when comparing it to SW (left plot of Fig. 4.18).
For large budgets c > 1,200, JIT is able to match and sometimes even outperform
JITR and SW as it does not introduce any search bias coming from repairing.
In the previous experiment, the number of storage cells was relatively low, which
is beneficial for SW. An increase in #SC means that more composites are regularly
ordered to fill all the storage cells. This approach is expensive and dampens the
performance of SW when compared to JIT (and JITR) as can be observed from
Fig. 4.19.

4 Ephemeral Resource Constraints in Optimization

Cost per time step ctime_step

1.4

1.6

1.4

10

1.2
1

10

0.8
101

0.6
0.4

10

0.2
0

Ratio P(f(x) > fJITR)/P(f(x) > fJIT)

104
Cost per time step ctime_step

Ratio P(f(x) > fJIT)/P(f(x) > fSW)


1.6

400
800
1200
Cost counter c

101
0

10

1.2
1

-1

10

0.8
10-2

0.6
0.4

-3

10

0.2
0

-1

10
0

129

-4

10

1600

400
800
1200
Cost counter c

1600

Fig. 4.18 Plots showing the ratio P(f (x) > fJIT )/P(f (x) > fSW ) (left) and P(f (x) > fJITR )/P(f (x) >
fJIT ) (right) as a function of c and ctime_step for the ERC commCompERC(o(H# ) = 10, #SC =
5, TL = 5, RN = 30, SL = 30) and corder = 1. Here, x is a random variable that represents
solutions drawn uniformly at random from the search space and f the population average fitness
obtained with policy . If P(f (x) > f )/P(f (x) > f ) > 1, then strategy is able to achieve a
higher average best solution fitness than strategy and a greater advantage of is indicated by a
darker shading in the heat maps; similarly, if P(f (x) > f )/P(f (x) > f ) < 1, then is better than
and a lighter shading indicates a greater advantage of

4.8 Conclusion
In this chapter we have considered a new type of (dynamic or temporary) constraint
that differs in several aspects from the traditional hard and soft constraints. Hard
constraints define the feasible region in the search space, and soft constraint express
objectives or preferences on solutions, while the constraints we discussed here specify
the set of solutions in the search space that can be evaluated at any moment in time.
That is, a solution that violates one of these constraints cannot be evaluated at the
moment although it may be a feasible solution to the problem. This constraint type
is called ephemeral resource constraint (or ERC) and is commonly encountered
in closed-loop optimization problems, where it models limitations on the resources
needed to construct and/or evaluate solutions.

Ratio P(f(x) > fJIT)/P(f(x) > fSW) at C=1500

Order of high level


constraint schema H# , o(H#)

Fig. 4.19 A plot showing


the ratio P(f (x) > fJIT )/
P(f (x) > fSW ) as a function
of the number
of #SC and o(H# ) for the ERC
commCompERC(o(H# ), #SC,
TL = 25, RN = 25, SL = 25),
corder =ctime_step = 1,
C = 1,500. Please refer to
the caption of Fig. 4.18 for
an explanation of the
performance metric

50

10

42

10

34
10

26
10

18

10

10
2
3

12

21

30

39

Number of storage cells #SC

48

10

-1

-2

-3

130

R. Allmendinger and J. Knowles

We pursued three goals in this chapter. First, we have summarized the framework
and terminology for describing ERC problems, and defined three ERC types that arise
commonly in practical applications including (i) absence of resources at regular time
intervals (periodic ERCs), (ii) temporary commitment to a certain resource triggered
on using that resource (commitment relaxation ERCs), and (iii) an ERC where costly
resources need to be purchased in advance, kept in capacity-limited storage, and
used up within a certain number of experiments or a fixed time frame (commitment
composite ERCs).
Secondly, we have extended our previous work with a theoretical study focused on
understanding the fundamental effects of ERCs on simple evolutionary algorithms
(EAs). Using the concept of Markov chains, the study concluded that (i) an order
relation-based selection operator, such as tournament selection, is more robust to
simple ERCs than a fitness proportionate-based selection operator, and (ii) while an
EA with a non-elitist generational reproduction scheme converges more quickly to
some optimal population state than with a non-elitist steady state scheme when the
ERC is active, the opposite is the case when the ERC is inactive. This result implies
that ERCs should be accounted for when tuning EAs for ERCOPs.
Third, we have summarized and evaluated empirically several of the constrainthandling methods we have proposed for handling ERCs including static and learningbased strategies (Sects. 4.5 and 4.6), as well as resource-purchasing strategies for
dealing with commitment composite ERCs (Sect. 4.7). Generally, the empirical study
revealed that ERCs affect the performance of an optimizer and that different strategies should be favored as a function of the ERC and its parameters. Moreover, we
have demonstrated here and in more detail in our previous work (Knowles 2009;
Allmendinger and Knowles 2010, 2011, 2013) that the effect of a particular ERC
is similar across different problem types, meaning that knowing about the ERC is
sufficient to select a constraint-handling strategy. Overall, we can therefore say that
if the ERCs are known in advance, then a promising strategy is one that learns offline
how to deal best with the ERCs during the optimization. As an example, in this
chapter we have seen that good results can be achieved with a reinforcement learning approach that learns offline when to switch between different static strategies
during the optimization.

4.8.1 Future Work


Although we have established some of the building blocks for dealing with ERCs,
there remains much else to learn about the effects of ERCs on search and how to
handle them. We now discuss several directions for future research toward achieving
this goal.
Gaining a more robust understanding for the search strategies developed. To
gain a more robust understanding of the behavior of the search strategies developed, it
would be beneficial to consider further and perhaps more realistic fitness landscapes

4 Ephemeral Resource Constraints in Optimization

131

(featuring also real or mixed integer variables) than the ones we considered so far. Of
course, it would be ideal to validate the search strategies on real-world closed-loop
problems featuring real resource constraints. However, this approach is generally not
realistic due to time and/or budgetary requirements. The next best thing we can do is
to simulate a fitness landscape based on data obtained from real-world experiments.
This is the approach we have taken in Allmendinger and Knowles (2011), and more
studies of this kind are needed.
Further theoretical analysis of resourcing issues. In Sect. 4.4 we have used Markov
chains to analyze theoretically the effect of a particular ERC type on simple EAs.
Although our analysis used a simplified optimization environment (two solution types
only), valuable observations were made with respect to the applicability of different
selection and reproduction schemes. We also gained some understanding about the
impact of ERCs on evolutionary search, which ultimately, may help us in the design
of effective and efficient search strategies for closed-loop optimization. However,
our theoretical results were limited in the sense that we did not derive mathematical equations relating, for instance, ERC configurations to optimal EA parameter
settings. It remains to be seen whether it is possible to derive such expressions, and
how applicable they would be in practice. A number of recent advances in EA theory
might present the possibility of understanding ERCs more deeply, including drift
analysis (Auger and Doerr 2011) and the fitness level method (Chen et al. 2009;
Lehre 2011).
Understanding the effects of non-homogeneous experimental costs in closedloop optimization. So far, we have made the assumption that all solution evaluations
take equal time or resources. This need not be the case. For instance, when dealing
with commitment composite ERCs, it is a very realistic scenario that the composites
to be ordered vary in their prices and delivery periods. Under a limited budget, this
scenario might cause an optimizer not only to follow fitness gradients but also to
account for variable experimental costs. Hence, further work should investigate how
to trade-off these two aspects effectively. For inspiration, we may look at strategies
employed in the Robot Scientist study (King et al. 2004), where this scenario has
been encountered within an inference problem rather than an optimization problem.
Broadening the application of machine learning and surrogate modeling techniques in closed-loop optimization. We have shown (in Sect. 4.6) that evolutionary
search augmented with machine learning techniques, such as reinforcement learning
(RL), can be a powerful optimization tool to cope with ERCs. To increase the applicability of learning-based optimizers to different types of optimization problems, one
could also try combining offline learning with online learning. For instance, RL
can be used to learn offline a policy until some distant point in time, and this policy can then be refined or slightly modified online using the anticipation approach
of (Bosman 2005). Another avenue worth pursuing is to extend an optimizer with
surrogate modeling techniques (Jin 2011) in order to help cope with ERCs. In the
simplest case, surrogate modeling would be used to approximate the objective values
of solution that cannot be evaluated due to a lack of resources. More sophisticated

132

R. Allmendinger and J. Knowles

approaches might use surrogate modeling to scan the search space for promising
regions from which solutions are then created. If the active ERCs are known, or can
be well predicted, then scanning can be used to avoid the non-evaluable parts of the
search space, while still concentrating the search on the most promising areas in
terms of fitness.

References
Allmendinger R (2012) Tuning evolutionary search for closed-loop optimization. PhD thesis,
Department of Computer Science, University of Manchester, UK
Allmendinger R, Knowles J (2010) On-line purchasing strategies for an evolutionary algorithm
performing resource-constrained optimization. In: Proceedings of parallel problem solving from
nature, pp 161170
Allmendinger R, Knowles J (2011) Policy learning in resource-constrained optimization. In: Proceedings of the genetic and evolutionary computation conference, pp 19711978
Allmendinger R, Knowles J (2013) On handling ephemeral resource constraints in evolutionary
search. Evol Comput 21(3):497531
Auger A, Doerr B (2011) Theory of randomized search heuristics. World Scientific, Singapore
Bck T, Knowles J, Shir OM (2010) Experimental optimization by evolutionary algorithms.
In: Proceedings of the genetic and evolutionary computation conference (companion),
pp 28972916
Bedau MA (2010) Coping with complexity: machine learning optimization of highly synergistic
biological and biochemical systems. In: Keynote talk at the international conference on genetic
and evolutionary computation
Borodin A, El-Yaniv R (1998) Online computation and competitive analysis. Cambridge University
Press, Cambridge
Bosman PAN (2005) Learning, anticipation and time-deception in evolutionary online dynamic
optimization. In: Proceedings of genetic and evolutionary computation conference, pp 3947
Bosman PAN, Poutr HL (2007) Learning and anticipation in online dynamic optimization with
evolutionary algorithms: the stochastic case. In: Proceedings of genetic and evolutionary computation conference, pp 11651172
Branke J (2001) Evolutionary optimization in dynamic environments. Kluwer Academic Publishers,
Dordrecht
Caschera F, Gazzola G, Bedau MA, Moreno CB, Buchanan A, Cawse J, Packard N, Hanczyc MM
(2010) Automated discovery of novel drug formulations using predictive iterated high throughput
experimentation. PLoS ONE 5(1):e8546
Chen T, He J, Sun G, Chen G, Yao X (2009) A new approach for analyzing average time complexity
of population-based evolutionary algorithms on unimodal problems. IEEE Trans Syst Man Cybern
B 39(5):10921106
Coello CAC (2002) Theoretical and numerical constraint-handling techniques used with evolutionary algorithms: a survey of the state of the art. Comput Methods Appl Mech Eng
191(1112):12451287
Costa LD, Fialho A, Schoenauer M, Sebag M (2008) Adaptive operator selection with dynamic
multi-armed bandits. In: Proceedings of genetic and evolutionary computation conference,
pp 913920
Davis TE, Principe JC (1993) A Markov chain framework for the simple genetic algorithm. Evol
Comput 1(3):269288
Doob JL (1953) Stochastic processes. Wiley, New York
Finkel DE, Kelley CT (2009) Convergence analysis of sampling methods for perturbed Lipschitz
functions. Pac J Optim 5:339350

4 Ephemeral Resource Constraints in Optimization

133

Goldberg DE, Segrest P (1987) Finite Markov chain analysis of genetic algorithms. In: Proceedings
of the international conference on genetic algorithms, pp 18
Hartland C, Gelly S, Baskiotis N, Teytaud O, Sebag M (2006) Multi-armed bandits, dynamic
environments and meta-bandits. In: NIPS workshop online trading of exploration and exploitation
Hartland C, Baskiotis N, Gelly S, Sebag M, Teytaud O (2007) Change point detection and metabandits for online learning in dynamic environments. In: CAp, pp 237250
He J, Yao X (2002) From an individual to a population: an analysis of the first hitting time of
population-based evolutionary algorithms. IEEE Trans Evol Comput 6(5):495511
Herdy M (1997) Evolutionary optimization based on subjective selection-evolving blends of coffee.
In: European congress on intelligent techniques and soft computing, pp 640644
Holland JH (1975) Adaptation in natural and artificial systems. MIT Press, Boston
Horn J (1993) Finite Markov chain analysis of genetic algorithms with niching. In: Proceedings of
the international conference on genetic algorithms, pp 110117
Jin Y (2011) Surrogate-assisted evolutionary computation: recent advances and future challenges.
Swarm Evol Comput 1(2):6170
Judson RS, Rabitz H (1992) Teaching lasers to control molecules. Phys Rev Lett 68(10):15001503
Kauffman S (1989) Adaptation on rugged fitness landscapes. In: Lecture notes in the sciences of
complexity, pp 527618
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley,
New York
King RD, Whelan KE, Jones FM, Reiser PGK, Bryant CH, Muggleton SH, Kell DB, Oliver SG
(2004) Functional genomic hypothesis generation and experimentation by a robot scientist. Nature
427:247252
Klockgether J, Schwefel H-P (1970) Two-phase nozzle and hollow core jet experiments. In: Engineering aspects of magnetohydrodynamics, pp 141148
Knowles J (2009) Closed-loop evolutionary multiobjective optimization. IEEE Comput Intell Mag
4(3):7791
Lehre PK (2011) Fitness-levels for non-elitist populations. In: Proceedings of the conference on
genetic and evolutionary computation, pp 20752082
Liepins GE, Potter WD (1991) A genetic algorithm approach to multiple-fault diagnosis. In: Handbook of genetic algorithms, pp 237250
Mahfoud SW (1991) Finite Markov chain models of an alternative selection strategy for the genetic
algorithm. Complex Syst 7:155170
Michalewicz Z, Schoenauer M (1996) Evolutionary algorithms for constrained parameter optimization problems. Evol Comput 4(1):132
Nakama T (2008) Theoretical analysis of genetic algorithms in noisy environments based on
a Markov model. In: Proceedings of the genetic and evolutionary computation conference,
pp 10011008
Nguyen TT (2010) Continuous dynamic optimisation using evolutionary algorithms. PhD thesis,
University of Birmingham
Nix A, Vose MD (1992) Modeling genetic algorithms with Markov chains. Ann Math Artif Intell
5:7988
Nocedal J, Wright SJ (1999) Numerical optimization. Springer, New York
Norris JR (1998) Markov chains (Cambridge Series in Statistical and Probabilistic Mathematics).
Cambridge University Press, Cambridge
OHagan S, Dunn WB, Brown M, Knowles J, Kell DB (2005) Closed-loop, multiobjective optimization of analytical instrumentation: gas chromatography/time-of-flight mass spectrometry of
the metabolomes of human serum and of yeast fermentations. Anal Chem 77(1):290303
OHagan S, Dunn WB, Knowles J, Broadhurst D, Williams R, Ashworth JJ, Cameron M, Kell DB
(2007) Closed-loop, multiobjective optimization of two-dimensional gas chromatography/mass
spectrometry for serum metabolomics. Anal Chem 79(2):464476
Pettinger JE, Everson RM (2003) Controlling genetic algorithms with reinforcement learning. Technical report, The University of Exeter

134

R. Allmendinger and J. Knowles

Rechenberg I (2000) Case studies in evolutionary experimentation and computation. Comput Methods Appl Mech Eng 24(186):125140
Reeves CR, Rowe JE (2003) Genetic algorithmsprinciples and perspectives: a guide to GA theory.
Kluwer Academic Publishers, Boston
Rummery GA, Niranjan M (1994) On-line Q-learning using connectionist systems. Technical report
CUED/F-INFENG/TR 166, Cambridge University Engineering Department
Schwefel H-P (1968) Experimentelle Optimierung einer Zweiphasendse, Teil 1. AEG Research
Institute Project MHD-Staustrahlrohr 11.034/68, Technical report 35, Berlin
Schwefel H-P (1975) Evolutionsstrategie und numerische Optimierung. PhD thesis, Technical University of Berlin
Shir O, Bck T (2009) Experimental optimization by evolutionary algorithms. In: Tutorial at the
international conference on genetic and evolutionary computation
Shir OM (2008) Niching in derandomized evolution strategies and its applications in quantum
control: a journey from organic diversity to conceptual quantum designs. PhD thesis, University
of Leiden
Small BG, McColl BW, Allmendinger R, Pahle J, Lpez-Castejn G, Rothwell NJ, Knowles J,
Mendes P, Brough D, Kell DB (2011) Efficient discovery of anti-inflammatory small molecule
combinations using evolutionary computing. Nat Chem Biol (to appear)
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Syswerda G (1989) Uniform crossover in genetic algorithms. In: Proceedings of the international
conference on genetic algorithms, pp 29
Syswerda G (1991) A study of reproduction in generational and steady state genetic algorithms.
In: Foundations of genetic algorithms, pp 94101
Thompson A (1996) Hardware evolution: automatic design of electronic circuits in reconfigurable
hardware by artificial evolution. PhD thesis, University of Sussex
Vaidyanathan S, Broadhurst DI, Kell DB, Goodacre R (2003) Explanatory optimization of protein
mass spectrometry via genetic search. Anal Chem 75(23):66796686
Vose MD, Liepins GE (1991) Punctuated equilibria in genetic search. Complex Syst 5:3144
Zhang W (2001) Phase transitions and backbones of 3-SAT and maximum 3-SAT. In: Proceedings
of the international conference on principles and practice of constraint programming, pp 153167

Chapter 5

Incremental Approximation Models


for Constrained Evolutionary Optimization
Sanghoun Oh and Yaochu Jin

Abstract Many real-world scientific and engineering problems are constrained


optimization problems (COPs). To solve these problems, a variety of evolutionary
algorithms have been proposed by incorporating different constraint-handling techniques. However, many of them have difficulties in achieving the global
optimum due to the presence of highly constrained feasible regions in the search
space. To effectively address the low degree of feasibility, this chapter presents an
incremental approximation strategy-assisted constraint-handling method in combination with a multi-membered evolution strategy. In the proposed approach, we
generate an approximate model for each constrained function with increasing accuracy, from a linear-type approximation to a model that has a complexity similar to
the original constraint functions, thereby manipulating the complexity of the feasible
region. Thanks to this property, our constrained evolutionary optimization algorithm
can acquire the optimal solution conceivably. Simulations are carried out to compare
the proposed algorithm with well-known references on 13 benchmark problems and
three engineering optimization problems. Our computational results demonstrate that
the proposed algorithm is comparable or superior to the state of the art on most of
the test problems used in this study and a spring design optimization problem.
Keywords Constrained optimization Evolutionary algorithms Approximation
Surrogate

S. Oh (B)
School of Information and Communications,
Gwangju Institute of Science and Technology, Gwangju 500-712, Korea
e-mail: oosshoun@gist.ac.kr
Y. Jin
Department of Computing, University of Surrey,
Guildford, Surrey GU2 7XH, UK
e-mail: yaochu.jin@surrey.ac.uk
Springer India 2015
R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,
Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_5

135

136

S. Oh and Y. Jin

5.1 Introduction
Evolutionary algorithms (EAs) have been widely employed to solve (COP)s, which
are commonly seen in solving real-world optimization problems (Jin et al. 2010; Oh
et al. 2011). Without loss of generality, COPs can be formulated as a minimization
problem subject to one or more (in)equality constraints as follows:
minimize f (x), x = (x1 , . . . , xn ) R n
subject to h i (x) = 0, i = {1, 2, . . . , r }
g j (x) 0, j = {r + 1, . . . , m},

(5.1)
(5.2)
(5.3)

where R n is n-dimensional search spaces, each design variable is positioned within


the parametric constraints of x i xi x i , i = {1, . . . , n}, f (x) is an objective
function, h i (x) and g j (x) are r equality constraints and m r inequality constraints,
respectively.
In COPs, conventional evolutionary approaches are generally afflicted with
the highly constrained feasibility, particularly those with separated, small feasible
regions. To cope with this limitation, a considerable number of evolutionary optimization algorithms have been suggested by incorporating various constraint-handling
techniques: separation of objective and constraints, special operators and hybrid
techniques (Coello 2002; Michalewicz and Schoenauer 1996).
1. Penalty functions reduce a COP to an unconstrained optimization problem by
penalizing the objective function with a penalty factor of j and constraints violations. The penalized objective function can be defined as follows:
F(x, ) = f (x) +

r


j Hj +

= f (x) +

j Gj

(5.4)

j=r +1

j=1
m


m


j Gj ,

(5.5)

j=1



where H j = |h j (x)| and G j = max 0, g j (x) functions of the constraints
h j and g j , and and are constants which are set to 1 or 2, respectively. By
virtue of introducing a small tolerance value of , equality constraints can be
converted into inequality constraints, i.e., |H j | 0 (Coello 2002). Thus,
given that = = 1, the original formula (5.4) can be reformulated as (5.5),
where Gj indicates inequality constraints Gj {|H j | , G j }. The penalty
function-based approaches may work well for some COPs; however, it is not
straightforward to determine an optimal value for the penalty factor. In particular,
a too small value of may mislead the EA because of insufficient penalty. By
contrast, a too large penalty factor may prevent the EA from finding the optimal
solution. To determine the penalty factor, four types of penalty handling methods

5 Incremental Approximation Models for Constrained Evolutionary Optimization

137

such as death penalties, static penalties, dynamic penalties, and adaptive penalties
are proposed (Coello 2002).
2. Another constraint-handling approach is the separate consideration between the
objective and the constraints during optimization. It is typically categorized by
three major techniques. The first approach was a (SRES) proposed by Runarsson
and Yao in (Runarsson and Yao 2000). The aim of SRES was to balance the influence of the objective function and the constraints in selection by using the dominance comparison between the fitness and constraint violations by the use-defined
parameter of P f . Coello and Montes suggested a method (Coello and Montes
2002) inspired by a well-known constraint technique in the niched-Pareto genetic
algorithm. It designed a new dominance-based selection scheme to integrate constraints into the fitness function used for global optimization. Montes and Coello
introduced another method based on a simple diversity mechanism (Montes and
Coello 2005).
3. A few ad hoc constraint-handling techniques, viz., special representations and
operators, have also been suggested (Coello 2002). This fundamental idea is to
simplify the shape of the feasible search space and to preserve feasible solutions
found during the evolutionary process. Several examples are Daviss work (Davis
and Mitchell 1991), random key (Bean 1994), GENOCOP (Michalewicz 1996),
constraint consistent GAs (Kowalczyk 1997), locating the boundary of the feasible
region (Glover and Kochenberger 1996), and a homomorphous mapping (HM)
to transform COP into an unconstrained one using a high-dimensional cube and
a feasible search space (Koziel and Michalewicz 1999).
4. Finally, hybrid techniques have also been proposed. They combine either a mathematical or heuristic approach such as Lagrangian multipliers (Adeli and Cheng
1994), fuzzy logic (Le 1995), immune system (Smith et al. 1993), cultural algorithms (Reynolds 1994), differential evolutions (Das and Suganthan 2011), and
ant colony optimization (Dorigo and Gambardella 1997).
This chapter is concerned with constraint optimization problems that are affected
by the highly constrained feasible regions, i.e., separated and small feasible regions.
To systematically alleviate the low degree of feasibility, we propose the incremental
approximation model-assisted constraint-handling approach. The model starts with
a rough approximation of the constraints using a linear model. As the evolution
proceeds, the accuracy of the approximate constrained functions should increase
gradually. At the end of the search process, the accuracy of approximate constraint
functions is desired. We term this approach, where an originally stationary optimization problem is converted into a dynamic optimization problem (Paenke et al. 2006;
Nguyen et al. 2012; Jin et al. 2013) to make the problem easier to solve. Here, the
approximate model, also known as a (Jin 2011), plays a key role.
In this study, we adopt two representative methods, i.e., Neural Network and
(GP), for constructing the approximate models. The proposed algorithms have been
compared with a few state-of the-art algorithms on 13 benchmark problems and a
tension/compression design optimization problem.

138

S. Oh and Y. Jin

Use of approximate models or surrogates for solving constrained optimization


problems has been reported. For example, quadratic approximation models have
been used to estimate both the objective function and constraints (Wanner et al.
2005), which has been shown to enhance the convergence performance. In addition,
surrogate models have also been used to approximate computationally expensive
constraint functions in (Goh et al. 2011; Regis 2014). However, none of the above
work intentionally controls the complexity of the approximate model to manipulate
the size of feasible region.
The rest of this chapter is organized as follows. In Sect. 5.2.1, we discuss our
hypothesis and the basic idea of the work, followed by Sect. 5.2.2 that provides a
brief description of the evolutionary algorithm used in this work, and the details of
our approach for COPs are presented in Sect. 5.2.3. Empirical studies on the test
functions and spring design optimization are presented in Sect. 5.3. This paper is
concluded with a brief summary in Sect. 5.4.

5.2 The Proposed Constrained Evolutionary Optimization


Algorithm
5.2.1 Incremental Approximation of the Constraint Functions
The highly constrained feasible regions in COPs, as illustrated in Fig. 5.1, prevent
evolutionary search algorithms from achieving the global optimum (Jin et al. 2010).
Here, 1 is a feasibility proportion in accordance with whole search spaces. To
cope with this problem, we synthetically enlarge the feasible regions by means of
approximating the constraint functions.
In the first stage of evolutionary search, the proposed model endeavors to approximate the original constraint functions roughly by using the small number of sampling
data for training. Step-by-step, we increase the accuracy of approximate constraints
by increasing the number of samplings. In this manner, we are able to secure a large
feasible region in the beginning and resort to the original feasible region at the end
of evolutionary search.
We adopt the incremental approximation technique for accomplishing good
approximate models of constraints since it can satisfy our assumption well; that
is, the accuracy is increased according to the increasing number of training data.
Figure 5.2 shows the procedures of our incremental approximation of nonlinear constraints. In the beginning, a smaller number of training data are sampled from the
constrained functions to obtain the rough approximation of the constraints, as shown
in Fig. 5.2b. As the number of sampled data points increases, our approximation
It is defined as |F|/|S|, where |S| is the random solutions generated (S =1,000,000) and |F| is the
number of feasible solutions found out of the total |S| solutions randomly generated (Michalewicz
and Schoenauer 1996).

5 Incremental Approximation Models for Constrained Evolutionary Optimization

(a)

g1 (x) = (x1 5)2 (x2 5)2 + 100 0


g2 (x) = (x1 6)2 +( x2 5) 82.81 0

139

(b)
g1 (x) = x21 x2 +1 0

Feasibility proportion:

Feasible Regions

= 0.8560%

g2 (x) = 1 x1 (x2 4)2 0

Feasibility proportion:
= 0.0066%

Feasible Regions

Fig. 5.1 Illustrations of feasible regions and feasibility proportion in two benchmark problems.
a Benchmark problem: g06. b Benchmark problem: g08

(a)

(b)
Original Feasible Regions

(c)

g1 (x)

Original Feasible Regions

g1 (x)

Approximate Feasible Regions

Original Feasible Regions


Approximate Feasible Regions

g1 (x)

g1 (x)

g2 (x)

g2 (x)

g2 (x)
g2 (x)

g1 (x)

g2 (x)

Fig. 5.2 Synthetical change of the feasible regions by incremental approximation models of two
constrained functions. a The design space has small feasible regions with two nonlinear constrained
functions. b With a linear approximation of both constraints, the approximated feasible regions
become larger. c The approximate nonlinear constraint functions become more accurate to original
constraints

of the nonlinear constraints becomes more accurate, as described in Fig. 5.2c. Note
however that the system should switch back to the original constraints at the end
of the evolutionary optimization so that the obtained optimal solutions are always
feasible.

5.2.2 Evolution Strategies with Stochastic Ranking


To successfully achieve the global optimum, we adopt a multi-membered evolution strategy (, )-ES based on the stochastic ranking (SR) selection. In our
EOA, each individual is composed of a set of two real-valued vectors (x, ) =
{(x1 , . . . , xn ), (1 , . . . , n )}, where x is the design variable, is the step size, and n

140

S. Oh and Y. Jin

is the dimension of the given problem. In the initialization, both vectors are generated
by a uniform distribution within a lower bound of x j and an upper bound of x j , and

(x j x j )/ n, j = {1, . . . , n}, respectively, where n is the number of decision


variables.
To produce high quality offspring () from parent (), genetic operators such
as a global intermediate recombination and Gaussian mutation are applied. The
former operator generates a new step size as performing the arithmetic average of
both individuals, which are stochastically selected from the parent population. This
operator is formulated as follows:
(g)

h, j =

i, j + k, j
,
2

(5.6)

where h = {1, , }, i = {1, . . . , }, j = {1, . . . , n} and k is a randomly chosen


index from i. Its recombination operator is iterated until offspring are generated.
After the first operator, we will update mean step sizes by virtue of a log-normal
rule (5.7) for the mutation operator.


(g+1)
(g)
(5.7)
h, j = h, j exp  N (0, 1) + N j (0, 1) ,


where each and  is learning rates defined as / 2 n and / 2n, where


is an expected rate of convergence which is set to 1, and N (0, 1) is the normal
distribution with a zero expectation and one variance. Then, each design variable is
mutated in the following manner:
(g+1)

x h, j

(g)

(g+1)

= x h, j + h, j

N j (0, 1).

(5.8)

Next, we employ the SR selection strategy, which is a stochastic bubble sorting


selection scheme, to balance between the objective and constraint violations. In this
selection, a probability (P f ) should be set to use only the objective function for
comparisons in ranking of the infeasible solutions (Runarsson and Yao 2000). Note
that in our work, we utilize our designated constraints for calculating the constraint
violations:
m



max 0, g j (x) ,
(5.9)
Gj (x) =
j=1

where Gj (x) denotes the sum of all constraint violations and the constant is set
to 1. Our defined constraints are called the synthesized constraints2 of g j (x)
{g j (x), g j (x)}.
Given the pair of objective and constraint violations ( f (x j ), G(x j )), where x j
denotes the solution of the jth offspring individual, j = {1, , }, they will be
They are assembled as comparing the degree of feasibility between the original constraint of g j (x)
and the incremental approximate constraint of g j (x).
2

5 Incremental Approximation Models for Constrained Evolutionary Optimization

141

ranked according to the stochastic ranking algorithm. The details of the stochastic
ranking algorithm can be found in (Runarsson and Yao 2000).
In our algorithm, all equality constraints are modified on inequalities by introducing a tolerance (), i.e., |h j (x)| 0, where the constant is set to 1.
The parameter is updated according the generation number, as formulated below
(Hamida and Schoenauer 2002).
(t + 1) =

(t)
.

(5.10)

Here, the initial value of the tolerance 0 and the allowable value of tolerance are
denoted as 3 and 1.0168, respectively, as recommended in (Hamida and Schoenauer
2002). This approach is analogous to our proposed approximation of constraints due
to the concept of the dynamic setting of the tolerance. In other words, the accuracy
of the altered constraints should increase gradually during generations. Thanks to its
property, we need not apply our approximate mechanism into equality constraints.

5.2.3 The Proposed Constrained Evolutionary Optimization


Algorithm
We propose the incremental approximation approach to handle highly constrained
feasible regions by synthetically enlarging feasible regions. The proposed constrainthandling technique is embedded in our evolution strategy using the SR selection. The
main components of the proposed evolutionary algorithm are depicted in Fig. 5.3.
The major feature of our algorithm is that a set of synthesized constraints will
be created and used in the SR selection. Figure 5.4 describes the procedure of how
to create the constraints. In the initial step, we derive the approximate models with
respect to the original constrained functions by the incremental approximation technique. Based on this handling method, we are able to attain a synthesized search space
larger than original. However, the approximate constraints can occasionally lead to

Fig. 5.3 Diagram of the


proposed constrained
evolutionary optimization
algorithm

142

S. Oh and Y. Jin
Manipulate
synthesized constraints

No

?
Yes

j=1
Yes

gj is inquality

?
Re-trained approximate
constraint gj

No

j =j+1

No

NF gj N Fgj

?
Yes

Add gj into

Add gj into

synthesized constraints

synthesized constraints

j = Noc

No

?
Yes

Return synthesized constraints

Return original constraints

Fig. 5.4 Synthesized constraints via a competition between original and approximate constraints,
where N F is the number of feasible solutions and Noc is the number of original constraints

premature convergence. To effectively deal with this problem, we establish a set of


synthesized constraints by competing between the approximated constraints and the
given constraints on the basis of the number of feasible solutions in the population.
Thanks to its manipulation of both constraints, we are able to navigate evolutionary
algorithm to the global optimum. In particular, for the jth constraint, if the original constraint function g j (x) attains more feasible solutions than the approximate
constraint g j (x), the original constraint function will be included in the synthesized
constraint g j (x) = g j (x). Otherwise, the approximate constraint function will be
included as g j (x) = g j (x). Also, in case of the equality constraint, we regard the
original constraint as the synthesized constraint without comparing with the approximate model for the sake of simplicity, partly because the dynamically set tolerance
works in a sense similar to approximate constraints.
To properly update the approximate models as evolution 
proceeds, we specify the
k
(i 1)2 , where tk is
updated generations as tk = tk1 +10(k 1)2 = t0 +10 i=1
the generation number in which the incremental approximation model is re-trained,
the initial generation t0 is set to 0, and k is the sampling times k = {1, 2, . . . , kmax }.

5 Incremental Approximation Models for Constrained Evolutionary Optimization

143

However, the condition of tkmax tmax should be satisfied, where tmax is the allowed
maximum number of generations. During the remaining generations of tmax tkmax ,
only the original constrained functions are considered for guaranteeing the obtained
optimal solution, avoiding the under-fitting problem. Also, we should formulate
how many samples are used for training our approximation model to approximate
constrained functions. In this work, we heuristically designate the number of the
samples Nk = n j k 2 , where n j is the number of design variables involved on the
jth constraint function and k is the number of sampling times k = {1, 2, . . . , kmax }.
For instance, in the initial generation (k = 1) of approximate constraint functions
on g08, each pair of training data (2 12 ) is sampled individually, because both
constraints of g1 (x) = x12 x2 + 1 0 and g2 (x) = 1 x1 (x2 4)2 0
consist of only two variables of x1 and x2 . Based on two sampled data, we obtains
two approximate models derived by GP, one of representative symbolic regression
models, with regard to two constraints of g08, i.e., g 1 (x) = 3x1 x2 + 1 0 and
g 2 (x) = x1 x2 + 11 0, as shown in Fig. 5.2b. Later, we compare the number of
feasible solutions with regard to each approximate constraint of g j and the original
constraint of g j , j = {1, 2}. Based on the comparisons, we create a set of synthesized
constraints, i.e., g j (x) = {g 1 (x), g 2 (x)}, since all approximate constraints result in
more feasible solutions than original ones.
Our assumption is that the initial approximate models start from a simple model
such as a linear approximation of the nonlinear constraints. Then we increase the
number of samples as evolution proceeds. Therefore, we can achieve more accurate
approximate models. In particular, at the sixth sampling time k 
= 6 of g08, our
6
(i 1)2 , and
approximate models are updated in 550 generation, t6 = t0 +10 i=1
2
generate 72 samples following the defined rule as N6 = 26 . Based on the sampled
data, we approximate both constraints as g 1 = x12 x2 + cos(sin(x2 )) 0 and g 2 =
1x1 (x2 4)2 0 by GP (see Fig. 5.2c). At this time, we comprise the synthesized
constraints g j (x) = {g 1 (x), g2 (x)} by comparing approximate constraints with the
original ones according to the feasibility degrees.
The location of the samples is determined by a (LHS) which generates an arbitrary
number of dimensions, whereby each sample is the only one in each axis-aligned
hyperplane containing it (Jin and Branke 2005).
There are two proposed incremental approximation models such as neural network-assisted approximation model and guided approximation model adopted in this
study.
Neural network-assisted approximation model for ES: NNA-ES
In this work, we adopt a (MLP) network with one hidden layer (Reed and Marks
1998) (refer to Fig. 5.5) for approximating the nonlinear constraints. Both the hidden
neurons and the output neurons use a tan-sigmoid transfer function. The number of
input nodes equals the number of parameters in the constrained function plus one
(a constant input as bias), the number of hidden nodes is set to three times that of the
input nodes, and the number of output node is one.

144

S. Oh and Y. Jin
X1

w1,1

w1,2
w2,1

X2

w1,1

w2,2

w2,1
wn,1
wn,2

Xn

Fig. 5.5 Illustration of a multi-layer perceptron network

L
x3

sin
exp

x1 0.5 0.2 x2

0.5

Fig. 5.6 Example of expanded parse tree (EPT)

Genetic programming guided approximation model for ES: GPA-ES


To obtain an adjustable approximation for constraint functions, we adopt a new
type of GP to replace the conventional GP whose chromosomes are represented by
nonlinear-style (i.e., the variable length), which causes a difficulty in applying the
crossover operator (Oh et al. 2009). For tackling this problem, each chromosome
of our GP, which is a candidate solution (i.e., the approximate model for the constraint), is stated as the linear strings by adding introns and selectors. Its expression
is termed as an expanded parse tree (EPT) which is shown in Fig. 5.6, where each
solid line and each dashed line are expressed as internal nodes and external nodes,
and the gray nodes indicate introns (Oh et al. 2009). The initial population are symbolized as a uniform distribution with two predefined sets, i.e., a functional set and
a terminal set. Elements of the former set consist of unary and binary functions
F = {+, , , , sin, cos, L , R}, where is a protected division operator which
allows the division by 0 as returning the value of 1, and L and R are selector operators as L(x1 , x2 ) = x1 and R(x1 , x2 ) = x2 , individually. The other set is composed
of design variables of the given COP such as {x1 , . . . , xn } and a random value (R)
within the range [0, 1]. Next, we evaluate the difference between the fitness of each

5 Incremental Approximation Models for Constrained Evolutionary Optimization

Parent

Offspring

sin

exp

x3 1 x1

0.5 0.2

+
x2 1

x1 x3

0.2

x1

x1 x2

0.5

x2 1

0.2

0.4

0.5

exp

0.2

Crossover

sin

x1 x3

0.5

145

exp

x3 1 x1

exp
0.2

x2

0.5

0.4

Fig. 5.7 Procedure of crossover

Offspring

Parent
Mutation point

L
x3

sin
exp

Mutation

x1 0.5 0.2 x2

+
1 0.5

sin

x1 x3 0.3 x1 0.2 x2

+
1 0.5

Fig. 5.8 Procedure of mutation

chromosome and the object of an constrained function in accordance with the given
inputs. On the basis of the fitness value of each individual, our GP operates the pairwise tournament selection without replacement to improve the average quality of the
population by passing the high quality chromosomes to the next. To explore the search
spaces, the variation operators (i.e., crossover and mutation), which are described in
Figs. 5.7 and 5.8, respectively, are applied on the selected chromosome(s). The GP
iterates two procedures including evaluation and genetic operators until a stopping
criterion is satisfied. At the end, the GP is able to obtain a robust approximation
of the original nonlinear constraint function. Based on the discovered approximate
constraints, we assemble synthesized constraints, which are created and used in the
SR selection.

146

S. Oh and Y. Jin

5.3 Computational Studies


In this section, we compare the proposed incremental approximation approach
guided algorithms such as NN-assisted approximate approach for evolutionary
strategy (NNA-ES) and GP guided approximate method for evolutionary strategy
(GPA-ES) with a few state-of-the-art evolutionary algorithms for constraint handling on 13 benchmark functions in Sect. 5.3.1. We also compare our approach with
six recently reported evolutionary methods on a spring design optimization problems
in Sect. 5.3.2.

5.3.1 Results on 13 Benchmark Problems


We carry out statistical analysis of the results on 13 benchmark functions widely
used in the literature. Table 5.1 describes each attribute of benchmark problems,
where n is the number of design variables, |F|/|S| is the proportion of the feasible
regions in the entire search spaces, the range of constraint types, and the number
of constraints: linear inequalities (LI), nonlinear inequalities (NI), linear equalities
(LE) and nonlinear equalities (NE), and a is the number of active constraints at the
optimum solution (Liang et al. 2006).
In the proposed algorithm, we update the approximate models of constraints
according
to the heuristically predefined generation such as tk = t0 + 10
k
(i

1)2 = {0, 10, 50, 140, 300, 550, 910}, where k is the updated time
i=1
k = {1, 2, 3, 4, 5, 6 , 7}, and t0 is an initial generation which is set to 0. During the rest generations, we only used the original constraints to guarantee that
the obtained solutions are feasible. At that time, we require the sampling training
Table 5.1 Summary of 13 benchmark functions
fcn
n
Type of f
|F|/|S| (%)
g01
g02
g03
g04
g05
g06
g07
g08
g09
g10
g11
g12
g13

13
20
10
5
4
2
10
2
7
8
2
3
5

Quadratic
Nonlinear
Polynomial
Quadratic
Cubic
Cubic
Quadratic
Nonlinear
Polynomial
Linear
Quadratic
Quadratic
Nonlinear

0.0111
99.9971
0.0000
52.1230
0.0000
0.0066
0.0003
0.8560
0.5121
0.0010
0.0000
4.7713
0.0000

LI

NI

LE

NE

9
0
0
0
2
0
3
0
0
3
0
0
0

0
2
0
6
0
2
5
2
4
3
0
1
0

0
0
0
0
0
0
0
0
0
0
0
0
0

0
0
1
0
3
0
0
0
0
0
1
0
3

6
1
1
2
3
2
6
0
2
6
1
0
3

5 Incremental Approximation Models for Constrained Evolutionary Optimization

147

data for updating our approximations, which are extracted by the prefixed rule
Nk = n j k 2 = {n j , 4n j , 9n j , 16n j , 25n j , 36n j , 49n j }, where n j is the number of design variables involved in the jth constraint function. Note that, if Nk = 1,
the minimum number of samples should be 2, and if Nk 200, the maximum
samples should be set to 200.
Our NNA-ES is used for training the MLP for 150 iterations every time when the
MLP network models need to be updated, where the learning rate is set to 0.1. Also,
the system parameters of GPA-ES are designated as the depth of tree is set to 4, the
size of population is equal to the number of sampled data, the maximum generations
are three times the size of population, and probabilities of crossover and mutation
are set to 1.0 and 0.5, respectively.
To study our performances, we utilize the state-of-the-art EAs, which are briefly
described below:
1. Self-adaptive fitness formulation (SAFF) employed the penalty function method
for solving the COPs, where infeasible solutions that have a high fitness value are
also favored in selection (Farmani and Wright 2005). In the SAFF, the infeasible
constraint violations were handled by the designed two-stage penalties.
2. Homomorphous mapping (HM) designed a special operator (i.e., decoders) to
discover the optimal solution in COPs. Thanks to these decoders, all solutions
were mapped into n-dimensional cube for maintaining feasible states (Koziel and
Michalewicz 1999).
3. Stochastic ranking evolutionary strategy (SRES) considered the separation between objective and constraints (Runarsson and Yao 2000). This algorithm utilized
the SR selection mechanism to balance objective and constraint violations directly
and explicitly in the optimization with the probabilistic factor to include infeasible
solutions.
4. Simple multi-membered evolutionary strategy (SMES) was also based on the
separated objective and constraint violations (Montes and Coello 2005). Its main
feature was to devise three diversity mechanisms: diversity mechanism, combined recombination, and reduction of the initial step sizes of ES. All designed
techniques were operated on the basis of the number of infeasible solutions in the
population.
5. Adaptive tradeoff model-based evolutionary strategy (ATMES) was proposed
for facilitating a more explicit tradeoff between objective and constraints (Wang
et al. 2008). It developed three different search techniques which were classified
by the feasibility ratio in the current population.
Table 5.2 presents the parameter setups of each compared algorithm. It shows the
size of population, the number of generations, and the number of fitness evaluations.

5.3.1.1 Comparison with SAFF


The proposed NNA-ES and GPA-ES discovered a better best result in six problems
(g02, g04, g05, g06, g07 and g09) and a similar best result in four problems

148

S. Oh and Y. Jin

Table 5.2 Parameter setups of the compared algorithms, where (, ) is the set of parent and
offspring
Population size
Generations
Fitness evaluations
SAFF (Farmani and Wright 2005)
HM (Koziel and Michalewicz 1999)
SRES (Runarsson and Yao 2000)
SMES (Montes and Coello 2005)
ATMES (Wang et al. 2008)
NNA-ES
GPA-ES

70
70
(30,200)
(100,300)
(50,300)
(30,200)
(30,200)

20,000
20,000
1,200
800
800
1,200
1,200

1,400,000
1,400,000
240,000
240,000
240,000
240,000
240,000

(g01, g03, g08 and g11). Our first algorithm found a better best result in g10 than
the SAFF; on the other hand, GPA-ES obtained a worse best result. In addition,
our algorithm reached better and similar solutions in a mean result in most of the
problems except for g04 and g06 in case of GPA-ES and NNA-ES, separately. No
comparisons were made with two functions, g12 and g13, since the results from
SAFF are not available.

5.3.1.2 Comparison with HM


All our algorithms obtained better solutions in a best result on all problems. The
proposed algorithms also obtained superior or comparable mean result, whereas
the HM found better solutions on two problems (g02 and g04) in this result. However,
we were not able to make the comparison on three problems (g05, g12, and g13) as
no results on these problems are available from HM.

5.3.1.3 Comparison with SRES


Compared to SRES, GPA-ES could achieve better and similar best results on all
problems. In addition, it found a better mean result on six problems (g02, g06,
g07, g09, g10, and g13) and a similar result on five problems (g01, g03, g08, g11,
and g12). SRES only discovered two better mean results in g04 and g05.
The remaining NNA-ES could obtain superior or comparable to SRES in all cases
excluding four instances (i.e., g02, g05, g06, and g10). Besides, it discovered a better
or similar mean result on ten problems.

5.3.1.4 Comparison with SMES


All proposed algorithms discovered four superior solutions on g05, g07, g09, g13,
and seven comparable solutions on g01, g03, g04, g06, g08, g11, and g12 to SMES

5 Incremental Approximation Models for Constrained Evolutionary Optimization

149

in a best result. Also, each of our algorithms such as NNA-ES and GPA-ES found
a competitive mean result on ten problems, respectively. Meanwhile, the SMES
discovered slightly better mean results in four functions of g04, g06, g09, and g10.
Especially, the mean value of SMES in g09 was much smaller than that of both of
them.

5.3.1.5 Comparison with ATMES


Compared to ATMES, the proposed NNA-ES found the same best solution in g09 of
the 13 test functions, and a better best solution in test function g10. Our algorithm
also achieved better mean and worst solution compared to ATMES in test function
g02.
The other GPA-ES achieved a similar best result on eleven functions (g01, g03,
g04, g05, g06, g07, g08, g09, g11, g12, and g13). The ATMES found better solution
in the best result on g10; on the other hand, we could achieve a better best result
on g02. GPA-ES also achieved a better mean solution compared to ATMES in the
function g02.

5.3.1.6 Comparison Between NNA-ES and GPA-ES


GPA-ES algorithm discovered four better solutions in g02, g05, g06, and g07 and
eight similar solutions in g01, g03, g04, g08, g09 g11, g12, and g13, respectively.
Only one worse solution was found in g10. From a mean result, we could discover
four better solutions and six similar solutions, respectively. In three problems (g02,
g04, and g05), NNA-ES achieved better solutions.
The best results as well as the mean results of GPA-ES and other compared
algorithms on the above 13 benchmark problems are summarized in Tables 5.3 and
5.4, respectively. From these results, we could verify the performance of the proposed
approach. However, our algorithm could not find better solutions than other compared
approaches in two test functions of g04 and g10.

5.3.2 Results on Spring Design Optimization


In addition to the test problems, we compare two kinds of the proposed
algorithms of NNA-ES and GPA-ES with six novel heuristic approaches utilizing
various constraint-handling techniques of a spring design optimization problem. The
reference algorithms are described as below:

g01
g02
g03
g04
g05
g06
g07
g08
g09
g10
g11
g12
g13

15.000
0.803619
1.0000
30665.539
5126.498
6961.814
24.306
0.095825
680.630
7049.331
0.75
1.000
0.053950

15.000
0.802970
1.0000
30665.500
5126.989
6961.800
24.480
0.095825
680.640
7061.340
0.75
N.A
N.A

Table 5.3 Comparison of the best results


Available
fcn
Optimum
SAFF
(Farmani and
Wright 2005)

14.786
0.799530
0.9997
30664.500
N.A
6952.100
24.620
0.095825
680.910
7147.900
0.75
N.A
N.A

HM
(Koziel and Michalewicz
1999)
15.000
0.803481
1.0000
30665.539
5126.498
6961.814
24.314
0.095825
680.633
7053.064
0.75
1.000
0.054008

SRES
(Runarsson
and Yao 2000)
15.000
0.803601
1.0000
30665.539
5126.599
6961.814
24.327
0.095825
680.632
7051.903
0.75
1.000
0.053986

SMES
(Montes and
Coello 2005)
15.000
0.803388
1.0000
30665.539
5126.498
6961.814
24.306
0.095825
680.630
7052.253
0.75
1.000
0.053950

ATMES
(Wang et al.
2008)

15.000
0.803185
1.0000
30665.539
5126.505
6961.807
24.309
0.095825
680.630
7056.710
0.75
1.000
0.053950

NNA-ES

15.000
0.803532
1.0000
30665.539
5126.498
6961.814
24.306
0.095825
680.630
7081.948
0.75
1.000
0.053950

GPA-ES

obtained by the proposed GAP-EA as well as five references on 13 benchmark functions, where N.A. = Not

150
S. Oh and Y. Jin

15.000
0.790148
1.0000
30665.539

15.000
0.803619
1.0000
30665.539

g01
g02
g03
g04

1.60.E 14
1.30.E 02
5.90.E 05
7.40.E 12

15.000
0.00.E + 00
0.790100 1.20.E 02
0.9999
7.50.E 05
30665.200 4.85.E 01
5432.08
3.89.E + 03
6961.800 0.00.E + 00
26.580
1.14.E + 00
0.095825 0.00.E + 00
680.720
5.92.E 02
7627.890
3.73.E + 02
0.75
0.00.E + 00
N.A
N.A
N.A
N.A
ATMES (Wang et al. 2008)
Mean
St. dev

15.000
0.803619
1.0000
30665.539
5126.498
6961.814
24.306
0.095825
680.630
7049.331
0.75
1.000
0.053950
Optimal

g01
g02
g03
g04
g05
g06
g07
g08
g09
g10
g11
g12
g13
fcn
15.000
0.794128
1.0000
30665.539

14.708
0.796710
0.9989
30655.300
N.A
6342.600
24.826
0.0891568
681.160
8163.600
0.75
N.A
N.A
NNA-ES
Mean
0.00.E + 00
8.04.E 03
1.90.E 04
2.05.E 04

St. dev

N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A

15.000
0.791084
1.0000
30648.853

15.000
0.775346
1.0000
30665.525
5132.882
6875.442
24.364
0.095825
680.658
7472.902
0.75
1.000
0.083290
GPA-ES
Mean
6.29.E 07
8.03.E 03
1.35.E 05
4.98.E + 01

St. dev

0.00.E + 00
2.35.E 02
2.90.E 04
6.32.E 02
8.61.E + 00
1.53.E + 02
5.59.E 02
2.82.E 17
4.20.E 02
4.20.E 02
4.20.E 02
0.00.E + 00
9.70.E 02

Table 5.4 Comparison of the mean results obtained by the proposed GAP-EA as well as five references on 13 Benchmark
Available
fcn Optimum
SAFF (Farmani and Wright 2005) HM (Koziel and Michalewicz 1999) SRES (Runarsson and Yao
2000)
Mean
St. dev
Mean
St. dev
Mean
St. dev
15.000
0.785238
1.0000
30665.539
5174.492
6961.284
24.475
0.095825
680.643
7253.047
0.75
1.000
0.166385

(continued)

0.00.E + 00
1.67.E 02
2.09.E 04
0.00.E + 00
5.01.E + 01
1.85.E + 00
1.32.E 01
0.00.E + 00
1.55.E 02
1.36.E + 02
1.52.E 04
0.00.E + 00
1.77.E 01

SMES (Montes and Coello


2005)
Mean
St. dev

functions, where N.A. = Not

5 Incremental Approximation Models for Constrained Evolutionary Optimization


151

g05
g06
g07
g08
g09
g10
g11
g12
g13

5126.498
6961.814
24.306
0.095825
680.630
7049.331
0.75
1.000
0.053950

5127.648
6961.814
24.316
0.095825
680.639
7250.437
0.75
1.000
0.053959

1.80.E 14
4.60.E 12
1.10.E 02
2.80.E 17
1.00.E 02
1.20.E + 02
3.40.E 04
1.00.E 03
1.30.E 05

Table 5.4 (continued)


fcn
Optimal
ATMES (Wang et al. 2008)
Mean
St. dev
5133.481
6758.018
24.327
0.095825
680.648
7409.876
0.75
1.000
0.091730

NNA-ES
Mean
9.05.E + 00
1.62.E + 02
2.01.E 02
2.82.E 17
2.74.E 02
4.38.E + 02
9.80.E 04
0.00.E + 00
9.95.E 02

St. dev
5152.634
6961.814
24.315
0.095825
680.648
7342.196
0.75
1.000
0.054024

GPA-ES
Mean
St. dev
4.14.E + 01
4.63.E 12
1.83.E 02
2.82.E 17
2.23.E 02
2.25.E + 02
1.25.E 03
4.10.E 05
1.40.E 04

152
S. Oh and Y. Jin

5 Incremental Approximation Models for Constrained Evolutionary Optimization

153

1. GA1 utilized a co-evolutionary mechanism to automatically adjust penalty factors


of a fitness function combined with a GA to find the optimal solution (Coello
2000).
2. GA2 proposed the separate consideration between objective and constraint
violations using the pair-wise tournament selection mechanism (Coello and
Montes 2002).
3. HE-PSO suggested a new particle swarm optimization (PSO) for solving COPs as
adopting the death penalty mechanism, which did not use all infeasible solutions
during whole procedures (Hu et al. 2003).
4. CPSO proposed co-evolution based PSO algorithm to provide a framework for
handling decision solutions and constraints (He and Wang 2007a). The aim of
this algorithm was to search for the optimal solutions and penalty factors.
5. HPSO utilized the feasibility-based rule to manage constraints without additional
parameters and to guide the particles into the feasible region, quickly (He and
Wang 2007b). In addition, a simulated annealing (SA) was applied on the best
solution for avoiding the premature convergence.
6. NM-PSO integrated the Nelder-Mead (NM) simplex search method with PSO
algorithm (Zahara and Kao 2009). This algorithm took on the special operators,
i.e., the gradient repair method and the constraint fitness priority-based ranking,
to convert infeasible solutions into feasible ones.
The problem taken from Arora is to minimize the weight of a tension/compression
spring subject to constraints of minimum deflection, shear stress, surge frequency,
and limits on outside diameter and on design variables which are set to the wire
diameter 0.05 x1 2.0, the mean coil diameter 0.25 x2 1.3 and the number
of active coils 2.0 x3 15.0.
minimize
f (x) = (x3 + 2)x12 x2

(5.11)

subject to
g1 (x) = 1
g2 (x) =

x23 x3
71785x14

4x22 x1 x2
12566(x13 x2 x14 )

1
5108x12

10

140.45x1
0
x22 x3
x1 +x2
1.5 1 0.

g3 (x) = 1
g4 (x) =

(5.12)

Table 5.5 illustrates their statistical results such as best, mean, worst, and standard
deviation outcomes from whole algorithms. It can be seen in Table 5.5 that the performance of GPA-ES is even better than those of our compared algorithms, and our
worst solution is smaller than the optimal values of the compared ones.
To sum up the experimental results and comparisons of the above three engineering
optimization problems, we could verify the superiority of the proposed incremental
approximation-assisted algorithms.

154

S. Oh and Y. Jin

Table 5.5 The comparison of the statistics on tension/compression spring optimization problem
Method
Best
Mean
Worst
St. dev
GA1 (Coello 2000)
GA2 (Coello and Montes 2002)
HE-PSO (Hu et al. 2003)
CPSO (He and Wang 2007a)
HPSO (He and Wang 2007b)
NM-PSO (Zahara and Kao 2009)
NNA-ES
GPA-ES

0.0127048
0.0126810
0.0126661
0.0126747
0.0126652
0.0126302
0.0098725
0.0098725

0.0127690
0.0127420
0.0127190
0.0127300
0.0127072
0.0126314
0.0098741
0.0098725

0.0128220
0.0129730
N.A
0.0129240
0.0127190
0.0126330
0.0098930
0.0098725

3.94.E 05
5.90.E 05
6.45.E 05
5.20.E 04
1.58.E 05
8.74.E 07
4.69E 06
9.87.E 03

5.4 Conclusion
This chapter has presented the new evolutionary algorithm for solving COPs. We
particularly targeted problems that are highly constrained and thus the feasible
regions are small and separated. To methodically solve problems caused by an
extremely low degree of feasibility, we suggested the incremental approximation
models. Thanks to a manipulated, gradually increasing feasible region managed by
the approximate constraints, we could handle the highly constrained problems more
effectively. We have empirically compared our approach with a few state-of-the-art
algorithms for handling COPs on 13 benchmark problems and one engineering optimization problem. As a whole, the proposed method has shown to be promising as
they produced better or comparable results on most test problems.
Acknowledgments The authors would like to thank Chang Wook Ahn for useful discussions.

References
Adeli H, Cheng N-T (1994) Augmented Lagrangian genetic algorithm for structural optimization.
J Aerosp Eng 7:104118
Bean J (1994) Genetic algorithms and random keys for sequencing and optimization. ORSA J
Comput 6:154160
Coello CAC (2000) Use of a self-adaptive penalty approach for engineering optimization problems.
Comput Ind 41(2):113127
Coello CAC (2002) Theoretical and numerical constraint-handling techniques used with evolutionary algorithms: a survey of the state of the art. Comput Methods Appl Mech Eng 191
(1112):12451287
Coello CAC, Montes EM (2002) Constraint-handling in genetic algorithms through the use of
dominance-based tournament selection. Adv Eng Inform 16(3):193203
Das S, Suganthan P (2011) Differential evolution: a survey of the state-of-the-art. IEEE Trans Evol
Comput 15(1):431
Davis LD, Mitchell M (eds) (1991) Handbook of genetic algorithms. Van Nostrand Reinhold, New
York

5 Incremental Approximation Models for Constrained Evolutionary Optimization

155

Dorigo M, Gambardella LM (1997) Ant colony system: a cooperative learning approach to the
traveling salesman problem. IEEE Trans Evol Comput 1:5366
Farmani R, Wright J (2005) Self-adaptive fitness formulation for constrained optimization. IEEE
Trans Evol Comput 7(5):445455
Glover F, Kochenberger G (1996) Critical event tabu search for multidimensional knapsack problems. Meta heuristics: theory and applications. Kluwer Academic Publishers, Dordrecht
Goh C, Lim D, Ma L, Ong Y, Dutta P (2011) A surrogate-assisted memetic co-evolutionary algorithm
for expensive constrained optimization problems. In: IEEE congress on evolutionary computation,
pp 744749
Hamida S, Schoenauer M (2002) ASCHEA: new results using adaptive segregational constraint
handling. In: Proceedings of IEEE conference on evolutionary computation 2002. Honolulu,
Hawaii, pp 8287
He Q, Wang L (2007a) An effective co-evolutionary particle swarm optimization for constrained
engineering design problems. Eng Appl Artif Intell 20(1):8999
He Q, Wang L (2007b) A hybrid particle swarm optimization with a feasibility-based rule for
constrained optimization. Appl Math Comput 186(2):14071422
Hu X, Eberhart R, Shi Y (2003) Engineering optimization with particle swarm. In: Proceedings of
the IEEE swarm intelligence symposium 2003 (SIS 2003). Indianapolis, Indiana, pp 5357
Jin Y (2011) Surrogate-assisted evolutionary computation: recent advances and future challenges.
Swarm Evol Comput 1(2):6170
Jin Y, Branke J (2005) Evolutionary optimization in uncertain environmentsa survey. IEEE Trans
Evol Comput 9:303317
Jin Y, Oh S, Jeon M (2010) Incremental approximation of nonlinear constraints functions for evolutionary constrained optimization. In: Proceedings of IEEE conference on evolutionary computation 2010 (CEC 2010), Barcelona, Spain, pp 18
Jin Y, Tang K, Yu X, Sendhoff B, Yao X (2013) A framework for finding robust optimal solutions
over time. Memet Comput 5(1):318
Kowalczyk R (1997) Constraint consistent genetic algorithms. In: Proceedings of IEEE international
conference on evolutionary computation. Indianapolis, pp 343348
Koziel S, Michalewicz Z (1999) Evolutionary algorithms, homomorphous mappings, and constrained parameter optimization. Evol Comput 7(1):1944
Le TV (1995) A fuzzy evolutionary approach to constrained optimization problems. In: Proceedings
of parallel problem solving form nature, vol 274278. Perth
Liang JJ, Runarsson TP, Mezura-Montes E, Clerc M, Suganthan PN, Coello CAC, Deb K (2006)
Problem definitions and evaluation criteria for the CEC 2006 special session on constrained
real-parameter optimization. Technical report, Nanyang Technological University, Singapore
Michalewicz Z (1996) Genetic algorithms + data structures = evolution programs. Springer, New
York
Michalewicz Z, Schoenauer M (1996) Evolutionary algorithms for constrained parameter optimization problems. Evol Comput 4:132
Montes EM, Coello CAC (2005) A simple multimembered evolution strategy to solve constrained
optimization problems. IEEE Trans Evol Comput 9(1):117
Nguyen T, Yang S, Branke J (2012) Evolutionary dynamic optimization: a survey of the state of the
art. Swarm Evol Comput 6:124
Oh S, Lee S, Jeon M (2009) Evolutionary optimization programming with probabilistic models. In:
International conference on bio-inspired computing. Beijing, P.R. China, pp 16
Oh S, Jin Y, Jeon M (2011) Approximate models for constraint functions in evolutionary constrained
optimization. Int J Innov Comput, Inf Control 7(11):65856603
Paenke I, Branke J, Jin Y (2006) Efficient search for robust solutions by means of evolutionary
algorithms and fitness approximation. IEEE Trans Evol Comput 10(4):405420
Reed RD, Marks RJ (1998) Neural smithing: supervised learning in feedforward artificial neural
networks. MIT Press, Cambridge

156

S. Oh and Y. Jin

Regis RG (2014) Evolutionary programming for high-dimensional constrained expensive blackbox


optimization using radial basis functions. IEEE Trans Evol Comput 18(3):326347
Reynolds RG (1994) An introduction to cultural algorithms. In: Proceedings of third annual conference on evolutionary programming. World Scientific, River Edge, pp 131139
Runarsson T, Yao X (2000) Stochastic ranking for constrained evolutionary optimization. IEEE
Trans Evol Comput 4(3):284294
Smith RE, Forrest S, Perelson AS (1993) Searching for diverse, cooperative populations with genetic
algorithms. Evol Comput 1:127149
Wang Y, Cai Z, Zhou Y, Zeng W (2008) An adaptive tradeoff model for constrained evolutionary
optimization. IEEE Trans Evol Comput 12(1):8092
Wanner E, Guimaraes F, Takahashi RSR, Fleming P (2005) Constraint quadratic approximation
operator for treating equality constraints with genetic algorithms. In: IEEE congress on evolutionary computation, pp 22552262
Zahara E, Kao Y-T (2009) Hybrid Nelder-Mead simplex search and particle swarm optimization
for constrained engineering design problems. Expert Syst Appl 36(2):38803886

Chapter 6

Efficient Constrained Optimization


by the Constrained Differential Evolution
with Rough Approximation
Tetsuyuki Takahama and Setsuko Sakai

Abstract It has been proposed to utilize a rough approximation model, which is an


approximation model with low accuracy and without learning process, to reduce the
number of function evaluations in unconstrained optimization. Although the approximation errors between true function values and the approximation values estimated
by the rough approximation model are not small, the rough model can estimate the
order relation of two points with fair accuracy. The estimated comparison, which
omits the function evaluations when the result of the comparison can be judged by
the approximation values, proposed to use this nature of the rough model. In this
chapter, a constrained optimization method is proposed by combining the constrained method and the estimated comparison, where rough approximation is used
not only for an objective function but also for constraint violation. The proposed
method is an efficient constrained optimization algorithm that can find near-optimal
solutions in a small number of function evaluations. The advantage of the method is
shown by solving well-known nonlinear constrained problems.
Keywords Rough approximation model Constrained optimization constrained
method Estimated comparison Differential evolution

6.1 Introduction
Constrained optimization problems, especially nonlinear optimization problems,
where objective functions are minimized under given constraints, are important
and frequently appear in the real world. There exist several studies on solving
T. Takahama (B)
Hiroshima City University, 3-4-1 Ozuka-higashi, Asaminami-ku,
Hiroshima 731-3194, Japan
e-mail: takahama@info.hiroshima-cu.ac.jp
S. Sakai
Hiroshima Shudo University, 1-1-1 Ozuka-higashi, Asaminami-ku,
Hiroshima 731-3195, Japan
e-mail: setuko@shudo-u.ac.jp
Springer India 2015
R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,
Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_6

157

158

T. Takahama and S. Sakai

constrained optimization problems using evolutionary algorithms (EAs) (Coello


2002; Mezura-Montes and Coello 2011; Michalewicz 1995; Takahama and Sakai
2005a). EAs basically lack the mechanism to incorporate the constraints of a given
problem in the fitness value of individuals. Thus, numerous studies have been dedicated to handle the constraints in EAs. In most successful constraint-handling techniques, the objective function value and the sum of constraint violations, or the
constraint violation, are separately handled and an optimal solution is searched by
balancing the optimization of the function value and the optimization of the constraint
violation.
The constrained differential evolution (DE) has been proposed, which adopted
one of such techniques called the constrained method and also adopted differential
evolution (DE) as an optimization engine. The DE can solve constrained problems successfully and stably (Takahama and Sakai 2006, 2009b, 2010a, b), including engineering design problems (Takahama and Sakai 2006). The constrained
method (Takahama and Sakai 2009b) is an algorithm transformation method, which
can convert algorithms for unconstrained problems into algorithms for constrained
problems using the level comparison and compares search points or individuals
based on the pair of objective value and their constraint violation. It has been shown
that the method has general-purpose properties.
Generally, a disadvantage of EAs is that they need a large number of function
evaluations before a well-acceptable solution can be found. An effective method for
reducing function evaluations is to build an approximation model for the objective
function and to solve the problem using the approximation values (Jin 2005). If an
approximation model with high accuracy can be built, it is possible to largely reduce
the function evaluations. However, building a high quality approximation model is
difficult and time-consuming. It needs to learn the model from many pairs of known
solutions and their function value. Also, a proper approximation model depends on
the problem to be optimized. It is difficult to design a general-purpose approximation
model with high accuracy.
An approximation model has been proposed to utilize with low accuracy and without learning process to reduce the number of function evaluations effectively. In the
following, the approximation model is called a rough approximation model. Although
the approximation errors between the true function values and the approximation values estimated by the rough approximation model are not small, the approximation
model can estimate whether the function value of a point is smaller than that of the
other point or not with fair accuracy. For example, Fig. 6.1 shows a correct order relation even when the errors between the true values and the approximation values are
large. In order to use this nature of the rough approximation model, estimated comparison (Sakai and Takahama 2010; Takahama and Sakai 2008a, b, 2009a, 2010c)
for unconstrained optimization has been proposed.
In the estimated comparison, the approximation values are compared first. When
a value is worse than the other value, the estimated comparison returns an estimated
result without evaluating the true function. When it is difficult to judge the result from
the approximation values, true values are obtained by evaluating the true function
and the estimated comparison returns a true result based on the true values. Using

6 Efficient Constrained Optimization by the

159

large error

correct
order relation

Fig. 6.1 A correct order relation in a rough approximation model

the estimated comparison, the evaluation of the true function is sometimes omitted
and the number of function evaluations can be reduced.
In this chapter, the estimated comparison is applied to constrained optimization and DEpm , which is a combination of the constrained method and the estimated comparison (Takahama and Sakai 2013) using a potential model defined and
improved by approximating not only the objective function but also the constraint
violation. The potential model without learning process is adopted as a rough approximation model (Takahama and Sakai 2008b). DEpm is an efficient constrained optimization algorithm that can find near-optimal solutions in a small number of function
evaluations. The effectiveness of DEpm is shown by solving well-known 13 constrained problems mentioned in Coello (2002) and comparing the results of DEpm
with those of representative methods. It is shown that DEpm can solve problems
with a much smaller, about half, number of function evaluations compared with the
representative methods.
In Sect. 6.2, constrained optimization methods and approximation methods are
reviewed. The constrained method and the estimated comparison using the potential
model are explained in Sects. 6.3 and 6.4, respectively. The DEpm is described in
Sect. 6.5. In Sect. 6.6, experimental results on 13 constrained problems are shown and
the results of DEpm are compared with those of other methods. Finally, conclusions
are described in Sect. 6.7.

6.2 Constrained Optimization and Previous Works


6.2.1 Constrained Optimization Problems
In this study, the following optimization problem (P) with inequality constraints,
equality constraints, upper bound constraints, and lower bound constraints are
discussed.

160

T. Takahama and S. Sakai

(P) minimize f (x)


subject tog j (x) 0, j = 1, . . . , q
h j (x) = 0, j = q + 1, . . . , m
li xi u i , i = 1, . . . , n,

(6.1)

where x = (x1 , x2 , . . . , xn ) is an n dimensional vector, f (x) is an objective function,


g j (x) 0, and h j (x) = 0 are q inequality constraints and m q equality constraints,
respectively. Functions f, g j and h j are linear or nonlinear real-valued functions.
Values u i and li are the upper bound and lower bound of xi , respectively. Also, let
the feasible space in which every point satisfies all constraints be denoted by F and
the search space in which every point satisfies the upper and lower bound constraints
be denoted by S ( F).

6.2.2 Constrained Optimization Methods


EAs for constrained optimization can be classified into several categories according
to the way the constraints are treated as follows (Takahama and Sakai 2005a):
1. Constraints are only used to see whether a search point is feasible or not.
Approaches in this category are usually called death penalty methods. In this category, generating initial feasible points is difficult and computationally demanding
when the feasible region is very small.
2. The constraint violation, which is the sum of the violation of all constraint
functions, is combined with the objective function. The penalty function method
belongs to this category (Coello 2000b; Homaifar et al. 1994; Joines and Houck
1994; Michalewicz and Attia 1994). The main difficulty of the method is the selection of an appropriate value for the penalty coefficient that adjusts the strength
of the penalty. In order to solve the difficulty, some methods, where a kind of the
penalty coefficient is adaptively controlled (Tessema and Yen 2006; Wang et al.
2008), are proposed.
3. The constraint violation and the objective function are used separately. In this
category, both the constraint violation and the objective function are optimized
by a lexicographic order in which the constraint violation precedes the objective function. Deb (2000) proposed a method that adopts the extended objective
function and realizes lexicographic ordering. Takahama and Sakai proposed the
constrained method (Takahama and Sakai 2000) and constrained method
(Takahama and Sakai 2005b) that adopt a lexicographic ordering with relaxation
of the constraints. Runarsson and Yao (2000) proposed the stochastic ranking
method that adopts the stochastic lexicographic order which ignores the constraint
violation with some probability. Mezura-Montes and Coello (2005) proposed a
comparison mechanism that is equivalent to lexicographic ordering. Venkatraman
and Yen (2005) proposed a two-step optimization method, which first optimizes
constraint violation and then objective function. These methods were successfully
applied to various problems.

6 Efficient Constrained Optimization by the

161

4. Every constraint and objective function are used separately. In this category,
constrained optimization problems are solved as multi-objective optimization
problems in which the objective function and the constraint functions are objectives to be optimized (Aguirre et al. 2004; Camponogara and Talukdar 1997;
Coello 2000a; Ray et al. 2002; Runarsson and Yao 2003; Surry and Radcliffe
1997; Wang et al. 2007). However, in many cases solving a constrained problem
as a mult-iobjective optimization problem is a more difficult and expensive task
than solving the constrained problem as essentially a single objective optimization
problem in categories 1, 2, and 3.
5. Hybridization methods. In this category, constrained problems are solved by combining some of the above-mentioned methods. Mallipeddi and Suganthan (2010)
proposed a hybridization of the methods in categories 2, 3, and 4.

6.2.3 Evolutionary Algorithms Using Approximation Models


In this section, EAs using approximation models are briefly reviewed.
Various approximation models are utilized to approximate the objective function.
In most approximation models, model parameters are learned by the least square
method, gradient method, maximum likelihood method, and so on. In general, learning model parameters is a time-consuming process, especially to obtain models
with higher accuracy and models of larger functions such as functions with large
dimensions.
EAs with approximation models can be classified as follows:
1. All individuals have only approximation values. A high quality approximation
model is built and the objective function is optimized using approximation values
only. It is possible to reduce function evaluations greatly. However, these methods
can be applied to well-informed objective function and cannot be applied to
general problems.
2. Some individuals have approximation values and others have true values. The
methods in this type are called evolution control approaches and can be classified as individual-based and generation-based control. Individual-based control
means that good individuals (or randomly selected individuals) use true values
and others use approximation values in each generation (Jin et al. 2000; Jin and
Sendhoff 2004). Generation-based control means that all individuals use true values once in a fixed number of generations and use approximation values in other
generations (Jin et al. 2000, 2002). In the approaches, the approximation model
should be accurate because the approximation values are compared with the true
values. Also, it is known that approximation models with high accuracy sometimes generate a false optimum or hide a true optimum. Individuals may converge
into the false optimum while they are optimized using the approximation models
in some generations. Thus, these approaches are much affected by the quality of
the approximation models. It is difficult to utilize rough approximation models.

162

T. Takahama and S. Sakai

3. All individuals have true values. Some methods in this type are called surrogate
approaches. In surrogate approaches, an estimated optimum is searched using an
approximation model called a surrogate model, which is usually a local model.
The estimated optimum is evaluated, the true value is obtained, and the true value
is also used to improve the approximation model (Bche et al. 2005; Guimares
et al. 2006; Ong et al. 2006). If the true value is good, the value is included
as an individual. In the approaches, rough approximation models might be used
because approximation values are compared with other approximation values.
These approaches are less affected by the quality of the approximation model than
the evolution control approaches. However, they have the process of optimization
using the approximation model only. If the process is repeated many times, they
are much affected by the quality of the approximation model.
The estimated comparison method is classified in the last category because all
individuals have true values. However, the method is different from the surrogate
approaches. It uses a global approximation model of current individuals using the
potential model. It does not search for the estimated optimum, but judges whether
a new individual is worth evaluating its true value or not. Also, it can specify the
margin of approximation error when comparison is carried out. Thus, it is not much
affected by the quality of the approximation model.

6.3 The Constrained Method


6.3.1 Constraint Violation and Level Comparisons
In the constrained method, constraint violation (x) is defined. The constraint
violation can be given by the maximum of all constraints or the sum of all constraints.
(x) = max{ max {0, g j (x)},
1 jq

(x) =

q

j=1

max

|h j (x)|}

(6.2)

||h j (x)|| p

(6.3)

q +1 j m

||max{0, g j (x)}|| p +

m

j=q+1

where p is a positive number.


The -level comparison is defined as an order relation on a pair of objective function value and constraint violation ( f (x), (x)). If the constraint violation of a point
is greater than 0, the point is not feasible and its worth is low. The level comparisons
are defined basically as a lexicographic order in which (x) precedes f (x), because
the feasibility of x is more important than the minimization of f (x). This precedence
can be adjusted by the parameter .

6 Efficient Constrained Optimization by the

163

Let f 1 ( f 2 ) and 1 (2 ) be the function values and the constraint violation at a


point x1 (x2 ), respectively. Then, for any satisfying 0, level comparisons
< and between ( f 1 , 1 ) and ( f 2 , 2 ) are defined as follows:

f 1 < f 2 , if 1 , 2
( f 1 , 1 ) < ( f 2 , 2 ) f 1 < f 2 , if 1 = 2

1 < 2 , otherwise

f 1 f 2 , if 1 , 2
( f 1 , 1 ) ( f 2 , 2 ) f 1 f 2 , if 1 = 2

1 < 2 , otherwise

(6.4)

(6.5)

In case of = , the level comparisons < and are equivalent to the ordinary
comparisons < and between function values. Also, in case of = 0, <0 and
0 are equivalent to the lexicographic orders in which the constraint violation (x)
precedes the function value f (x).

6.3.2 The Properties of the Constrained Method


The constrained method converts a constrained optimization problem into an unconstrained one by replacing the order relation in direct search methods with the level
comparison. An optimization problem solved by the constrained method, that is,
a problem (P ) in which the ordinary comparison is replaced with the level comparison, is defined as follows:
(P ) minimize f (x),

(6.6)

where minimize denotes the minimization based on the level comparison .


Also, a problem (P ) is defined such that the constraints of (P), that is, (x) = 0, is
relaxed and replaced with (x) :
(P ) minimize f (x)
subject to (x)

(6.7)

where (P0 ) is equivalent to (P) because the feasible solution satisfies (x) = 0.
For the three types of problems, (P ), (P ) and (P), the following theorems are
given based on the constrained method (Takahama and Sakai 2005b).
Theorem 1 If an optimal solution (P0 ) exists, any optimal solution of (P ) is an
optimal solution of (P ).
Theorem 2 If an optimal solution of (P) exists, any optimal solution of (P0 ) is an
optimal solution of (P).
Theorem 3 Let {n } be a strictly decreasing nonnegative sequence and converge
to 0. Let f (x) and (x) be continuous functions of x. Assume that an optimal solution

164

T. Takahama and S. Sakai

x of (P0 ) exists and an optimal solution x n of (Pn ) exists for any n . Then, any
accumulation point to the sequence {xn } is an optimal solution of (P0 ).
Theorems 1 and 2 show that a constrained optimization problem can be converted
into an equivalent unconstrained optimization problem by using the level comparison. So, if the level comparison is incorporated into an existing unconstrained
optimization method, constrained optimization problems can be solved. Theorem 3
shows that, in the constrained method, an optimal solution of (P0 ) can be given by
converging to 0 as well as by increasing the penalty coefficient to infinity in the
penalty method.

6.4 Estimated Comparison Method for Constrained


Optimization
The potential model is explained as a rough approximation model and the estimated
comparison method is described (Sakai and Takahama 2010; Takahama and Sakai
2008a, b, 2009a, 2010c).

6.4.1 Potential Model


Potential energy is stored energy that depends on the relative position of various parts
of a system. The gravity potential energy is an example of potential energy. If there
is an object of which mass is m, there exists gravity potential energy E g around the
object. If there is another object of which mass is m  at a distance r from the object,
there exists the attractive force Fg between two objects.
E g = G

m
mm 
, Fg = G 2
r
r

(6.8)

where G is a proportional constant or the gravitational constant.


It is supposed that when a solution x exists, there is potential for objective U f and
potential for congestion Uc at a distance r from the solution as follows:
f (x)
r pd
1
Uc = p
r d

Uf =

(6.9)
(6.10)

where pd is a positive number and is usually 1 or 2. The proportional constant is 1


for simplicity.
When a set of solutions X = {x1 , x2 , . . . , x N } are given and the objective values
f (xi ), i = 1, 2, . . . , N are known, two potential functions at a point y can be defined
as follows:

6 Efficient Constrained Optimization by the

U f (y) =
Uc (y) =

165

(6.11)

f (xi )
d(xi , y) pd

(6.12)

1
d(xi , y) pd

where d(x, y) is a distance between points x and y.


It is obvious that U f shows a measure of the function value at y and Uc shows the
congestion of the point y . If U f is big, the function value tends to be big. If Uc is
big, there are many points near the point.
The approximation value f(y) at the point y can be defined as follows:
f(y) = U f (y)/Uc (y)

(6.13)

For example, if y is xi , f(y) becomes f (xi ).

6.4.2 Estimated Comparison


The estimated comparison is used to compare a new point with an old point. If the
new point is better than the old according to the approximation values, the new point
is evaluated and the comparison result using true values is returned. Otherwise, the
comparison returns no and the evaluation of the new one can be omitted. This flow
can be described as follows:
EstimatedBetter(new, old) {
if(MaybeBetter(approximated new, approximated old)) {
Evaluate new;
if(Better(true new, true old)) return yes;
}
return no;
}

When the true function values ( f (xi ), (xi )) of all points in P = {xi , i =
1, 2, , N } are known and a new child point xi is generated from a parent point xi ,
the approximation values at points xi are given as follows:
U f (xi ) =
Uc (xi ) =

(6.14)

j =i

f (x j )
d(x j , xi )

(6.15)

j =i

1
d(x j , xi )

f(xi = U f (xi )/Uc (xi )

(6.16)

166

T. Takahama and S. Sakai

Also, f(xi ) is given by replacing xi with xi .


Also, the approximation values of the constraint violation at the points xi and xi
are given as follows:
( )

U (xi ) =


(x j )

j =i

d(x j , xi )

(6.17)

( )

( ) ) = U (x( ) )/Uc (x( ) )


(x
i
i
i

(6.18)

It should be noted that the parent point xi ( j = i) is omitted in the equation. If the
parent point is not omitted, the approximation value of the parent point becomes an
almost true value. As a result, the difference between the precision of approximation
at the parent point and that at the child point becomes big, and it is difficult to compare
the approximation values.
When search points are far from the feasible region, the -level comparison precedes the constraint violations. In this case, the constraint violation values are approximated. When search points are near the feasible region, the -level comparison precedes the objective values. In this case, the objective values are approximated. The
far case and the near case are judged by the number of feasible solutions. In this study,
the near case is identified when the ratio of feasible solutions in the population is
greater than or equal to 0.8. The estimated comparison for constrained optimization
using the constrained method can be defined as follows:
EstimatedBetter (xi , xi , ) {
if(the number of feasible solutions 0.8N ) {
// approximation of objective function
if( f(xi ) < f(xi ) + ) {
Evaluate xi ;
if(( f (xi ), (xi )) < ( f (xi ), (xi )))
return yes;
}
}
else {
// approximation of constraint violation
i ) + 2|(xi ) (x
i )|) {
 ) < (x
if((x
i
Evaluate xi ;
if(( f (xi ), (xi )) < ( f (xi ), (xi )))
return yes;
}
}
return no;
}
where the true value at the parent point ( f (xi ), (xi )) is known. In this study, the error
margin for the objective value is defined based on the error level of the population.
In contrast, the error margin for the constraint violation is defined based on the error

6 Efficient Constrained Optimization by the

167

level of each individual because it is thought that feasible solutions and infeasible
solutions have different error levels. The error margin parameter 0 controls the
margin value for the approximation error. When is 0, the estimated comparison
can reject many children and omit a large number of function evaluations. However,
the possibility of rejecting good child becomes high and a true optimum sometimes
might be skipped. When is large, the possibility of rejecting good child becomes
low. However, the estimated comparison can reject fewer children and omit a small
number of function evaluations. Thus, should have a proper value.
The estimation error can be given as the standard deviation of errors between
approximation values and true values.

=

1 
(ei e)
2
N

(6.19)

1 
ei
ei = f(xi ) f (xi ), e =
N

(6.20)

In potential model, current population P is used as the set of solutions that have
known objective values. When searching process progresses, the area where individuals exist may become elliptical. In order to handle such a case, the normalized
distance is introduced, in which the distance is normalized by the width of each
dimension in the current population P.




d(x, y) =
j

maxxi P

x j yj
xi j minxi P xi j

2
(6.21)

6.5 The DEpm


In this section, DE is described first and then the constrained DE with estimated
comparison using potential model (DEpm ) is defined.

6.5.1 Differential Evolution


Differential evolution was proposed by Storn and Price (1997). DE is a stochastic direct search method which uses population or multiple search points. DE
has been successfully applied to optimization problems including nonlinear, nondifferentiable, non-convex, and multi-modal functions. It has been shown that DE is
fast and robust to these functions.
There are some variants of DE that have been proposed and the variants are
classified using the notation DE/base/num/cr oss such as DE/rand/1/exp. base

168

T. Takahama and S. Sakai

indicates the method of selecting a parent that will form the base vector. For example, DE/rand selects the parent for the base vector at random from the population.
DE/best selects the best individual in the population. In DE/rand/1, for each individual xi , three individuals x p1 , x p2 and x p3 are chosen from the population without
overlapping xi and each other. A new vector, or a mutant vector xm is generated by
the base vector x p1 and the difference vector x p2 x p3 as follows, where F is a
scaling factor.
(6.22)
xm = x p1 + F(x p2 x p3 )
num indicates the number of difference vectors used to perturb the base vector. cr oss indicates the crossover operation used to create a child. For example,
bin shows that the crossover is controlled by binomial crossover using constant
crossover rate, and exp shows that the crossover is controlled by a kind of twopoint crossover using exponentially decreasing the crossover rate. A new child xi
is generated from the parent xi and the mutant vector xm , where CR is a crossover
rate.

6.5.2 The Algorithm of the DEpm


DEpm is the DE that adopts the constrained method and the estimated comparison
using potential model.
The algorithm of the DEpm is as follows:
1. Initialization of the individuals. Initial N individuals {xi , i = 1, 2, . . . , N } are
randomly generated in search space S and form an initial population. All individuals are evaluated and true values are obtained.
2. Initialization of the level. An initial level is given by the level control function
(0).
3. Termination condition. If the number of function evaluations exceeds the maximum number of evaluations F E max , the algorithm is terminated.
4. DE operation. Each individual xi is selected as a parent. A trial vector or a child xi
is generated by DE/rand/1/exp operation with a scaling factor F and a crossover
rate CR.
5. Survivor selection. The estimated comparison is used for comparing the trial
vector and the parent. The child xi is accepted for the next generation if the trial
vector is better than the parent xi by using the estimated comparison. Until all
individuals are selected, go back to 4 in order to select the next individual as a
parent.
6. Control of the level. The level is updated by the level control function (t).
7. Go back to 3.

6 Efficient Constrained Optimization by the

169

6.5.3 Controlling the Level


The level is controlled according to Eqs. (6.23) and (6.24). The initial level (0)
is the constraint violation of the top th individual in the initial search points. The
level is updated until the number of iterations t becomes the control generation Tc .
After the number of iterations exceeds Tc , the level is set to 0 to obtain solutions
with the minimum constraint violation.
(0) = (x )

(0)(1
(t) =
0,

(6.23)
t cp
Tc ) ,

0 < t < Tc ,
t Tc

(6.24)

where x is the top th individual and cp is a control parameter of the level.


Small and large cp make the convergence to the feasible region fast although the
fast convergence would result in trapping a local optimal solution. = 0.2N and
cp = 5 are standard parameter values adopted in many studies (Takahama and Sakai
2006, 2010a; Takahama et al. 2006). This control is effective to solve problems with
equality constraints.
Figure 6.2 shows the algorithm of the DEpm .
DEpm /rand/1/exp()
{
// Initialize the individuals
P =N individuals {xi } randomly generated in S and are evaluated;
// Initialize the level
=(0);
for(t=1; termination condition is false; t++) {
=estimation of approximation error in P ;
for(i=1; i N ; i++) {
xi =generated by DE/rand/1/exp operation;
// estimated comparison
if(EstimatedBetter (xi , xi , )) xi =xi ;
}
// Control the level
=(t);
}
}

Fig. 6.2 The algorithm of the constrained differential evolution with estimated comparison using
potential model, where (t) is the level control function

170

T. Takahama and S. Sakai

6.6 Solving Nonlinear Optimization Problems


Thirteen benchmark problems that are mentioned in some studies (Mezura-Montes
and Coello 2005; Runarsson and Yao 2000; Takahama and Sakai 2005a) are optimized, and the results by DEpm are compared with those results.

6.6.1 Test Problems and Experimental Conditions


In the 13 benchmark problems, problems g03, g05, g11, and g13 contain equality constraints. In problems with equality constraints, the equality constraints are
relaxed and converted into inequality constraints according to |h j (x)| 104 , which
is adopted in many methods. Problem g12 has disjointed feasible regions. Table 6.1
shows the outline of the 13 problems (Farmani and Wright 2003; Mezura-Montes and
Coello 2005). The table contains the number of variables n, the form of the objective
function, the number of linear inequality constraints (LI), nonlinear inequality constraints (NI), linear equality constraints (LE), nonlinear equality constraints (NE),
and the number of constraints active at the optimal solution.
The parameters for DEpm are as follows (Takahama and Sakai 2006, 2009b,
2010a): The number of search points N = 40, the maximum number of evaluations
FE max =100,000, the scaling factor F = 0.7, and the crossover rate CR = 0.9. The
parameters for the constrained method are as follows: Every constraint violation
is defined as a simple sum of constraints, or p = 1 in Eq. (6.3). The level is
controlled using Eqs. (6.23) and (6.24) for problems with equality constraints and
is 0 for the other problems. The control generation Tc =1,000, the control parameter
cp = 5, and = 0.2N . For the estimated comparison, the parameter for the potential
Table 6.1 Summary of test
problems

Form of f

LI

NI

LE

NE

Active

g01
g02
g03
g04
g05
g06
g07
g08
g09
g10
g11
g12
g13

13
20
10
5
4
2
10
2
7
8
2
3
5

Quadratic
Nonlinear
Polynomial
Quadratic
Cubic
Cubic
Quadratic
Nonlinear
Polynomial
Linear
Quadratic
Quadratic
Nonlinear

9
1
0
0
2
0
3
0
0
3
0
0
0

0
1
0
6
0
2
5
2
4
3
0
93
0

0
0
0
0
0
0
0
0
0
0
0
0
1

0
0
1
0
3
0
0
0
0
0
1
0
2

6
1
1
2
3
2
6
0
2
6
1
0
3

6 Efficient Constrained Optimization by the

171

Table 6.2 Experimental results on 13 benchmark problems using standard settings; 30 independent
runs were performed
Best

Median

Mean

Worst

st. dev.

g01 15.000

Optimal

15.000000

15.000000

15.000000

15.000000

4.193e 12

g02 0.803619

0.803547

0.803056

0.802406

0.790861

2.255e 03

g03 1.000

1.000500

1.000500

1.000500

1.000499

1.134e 07

g04 30665.539 30665.538672 30665.538672 30665.538672 30665.538672 0.000e + 00


0.000e + 00

g05 5126.498

5126.496714

5126.496714

5126.496714

5126.496714

g06 6961.814

6961.813876

6961.813876

6961.813876

6961.813876

2.803e 12

g07 24.306

24.306209

24.306209

24.306210

24.306214

1.215e06

g08 0.095825

0.095825

0.095825

0.095825

0.095825

0.000e + 00

g09 680.630

680.630057

680.630057

680.630057

680.630057

0.000e + 00

g10 7049.248

7049.248021

7049.28021

7049.248021

7049.248026

1.028e 06

g11 0.750

0.749900

0.749900

0.749900

0.749900

0.000e + 00

g12 1.000000

1.000000

1.000000

1.000000

1.000000

0.000e + 00

g13 0.053950

0.0539415

0.0539415

0.0539415

0.0539415

0.000e + 00

pd = 2 and the margin parameter = 0.1. In this paper, 30 independent runs are
performed.

6.6.2 Experimental Results


Table 6.2 summarizes the experimental results. The table shows the known optimal
solution for each problem and the statistics from the 30 independent runs. These
include the best, median, mean, and worst values and the standard deviation of the
objective values found.
For problems g01, g04, g05, g06, g08, g09, g11, g12, and g13, optimal
solutions are found consistently in all 30 runs. For problems g03, g07, and g10,
optimal or near-optimal solutions are found in all 30 runs. These results show that
DEpm is an efficient and stable algorithm. As for g02, it is a multi-modal problem
that has many local optima with peaks near the global optimum within the feasible
region. Many other methods cannot constantly obtain near-optimal solutions, but
DEpm attained about 0.802 on average within 100,000 FEs. Thus, it is thought
that DEpm has high ability to solve multi-modal problems.
The results show that DEpm is an efficient and stable algorithm.

6.6.3 Comparison with DE


In order to show the effectiveness of DEpm , the number of function evaluations
of DEpm to find a near-optimal solution is compared with the FEs of the original
DE , which does not use function approximation. Also, DEpm is compared with

172

T. Takahama and S. Sakai

DEpm without the approximation of the constraint violation, or DEpm -, where xi
is always evaluated when the number of the feasible solutions is small.
The number of evaluations of the objective function and the constraints to reach a
near-optimal solution, where the difference between the objective value of the nearoptimal solution and the optimal solution is within 104 , is shown in Table 6.3. The
average number of evaluations for the objective function and the constraints over 30
runs are shown in the columns labeled #func and #const respectively. The standard
deviations of the number of evaluations for the objective function and the constraints
are shown in parentheses. Also, the ratios of FEs of DEpm and DEpm - compared
with FEs of the DE and statistical significance are shown under the standard deviations. Statistical differences between DEpm and DEpm - and between DEpm and
DE using Welchs t-test are shown by ++/, +/ and as significantly different (smaller/greater) with p-value p < 0.01, significantly different (smaller/greater)
with p < 0.05 and otherwise, respectively.
Apparently, DEpm attained the best results followed by DEpm -. DEpm is
statistically faster than DE in 12 problems and faster than DEpm - in 9 problems.
DEpm can reduce the evaluation of the constraints by about 550 % compared with
DE. DEpm - can reduce the evaluation of the constraints by 0 to about 45 %.
Also, DEpm can reduce the evaluation of the objective function by about 1550 %
compared with DE. DEpm - can reduce the evaluation of the objective function
by about 045 %.
These results show that the potential model is effective not only for objective
function but also for constraint violation. Thus, it is thought that the potential model
is a general-purpose rough approximation model.
In the constrained method, the objective function and the constraints are treated
separately. So, when the order relation of the search points can be decided only by
the constraint violation of the constraints, the objective function is not evaluated, or
the evaluation of the objective function can often be omitted. Thus, the number of
evaluations of the objective function is less than the number of evaluations of the
constraints. This nature of the constrained method contributes to the efficiency of
the algorithm, especially when the objective function is computationally demanding.
The number of evaluations of the constraint violations to find the near-optimal solution ranged from about 500 to 120,000. The number of evaluations of the objective
function ranged between about 200 and 50,000. For these problems, DEpm can omit
the evaluation of the objective function by about 1590 %. Therefore, DEpm can
find optimal solutions very efficiently, especially from the viewpoint of the number
of evaluations for the objective function.

6.6.4 Comparison with Other Methods


There are some methods that have solved the same 13 problems. In the methods,
for comparative studies we chose the simple multi-membered evolution strategy
(SMES) proposed by Mezura-Montes and Coello (2005), the adaptive trade-off
model (ATMES) proposed by Wang et al. (2008), multi-objective method (HCOEA)

6 Efficient Constrained Optimization by the

173

Table 6.3 Comparison of the number of FEs to attain within 104 error from the optimal value
f
DEpm
DEpm -
DE
#const
#func
#const
#func
#const
#func
g01

g02

g03

g04

g05

g06

g07

g08

g09

g10

g11

g12

g13

44099.2
(1250.4)
0.76,++,++
123382.6
(11190.3)
0.83,,++
39489.8
(9040.0)
0.97,,
13556.8
(671.1)
0.56,,++
25007.6
(1435.7)
0.65,++,++
3344.7
(251.8)
0.53,++,++
54781.8
(4487.7)
0.76,,++
462.4
(85.9)
0.49,++,++
14700.6
(873.3)
0.69,++,++
45332.1
(2872.1)
0.72,++,++
10302.3
(3335.6)
0.60,++,++
2127.7
(419.1)
0.53,+,++
22304.5
(1049.0)
0.66,++,++

13626.1
(344.9)
0.82,,++
51697.8
(4062.7)
0.87,,++
11827.3
(483.2)
0.86,++,++
5087.9
(240.9)
0.54,,++
10173.6
(537.5)
0.74,++,++
1468.5
(176.4)
0.48,,++
15278.5
(1194.8)
0.77,,++
206.2
(67.8)
0.52,,++
7047.1
(398.2)
0.71,,++
7975.0
(463.2)
0.76,,++
8681.2
(2684.1)
0.70,++,++
207.4
(60.4)
0.56,,++
7618.8
(1211.1)
0.65,++,++

45899.6
(1411.9)
0.79
123382.6
(11190.3)
0.83
38707.7
(2530.4)
0.95
13589.1
(494.9)
0.56
38502.9
(409.4)
1.00
4110.0
(249.0)
0.65
56584.8
(3509.1)
0.79
713.3
(82.6)
0.75
15662.9
(946.7)
0.74
48126.4
(3182.2)
0.77
17105.3
(5476.2)
1.00
2447.7
(532.9)
0.61
33869.8
(691.6)
1.00

13782.8
(375.8)
0.83
51697.8
(4062.7)
0.87
13587.7
(287.3)
0.98
5061.7
(169.8)
0.54
13663.1
(225.8)
1.00
1418.2
(118.6)
0.46
15443.9
(878.9)
0.78
212.1
(54.8)
0.53
7225.8
(409.5)
0.73
8095.5
(577.7)
0.77
12380.3
(4027.3)
1.00
218.7
(55.6)
0.59
11662.2
(1133.7)
1.00

58135.3
(1306.0)
1.00
148677.6
(13972.9)
1.00
40566.8
(3575.5)
1.00
24063.7
(1124.7)
1.00
38502.9
(409.4)
1.00
6336.6
(366.5)
1.00
71619.5
(4163.2)
1.00
946.0
(142.5)
1.00
21177.6
(959.0)
1.00
62695.3
(3647.7)
1.00
17105.3
(5476.2)
1.00
4041.9
(1122.6)
1.00
33869.8
(691.6)
1.00

16667.1
(293.6)
1.00
59273.8
(5224.9)
1.00
13818.7
(341.5)
1.00
9410.9
(326.1)
1.00
13663.1
(225.8)
1.00
3058.8
(201.8)
1.00
19851.5
(1051.2)
1.00
397.8
(108.5)
1.00
9947.2
(439.3)
1.00
10466.0
(578.9)
1.00
12380.3
(4027.3)
1.00
370.0
(105.8)
1.00
11662.2
(1133.7)
1.00

g05
5126.4967

g04
30665.5387

g03
1.0005

g02
0.803619

g01
15.000

Best
Median
Mean
Worst

Best
Median
Mean
Worst

Best
Median
Mean
Worst

Best
Median
Mean
Worst

Best
Median
Mean
Worst

15.000000
15.000000
15.000000
15.000000
4.19e12
0.803547
0.803056
0.802406
0.790861
2.26e03
1.000500
1.000500
1.000500
1.000499
1.13e07
30665.538672
30665.538672
30665.538672
30665.538672
0.00e+00
5126.496714
5126.496714
5126.496714
5126.496714
0.00e+00

15.000000
15.000000
15.000000
15.000000
0.00e+00
0.803618
0.803614
0.803613
0.803588
5.59e06
1.000500
1.000500
1.000500
1.000500
6.46e09
30665.538670
30665.538670
30665.538670
30665.538670
0.00e+00
5126.496714
5126.496714
5126.496714
5126.496714
1.82e12

15.000
15.000
15.000
15.000
0.00e+00
0.803601
0.792549
0.785238
0.751322
1.67e02
1.000
1.000
1.000
1.000
2.09e04
30665.539
30665.539
30665.539
30665.539
0.00e+00
5126.599
5160.198
5174.492
5304.167
5.006e+01

15.000
15.000
15.000
15.000
1.6e14
0.803388
0.792420
0.790148
0.756986
1.3e02
1.000
1.000
1.000
1.000
5.9e05
30665.539
30665.539
30665.539
30665.539
7.4e12
5126.498
5126.776
5127.648
5135.256
1.8e+00

15.000000
15.000000
15.000000
14.999998
4.297e07
0.803241
0.802556
0.801258
0.792363
3.832e03
1.000000
1.000000
1.000000
1.000000
1.304e12
30665.539
30665.539
30665.539
30665.539
5.404e07
5126.4981
5126.4981
5126.4981
5126.4984
1.727e07

15.0000
15.0000
15.0000
15.0000
0.00e+00
0.8036191
0.8033239
0.7998220
0.7851820
6.29e03
1.0005
1.0005
1.0005
1.0005
0.0e+00
30665.5387
30665.5387
30665.5387
30665.5387
0.0e+00
5126.4967
5126.4967
5126.4967
5126.4967
0.0e+00

Table 6.4 Comparison of statistical results among the DEpm , the DE, SMES, ATMES, HCOEA, ECHT-EP2, and A-DDE
Stat.
DEpm
DE
SMES
ATMES
HCOEA
ECHT-EP2
f &optimal
F E max
100,000
200,000
240,000
240,000
240,000
240,000

(continued)

15.000
15.000
15.000
15.000
7.00e06
0.803605
0.777368
0.771090
0.609853
3.66e02
1.000
1.000
1.000
1.000
9.30e12
30665.539
30665.539
30665.539
30665.539
3.20e13
5126.497
5126.497
5126.497
5126.497
2.10e11

A-DDE
180,000

174
T. Takahama and S. Sakai

g09
680.630057

g08
0.095825

g07
24.3062

g06
6961.8139

Best
Median
Mean
Worst

Best
Median
Mean
Worst

Best
Median
Mean
Worst

Best
Median
Mean
Worst

Table 6.4 (continued)


Stat.
f &optimal
F E max
DE
200,000
6961.813876
6961.813876
6961.813876
6961.813876
0.00e+00
24.306209
24.306209
24.306209
24.306209
4.27e09
0.095825
0.095825
0.095825
0.095825
0.00e+00
680.630057
680.630057
680.630057
680.630057
0.00e+00

DEpm
100,000

6961.813876
6961.813876
6961.813876
6961.813876
2.80e12
24.306209
24.306209
24.306210
24.306214
1.22e06
0.095825
0.095825
0.095825
0.095825
0.00e+00
680.630057
680.630057
680.630057
680.630057
0.00e+00

6961.814
6961.814
6961.284
6952.482
1.85e+00
24.327
24.426
24.475
24.843
1.32e01
0.095825
0.095825
0.095825
0.095825
0.00e+00
680.632
680.642
680.643
680.719
1.55e02

SMES
240,000
6961.814
6961.814
6961.814
6961.814
4.6e12
24.306
24.313
24.316
24.359
1.1e02
0.095825
0.095825
0.095825
0.095825
2.8e17
680.630
680.633
680.639
680.673
1.0e02

ATMES
240,000
6961.81388
6961.81388
6961.81388
6961.81388
8.507e12
24.3064582
24.3073055
24.3073989
24.3092401
7.118e04
0.095825
0.095825
0.095825
0.095825
2.417e17
680.6300574
680.6300574
680.6300574
680.6300578
9.411e08

HCOEA
240,000
6961.8139
6961.8139
6961.8139
6961.8139
0.00e+00
24.3062
24.3063
24.3063
24.3063
3.19e05
0.09582504
0.09582504
0.09582504
0.09582504
0.0e+00
680.630057
680.630057
680.630057
680.630057
2.61e08

ECHT-EP2
240,000

(continued)

6961.814
6961.814
6961.814
6961.814
2.11e12
24.306
24.306
24.306
24.306
4.20e05
0.095825
0.095825
0.095825
0.095825
9.10e10
680.63
680.63
680.63
680.63
1.15e10

A-DDE
180,000

6 Efficient Constrained Optimization by the


175

g13
0.0539415

g12
1.000

g11
0.749900

g10
7049.248

Best
Median
Mean
Worst

Best
Median
Mean
Worst

Best
Median
Mean
Worst

Best
Median
Mean
Worst

Table 6.4 (continued)


Stat.
f &optimal
F E max

7049.248021
7049.248021
7049.248021
7049.248026
1.03e06
0.749900
0.749900
0.749900
0.749900
0.00e+00
1.000000
1.000000
1.000000
1.000000
0.00e+00
0.0539415
0.0539415
0.0539415
0.0539415
0.00e+00

DEpm
100,000
7049.248021
7049.248021
7049.248021
7049.248021
0.00e+00
0.749900
0.749900
0.749900
0.749900
0.00e+00
1.000000
1.000000
1.000000
1.000000
0.00e+00
0.053942
0.053942
0.053942
0.053942
0.00e+00

DE
200,000
7051.903
7253.603
7253.047
7638.366
1.36e+02
0.75
0.75
0.75
0.75
1.52e04
1.0000
1.0000
1.0000
1.0000
0.00e+00
0.053986
0.061873
0.166385
0.468294
1.77e01

SMES
240,000
7052.253
7215.357
7250.437
7560.224
1.2e+02
0.75
0.75
0.75
0.75
3.4e04
1.000
1.000
1.000
0.994
1.0e03
0.053950
0.053952
0.053959
0.053999
1.3e05

ATMES
240,000
7049.286598
7049.486145
7049.525438
7049.984208
1.502e01
0.750000
0.750000
0.750000
0.750000
1.546e12
1.000000
1.000000
1.000000
1.000000
0.00e+00
0.0539498
0.0539498
0.0539498
0.0539499
8.678e08

HCOEA
240,000
7049.2483
7049.2488
7049.2490
7049.2501
6.60e04
0.7499
0.7499
0.7499
0.7499
0.0e+00
1.0000
1.0000
1.0000
1.0000
0.0e+00
0.0539415
0.0539415
0.0539415
0.0539415
1.00e12

ECHT-EP2
240,000

7049.248
7049.248
7049.248
7049.248
3.23e4
0.75
0.75
0.75
0.75
5.35e15
1.000
1.000
1.000
1.000
4.10e11
0.053942
0.053942
0.079627
0.438803
9.60e02

A-DDE
180,000

176
T. Takahama and S. Sakai

6 Efficient Constrained Optimization by the

177

proposed by Wang et al. (2007), ECTHT-EP2 proposed by Mallipeddi and Suganthan


(2010), and the DE proposed by Takahama and Sakai (2009b), because the results
of these methods are better than the results of the other methods and they report
good quality statistical information. Also, A-DDE proposed by Mezura-Montes and
Palomeque-Ortiz (2009), which adopts adaptive parameter control, is included in the
comparison.
Table 6.4 shows the comparisons of the best, median, mean, worst values and
the standard deviation for the seven methods. The maximum number of FEs is also
shown in FE max .
All methods found optimal solutions in all 30 runs for g01, g03, g04, g08,
g11, and g12. In other problems, from the viewpoint of quality of solutions, it is
thought that DE are the best methods followed by ECHT-EP2 and DEpm , where
the difference between ECHT-EP2 and DEpm is very small. However, the number
of function evaluations in DEpm is very small, that is only about half, compared
with that in DE and ECHT-EP2. Thus, it is thought that DEpm is better than DE
and ECHT-EP2 from the viewpoint of efficiency.

6.7 Conclusions
In order to utilize a rough approximation model in constrained optimization, a new
scheme of combining the constrained method and the estimated comparison using
potential model is proposed. The potential model is used to approximate not only the
objective function but also the constraint violation. This idea is introduced to differential evolution, which is known as a simple, efficient, and robust search algorithm
that can solve unconstrained optimization problems, and the DEpm is proposed.
It is shown that DEpm could solve 13 benchmark problems most efficiently
compared with many other methods. Also, it is shown that the potential model is
a general-purpose rough approximation model and the approximation of both the
objective function and the constraint violation can improve the efficiency of DE.
In the future, we will apply DEpm to various real-world problems that have
expensive objective functions.
Acknowledgments This research is supported in part by Grant-in-Aid for Scientific Research
(C) (No. 24500177, 26350443) of Japan society for the promotion of science and Hiroshima City
University Grant for Special Academic Research (General Studies).

References
Aguirre AH, Rionda SB, Coello CAC, Lizrraga GL, Montes EM (2004) Handling constraints using
multiobjective optimization concepts. Int J Numer Methods Eng 59(15):19892017
Bche D, Schraudolph NN, Koumoutsakos P (2005) Accelerating evolutionary algorithms with
Gaussian process fitness function models. EEE Trans Syst, Man, Cybern, Part C: Appl Rev
35(2):183194

178

T. Takahama and S. Sakai

Camponogara E, Talukdar SN (1997) A genetic algorithm for constrained and multiobjective optimization. In: Alander JT (ed) 3rd Nordic workshop on genetic algorithms and their applications
(3NWGA), University of Vaasa, Vaasa pp 4962
Coello CAC (2000a) Constraint-handling using an evolutionary multiobjective optimization technique. Civ Eng Environ Syst 17:319346
Coello CAC (2000b) Use of a self-adaptive penalty approach for engineering optimization problems.
Comput Ind 41(2):113127
Coello CAC (2002) Theoretical and numerical constraint-handling techniques used with evolutionary algorithms: a survey of the state of the art. Comput Methods Appl Mech Eng 191(11
12):12451287
Deb K (2000) An efficient constraint handling method for genetic algorithms. Comput Methods
Appl Mech Eng 186(2/4):311338
Farmani R, Wright JA (2003) Self-adaptive fitness formulation for constrained optimization. IEEE
Trans Evol Comput 7(5):445455
Guimares FG, Wanner EF, Campelo F, Takahashi RH, Igarashi H, Lowther DA, Ramrez JA (2006)
Local learning and search in memetic algorithms. In: Proceedings of the 2006 IEEE congress on
evolutionary computation, Vancouver. pp 98419848
Homaifar A, Lai SHY, Qi X (1994) Constrained optimization via genetic algorithms. Simulation
62(4):242254
Jin Y (2005) A comprehensive survey of fitness approximation in evolutionary computation. Soft
Comput 9:312
Jin Y, Olhofer M, Sendhoff B (2000) On evolutionary optimization with approximate fitness functions. In: Proceedings of the genetic and evolutionary computation conference. Morgan Kaufmann, pp 786792
Jin Y, Olhofer M, Sendhoff B (2002) A framework for evolutionary optimization with approximate
fitness functions. IEEE Trans Evol Comput 6(5):481494
Jin Y, Sendhoff B (2004) Reducing fitness evaluations using clustering techniques and neural
networks ensembles. In: Genetic and evolutionary computation conference. LNCS, vol 3102,
Springer, pp 688699
Joines J, Houck C (1994) On the use of non-stationary penalty functions to solve nonlinear constrained optimization problems with GAs. In: Fogel D (ed) Proceedings of the first IEEE conference on evolutionary computation. IEEE Press, Orlando, pp 579584
Mallipeddi R, Suganthan PN (2010) Ensemble of constraint handling techniques. IEEE Trans Evol
Comput 14(4):561579
Mezura-Montes E, Coello CAC (2005) A simple multimembered evolution strategy to solve constrained optimization problems. IEEE Trans Evol Comput 9(1):117
Mezura-Montes E, Coello CAC (2011) Constraint-handling in nature-inspired numerical optimization: past, present and future. Swarm Evol Comput 1:173194
Mezura-Montes E, Palomeque-Ortiz AG (2009) Parameter control in differential evolution for constrained optimization. In: Proceedings of the 2009 IEEE congress on evolutionary computation,
pp 13751382
Michalewicz Z (1995) A survey of constraint handling techniques in evolutionary computation
methods. In: Proceedings of the 4th annual conference on evolutionary programming. The MIT
Press, Cambridge, pp 135155
Michalewicz Z, Attia N (1994) Evolutionary optimization of constrained problems. In: Sebald A,
Fogel L (eds) Proceedings of the 3rd annual conference on evolutionary programming. World
Scientific Publishing, River Edge, pp 98108
Ong YS, Zhou Z, Lim D (2006) Curse and blessing of uncertainty in evolutionary algorithm using
approximation. In: Proceedings of the 2006 IEEE congress on evolutionary computation. Vancouver, pp 98339840
Ray T, Liew KM, Saini P (2002) An intelligent information sharing strategy within a swarm for
unconstrained and constrained optimization problems. Soft ComputFusion Found, Methodol
Appl 6(1):3844

6 Efficient Constrained Optimization by the

179

Runarsson TP, Yao X (2000) Stochastic ranking for constrained evolutionary optimization. IEEE
Trans Evol Comput 4(3):284294
Runarsson TP, Yao X (2003) Evolutionary search and constraint violations. In: Proceedings of the
2003 congress on evolutionary computation, vol 2. IEEE Service Center Piscataway, New Jersey,
pp 14141419
Sakai S Takahama T (2010) A parametric study on estimated comparison in differential evolution
with rough approximation model. In: Kitahara M, Morioka K (eds) Social systems solution by
legal informatics. Economic sciences and computer sciences, Kyushu University Press, Fukuoka,
pp 112134
Storn R, Price K (1997) Differential evolutiona simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11:341359
Surry PD, Radcliffe NJ (1997) The COMOGA method: constrained optimisation by multiobjective
genetic algorithms. Control Cybern 26(3):391412
Takahama T, Sakai S (2000) Tuning fuzzy control rules by the constrained method which solves
constrained nonlinear optimization problems. Electron Commun Japan, Part 3: Fundam Electron
Sci 83(9):112
Takahama T, Sakai S (2005a) Constrained optimization by applying the constrained method to
the nonlinear simplex method with mutations. IEEE Trans Evol Comput 9(5):437451
Takahama T, Sakai S (2005b) Constrained optimization by constrained particle swarm optimizer
with -level control. In: Proceedings of the 4th IEEE international workshop on soft computing
as transdisciplinary science and technology (WSTST05), pp 10191029
Takahama T, Sakai, S (2006) Constrained optimization by the constrained differential evolution
with gradient-based mutation and feasible elites. In: Proceedings of the 2006 IEEE congress on
evolutionary computation, pp 308315
Takahama T, Sakai S (2008a) Efficient optimization by differential evolution using rough approximation model with adaptive control of error margin. In: Proceedings of the joint 4th international conference on soft computing and intelligent systems and 9th international symposium on
advanced intelligent systems, pp 14121417
Takahama T, Sakai S (2008b) Reducing function evaluations in differential evolution using rough
approximation-based comparison. In: Proceedings of the 2008 IEEE congress on evolutionary
computation, pp 23072314
Takahama T, Sakai S (2009a) A comparative study on Kernel smoothers in differential evolution
with estimated comparison method for reducing function evaluations. In: Proceedings of the 2009
IEEE congress on evolutionary computation, pp 13671374
Takahama T, Sakai S (2009b) Fast and stable constrained optimization by the constrained differential evolution. Pac J Optim 5(2):261282
Takahama T, Sakai S (2010a) Constrained optimization by the constrained differential evolution
with an archive and gradient-based mutation. In: Proceedings of the 2010 IEEE congress on
evolutionary computation, pp 16801688
Takahama, T, Sakai S (2010b) Efficient constrained optimization by the constrained adaptive
differential evolution. In: Proceedings of the 2010 IEEE congress on evolutionary computation,
pp 20522059
Takahama T, Sakai S (2010c) Reducing function evaluations using adaptively controlled differential evolution with rough approximation model. In: Tenne Y, Goh C-K (eds) Computational
intelligence in expensive optimization problems. Adaptation learning and optimization, vol 2.
Springer, Berlin, pp 111129
Takahama T, Saka S (2013) Efficient constrained optimization by the constrained differential
evolution with rough approximation using kernel regression. In: Proceedings of the 2013 IEEE
congress on evolutionary computation, pp 6269
Takahama T, Sakai S, Iwane N (2006) Solving nonlinear constrained optimization problems by the
constrained differential evolution. In: Proceedings of the 2006 IEEE adaptation learning and
optimization, pp 23222327

180

T. Takahama and S. Sakai

Tessema B, Yen G (2006) A self adaptive penalty function based algorithm for constrained optimization. In: Yen GG, Lucas SM, Fogel G, Kendall G, Salomon R, Zhang B-T, Coello CAC,
Runarsson TP (eds) Proceedings of the 2006 IEEE congress on evolutionary computation. IEEE
Press, Vancouver, pp 246253
Venkatraman S, Yen GG (2005) A generic framework for constrained optimization using genetic
algorithms. IEEE Trans Evol Comput 9(4):424435
Wang Y, Cai Z, Cuo G, Zhou Z (2007) Multiobjective optimization and hybrid evolutionary
algorithm to solve constrained optimization problems. IEEE Trans Syst, Man Cybern, Part B
37(3):560575
Wang Y, Cai Z, Xhau Y, Zeng W (2008) An adaptive tradeoff model for constrained evolutionary
computation. IEEE Trans Evol Comput 12(1):8092

Chapter 7

Analyzing the Behaviour


of Multi-recombinative Evolution Strategies
Applied to a Conically Constrained Problem
Jeremy Porter and Dirk V. Arnold

Abstract Many step size adaptation techniques for evolution strategies have been
developed with unconstrained optimization problems in mind. In constrained settings, the interplay between step size adaptation and constraint handling is both of
crucial importance and often not well understood. We consider a linear optimization
problem with a feasible region defined by a right circular cone symmetric about the
gradient direction, such that the optimal solution is located at the cones apex. We
provide a detailed analysis of the behaviour of a multi-recombinative evolution strategy that employs cumulative step size adaptation and a simple constraint handling
technique. The results allow studying the influence of parameters of both the problem
class at hand, such as the angle at the cones apex, and of the strategy considered,
including its population size parameters. The impact of assuming different models
for the cost of objective and constraint function evaluations is discussed.
Keywords Evolution strategy Constraint handling
adaptation Conically constrained problem

Cumulative step size

7.1 Introduction
While numerous constraint handling techniques used in connection with evolution
strategies exist and are in common use (compare Mezura-Montes and Coello Coello
(2011)), the understanding of their properties lags behind that of strategy variants for
unconstrained problems. Of particular significance for the success of the strategies is
the interaction between step size adaptation and constraint handling technique. Generally, convergence to non-stationary points is more easily avoided in unconstrained
settings than in constrained ones.
J. Porter (B) D.V. Arnold
Faculty of Computer Science, Dalhousie University, Halifax, NS B3H 4R2, Canada
e-mail: jporter@cs.dal.ca
D.V. Arnold
e-mail: dirk@cs.dal.ca
Springer India 2015
R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,
Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_7

181

182

J. Porter and D.V. Arnold

One approach to learning about the behaviour of adaptive optimization algorithms,


including evolution strategies, is the dynamical systems approach comprehensively
described by Meyer-Nieberg and Beyer (2012). In essence, the approach considers
test functions that pose interesting challenges to optimization strategies while being
simple enough to yield interpretable results. Test functions considered usually exhibit
strong symmetries, making it possible to describe the behaviour of adaptive optimization strategies applied to them in terms of dynamical systems with low-dimensional
state spaces. By choosing the state variables appropriately, the evolution equations
generate a time-invariant Markov process with a stationary limit distribution. That
limit distribution is expanded in terms of its moments, moments after an iteration are
computed as functions of those before, and stationarity is enforced by equating one
set to the other. The result is a system of as many equations as there are moments
considered in the expansion of the distribution. Solving the system for the unknowns
yields interpretable results that can be studied numerically.
As in unconstrained environments, the complexity of both the settings and the
algorithms considered has been increasing gradually. Early work, such as that of
Rechenberg (1973), Schwefel (1981), and Beyer (1989), analyzes (1 + 1) and (1, )
evolution strategies1 in connection with simple constrained problems where the normal vectors of the constraint planes are perpendicular to the gradient of the objective
function. In more recent work, Arnold and Brauer (2008) and Arnold (2011b) consider the same strategies for a linear problem with a single linear constraint of general
orientation. One of the simplest methods for constraint handling is used, which is
to resample infeasible offspring until they are feasible. In the same environment,
constraint handling through the projection of infeasible candidate solutions onto the
feasible region is analyzed by Arnold (2011a), who finds fundamental differences
between the two constraint handling approaches when used in connection with cumulative step size adaptation. Arnold (2013b) extends the analyses by considering the
(/, )-ES, which selects more than a single candidate solution per iteration and
employs multi-recombination.
Commonly used test problems in benchmarking studies of evolutionary algorithms, such as those considered by Michalewicz and Schoenauer (1996), have optimal solutions located on the boundary of the feasible region. Often, there is more
than a single linear constraint active at the location of the optimum. In an attempt
to model such situations, Arnold (2013a) considers the behaviour of the (1, )-ES
for a problem where the feasible region is bounded by a right circular cone and the
optimum is located at the cones apex. In this work, we expand this to the more
general case of the (/, )-ES with 1, yielding new insights with regard to the
use of non-singleton populations and multi-recombination. Moreover, we consider
different models for the cost of objective and constraint function evaluations, and
their impact on optimal strategy behaviour.
The remainder of this chapter is organized as follows. In Sect. 7.2, we give an
overview of the (/, )-ES algorithm with cumulative step size adaptation, as well
as a description of the optimization problem we will consider. In Sect. 7.3, we describe
1

See Beyer and Schwefel (2002) for an overview of evolution strategy terminology.

7 Analyzing the Behaviour of Multi-recombinative Evolution

183

the expected behaviour of a single step of the iterative algorithm. Section 7.4 expands
on these results to model the strategy as a Markov process and describes its steady
state for scale invariant step size. Section 7.5 expands that analysis to consider cumulative step size adaptation, and derives update rules for related quantities. Finally,
in Sect. 7.6 we provide a summary of our results and discuss their implications. An
Appendix contains details of computations related to Sect. 7.3.

7.2 Algorithm and Problem Descriptions


In this section, we first give a brief description of the (/, )-ES with cumulative
step size adaptation. We then define the constrained optimization problem considered
in the remainder of the chapter.

7.2.1 Algorithm
The (/, )-ES with cumulative step size adaptation (CSA) is an iterative algorithm for solving N -dimensional, real-valued optimization problems. The variant
considered throughout this paper resamples infeasible offspring candidate solutions
until they are feasible (compare Oyman et al. (1999)). Its state is described by the
population centroid x R N , the step size R, and the search path s R N . A
single iteration is described in detail in Algorithm 1.
Algorithm 1 Single iteration of (/, )-ES with CSA
Input: f : R N R
1: for k = 1 do
2:
repeat
3:
z(k) = N (0, I)
4:
x(k) = x + z(k)
5:
until IsFeasible(x(k) )
6: endfor

7: sort [z(1) , . . . , z() ], [ f (x(1) ), . . . , f (x() )]

1  (k)
z
8: z =

 sample from normal distribution


 resample until feasible
 sort z(k) by values of f (x(k) )

k=1

9: x = x + z

10: s = (1 c)s
c)z
 + 2 c(2 
s N
11: = exp
2D N

 update s
 update

In each iteration, feasible offspring candidate solutions are generated by


sampling normally distributed random vectors in the neighbourhood of the population
centroid x R N . If a candidate solution generated is infeasible, it is resampled until

184

J. Porter and D.V. Arnold

a feasible offspring candidate solution has been generated (Lines 16). Parameter
determines the variance and thereby the step size of the strategy; vectors z(k) are
referred to as mutation vectors. For the purpose of selection, the objective function
of the problem at hand is then used to evaluate the quality of the offspring candidate
solutions. Recombination averages the best offspring candidate solutions to form
the next population centroid and is implemented by averaging the mutation vectors
corresponding to the selected offspring (Lines 79).
The cumulative step-size adaptation approach introduced by Ostermeier et al.
(1994) modifies the step size parameter of the strategy based on past averaged
mutations. It employs an exponentially fading record of recent steps referred to as
the search path (Line 10), where
c (0, 1) is a constant that controls the rate of
exponential fading. The factor (2 c)/c in the update rule normalizes the nonunit variances of the steps, and ensures that if successive steps are uncorrelated, the
search path is of expected length N . The step size of the strategy is then increased if
recent steps of the strategy are positively correlated (as indicated by search paths with
length exceeding the dimension of the problem), and it is decreased if correlations
between recent steps are negative (if search paths are short). The factor D in the
update rule (Line 11) is a damping constant and controls how rapidly the step size
can be adapted. The search path and step size are initialized as s = 0 and = 1,
respectively.

7.2.2 Optimization Problem


We would like to analyze the behaviour of this strategy in a constrained optimization
setting where the optimal solution is located on the boundary of the feasible region.
As a model for this scenario, consider minimizing the objective function
f (x) = x1

(7.1)

subject to the inequalities


= x12

N


xi2 0

(7.2)

x1 0.

(7.3)

i=2

Here x = x1 , x2 , . . . , x N is an N -dimensional vector, and the variable 0 is


the slack for the first constraint. The feasible region given by this pair of inequalities
defines a conic region with its apex at the origin and its axis coinciding with the
positive x1 axis. The shape of the feasible region is controlled by parameter > 0,
with smaller values resulting in a wider cone. As tends to zero, the feasible region
approaches the half-space with non-negative x1 coordinates, and as approaches
infinity the feasible region is restricted to the x1 axis itself.

7 Analyzing the Behaviour of Multi-recombinative Evolution

185

In what follows, we will analyze the behaviour of the strategy described in


Sect. 7.2.1 applied to the conically constrained problem thus defined, while assuming
that the dimension N of the search space is high. Formally, we will omit terms that
disappear in the limit N in order to arrive at simpler equations. While not
exact, the equations will approximate results for large but finite N , and computer
experiments will be used to verify their accuracy.
Notice that the behaviour of the evolution strategy considered is invariant with
respect to translations and rotations of the coordinate system. The analysis below
thus applies to the more general case where the feasible region of the problem forms
a right circular cone and the cones axis coincides with the gradient direction of the
objective function. The particular choice of coordinate system employed here has
the advantage of resulting in relatively simple equations.

7.3 Analysis of a Single Step


Although the mutation vectors are drawn from a standard normal distribution, the act
of enforcing the feasibility constraints through resampling affects the distribution of
feasible offspring. Averaging the best mutation vectors as in Line 8 of Algorithm 1
further affects the distribution of z , which we now describe in the context of a single
iteration of the strategy.

7.3.1 Probability of Generating Feasible Offspring


We first observe that any vector x = x1 , x2 , . . . , x N may be written as x = x1 +
x2...N , where x1 = x1 , 0, . . . , 0 and x2...N = 0, x2 , . . . , x N . In the context of a
particular parental centroid x, any mutation vector z may be decomposed into three
mutually orthogonal components: the vector z1 that is its projection onto x1 /x1 ,
the vector z that is its projection onto x2...N /x2...N , and the vector z that lies
in the N 2 dimensional hyperplane orthogonal to both x1 and x2...N . The sum of
these components gives the original vector, as z = z1 + z + z , and we will write
z 1 , z , and z to refer to their respective magnitudes. Note that z is the magnitude
of the component of z that points to x from the axis of the cone defining the feasible
region, and that the cones axis coincides with the x1 axis for the current problem. If
we write

 N

xi2 = x2...N 
(7.4)
R=
i=2

to denote the distance from x to the axis of the cone within the N 1 dimensional
hyperplane determined by x1 , then z can be written as

186

J. Porter and D.V. Arnold

z =

N
1 
xi z i .
R
i=2

In each generation, all of the offspring must be feasible before recombination can
occur. In other words, for any offspring both
x1 + z 1 0
and
(x1 + z 1 )
2

N


(7.5)

(xi + z i )2 0

(7.6)

i=2

must be satisfied. Defining the normalized step size


=

N
R

(7.7)

and normalized slack




N x12 R 2
N
= 2 =
R
R2

(7.8)

and solving the latter equation for x1 , we arrive at

x1 = R

+ .
N

(7.9)

Substituting this into Eq. (7.5) and using Eq. (7.7) gives us the equivalent statement

+ +
z1 0
N
N

using normalized quantities. Assuming that both and tend to finite limit values
as N increases (and it will be confirmed
below that they do), then taking the limit

as N , this converges to 0 in distribution and the constraint will thus


be satisfied with overwhelming probability. Similarly, by using Eqs. (7.7), (7.8), and
(7.9) the inequality of Eq. (7.6) becomes

+ 2 z 1




N


2 2
+ z +
z i2 0.
z1
N
N
i=2

normally distributed, the sum of their squares is 2


Since the z i are all standard
1 N
distributed and the term N i=2 z i2 converges almost surely to E[z i2 ] = 1 by the
strong law of large numbers. Omitting other terms that disappear in the limit N
and solving for z gives condition

7 Analyzing the Behaviour of Multi-recombinative Evolution

187

+ 2 z 1 2
z
2

(7.10)

for a mutation vector to result in a feasible offspring candidate solution. Since both
the z i and z are standard normally distributed, the probability of the offspring
candidate solution x + z being feasible can thus be expressed using the conditional
probability of z 1 as

Pfeas =

1
2

+2 x 2
2

ex

2 /2

ey

2 /2

dy dx


1
+ 2 x 2
2
dx
=
ex /2
2
2



2

=
2 + 2


(7.11)

where () denotes the cumulative distribution function of the standard normal distribution. Equality between the second and third lines is established by use of an
identity from Arnold (2002, p. 117).

7.3.2 Expected Step


Having computed the probability Pfeas of generating feasible offspring, we can now
describe the expected behaviour of an individual step of the (/, )-ES. Where
before we considered individual offspring before selection and recombination, we
now refer to the results z 1 , z , and z of averaging across the best feasible offspring
in a generation of individuals. Using Eq. (7.10), the joint probability density for
the z 1 and z components of a feasible offspring is

+ 2 x 2
2
otherwise.

1
2
2
e(x +y )/2
p1, (x, y) = 2 Pfeas

if y

(7.12)

The marginal density of variable z 1 is therefore



p1 (x) =

p1, (x, y) dy

1
2 Pfeas

x 2 /2


+ 2 x 2
2

(7.13)

188

J. Porter and D.V. Arnold

with associated cumulative distribution function P1 . Expected values of the z 1 and


z components of the average selected mutation vectors are computed in Eqs. (7.27)
and (7.30) of the Appendix. Since the coefficient of variation for the 2 distribution decreases with increasing N , the components of the z vector for feasible
offspring are independently standard normally distributed in the limit. Averaging
such vectors results in a vector of expected squared length
 
N
2
=
E z

(7.14)

and of random direction.

7.4 Steady State Behaviour


To analyze the steady state behaviour of the algorithm applied to the conically constrained problem, for now we assume that the normalized step size is constant.
The step size is then said to be scale invariant. As a result, only remains as a
parameter describing the state of the strategy. The case of dynamically varying step
size under the control of CSA will be considered in Sect. 7.5.

7.4.1 Change of Slack


The update rule
N 
2
2


xi(t) + (t) z i(t)
(t+1) = x1(t) + (t) z 1(t)
i=2

for the slack is directly implied by Eq. (7.6), where superscripts indicate iteration
number. To derive the update rule for the normalized slack , this can be combined
with Eq. (7.8) to write
(t+1) =

R (t)

R (t+1)

(t)

(t)
(t) + 2 (t) z (t)
+ z
1
N

2


(t)  (t) 2
(t) 2
(t) 2
z 1
+
z
+ z
N


where z 1 , z , and z refer to the respective component lengths averaged from the
best offspring. The update rule for distance R is derived from Eq. (7.4) to be

7 Analyzing the Behaviour of Multi-recombinative Evolution


2

R (t+1) =

189

N

2
(xi + z i (t) )
i=2

=R

(t) 2

2 (t) (t) (t)


1+
z +
N
N

(7.15)

Using Eq. (7.14), combining this with Eq. (7.15), and taking the limit as N ,
the update rule becomes

2
.
(t+1) = (t) + 2 z 1 2 z

(7.16)

The evolution of the (/, )-ES can therefore be viewed as a time-homogeneous


Markov process with the single state variable . At each iteration, this state variable
is influenced by the component z 1 of the step made along the gradient direction, and
the component z in the direction of the axis of the cone from the current population
centroid x.
Iterating Eq. (7.16) yields a sequence of normalized slack values. After initialization effects have faded, those values are drawn from a stationary limit distribution. In
order to study this, we apply the dynamical systems approach using a shifted Dirac
delta function as a model for the stationary distribution of , resulting in stationarity
condition


E (t+1) = (t) .
Using Eq. (7.16) yields



E z 1
= 2
E z .

(7.17)

The expected values E[z 1 ] and E[z ] are functions of , and expressions for both
can be found in the Appendix.
Figure 7.1 plots the average normalized slack for the (/, )-ES with = 10
and {1, 3}. The curves were computed by numerically solving Eq. (7.17) with
Eqs. (7.27) and (7.30) using Eqs. (7.28) and (7.31). The data points were found by
artificially restricting the normalized step size of Algorithm 1 to a fixed value of
and initializing runs with a point on the boundary of the constrained region. For
each run, the first 40N iterations were discarded to allow for initialization conditions
to subside, then the average normalized slack over the next 20,000 iterations was
recorded. An upper limit for resampling was set at 1,000, so that a run for generating
a data point would be aborted if any offspring remained infeasible after 1,000 resampling operations. In this event, all subsequent data points were also omitted from
the graph. As observed for the = 1 case in Arnold (2013a), the normalized slack
increases with increasing and increasing . The same holds true for > 1. The

190

J. Porter and D.V. Arnold


1.0e+03

normalized slack *

1.0e+02
= 10.0

1.0e+01

= 1.0

1.0e+00
1.0e-01

= 0.1

1.0e-02
1.0e-03
1.0e-01

1.0e+00

1.0e+01

normalized step size *

Fig. 7.1 Average normalized slack plotted against the normalized step size . Solid lines
represent results for = 1, while dashed lines represent results for = 3. In both cases, = 10.
Marked points represent experimental data from runs of the strategy with scale invariant step size
and dimension N = 40

case of = 3 shows larger overall values of normalized slack than for = 1. This
can be explained by noting that by averaging across multiple offspring, selection
pressure for remaining close to the constraint boundary is reduced, and candidate
solutions will tend to drift farther away. The data points appear to match very closely
to the predicted curves throughout, which suggests that using the Dirac delta model
is suitable for the range of parameters considered in the plot.

7.4.2 Rate of Convergence


Assuming scale invariant step size, the (/, )-ES will either converge linearly to
the optimal solution at the cones apex or linearly diverge. That is, when plotting the
logarithm of the objective function value of the population centroid over the iteration
number, one will observe a noisy, linear decrease (or increase). Following Auger and
Hansen (2006), the convergence rate is defined as
"
! 
f (x + z )
= N E log
f (x)
and is the negative of the slope of the line observed in the graph of logarithmic objective function values scaled with N . Positive convergence rates indicate convergence
while negative values signify divergence of the strategy. Using Eqs. (7.7) and (7.9)
this may be rewritten in terms of normalized quantities as

= N E log 1 +

z 1

N + /N

"
.

(7.18)

7 Analyzing the Behaviour of Multi-recombinative Evolution

191

Dropping quadratic and higher order terms from the Taylor series expansion of the
logarithm and taking expected values, as N this becomes

E z 1
.
=

(7.19)

That is, convergence rates are affected by the normalized step size of the strategy as
well as by the population size parameters and that are implicit in E[z 1 ].
Higher convergence rates can be achieved by using larger values of and .
However, increasing the population size parameters also increases the computational costs of a single iteration of the algorithm. We consider two cost models for
comparing different parameter settings. In the first model, we assume that objective
function evaluations have a uniform cost that dominates the cost of all other operations involved in Algorithm 1. In particular, the cost of constraint function evaluations
is assumed to be negligible in this model. In the second cost model, we assume that
the cost of constraint function evaluations dominates all other costs. Optimal performance under the first cost model requires optimizing obj = /, as the number
of objective function evaluations per iteration equals . Optimal performance under
the second cost model involves optimizing feas = Pfeas /, as /Pfeas is the
expected number of constraint function evaluations per iteration.
In Fig. 7.2, the probability Pfeas of generating feasible offspring is shown for the
(/, )-ES with scale invariant step size for = 10 and {1, 3}. The lines have
been obtained from Eq. (7.11), with the normalized slack computed using the Dirac
delta model as above. The data points were calculated from averages over the same
runs of 20,000 iterations used to generate Fig. 7.1. As observed for the = 1 case
in Arnold (2013a), the probability Pfeas decreases with increasing , going below
1.0

probability Pfeas

0.8

0.6
= 0.1

0.4

0.2

0.0
1.0e-01

= 1.0

= 10.0

1.0e+00

1.0e+01

normalized step size *

Fig. 7.2 Probability Pfeas of a random offspring candidate solution being feasible plotted against
the normalized step size . Solid lines represent results for = 1, while dashed lines represent
results for = 3. In both cases, = 10. Marked points represent experimental data from runs of
the strategy with scale invariant step size and dimension N = 40

192

J. Porter and D.V. Arnold

convergence rate *

2.0
= 10.0

1.5
= 1.0

1.0
0.5
0.0
-0.5
1.0e-01

= 0.1

1.0e+00

1.0e+01

normalized step size *

Fig. 7.3 Convergence rate plotted against the normalized step size . Solid lines represent
results for = 1, while dashed lines represent results for = 3. In both cases, = 10. Marked
points represent experimental data from runs of the strategy with scale invariant step size and
dimension N = 40

one half and appearing to approach zero for large . For equal normalized step
size, Pfeas is larger for = 3 than for = 1, which is unsurprising as it has been
observed in Fig. 7.1 that = 3 results in larger normalized slack values.
Figure 7.3 shows the convergence rate of the (/, )-ES with scale invariant
step size for = 10 and {1, 3}. The data points were calculated from averages
computed over the same runs used to generate Figs. 7.1 and 7.2, and the curves were
computed using Eq. (7.19) after solving Eq. (7.17) numerically for the normalized
slack. As observed for the = 1 case in Arnold (2013a), each curve first increases
with increasing step size before it starts decreasing and eventually turns negative
(indicating divergence of the strategy). This overall pattern introduces the notion
of an optimal normalized step size that maximizes the rate of convergence .
Larger values of , which correspond to more narrow cones delimiting the feasible
region, appear to admit higher maximal convergence rates. In terms of the strategys
behaviour, this suggests that narrower regions of feasibility funnel the candidate
solutions toward the optimum solution by inherently limiting the choice of offspring
in perpendicular directions.
Figure 7.4 shows the behaviour of various quantities when the normalized step
and which maximize and ,
size is fixed at the optimum values obj
feas
obj
feas
respectively. The resulting probability of generating feasible offspring, convergence
rates relative to the number of objective and constraint function evaluations, and the
optimal step size itself are all plotted for the (/, )-ES with = 10 and {1, 3}.
The data for the curves was generated by numerically computing the optimal values
and using Eqs. (7.11) and (7.19) with the Dirac delta model.
obj
feas
(shown with solid lines), a cost model is assumed where objective function
For obj
evaluations dominate overall computational costs. The case of = 1 corresponds to
the observations made in Arnold (2013a). The probability Pfeas is higher for = 3
for sufficiently large . For all choices
than for = 1, and the same is true for obj

12.0
8.0

=3

4.0
0.0
1.0e-02

=1
1.0e-01

1.0e+00

0.8

0.4

convergence rate *feas

convergence rate *obj

=3

0.10

=1
0.05
1.0e-01

1.0e+00

constraint parameter

1.0e-01

1.0e+00

1.0e+01

constraint parameter

0.20
0.15

=1

0.2
0.0
1.0e-02

1.0e+01

=3

0.6

constraint parameter

0.00
1.0e-02

193

1.0

probability Pfeas

optimal step size *

7 Analyzing the Behaviour of Multi-recombinative Evolution

1.0e+01

0.20
0.15
0.10

=3

0.05

=1
0.00
1.0e-02

1.0e-01

1.0e+00

1.0e+01

constraint parameter

Fig. 7.4 Optimal normalized step size , probability Pfeas of generating feasible offspring, convergence rate obj relative to the number of objective function evaluations, and convergence rate feas
relative to the number of constraint function evaluations are plotted against constraint parameter
for = 10 and {1, 3}. All figures use solid lines to indicate the optimal normalized step
, and dotted lines to indicate the optimal normalized step size
size obj
feas

of , Pfeas appears to approach zero as increases. Considering the behaviour of the


normalized convergence rate relative to the assumed computational costs, the strategy
with = 1 outperforms that with = 3 for small values of while the situation
is reversed for larger values of the constraint parameter. Additionally, larger values
, for sufficiently large .
of appear to correspond with larger optimal values obj
This agrees with the observations of Fig. 7.3, and suggests that the choice of larger
encourages larger step size when the region is more narrow, subsequently improving
the expected rate of convergence.
(shown with dotted lines), a cost model is assumed where constraint
For feas
function evaluations dominate overall computational costs. The behaviour differs
for larger values of , yet appears almost identical for smaller
from that of obj
values. For these narrow regions of feasibility, the optimal step size is relatively
smaller, while the probability Pfeas remains at or above 0.5. Over approximately
the same interval of , the convergence rate obj is smaller and the convergence
. Taken together, these results suggest
rate feas is larger than when optimizing obj
that the second cost model is able to improve its expected rate of convergence by
encouraging smaller step size when dealing with more narrow regions of feasibility.
, corresponding probability P
In Fig. 7.5, the optimal normalized step size obj
feas
of generating feasible offspring, and convergence rates relative to both cost models
are shown for = 10 and varying . All points were generated by computing the
using the same method as in Fig. 7.4. The values for
optimal normalized step size obj
Pfeas increase monotonically with increasing truncation ratio /. The curves for the

J. Porter and D.V. Arnold


16.0
12.0

= 0.01
= 0.1

1.0

= 1.0
= 10.0

probability Pfeas

optimal step size *obj

194

8.0
4.0
0.0
0.0

0.2

0.4

0.6

0.8

0.8
0.6

0.2
0.0
0.0

1.0

= 0.01
= 0.1

= 1.0
= 10.0

0.2

0.1

0.0
0.0

0.2

0.4

0.6

0.8

truncation ratio /

0.2

0.4

0.6

0.8

1.0

truncation ratio /
convergence rate *feas

convergence rate *obj

truncation ratio /
0.3

= 0.01
= 0.1
= 1.0
= 10.0

0.4

1.0

0.3

= 0.01
= 0.1

= 1.0
= 10.0

0.2

0.1

0.0
0.0

0.2

0.4

0.6

0.8

1.0

truncation ratio /

, probability P
Fig. 7.5 Optimal normalized step size obj
feas of generating feasible offspring,

normalized convergence rate obj relative to the number of objective function evaluations, and
normalized convergence rate feas relative to the number of constraint function evaluations are
.
plotted against truncation ratio / for = 10. All figures use the optimal normalized step size obj
The data points are joined by lines for ease of visibility

normalized convergence rate relative to the two cost models show optimal behaviour
for intermediate values of , except for very small values of where = 1 is
optimal. For both models, the optimal value of appears to increase monotonically
with respect to .
, corresponding probability P
In Fig. 7.6, the optimal normalized step size feas
feas
of generating feasible offspring, and convergence rates relative to both cost models
are shown for = 10 and varying . All points were generated by computing the
using the same method as in Fig. 7.5, adjusted for
optimal normalized step size feas
the different cost model. Throughout, the values seem more tightly clustered than
in Fig. 7.5. The optimal value of for both cost models still appears to increase
monotonically with respect to .

7.5 Step Size Adaptation


While we have assumed constant in the analysis up to now, that assumption is
of course unrealistic as the distance to the cones axis is unknown to the algorithm.
Practically, the step size needs to be adapted using one of a number of control
schemes. In this section, we consider the case that the step size of the algorithm is
controlled by CSA as described in Sect. 7.2.1. As before, the notation

16.0
12.0

= 0.01
= 0.1

8.0
4.0
0.0
0.0

0.2

0.4

0.6

0.8

0.8
0.6

0.2
0.0
0.0

1.0

= 1.0
= 10.0

0.2
0.1
0.0
0.0

0.2

0.4

0.6

0.8

0.2

0.4

0.6

0.8

1.0

truncation ratio /

1.0

convergence rate *feas

convergence rate *obj

= 0.01
= 0.1

= 0.01
= 0.1
= 1.0
= 10.0

0.4

truncation ratio /
0.3

195

1.0

= 1.0
= 10.0

probability Pfeas

optimal step size *feas

7 Analyzing the Behaviour of Multi-recombinative Evolution

0.3

= 0.01
= 0.1

= 1.0
= 10.0

0.2
0.1
0.0
0.0

0.2

0.4

0.6

0.8

1.0

truncation ratio /

truncation ratio /

, probability P
Fig. 7.6 Optimal normalized step size feas
feas of generating feasible offspring,
normalized convergence rate obj relative to the number of objective function evaluations, and
normalized convergence rate feas relative to the number of constraint function evaluations are
.
plotted against truncation ratio / for = 10. All figures use the optimal normalized step size feas
The data points are joined by lines for ease of visibility, and the scales are kept identical to Fig. 7.5
for straightforward comparison

s =

N
1 
si xi
R

(7.20)

i=2

refers to the magnitude of the component of vector s which points in the direction
from the axis of the cone to candidate solution x. Together with the component s1 ,
normalized slack , normalized step size , and deviation s2 N , this describes
the state of the strategy. This gives a five-dimensional parameter space for modeling the Markov process, compared to the one-dimensional parameter space used in
Sect. 7.4. Using the consequence given in Eq. (7.17) of the existing update rule for
, and known expected values E[z 1 ], E[z ] as computed in the Appendix, then by
following a similar approach to that of Arnold (2013a) and Arnold and Beyer (2010)
we will derive update rules and model the stationary distributions for s1 , s , and s2
in order to completely describe the expected behaviour of the system when using
CSA.
An immediate consequence of the update of the search path in Line 10 of Algorithm 1 is the update equation
(t+1)

s1

(t)

= (1 c)s1 +


(t)
c(2 c)z 1

196

J. Porter and D.V. Arnold

where superscripts indicate iteration number, for the component of s contained in the
subspace spanned by the x1 axis. Employing the Dirac delta model in the dynamical
(t+1)
(t)
systems approach and requiring that E[s1
] = s1 results in

s1 =

(2 c)
E[z 1 ]
c

(7.21)

as an approximation to the average value of the s1 component of the search path if


the strategy operates in a stationary state.
For the component s , using Eq. (7.20) with the search path update equation in
Line 10 of Algorithm 1 gives
(t+1)
s

#


R (t)
(t) (t)
(t)
(t)
s
= (t+1) (1 c) s +
z
N 2...N 2...N
R



(t) (t) 2
(t)
z2...N 
+ c(2 c) z +
.
N

Then applying Eqs. (7.14) and (7.15) while omitting terms that disappear in the limit
N yields
(t+1)
s

(t)
(1 c)s

c(2 c) z +

(t+1)

Taking expected values and imposing the condition that E[s

s =

.
(t)

] = s , we have



(2 c)

E[z ] +
c

(7.22)

as an approximation to the average value of the s component of the search path if


the strategy operates in a stationary state.
Considering the squared length s2 of the search path, the corresponding update
rule is
2

s(t+1)  =

N 



(t)
(t) 2
(1 c)si + c(2 c)z i
i=1


(t) (t)
= (1 c)2 s(t) 2 + 2(1 c) c(2 c)(z 1 s1
(t) (t)

+ z s ) + c(2 c)z(t) 2 .
Taking expected values, imposing the condition E[s(t+1) 2 ] = s(t) 2 , and recalling
that E[z2 ]/N = 1/ for large N , this becomes

7 Analyzing the Behaviour of Multi-recombinative Evolution

197




s2 = (12c +c2 )s2 +2(1c) c(2 c)(E z 1 s1 +E z s )+c(2c)N .
Using Eqs. (7.21) and (7.22) gives


2(1 c)
E z 1
s N =
c
2

+ E z

+ E z


(7.23)

as an approximation for the average deviation of the squared length of the search
path from the expected value in the case of uncorrelated steps.
Finally, considering the normalized step size, using Eqs. (7.7) and (7.15) with the
update rule in Line 11 of Algorithm 1 results in

(t+1)



s(t+1) 2 N
R (t) (t)
= (t+1)
exp
.
2D N
R
1

=$
2
(t)
(t)
1 + 2 z /N +
/(N )

(t)

s(t+1) 2 N
exp
2D N

Using the Taylor expansions for 1/ 1 + x and exp(x) and dropping all terms of
quadratic and higher order we arrive at
#

(t+1)

(t)

1
1
N


(t)
(t) z

(t)
+
2

s(t+1) 2 N
+
2D N


.

Taking expected values and imposing condition E[ (t+1) ] = (t) leads to


E[z ] +

s2 N
2
=
.
2
2D

Applying Eq. (7.23) to the right hand side while again taking expected values, this
yields


2(1 c)
2

2
2
=
E[z ] .
E[z 1 ] + E[z ] +
E[z ] +
2
2cD

For large N , the cumulation parameter


c may be set to 1/ N , and the damping

constant D may be set to 1/c = N . Re-arranging the terms above while simplifying
and omitting those that vanish as N gives


2 = 22 E[z 1 ]2 + E[z ]2

(7.24)

J. Porter and D.V. Arnold


12.0
10.0

1.0

=1
=3

probability Pfeas

normalized step size *

198

8.0
6.0
4.0
2.0
0.0
1.0e-02

1.0e-01

1.0e+00

0.8
0.6
0.4
0.2

0.0
1.0e-02

1.0e+01

=1
=3

0.1

0.0
1.0e-02

1.0e-01

1.0e+00

1.0e-01

1.0e+00

1.0e+01

constraint parameter

1.0e+01

constraint parameter

convergence rate *feas

convergence rate *obj

constraint parameter
0.2

=1
=3

0.10

=1
=3

0.05

0.00
1.0e-02

1.0e-01

1.0e+00

1.0e+01

constraint parameter

Fig. 7.7 Normalized step size , probability Pfeas of generating feasible offspring, convergence
rate obj relative to the number of objective function evaluations, and convergence rate feas relative
to the number of constraint function evaluations plotted against constraint parameter . All plots
represent runs using CSA to control step size. Values for = 1 and = 3 are compared for
= 10. In all figures, the marked points represent experimental data from runs of the strategy using
dimension N = 40 (+) and dimension N = 400 (). The extra black dotted lines are provided for
reference, and indicate the curves for normalized step size optimized for obj as shown in Fig. 7.4

as an approximation to the average normalized step size that CSA will generate in
the stationary state of the strategy.
In Fig. 7.7, the average normalized step size, the probability Pfeas of generating
feasible offspring, and the normalized convergence rates relative to the two cost
models are plotted when using CSA to control the value of . The curves were
generated by numerically solving Eqs. (7.17) and (7.24) with Eqs. (7.27) and (7.30).
The data points were determined by averaging results from runs of 20,000 iterations
of the (/, )-ES using CSA. As before, the first 40N iterations were discarded to
avoid initialization biases, and resampling offspring over 1,000 times resulted in no
further data points included from that run. Step sizes generated using CSA with = 3
are larger than those generated with = 1, and in both cases the values generated
are close to the optimal ones for the obj cost model (shown with dotted lines)
except where is large and CSA results in significantly smaller than optimal values.
Considering Pfeas , the probability of generating feasible offspring decreases with
increasing constraint parameter, though not as rapidly as in Fig. 7.4 when optimized
for . Values of the convergence rate obj relative to the number of objective
function evaluations are close to optimal throughout, provided that N is large enough
for the approximations to be sufficiently accurate. Values of the convergence rate
feas relative to the number of constraint function evaluations decrease and lose

7 Analyzing the Behaviour of Multi-recombinative Evolution


= 0.1
= 1.0

0.8

1.0

= 10.0

probability Pfeas

probability Pfeas

1.0

0.6
0.4
0.2
0.0

10

100

= 0.1
= 1.0

0.8

0.4
0.2
1

1.2

= 10.0

convergence rate

convergence rate

= 0.1
= 1.0

0.8
0.4
0.0

10

100

dimension N

10

100

1000

dimension N

dimension N
1.6

= 10.0

0.6

0.0

1000

199

1000

2.0

= 0.1
= 1.0

1.5

= 10.0

1.0
0.5
0.0

10

100

1000

dimension N

Fig. 7.8 Probability Pfeas and convergence rate plotted against search space dimension N . The
left hand graphs represent results for = 1 and those on the right for = 3. In both cases,
= 10. The horizontal lines represent results obtained using the dynamical systems approach
assuming N . The marked points represent results measured in runs of the (/, )-ES with
cumulative step size adaptation

accuracy with increasing constraint parameter, mirroring the behaviour of Pfeas . The
relatively inaccurate predictions of the convergence rates for = 3 and N = 40
can be explained from the large observed values of the normalized slack causing
significant error when dropping the term /N compared to in the calculation going
from Eqs. (7.18) to (7.19). Measurements for N = 400 are noticeably more accurate
in this case.
Finally, Fig. 7.8 illustrates the accuracy of the predictions made using the dynamical systems approach in the limit N by comparing the estimates for the
probability Pfeas of generating feasible offspring and the convergence rate with
measurements made in runs of the (/, )-ES with cumulative step size adaptation
as described above. It can be seen that the error in the predictions decreases with
increasing search space dimensionality, though not necessarily monotonically. Predictions for small values of are more accurate than those for larger values of the
constraint parameter, and the error in the predictions of is generally larger for
= 3 than it is for = 1. While in the latter case the error is below 15 % for N as
small as 20, = 3 requires N an order of magnitude larger in order to achieve that
level of accuracy for larger values of .

200

J. Porter and D.V. Arnold

7.6 Conclusion
We have analyzed the behaviour of the (/, )-ES with cumulative step size adaptation applied to a conically constrained problem where the gradient direction coincides with the cones axis, and the optimal solution lies in the cones apex, on the
boundary of the feasible region. Under the assumption of scale invariant step size,
we used a Markov process model to estimate the evolving slack of candidate solutions and the overall operation of the strategy probabilistically. More narrow conic
regions of feasibility were found to result in higher convergence rates, for appropriately chosen normalized step size. If choosing the step size to maximize the rate of
convergence, the strategy performed better with larger choices of when the feasible
region was narrow, while = 1 was a better choice for feasible regions approaching
the half-space.
An offsetting factor for the high convergence rates in narrow regions of feasibility was that these regions also resulted in a lower probability of feasible offspring,
requiring more resampling in each generation on average. Selecting more offspring
for recombination with larger could improve the probability of offspring being
feasible in these narrow regions, but would not improve the rate of convergence in
more broad regions of feasibility. As the region approaches the half-space, choosing
> 1 would eventually reduce the convergence rate. The balance between the probability of generating feasible offspring and the rate of convergence was considered
using two cost models: one that assumes that objective function evaluations dominate
computational costs, and one that assumes that constraint function evaluations play
that role.
Using cumulative step size adaptation was found to lead to convergence, usually
at a rate close to the optimal one, at least for sufficiently large N . However, the
predicted convergence rates were notably inaccurate when both and were large
and the feasible region was narrow. In these cases, the strategy moves farther from
the constraint boundary, developing a large average value of normalized slack. With
dimension N = 40, the error term then dominates the predicted convergence rate.
With larger dimensional problems, the observed values once again approached the
predicted rate.
Acknowledgments This research was supported by the Natural Sciences and Engineering Research
Council of Canada (NSERC).

7.7 Appendix
The derivation of expressions for E[z 1 ] and E[z ] closely follows similar calculations
by Arnold (2013b), with differences due to the task here being minimization rather
than maximization and the underlying probability distributions differing from those
that hold for the linearly constrained problem.

7 Analyzing the Behaviour of Multi-recombinative Evolution

201

7.7.1 Deriving an Expression for E[z1 ]


The (/, )-ES averages the mutation vectors corresponding to the selected offspring. Since the objective is minimization of f (x) = x1 , the vectors that are selected
are those with the smallest z 1 components. If the vectors are sorted so that z(k;) refers
to the vector with the kth smallest z 1 component, then by using elementary results
from the field of order statistics (see Balakrishnan and Rao (1998)), the probability
density function of the z 1 component for the mutation vector with the k-th smallest
objective function value may be written as
(k;)

p1

(x) =

!
p1 (x) [1 P1 (x)]k [P1 (x)]k1 .
( k)!(k 1)!

(7.25)

Since the value of z 1 is the average of the best individuals, its expected value can
be expressed as


1
E z 1 =

!
=

(k;)

x p1

(x) dx

k=1



[1 P1 (x)]k [P1 (x)]k1
dx.
x p1 (x)
( k)!(k 1)!
k=1

The summation term can be converted to an integral using identity





1
Q k [1 Q]k1
=
z 1 (1 z)1 dz (7.26)
( k)!(k 1)!
( 1)!( 1)!
Q

k=1

from Beyer (2001), resulting in




E z 1

  

= ( )
x p1 (x)

1P
 1 (x)

z 1 (1 z)1 dz dx.

By performing a change of variable with z = 1 P1 (y) and then exchanging the


order of integration, this becomes


E z 1

  


= ( )
x p1 (x) p1 (y) [1 P1 (y)]1 [P1 (y)]1 dy dx

  

p1 (y) [1 P1 (y)]1 [P1 (y)]1 I1 (y) dy


= ( )

(7.27)

202

J. Porter and D.V. Arnold

where

y
I1 (y) =

x p1 (x) dx.

We introduce abbreviations
Ax =
and

+ 2 x 2
2

2
B= 
2 + 2

and solve the inner integral I1 (y) by integration by parts with


v = ex /2
2
v = xex /2
2

u = (A x )

2

u = eA x /2 / 2
yielding
I1 (y) =

y

1
2 Pfeas

xex

2 /2

(A x ) dx

1
2 Pfeas

= p1 (y) +

ey 2 /2 A y + 1
2
1

2 Pfeas

y

e(x

2 +A2 )/2
x

y

ex

2 /2

eA x /2 dx
2

dx.

The remaining integral can be solved by quadratic completion of the argument to the
exponential function and subsequent change of variable, resulting in

 
1
1
2
1 + Ay B .
eB /2
I1 (y) = p1 (y) +

2 Pfeas 1 +

(7.28)

Together with Eq. (7.27), the expression in Eq. (7.28) allows numerically computing
the expected value of z 1 .

7.7.2 Deriving an Expression for E[z ]


Due to the resampling of infeasible candidate solutions, the z components of mutation vectors resulting in feasible offspring are not independent of the respective z 1
components. Their conditional probability density is

7 Analyzing the Behaviour of Multi-recombinative Evolution

p (y | z 1 = x) =

203

p1, (x, y)
,
p1 (x)

where the densities on the right hand side are given in Eqs. (7.12) and (7.13). The
corresponding conditional expected value is therefore


E z | z1 = x =

p1, (x, y)
dy
p1 (x)

1
2
2
ex /2 eA x /2 .
2 p1 (x)Pfeas

(7.29)

We use Eqs. (7.25) and (7.26) to express the expected value of this component for
the average of the best individuals, and write analogously to the calculations for
E[z 1 ]


E z

1
=


(k;)
E z | z 1 = x p1 (x) dx

k=1

  

= ( )
p1 (y) [1 P1 (y)]1 [P1 (y)]1 I2 (y) dy (7.30)

where

y
I2 (y) =


E z | z 1 = x p1 (x) dx.

With Eq. (7.29) this becomes


1
I2 (y) =
2 Pfeas

y

e(x

2 +A2 )/2
x

dx.

Again using quadratic completion for the argument to the exponential function and
performing a change of variable results in
I2 (y) =


 

2
eB /2
1 + Ay B .

2 Pfeas 1 +
1

(7.31)

Together with Eq. (7.30), the expression in Eq. (7.31) allows numerically computing
the expected value of z .

204

J. Porter and D.V. Arnold

References
Arnold DV (2002) Noisy optimization with evolution strategies. Kluwer Academic Publishers,
Dordrecht
Arnold DV (2011a) Analysis of a repair mechanism for the (1, )-ES applied to a simple constrained
problem. In: Genetic and evolutionary computation conferenceGECCO 2011. ACM Press, pp
853860
Arnold DV (2011b) On the behaviour of the (1, )-ES for a simple constrained problem. In: Beyer
H-G, Langdon WB (eds) Foundations of genetic algorithmsFOGA 2011. ACM Press, New
York, pp 1524
Arnold DV (2013a) On the behaviour of the (1, )-ES for a conically constrained problem. In:
Genetic and evolutionary computation conferenceGECCO 2013. ACM Press, pp 423430
Arnold DV (2013b) Resampling versus repair in evolution strategies applied to a constrained linear
problem. Evol Comput 21(3):389411
Arnold DV, Beyer H-G (2010) On the behaviour of evolution strategies optimising Cigar functions.
Evol Comput 18(4):661682
Arnold DV, Brauer D (2008) On the behaviour of the (1 + 1)-ES for a simple constrained problem.
In: Rudolph G et al (eds) Parallel problem solving from naturePPSN X. Springer, Berlin, pp
110
Auger A, Hansen N (2006) Reconsidering the progress rate theory for evolution strategies in finite
dimensions. In: Genetic and evolutionary computation conferenceGECCO 2006. ACM Press,
pp 445452
Balakrishnan N, Rao CR (1998) Order statistics: an introduction. In: Balakrishnan N et al (eds)
Handbook of statistics, vol 16. Elsevier, New York, pp 324
Beyer H-G (1989. Ein Evolutionsverfahren zur mathematischen Modellierung stationrer Zustnde
in dynamischen Systemen. PhD thesis, Hochschule fr Architektur und Bauwesen, Weimar
Beyer H-G (2001) The theory of evolution strategies. Springer, Heidelberg
Beyer H-G, Schwefel H-P (2002) Evolution strategiesa comprehensive introduction. Nat Comput
1(1):352
Meyer-Nieberg S, Beyer H-G (2012) The dynamical systems approachprogress measures and
convergence properties. In: Rozenberg G et al (eds) Handbook of natural computing. Springer,
Berlin, pp 741814
Mezura-Montes E, Coello Coello CA (2011) Constraint-handling in nature-inspired numerical optimization: past, present, and future. Swarm Evol Comput 1(4):173194
Michalewicz Z, Schoenauer M (1996) Evolutionary algorithms for constrained parameter optimization problems. Evol Comput 4(1):132
Ostermeier A, Gawelczyk A, Hansen N (1994) Step-size adaptation based on non-local use of
selection information. In: Davidor Y et al (eds) Parallel problem solving from naturePPSN III.
Springer, Berlin, pp 189198
Oyman AI, Deb K, Beyer H-G (1999) An alternative constraint handling method for evolution
strategies. In: Proceedings of the 1999 IEEE congress on evolutionary computation. IEEE Press,
pp 612619
Rechenberg I (1973) EvolutionsstrategieOptimierung technischer Systeme nach Prinzipien der
biologischen Evolution. Friedrich Frommann Verlag, Stuttgart
Schwefel H-P (1981) Numerical optimization of computer models. Wiley, Chichester

Chapter 8

Locating Potentially Disjoint Feasible


Regions of a Search Space with a Particle
Swarm Optimizer
Mohammad Reza Bonyadi and Zbigniew Michalewicz

Abstract In constraint optimization problems set in continuous spaces, a feasible


search space may consist of many disjoint regions and the global optimal solution
might be within any of them. Thus, locating these feasible regions (as many as possible, ideally all of them) is of great importance. In this chapter, we introduce niching
techniques that have been studied in connection with multimodal optimization for
locating feasible regions, rather than for finding different local optima. One of the
successful niching techniques was based on the particle swarm optimizer (PSO) with
a specific topology, called nonoverlapping topology, where the swarm was divided
into several nonoverlapping sub-swarms. Earlier studies have shown that PSO with
such nonoverlapping topology, with a small number of particles in each sub-swarm, is
quite effective in locating different local optima if the number of dimensions is small
(up to 8). However, its performance drops rapidly when the number of dimensions
grows. First, a new PSO, called mutation linear PSO, MLPSO, is proposed. This
algorithm is effective in locating different local optima when the number of dimensions grows. MLPSO is applied to optimization problems with up to 50 dimensions,
and its results in locating different local optima are compared with earlier algorithms.
Second, we incorporate a constraint handling technique into MLPSO; this variant
is called EMLPSO. We test different topologies of EMLPSO and evaluate them in
terms of locating feasible regions when they are applied to constraint optimization
problems with up to 30 dimensions. The results of this test show that the new method
M.R. Bonyadi (B) Z. Michalewicz
Optimization and Logistics, School of Computer Science, University of Adelaide,
Adelaide, SA 5005, Australia
e-mail: mrbonyadi@cs.adelaide.edu.au; vardiar@gmail.com
Z. Michalewicz
Institute of Computer Science, Polish Academy of Sciences, ul. Ordona 21,
01-237 Warsaw, Poland
e-mail: zbyszek@cs.adelaide.edu.au
Z. Michalewicz
Polish-Japanese Institute of Information Technology, ul. Koszykowa 86,
02 008 Warsaw, Poland
Z. Michalewicz
Chief of Science at Complexica (www.complexica.com), Adelaide, Australia
Springer India 2015
R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,
Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_8

205

206

M.R. Bonyadi and Z. Michalewicz

with nonoverlapping topology with small swarm size in each sub-swarm performs
better in terms of locating different feasible regions in comparison to other topologies,
such as the global best topology and the ring topology.
Keywords Constrained optimization Feasible regions Disjoint feasible regions
Particle swarm optimization

8.1 Introduction
A constrained optimization problem (COP) is formulated as follows:

(a)
y S f (z) f (y)
find z S RD such that gi (z) 0, for i = 1 to q (b)

hj (z) = 0, for j = 1 to m (c)

(8.1)

In this formulation, f , gi , and hj are real-valued functions on the search space S


(i.e. S R), q is the number of inequalities, and m is the number of equalities. The
search space S is defined as a D dimensional rectangle in RD such that lj zj uj ,
j = 1, . . . , D (lj and uj are the lower and upper bounds of the jth variable). The
set of all feasible points which satisfy constraints (b) and (c) are denoted by F
(Michalewicz and Schoenauer 1996). We consider a single objective case in this
chapter.
Usually in a COP, the equalities are replaced by the following inequalities (Takahama and Sakai 2010):


hj (x) ,

for j = 1 to m

(8.2)

where is a small positive value. In all experiments in this chapter, we set =


104 , the same as in other studies (Liang et al. 2010; Takahama and Sakai 2010).
Accordingly, Eq. 8.1 is rewritten as

y S f (z) f (y)
(a)
(8.3)
find z S RD such that
gi (z) 0, for i = 1 to m + q (b)


where gj (x) = hj (x) for 1 < j m. In this chapter, we refer to Eq. 8.3
whenever we use the term COP.
Each optimization method which deals with COPs generally consists of two main
parts: an optimization algorithm and a constraint handling technique (CHT). The optimization algorithm can be any optimization algorithm such as the particle swarm optimization (PSO) (Kennedy and Eberhart 1995), the genetic algorithm (GA) (Goldberg
1989), the covariance matrix adaptation evolutionary strategy (CMA-ES) (Hansen
2006), the gradient descent algorithms (Gilbert and Nocedal 1992), the conjugate
gradient algorithms (Gilbert and Nocedal 1992), or the linear programming (Dantzig

8 Locating Potentially Disjoint Feasible Regions


Fig. 8.1 An example of a
search space. The gray
regions are the feasible
regions. The point is the
global optimum solution

207

Search space

c
a

d
b

1998), among others. The task of the optimization algorithm is to generate new solutions at every iteration. In each optimization algorithm, an operator is needed to
compare candidate solutions thus enabling the optimizer to select one (or more) of
the solutions.1 This comparison operator plays a key role in the performance of the
algorithm in finding better solutions. In unconstrained problems, this comparison
operator is simple, and, for a minimization problem, it is implemented as
x S is better than y S iff f (x) < f (y)

(8.4)

where f (.) : RD R is the objective function and x and y are two samples from
the search space. However, in COPs, in addition to the objective function, there are
constraints that need to be considered in the comparison procedure. There are three
cases for comparing two solutions x and y in a COP:
1. x and y F , i.e. both are feasible
2. x
/ F and y
/ F , i.e. both are infeasible
3. x
/ F and y F , one is feasible the other is infeasible.
If the solutions follow the case (1) then the comparison is easy because it is
made in the same way as in Eq. 8.4 (both solutions are feasible). In cases (2) and
(3) however, this comparison is more complicated. Figure 8.1 provides examples to
show the reason behind the complications within cases (2) and (3).
In Fig. 8.1, both solutions a and b are infeasible. Also, assume that all constraint
values for solution a are smaller than the constraint values for solution b (i.e. gj (a) <
gj (b) for all j). However, solution b is much closer to the optimal solution than
solution a (d is the optimal solution). Thus, if solution b is selected, there is a greater
chance for the algorithm to improve the solution thereby reaching the optimal solution
in the next steps. Clearly, choosing one of a or b is not an easy task because solution
a is better than b in terms of one aspect (the value of constraints), while solution
b is better than a in terms of another aspect (closeness to the optimal solution).
1

Note that this selection can be performed by a direct decision (the better solution is selected) or
by some analysis to find out the potential of the solutions. However, in either approach, the concept
of being better needs to be defined.

208

M.R. Bonyadi and Z. Michalewicz

Also, choosing one of the solutions in case (3) is complicated. As an example, let
us concentrate on solutions b (an infeasible solution) and c (a feasible solution) in
Fig. 8.1. If solution c is selected, it is harder for the optimization algorithm to move
the solutions in the next steps toward the optimal solution, i.e., d. However, if solution
b is selected, although it is infeasible it is easier for the optimization algorithm to
move the solutions in the next steps toward the optimal solution. Clearly, the easiest
case is case (1) as the standard comparison between solutions can be used. However,
there are complications in regard to cases (2) and (3).
The aim of a CHT is to compare two solutions and decide which solution is the
better. Note that such a comparison needs to consider all the three aforementioned
cases. There are several categories of techniques for handling constraints that can
be incorporated in an optimization algorithm (Michalewicz and Schoenauer 1996);
these categories include penalty functions, special operators, repairs, decoders, and
hybrid techniques. In the category of penalty functions, the objective function is
combined with constraints in such a way that the problem is turned into an unconstrained problem. Thus, all solutions are feasible and, hence, comparisons follow
case (1) thereby making the comparison easy. In the category of special operators,
an operator is designed that always maps a feasible solution to a feasible solution.
Note that to use a technique in this category, the initial solutions need to be feasible. Because the solutions are always feasible all comparisons follow case (1), and
hence, comparison is done easily. In the category of repair, each infeasible solution
is repaired and a feasible solution is generated. In this case, two possibilities can be
considered: the original solution is kept in the population and is known as Baldwinian
evolution (Whitley et al. 1994), or it is replaced by the repaired solution known as
Lamarckian evolution (Whitley et al. 1994). In this category, because the solutions
are always feasible (repaired), again all comparisons follow case (1), thereby making the comparisons easier. In the category of decoder-based techniques, mapping
from genotype to phenotype is established such that any genotype is mapped into
a feasible phenotype. In this category, as with the previous categories, all solutions
are feasible, thus making it unnecessary to consider cases (2) and (3). Finally, the
last category, hybrid, includes all possible combinations of CHTs. It seems that all
CHTs try to apply some modification to the solutions (e.g., via repairing, applying
penalty) to get rid of the complications in comparison within cases (2) and (3).
There have been some attempts to design methods to explore the search space
of COPs to find a feasible solution: these methods are called constraint satisfaction methods (Tsang 1993). The acceptance criterion for a constraint satisfaction
method is at least one feasible solution. Normally, this feasible solution, found by
the constraint satisfaction method, is fed into an optimization method as an initial
solution, and the method improves the quality of this solution in terms of objective
value while maintaining feasibility. As feasible regions in COPs might have irregular shapes (e.g., disjoint, with holes, connected with narrow passages, non-convex),
the quality of the final solution, namely the improved solution by the optimization
method, is highly dependent on the location of the initial feasible solution. Figure 8.2
shows some examples of irregular shapes of feasible regions.

8 Locating Potentially Disjoint Feasible Regions

209

Fig. 8.2 A sample search


space with several irregular
shape feasible regions. The
dark and light grey regions
are feasible regions and the
search space respectively

Search space

Feasible passage

If the optimization method is initialized with solutions in region A, as in Fig. 8.2,


it might be difficult for the method to explore the solutions in the feasible region
B. The reason is that regions A and B are disjoint and usually, infeasible solutions
between A and B are considered to be of lower quality than the solutions within
A or B. Hence, as optimization methods normally tend to move solutions closer
to good known solutions, i.e., they are attracted by higher quality solutions, it is
very unlikely that they are successful in moving a solution in region A to region B.
Also, note that A and B might be far from each other and B can be a very small
region, which makes it harder to move a solution in region A to region B. In addition,
even though the regions C and D are connected, if the initial feasible solution is
located in region C, it is hard for the optimization method to move that solution to
region D. The reason for this is that the feasible passage between regions C and D
is very narrow. Hence, it is hard for the optimization method to find that passage to
move the solutions through it toward D. Thus, rather than locating only one feasible
solution, it is better to generate different feasible solutions that are in potentially
disjoint feasible regions.2 From now on, we use the term feasible regions rather than
potentially disjoint feasible regions. In this case, there is an increased probability of
locating feasible regions which contain high quality solutions, as well as of locating
the feasible region that contains the optimal solution, i.e., optimal region. However,
there have not been many attempts to design algorithms capable of locating feasible
regions.
Clearly, there are similarities between locating feasible regions in a COP and the
concept of niching in multi-modal optimization, i.e., locating different, ideally all,
local optima of an objective function (Brits et al. 2002). We use these similarities
to propose a method that is able to locate feasible regions in the search space. The
particle swarm optimization algorithm (PSO) is used in this chapter for optimization
purposes. Some issues related to the niching abilities of the PSO are investigated and
a new PSO (called mutation linear PSO, MLPSO) is proposed, which addresses these
2

The term potentially disjoint feasible regions refers to disjoint feasible regions and the regions
that are connected with narrow passages. Also, note that without information about the topology of
the search space, it is not possible to claim that the found solutions are in disjoint feasible regions.

210

M.R. Bonyadi and Z. Michalewicz

issues. Then, the MLPSO is extended in such a way that it can locate feasible regions
in a COP. To confirm that the proposed method performs effectively in locating
feasible regions, the performance of the method is tested through some test cases
where the locations of their feasible regions are known.
The rest of the chapter is organized as follows: some background on COPs and
CHTs are provided in Sect. 8.2. An overview of the PSO algorithm including variants,
issues, topologies, and niching abilities, is given in Sect. 8.3. The proposed method
for locating feasible regions is proposed and discussed in Sect. 8.4 and it is tested
later in Sect. 8.5. At the end, we conclude the chapter and provide suggestions for
future research directions in Sect. 8.6.

8.2 Background on COP


In this section, a brief background is provided on COPs, including the CHT used in
this chapter, and locating feasible regions.

8.2.1 Epsilon-Level Constraint Handling (ELCH)


In this subsection, a CHT which has been used in our proposed method is described.
It is called epsilon-level constraint handling (ELCH) (Takahama and Sakai 2010)
which belongs to the penalty functions category. In this technique, the constraint
violation value for solution x is defined as follows:
G (x) =

q


max{0, gi (x)} +

i=1

m


|hi (x)|k

(8.5)

i=1

where k is a constant (in all of the experiments represented in this paper, k = 2).
Each solution x is represented by the pair (f , G) where f is the objective value at x
and G is its constraint violation value. If f1 and f2 are the objective values and G1 and
G2 are constraint violation values of the solution points x1 and x2 , then the level
comparison operator is defined as follows:

x1 x2

f1 f 2
G1 G2

if G1 , G2 or G1 = G2
otherwise

(8.6)

In other words, the -level comparison compares two solutions by constraint violation
value first. If both solutions have a violation value under a small threshold , or they
have the same level of violation, the two solutions are then ranked by the objective
function value only. Otherwise, the constraint violation value is taken into account.
There are some techniques to control the value of (Takahama and Sakai 2005).

8 Locating Potentially Disjoint Feasible Regions

211

8.2.2 Locating Feasible Regions in COPs


There have not been many attempts so far to design algorithms that locate feasible
regions. However, designing algorithms for locating feasible regions (ideally all of
them) in COPs is valuable as it reduces the probability of locating feasible regions
with poor quality solutions, in terms of objective value. Several multi-start methods
(e.g. Bonyadi et al. 2013; Jabr 2012; Lasdon and Plummer 2008; Smith et al. 2013)
have been proposed to locate feasible regions in COPs. Normally, these methods start
with a set of random points and improve them to find a feasible point. As an example,
in Lasdon and Plummer (2008), a multi-start nonlinear programming (MSNLP) was
proposed. In this method a set of random points is generated within the search space.
Then, the points are filtered according to two filters, a merit filter and a distance
filter. The merit filter ensures that the quality of the points in terms of constraint and
objective values is higher than a predefined threshold. The point that does not meet
this level of quality is filtered. The distance filter ensures that the generated points
are sufficiently diverse. In fact, a hyper-sphere neighbor of the points is evaluated to
find if two points are close to each other. Accordingly, some of the points are filtered.
An algorithm based on Constraint Consensus (CC) was proposed to identify areas
that may contain a feasible region (Smith et al. 2013). In this method, a certain number
of points are generated randomly in the search space. Then for each point, by using
the gradient of the violated constraints, a vector is generated which moves that point
to a new location. It is expected that the new location is closer to one of the feasible
regions. After moving all points, a clustering method is used to group the points
based on their distances from each other. At the end, the best point in each cluster, in
terms of its objective value if the point is feasible or in terms of its constraint violation
value if the point is not feasible, is selected as the representative of a feasible region.
A multi-start genetic algorithm with a local search was proposed to locate feasible
regions in the search space (Jabr 2012). In this method, a GA was run to generate
solutions which are in a predefined threshold of constraint violation value, defined by
the weighted sum of the value of all constraint corresponding to each solution. The
results from GA were then improved by a local search method in terms of objective
value. With the aim of generating different feasible solutions, GA method was run
several times, each time with a new seed, crossover and mutation rate.
A multi-start PSO was proposed by the authors of this chapter (Bonyadi et al.
2013). In that paper, a PSO was proposed that used ELCH to handle the constraints.
Also, a method based on the covariance matrix adaptation evolutionary strategy
(CMA-ES) was proposed, which used the same technique to handle the constraints.
Experiments showed that PSO has better performance in finding feasible solutions
while CMA-ES performs better in optimizing the objective value. Thus, a hybrid
method was proposed which runs PSO to find the first feasible solution and then that
solution was improved by CMA-ES. To prevent PSO from finding a poor-quality
feasible region, a multi-start strategy was proposed in which several instances of
PSO were run to generate different feasible solutions. Then the best among those
solutions were fed into CMA-ES for further improvement.

212

M.R. Bonyadi and Z. Michalewicz

8.3 Background on PSO


In this section, some background on PSO including variants, known issues, different
topologies, niching abilities, and abilities in dealing with COPs is given.

8.3.1 Standard Variant of Particle Swarm Optimizer


The Particle Swarm Optimization (PSO) (Kennedy and Eberhart 1995) algorithm is
a population-based algorithm, referred to as swarm, of n > 1 particles; each particle
is defined by three D-dimensional vectors:
Position (xti )is the position of the ith particle in the tth iteration. This is used to
evaluate the particles quality.
Velocity (vti )direction and length of movement of the ith particle in the tth iteration.
Personal best (pit )is the best position3 that the ith particle has visited in its
lifetime (up to the tth iteration). This vector serves as a memory for keeping
knowledge of quality solutions (Kennedy and Eberhart 1995).
All of these vectors are updated at every iteration t for each particle (i):


i
= xti , vti , Nti , for i = 1, . . . , n
vt+1

(8.7)



i
i
, for i = 1, . . . , n
xt+1
= xti , vt+1

(8.8)


pit f ptt f xt+1
, for i = 1, . . . , n
= i
xt+1
otherwise


pit+1

(8.9)

In Eq. 8.7, Nti (known as the neighbor set of the particle i) is a subset of personal
best positions of some particles which contribute
updating rule of
to the velocity
that particle at iteration t, i.e. Nti = pkt k Tti {1, 2, . . . , n} where Tti is a
set of indices of particles which contribute to the velocity updating for particle i
at iteration t. Clearly, the strategy of determining Tti might be different for various
types of PSO algorithms and it is usually referred to as the topology of the swarm.
Many different topologies have been defined so far (Kennedy and Mendes 2002),
e.g., the global best topology (gbest), the ring topology, the nonoverlapping, and
the pyramid, that are discussed later in this paper. The function (.) calculates
the new velocity vector for the particle i according to its current position, current
velocity vti , and neighborhood set Nti . In Eq. 8.8, (.) is a function that calculates
3

In general, personal best can be a set of best positions, but all PSO types listed in this paper use
single personal best.

8 Locating Potentially Disjoint Feasible Regions

213

the new position of


the particle
i according to its previous position and its new

i
i
= xti + vt+1
is accepted for updating the position of
velocity. Usually xti , vt+1
particle i. In Eq. 8.9, the new personal best position for the ith particle is updated
according to the objective values of its previous personal best position and the current
position. In the rest of this paper, these usual forms for the position updating rule
(Eq. 8.8) and for updating the personal best (Eq. 8.9) are assumed. In PSO, three
updating rules (Eqs. 8.7, 8.8, and 8.9) are applied to all particles iteratively until a
predefined termination criterion, e.g., the maximum number of iterations, is met.
In the original version of PSO (Kennedy and Eberhart 1995), the function (.) in
Eq. 8.7 is defined as




i
i
i
pit xti + 2 R2t
gt xti
= vti + 1 R1t
vt+1

(8.10)

In this equation, 1 and 2 are two real numbers called acceleration coefficients,4
and pit and gt are the personal best (of particle i) and the global best vector, respectively, at iteration t. Also, the role of the vectors PI = pit xti (Personal Influence)
and SI = gt xti (Social Influence) is to attract the particles to move toward known
quality solutions, i.e., personal and global best. Moreover, R1t and R2t are two d d
diagonal matrices,5 where their elements are random numbers distributed uniformly
(U (0, 1)) in [0, 1]. Note that matrices R1t and R2t are generated at each iteration
for each particle separately.
In 1998, Shi and Eberhart (1998) introduced a new coefficient , known as inertia
weight, to control the influence of the last velocity value on the updated velocity.
Indeed, Eq. 8.10 was written as




i
i
i
pit xti + 2 R2t
gt xti
= vti + 1 R1t
vt+1

(8.11)

The coefficient controls the influence of the previous velocity on movement. The
iterative application of Eq. 8.11 (plus position updating) causes the particles to oscillate around personal and global best vectors (Clerc and Kennedy 2002). This oscillation is controlled by three parameters , 1 , and 2 so that the larger is, with
respect to 1 and 2 , the more explorative the particles are, and vice versa. In this
chapter, this variant is known as the standard PSO. In the standard PSO, if the random matrices are replaced by random values, the new variant is called the linear PSO
(LPSO).
There are several well-studied issues in the standard PSO, such as stagnation
(Bergh and Engelbrecht 2002, 2010), line search (Spears et al. 2010; Wilke et al.
2007a), swarm size (Bergh and Engelbrecht 2002, 2010), local convergence (Bergh
4

These two coefficients control the effect of personal and global best vectors on the movement
of particles and they play an important role in the convergence of the algorithm. They are usually
determined by a practitioner or by the dynamic of particles movement.
5 Alternatively, these two random matrices are often considered as two random vectors. In this case,
the multiplication of these random vectors by PI and SI is element-wise.

214

M.R. Bonyadi and Z. Michalewicz

and Engelbrecht 2010), and rotation variance (Spears et al. 2010; Wilke et al. 2007b).
Apart from these issues within PSO, there have been some attempts to extend the
algorithm to work with COPs (Liang et al. 2010; Paquet and Engelbrecht 2007;
Takahama and Sakai 2005), to support niching6 (Brits et al. 2002, 2007; Engelbrecht
et al. 2005; Li 2010), to work effectively with large-scale problems (Helwig and
Wanka 2007), and to work in nonstationary environments (Wang and Yang 2010).

8.3.2 Issues in PSO


One of the issues in the standard PSO was as follows: if the acceleration coefficients
and inertia weight in the algorithm are set to inappropriate values, the velocity vector
might grow to infinity; or, in other words, there might be a swarm explosion. A swarm
explosion results in moving particles to infinity, which is not desirable (Clerc and
Kennedy 2002). One of the early solutions for this issue was to restrict the value of
each dimension of the velocity in a particular interval [Vmax , Vmax ], where Vmax
can be considered as the maximum value of the lower bound and upper bound of
the search space (Helwig and Wanka 2007); this is known as the nearest strategy.
Also, there are some other strategies to restrict the velocity in such a way that the
swarm explosion is prevented, e.g., the nearest with turbulence, random. However,
none of these strategies is comprehensive enough to prevent the swarm explosion
effectively in all situations (see Helwig and Wanka (2007) for details). Thus, many
researchers theoretically analyzed the behavior of the particles to find the reasons
behind the swarm explosion from different points of view (Clerc and Kennedy 2002;
Trelea 2003; Bergh and Engelbrecht 2006). The aim of these analyses was to define
criteria for the acceleration coefficients such that particles converge to a point in the
search space. One of the earliest attempts of this sort was made in Clerc and Kennedy
(2002) where a constriction coefficient PSO (CCPSO) was proposed. The authors
revised the velocity updating rule to:





i
= vti + c1 R1t pit xti + c2 R2t gt xti
vt+1
=

2k 


/

2c c2 4c

(8.12)
(8.13)

is called the constriction factor and it is proposed to set its value by Eq. 8.13.
Also, c = c1 + c2 > 4. Note that this notation is algebraically equivalent to that in
Eq. 8.11. The authors proved that if these conditions hold for the constriction factor,
particles converge to a stable point and the velocity vector does not grow to infinity.
The values of c1 and c2 are often set to 2.05 and the value of k is in the interval

Niching is the ability of the algorithm to locate different optima rather than only one local optima.
The niching concept is used usually in the multi-modal optimization.

8 Locating Potentially Disjoint Feasible Regions

215

[0, 1] (usually set to 1). Note that with these settings, the value of is in the interval
[0, 1]. This analysis was also done from other perspectives by Trelea (2003), Bergh
and Engelbrecht (2006).
Although the constriction coefficient guarantees converging the particles to a
point (a convergent sequence), there is no guarantee that this final point is a quality
point in the search space (Bergh and Engelbrecht 2006). In Bergh and Engelbrecht
(2010), it has been proven that for any c1 and c2 that satisfy converging conditions,
all particles collapse to the global best gt , i.e. limt xti = pit = gt for all particles.
Also, if gt = pit = xti for all particles, the velocity vector shrinks very fast. In this
situation, i.e., gt = pit = xti for all particles and at the same time vti = 0, all particles
stop moving and no improvement can take place as all components for moving the
particles are zero. This issue is known as stagnation, and was first introduced as a
defect in the standard PSO (Bergh and Engelbrecht 2002) and further investigated
by Bergh and Engelbrecht (2010). This issue exists in both LPSO and CCPSO. A
variant of PSO was proposed (called Guaranteed Converging PSO, GCPSO) which
addressed the stagnation issue. The only difference between GCPSO and CCPSO
was in updating the velocity of the global best particle (the particle that its personal
best is the current global best of the swarm).

i
=
vt+1

x
ti + gt +
vti +


vti + c1 Rti pit xti + c2 Rti gt xti

if i = t
otherwise

(8.14)

where t is the index of the particle which its personal best is the global best of the
swarm, i.e., (gt = pt t ), and is a randomly generated through and adaptive approach
(Bergh and Engelbrecht 2010). Note that, according to this formulation, stagnation
might still happen for all particles except for the global best particle. Hence, if the
global best particle is improved, gt is improved, which causes the other particles to
get out of the stagnation situation. See Bonyadi and Michalewicz (2014) for more
information.

to LPSO

that is
exclusive
is called line search (Wilke et al. 2007a);

Another issue
if pit xti || gt xti and vti || pit xti , the particle i starts oscillating between its
personal best and the global best (line search) forever. In this case, only the solutions
that are on this line are sampled by the particle i and other locations in the search space
are not examined anymore. Wilke showed that this is not the case in the standard
PSO (Wilke et al. 2007a); however, there are some situations where the particles
in the standard PSO start oscillating along one of the dimensions while there is no
chance for them to get out of this situation (Bonyadi 2014; Spears et al. 2010; Bergh
and Engelbrecht 2010). Note that GCPSO does not have this issue.
Stagnation happens with a higher probability when the swarm size is small (Bergh
and Engelbrecht 2002); this is called the swarm size issue throughout this chapter.
In Bergh and Engelbrecht (2002), the authors argued that PSO is not effective when
its swarm size is small (2 for example), and particles stop moving in the earlier
stage of the optimization process. To address this issue, a new velocity updating rule
was proposed that was only applied to the global best particle to prevent it from
becoming zero. Consequently, the global best particle never stops moving which

216

M.R. Bonyadi and Z. Michalewicz

solves the stagnation issue and, as a result, the swarm size issue is addressed as well.
Experiments confirmed that, especially in single modal optimization problems, the
new algorithm is significantly better than the standard version when the swarm size
is small (with 2 particles). Note that, in LPSO, apart from the stagnation issue, the
line search issue is reason why the algorithm becomes ineffective when the swarm
size is small.

8.3.3 Topology in PSO


There are many different topologies that have been introduced so far for PSO
(Kennedy and Mendes 2002). One of the well-known topologies is called gbest
topology. In this topology, the set Tti contains all particles in the swarm, i.e.,
Tti = {1, 2, . . . , n}. As an example, the standard PSO uses this topology as
used for the velocity updating rule and gt = pt t where
in each iteration,
gt
is
l
t = argminlT i F pt . It has been shown that when this topology is used, the
t
algorithm converges rapidly to a point (Kennedy and Mendes 2002). The reason
behind this rapid convergence is that all particles are connected7 to each other, and
hence, they all tend to converge to the best ever found solution.
Another well-known topology is called the ring topology, where the set Tti contains
{i, i 1, i + 1} (it is assumed that the particles are in a fixed order during the run).
In fact, each particle is connected to two other particles that are the previous and the
next particles. Also, if i + 1 was larger than n (swarm size), it is replaced by 1, and if
i 1 < 1, it is replaced by n. The velocity updating rule for this topology is written
as






i
vt+1
(8.15)
= vti + c1 R1t pit xti + c2 R2t lb it xti

where lb it is the best ever found solution by the particles i, i 1, and i + 1, i.e.

i
i
lb t = pt t where ti = argminlT i F plt . It has been shown that if the algorithm
t
uses the ring topology, it requires more iteration for exploration in comparison to the
gbest topology, thereby resulting in better explorative behavior.
Another topology that is used in this chapter is called nonoverlapping topology.
In this topology, the particles in the swarm are divided into several sets (called
i
sub-swarms)
that are independent of each other. In fact, if we define the set st =

{i} Tti , in any nonoverlapping topology, there exists at least one particle i that
j
for all j as a member of {1, 2, . . . , n} sti , the intersection
of sti and st is empty, i.e.



j
i {1, 2, . . . n} j {1, 2, . . . , n} sti
sti st = . Note that, in this case,
the gbest topology is a special case of nonoverlapping topology because for all i, the


j
{1, 2, . . . , n} sti is empty and, consequently, st is also empty. This means that
set




j
sti st = for any j {1, 2, . . . , n} sti . If the size of Tti is the same for all i,
7

A particle i is connected to particle j if it is aware of the personal best location of the particle j.

8 Locating Potentially Disjoint Feasible Regions

217

we show the topology by the notation nvl where l is the size of each sub-swarm.
Thus, the gbest topology can be indicated by nvn.
There are other topologies (e.g., pyramid) and it is hard to review all of them.
Our review has been limited to the topologies that are used in the rest of the chapter.
For further information about topologies, the readers are referred to Kennedy and
Mendes (2002).

8.3.4 Niching in PSO


Niching is a concept that has been introduced in multi-modal optimization. Niching
in multi-modal optimization refers to locating several (ideally all) optima (including
local and global optima) of a function. An optimization algorithm is said to support
niching if it is able to locate different optima in the search space rather than finding
only one (Li 2010). There have been many attempts to adopt the PSO to support
niching (Brits et al. 2007; Engelbrecht et al. 2005; Li 2010). As an example, in
Engelbrecht et al. (2005), the authors analyzed the performance of the PSO when the
gbest or ring topology is taken into account. In the gbest topology, results showed
that only one optimum is located at each run of the algorithm. This was expected
as all particles converge to (the convergence sequence, see Sect. 8.3.2), which is not
desirable for niching. In addition the capabilities of ring topology were investigated
experimentally so as to understand whether ring topology can satisfy niching aims.
Experiments with some standard functions led the authors to conclude that ring
topology is not an appropriate candidate for niching as well.
Niching is a concept that has been introduced in multi-modal optimization. Niching in multi-modal optimization refers to locating several (ideally all) optima (including local and global optima) of a function. An optimization algorithm is said to support niching if it is able to locate different optima in the search space rather than
finding only one (Li 2010). There have been many attempts to adopt the PSO to
support niching (Brits et al. 2007; Engelbrecht et al. 2005; Li 2010). As an example,
in Engelbrecht et al. (2005), the authors analyzed the performance of the PSO when
the gbest or ring topology is taken into account. In the gbest topology, results showed
that only one optimum is located at each run of the algorithm. This was expected
as all particles converge to gt (the convergence sequence, see Sect. 8.3.2), which is
not desirable for niching. In addition the capabilities of ring topology were investigated experimentally so as to understand whether ring topology can satisfy niching
aims. Experiments with some standard functions led the authors to conclude that ring
topology is not an appropriate candidate for niching as well.
A multi-swarm approach called NichePSO (Brits et al. 2007) was proposed in
which multiple sub-swarms were run to locate different local optima. Sub-swarms
could merge or exchange particles with one another. Also, in the NichePSO, whenever
the improvement in a particles fitness over some number of iterations (a parameter)
was small, a sub-swarm was created within that particles neighbor to assist that
particle in improving the solution.

218

M.R. Bonyadi and Z. Michalewicz

Ring topology in CCPSO was further investigated to find if it is effective for


niching (Li 2010). The author found that a CCPSO algorithm which uses the ring
topology can operate as a niching algorithm because of the particles personal bests.
In fact, the personal best of each particle forms a stable network retaining the best
positions found so far, while these particles explore the search space more broadly
by changing their position. Also, it was concluded that by using a reasonably large
population, CCPSO algorithm which uses the ring topology is able to locate dominant
niches (optima) across the search space. This means that particles locate niches that
are fairly similar in terms of their objective value. However, if the aim of the algorithm
is to locate the local optima that are less dominant, a nonoverlapping topology is a
good candidate. Results showed that a nonoverlapping topology with 2 or 3 particles
(i.e., nv2 or nv3) in each sub-swarm is significantly better than other topologies when
the number of dimensions is small (up to 8 dimensions). Although the performance
of these topologies is good with a small number of dimensions, their performance
was impaired much faster than other topologies in locating optima as the number of
dimensions grew. In fact, based on experiments, nv2 and nv3 were the worst among
other tested methods when the number of dimensions was larger than 8.

8.4 Proposed Approach


In this chapter, a PSO method is proposed, which is able to locate feasible regions
in COPs. The niching concept in multimodal optimization is adopted for locating
feasible regions in COPs. The proposed approach has two main parts:
1. The issues of PSO with nonoverlapping topology in niching are investigated in
detail. A new PSO (called mutation linear PSO, MLPSO) is proposed, which
addresses the issues of the nonoverlapping topology in niching (see Sect. 8.3).
2. A new PSO based on MLPSO (called EMLPSO) is proposed, which can locate
feasible regions.

8.4.1 Locating Different Local Optima (Niching)


As discussed earlier, CCPSO with nonoverlapping topology with a small number
of particles in each sub-swarm is highly effective for niching purposes (locating
different optima in the search space), when the number of dimensions is small.
However, it rapidly becomes ineffective as the number of dimensions grows. On the
other hand, it has been shown that most PSO algorithms, including CCPSO, with
small population size are not effective for optimization, because of stagnation and
line search issues (recall that this issue was known as swarm size issue, see Sect. 8.3).
Thus, it is natural to claim that if the swarm size issue is addressed, the nonoverlapping
topology with small sub-swarms becomes effective for niching purposes even if the

8 Locating Potentially Disjoint Feasible Regions

219

number of dimensions grows. We propose a mutation operator which is applied


to the velocity updating rule of LPSO (the new algorithm is called MLPSO) and
can address stagnation and line search issues. As these two issues are the reasons
behind the swarm size issue, we expect that MLPSO does not suffer from the swarm
size issue. The ability of MLSPO with small swarm size is examined through some
experiments. These experiments confirm that MLPSO is more effective than other
types of PSO when the swarm size is small. Then, in order to confirm that MLSPO
is effective in niching using nonoverlapping topology in higher dimensions, we test
the algorithm with this topology and compare its results with CCPSO with the same
topology defined in Li (2010).8

8.4.1.1 Vector Mutation


Consider an arbitrary vector d that connects the center of the coordinates to the point
d in the D-dimensional space. The proposed mutation operator is as follows:
d = m (d, c, )

(8.16)

where d is a vector that connects the center of the coordinates to the point d, m is the
mutation operator, c and are two constants. Obviously, for every vector d, there
are two elements that the operator m should mutate: direction and magnitude. One
can consider two different ideas to design m: (1) it rotates d by a random rotation
matrix to perturb its direction and multiplies that to a random number to perturb its
magnitude, and (2) it adds a normal distribution to the vector, which mutates both the
length and direction. In the first design (rotating and then mutating the magnitude),
we can write
(8.17)
d = m (d) = d
where is a rotation matrix and is a random scalar value. There are several ways
to design such as a Euclidean rotation equation (Ricardo and Prez-aguila 2004)


or an exponential map (Wilke et al. 2007b). However, both methods are in O D2
in terms of time complexity (see also Bonyadi (2014)).
The second design of the operator m can be written as
d = m (d) = d + N (0, )
8

(8.18)

Note that the GCPSO is another variant of PSO (introduced in Sect. 8.3) that does not have the
swarm size issue. However, it is not a good choice for niching using the nonoverlapping topology.
The reason is that, in GCPSO, the only particle which is able to move after stagnation is the global
best particle. All other particles stay unchanged until this particle is improved. As the global best
particle is only in one of the sub-swarms (the sub-swarms do not overlap with each other), this
particle cannot share its information (personal best) with particles in the other sub-swarms. Thus,
all other sub-swarms stay in the stagnation situation and only one of the sub-swarms may continue
searching. This leads to ineffective niching behavior, as only one of the sub-swarms converges to a
local optimum.

220

M.R. Bonyadi and Z. Michalewicz

where N is the multivariate normal distribution and is the vector of variances.


The larger the is, the more probably it can generate d farther from d (see also
Bonyadi and Michalewicz (2014)). As this calculation only needs the addition of
two D dimensional vectors, it is done in O(D) of time complexity. It is clear that
the second approach needs considerably less calculation. Thus, we use this design
(Eq. 8.18) for the mutation operator m.
In this chapter, the value of is calculated using the following equation:
for all j {1, . . . , D} j =


c ||N (0, )||
c ||d||

if 0 ||d|| <
otherwise

(8.19)

where ||.|| is the norm operator and c is a constant, is a small real number, is a
vector in which the value of all dimensions is equal to , N is the normal distribution.
If the length of the vector d is small, a random vector (N (0, )) is generated and
used for the calculations instead. The mutation operator that uses Eqs. 8.18 and 8.19
is shown by m (d, c, ).

8.4.1.2 Stagnation, Line Search, and Swarm Size Issues


In this subsection, we propose a new variant of the linear PSO, which addresses
stagnation and line search issues. Also, we experimentally show that the proposed
algorithm addresses the swarm size issue as well.
As discussed earlier, the appropriate setting of constriction coefficients guarantees
convergence of the particles to a solution in the search space, but not necessarily to
a quality solution. This results in stagnation in the algorithm, i.e., all particles stop
moving while the quality of the found solution is not satisfactory. In this chapter, it
is proposed to use the introduced vector mutation to guarantee that particles do not
stop moving (this variant is called the mutation linear PSO, MLPSO). In fact, the
velocity updating rule of LPSO is revised as follows:





 
i
= m vti + c1 rti pit xti + c2 rti gt xti , ti , ti
vt+1

(8.20)

The parameters , c1 , and c2 are exactly the same as the ones in CCPSO, while r1t
and r2t are two random values rather than random matrices. Note that in this variant
of LPSO, we have used CCPSO model (defined in Eq. 8.12); however, any other type
of PSO can be used instead. If the values of ti and ti are guaranteed to be nonzero,
i
is always nonzero (these parameters are investigated later in this subsection).
vt+1
Thus, the stagnation issue is addressed, i.e., there is no stagnation
Also, as

anymore.
i , the condition vi || pi xi is violated,
the mutation m changes the direction of vt+1
t
t
t
which implies that the line search issue does not exist in this variant of LPSO. We
propose an adaptive approach to set the value of ti , which has been inspired by Bergh
and Engelbrecht (2002, 2010) with some modifications. In this adaptive approach,
the value of ti for a particle i at the time t is calculated by:

8 Locating Potentially Disjoint Feasible Regions

i
t+1

i
2

t i
0.5t
=
2 i

it
t

if sti > s and ti < max 


if fmin < fti < fmax and vti  < ti
if fti > fmax and ti < max and mod (t, q) = 0
otherwise

221

(8.21)

where sti (fti ) is the number of successive iterations at the current iteration t that the
personal best of the particle i has been (has not been) improved by at least impmin
percent; this value was set to 105 in all experiments. At each iteration, if the personal
best of the particle i was improved, sti is increased by one and fti is set to 0 and if
it was not improved, fti is increased by one and sti is set to 0. If sti was larger than
the constant s (set to 10 in all experiments), the value of ti is multiplied by 2. This
multiplication, which grows the value of ti , takes place to give the algorithm the
opportunity to sample further locations and improve faster. Also, if fti was larger
than fmin and smaller than fmax , the value of ti is reduced to enable the algorithm
to conduct local search around current solutions and improve them. However, if the
value of fti was even larger than fmax , the strategy of controlling ti is reversed and ti
starts to grow. The idea behind this is that if the current solution is not improved for
a large number of successive iterations, the exploitation has been done and no better
solutions can be found in the current region. Thus, it is better to start jumping out
from the current local optima to improve the probability of finding better solutions.
According to Eq. 8.21, the value of ti is increased by a low rate (every q iterations)
in this situation (when fti is very large) to prevent the algorithm from jumping with
big steps. The values of max and min are set to 1 and 1e10 , respectively. Also, the
values of fmin and s are set to 10 as it was proposed in Bergh and Engelbrecht (2010),
fmax and q are set to 200 and 50, and 0i is set to 1 for all particles. We propose to
set the value of ti to D1z where z is a constant real value. Our experiments show that
z = 1.5 has acceptable performance in a wide range of optimization problems. Thus,
we use it = D11.5 in all experiments.
As was mentioned earlier, stagnation and line search are the main reasons behind
the swarm size issue in PSO. As the stagnation and line search issues have been
solved in MLPSO, it is very likely that the swarm size issue has been addressed. To
test if the swarm size issue has been solved, we apply MLPSO, LPSO, and CCPSO to
some standard benchmark functions (taken from CEC2005 (Suganthan et al. 2005))
when both algorithms use 2 particles (n = 2). Table 8.1 shows the results.
Each algorithm was run 20 times for 1000D function evaluations (FE) for D = 10
and D = 30. The results have been compared based on the averages over 20 runs and
the Wilcoxon test (Wilcoxon 1945) (with a significance level of p = 0.05), which
is used to measure the significance of the differences. It is obvious from the table
that the proposed MLPSO has a significantly better performance in 8 cases out of all
10 in comparison with LPSO and CCPSO when the swarm size is small (n = 2) for
the 10-dimensional cases. Also, it is worse than CCPSO in only 2 cases, although the
worst performance is not significant based on the Wilcoxon test. Also, MLPSO was
significantly better than LPSO in all cases when D = 10. When D = 30, MLPSO is

222

M.R. Bonyadi and Z. Michalewicz

Table 8.1 Comparison results between MLPSO and LPSO with small swarm size (n = 2)
Dimension 10
30
Algorithm
F1
F2
F3
F4
F5
F6
F7
F8
F9
F10

MLPSO
450LC
450LC
362588.8LC
59091.76L
6682.806LC
1492.037LC
172.326LC
119.746LC
244.852LC
167.377L

LPSO
30240.78
39143.76
1.54E+09
47408.93
26006.44
2.91E+10
1525.607
119.301
186.876
109.014

CCPSO
12259.54
14065.74
1.52E+08
22805.53
17362.1
6.7E+09
369.1611
119.553
233.761
193.809

MLPSO
450LC
445.696LC
4347140LC
622474.1LC
21284.27LC
2453.11LC
179.919LC
119.756LC
9.17035LC
442.103LC

LPSO
136525
717133.8
4.2E+09
682395.7
59100.9
1.55E+11
5673.608
118.796
249.8911
720.9524

CCPSO
87020.26
139933.9
1.86E+09
205955.2
39995.19
1.01E+11
4008.997
118.979
134.7559
465.4687

The best results have been shown in bold

significantly better than CCPSO and LPSO in all cases. These results confirm that
the proposed method works better than LPSO and CCPSO when the swarm size is
small.

8.4.1.3 Niching Ability of MLPSO


It has been shown that the nonoverlapping topology (in CCPSO) with 2 or 3 particles
in each sub-swarm shows good potential to locate different local optima (Li 2010).
However, it becomes very ineffective when the number of dimensions grows above 8.
We claim that the issue actually stemmed from the swarm size issue. As we have
addressed the swarm size issue in MLPSO, we expect to see that the algorithm with
the nonoverlapping topology with small number of particles in each sub-swarm is
more effective in locating different local optima. In the following experiment, we test
the ability of MLPSO to locate different local optima when it uses the nonoverlapping
topology with a small number of particles in each sub-swarm. We designed a test
function for this purpose (called six circles) as follows:
f (x) = min (C1 , C2 , C3 , C4 , C5 , C6 )

(8.22)




2
1)2 0.25, C3 = D
where C1 = D
(x 1.5)2 1, C2 = D
i=1
i=1 (xi +
i=1 (xi + 3)
D i
D
2
2
5
5
0.0625, C4 = i=1 (xi + 2) + 10 , and C5 = i=1 (xi 3.5) + 10 , C6 =
D
2
5
i=1 (2xi ) + 10 . The objective function (f (x) versus x) has been shown in
Fig. 8.3 in one/two dimensional case.
It is clear that the function has six optima (at x = 3, x = 2, x = 1, x = 0,
x = 1.5, and x = 3.5). We apply MLPSO and CCPSO to the six circles function
with two different topologies: nv2 and nv4. In this test we set the maximum number
of FEs to 3000D and D = {2, 5, 10, 15, 20, 25, 30, 40, and 50}. After each run,

8 Locating Potentially Disjoint Feasible Regions

g(x)

(a)

223

(b) 5

50

0.8

45

0.6

40

0.4

35

0.2

30

25

-0.2

-1

20

-0.4

-2

15

-0.6

-3

10

-0.8

-4

-1
-4

-5
-3

-2

-1

-5

Fig. 8.3 The six circles function in a one dimensional, b two dimensional spaces

Average number of local optima

we evaluated the personal bests of all particles to find how close they are to the
different local optima of the objective function. We consider a personal best of a
particle i (pit ) has located a local optimum if the mean square of error over all
dimensions of pit form that local optimum is less than 0.05. We set n = 20 for this
test. Figure 8.4 shows the average results over 20 runs.
The performance of MLPSO is inferior to CCPSO in both topologies when the
number of dimensions is small (two-dimensional problems). The reason is that when
MLPSO is used, most of the sub-swarms converge to the global optimum of the six
circles function (x = 1.5 in all dimensions) and, hence, the number of located local
optima drops. However, when the number of dimensions grows, MLPSO with both
topologies outperforms CCPSO in terms of the found number of local optima. Also,
the nv2 topology performs more effectively (in terms of locating local optima) than
the nv4 topology in MLPSO. The reason behind this phenomenon is that we have

MLPSO (n-v2)

CCPSO (n-v2)

MLPSO (n-v4)

CCPSO (n-v4)

4.5
4
3.5
3
2.5
2
1.5
1
0.5
0

10

15

20

25

30

40

50

Number of dimensions (D)

Fig. 8.4 Comparison results of applying MLPSO and CCPSO to six circle function with nv2 and
nv4 topologies. The x axis is the number of dimensions and y axis is the average number of found
local optima

224

M.R. Bonyadi and Z. Michalewicz

used 20 particles in all cases. Thus, the number of sub-swarms in the nv2 is greater
than the number of sub-swarms in the nv4. Hence, the number of located local optima
is less when the nv4 is used. In addition, the performance of MLPSO does not drop
when the number of dimensions grows.
Results presented in Fig. 8.4 confirm that MLPSO performs better than CCPSO
in locating different local optima. Note that this result was expected as MLPSO
outperforms CCPSO with small swarm size, hence, MLPSO with small sub-swarms
should outperform CCPSO with small sub-swarms. Also, the performance of MLPSO
does not drop when the number of dimensions grows.

8.4.2 Locating Feasible Regions


In this section we extend MLPSO to locate disjoint feasible regions. We incorporate
a modified version of ELCH (called MELCH) technique into MLPSO to enable
the method to handle constraints (this method is called EMLPSO). This method
(EMLPSO) is used to locate feasible regions in the search space. Also, the effect of
topology in this variant for locating feasible regions is tested through some experiments.

8.4.2.1 EMLPSO
In ELCH, the equality and inequality constraints were combined and a function called
constraint violation function appeared. Also, a level of desired constraint violation
(called ) was considered as the level of feasibility. The value of was reduced
linearly to zero during the optimization process. ELCH is modified by considering
this fact that equalities can be replaced by inequalities (Eq. 8.2). Hence, in ELCH,
we can modify the constraint violation function as follows:
G(x) =

m


max{0, gi (x)}k

(8.23)

i=1

where gi (x) for i = 1, . . . , q is the same as Eq. 8.1, while gi (x) is defined as
gi (x) = |hi (x)| for i = q + 1, . . . , m. Note that in this case, x is a feasible
solution if G (x) = 0. ELCH technique that uses Eq. 8.23 is called MELCH throughout this chapter. We incorporate MELCH technique into MLPSO algorithm (this is
called EMLPSO) to enable the algorithm to deal with constraints. Also, as MELCH
combines all constraints into one function; locating different local optima of this
function corresponds to locating disjoint feasible regions. Note that G(x) = 0 is
essential to count x as a local optima, as G(x) > 0 does not correspond to a feasible solution, which is not desirable. We test the ability of EMLPSO with different
topologies to locate disjoint feasible solutions in the next subsection.

8 Locating Potentially Disjoint Feasible Regions

225

8.4.2.2 Effects of Topologies in EMLPSO


In order to test the ability of EMLPSO with different topologies to locate feasible
regions in the search space, we designed a test function as follows:
f (x) =

D


(xi 1.5)2 subject to g (x) = min (C1 , C2 , C3 , C4 , C5 , C6 ) 0

i=1

(8.24)
where the definition of C1 to C6 is the same as that mentioned in Eq. 8.22. It is clear
that the function has three disjoint feasible regions (x = 1, x = 1, and x = 3) in
which g (x) 0 (feasible regions). However, there are three trap regions (x = 2,
x = 0, and x = 2) where values of g (x) reduce rapidly to 105 . Because the value
of g (x) at these points is larger than 0, these solutions are not feasible (see Fig. 8.5).
We test the ability of EMLPSO with different topologies (gbest, ring, and nonoverlapping) to deal with this function. For the nonoverlapping topology, we test the algorithm with nv6, nv4, nv3, and nv2, i.e., 6, 4, 3, and 2 particles in each sub-swarm. In
this test we set the maximum number of function evaluations (FE) to 3000D/n and
D = 10 and D = 30. Also, we set n = 12 to ensure that the swarm size is divisible
by 2, 3, 4, and 6. Table 8.2 shows the average of the results over 100 runs. The row
satisfaction is the percentage of the runs where a feasible solution was found (e.g.,
EMLPSO with ring topology has found a feasible solution in 76 % of all runs). The
row No. of feasible regions (Avr) is the average number of feasible regions that was
located by the personal bests of the particles in the swarm on average over all runs
(e.g. EMLPSO with ring topology found 1.18 over all three existing feasible regions
on average). The row locating optimal region (%) indicates the percentage of the
runs where the algorithm has found a feasible solution in the optimal region (in this
example, the region around x = 1.5). Comparing the results, it is clear that EMLPSO
with nonoverlapping topology with 2 particles in each sub-swarm (nv2) has the best
performance in satisfying the constraints (100 %), locating different feasible regions

(a)
5

160

(b) 5
80

4
140

70

120

60

100

50

80

-1

60

-2

40

-1
30

-2
40

-3
-4

20

-4

-5
-5

-5
-5

20

-3

10
0

Fig. 8.5 The contour of the function introduced in Eq. 8.24, a the objective values, and b objective
values in the feasible space

226

M.R. Bonyadi and Z. Michalewicz

Table 8.2 Comparison of different topologies in EMLPSO for solving COP defined in Eq. 8.24
where D = 10 and D = 30
D
Topology
Gbest
Ring
Nonoverlapping
nv6
nv4
nv3
nv2
10

30

Satisfaction (%)
No. of feasible regions (Avr)
Locating optimal region (%)
Satisfaction (%)
No. of feasible regions (Avr)
Locating optimal region (%)

58
1
23
61
1
24

76
1.17
26
77
1.18
27

78
1.27
28
77
1.26
31

95
1.4
41
88
1.48
42

96
1.65
53
98
1.6
50

100
2.06
58
100
2.14
73

(2.06 feasible regions in average over all 3 existing regions), and finding the optimal
region (58 % of runs). Note that the last two measures (average of feasible solutions and percentage of locating optimal solution) are interrelated since the ability of
the methods to find feasible regions improves the probability of finding the optimal
region. It is also clear that the results in 30-dimensional space confirm the results
of 10-dimensional space. Thus there is a better performance in locating different
feasible regions when there are several small sub-swarms and a better performance
in improving the final solutions when there are few large sub-swarms.

8.5 Experimental Results


We compare EMLPSO, CCPSO, and CC methods in locating disjoint feasible
regions. The test problems that were introduced in Smith et al. (2013) are used
for this comparison. The specifications (i.e., equation, boundaries, and number of
disjoint feasible regions) of these problems are reported in Table 8.3. EMLPSO, CC,
and CCPSO were applied (CCPSO was combined with MELCH to be able to handle

Table 8.3 The test functions used for the next experiments
Functions Equation
Boundaries

2
5.1x 2
g1 (x) = x2 4 21 + 5x1 6 +


10
10 8
cos (x1 ) + 9
12
g2 (x) = x2 + x11.2
2
2
Rastrigin1 g1 (x) = x1 + x2 + 20
20 (cos (2 x1 ) + cos (2 x2 ))
g2 (x) = x2 x13


Schwefel1 g1 (x) = x1 sin |x1 | +


x2 sin |x2 | + 125
1 2
g2 (x) = x2 16
x1 + 150
Branin1

5 x1 10,

No. of disjoint feasible


regions
3

0 x2 15
5 x1 5,
5 x2 5
150 x1 150,
150 x2 150

36

8 Locating Potentially Disjoint Feasible Regions

227

Table 8.4 Results of applying EMLPSO, CCPSO, and CC to three 2-dimensional COPs to locate
their feasible regions
Branin1
Rastrigin1
Schwefel1
EMLPSO
CCPSO+MELCH
CC

3/50
2.4/99
3/50

20.4/90
17.1/192
16/50

5.5/50
2.9/110
3/50

The table reports the averages of number of found feasible disjoint regions/needed FE over 20 runs

the constraints) to these problems. The PSO methods used nv2 topology with 50
particles, because CC method uses 50 initial solutions. The maximum number of FE
was also set to 3000*D. Table 8.4 shows the average results over 20 runs of each
method.
Figure 8.6 shows the feasible regions of all three functions and the personal bests
of the particles after finding the feasible regions.
Clearly, Branin1 function (Fig. 8.6a) contains 3 similar size disjoint feasible
regions fairly scattered over the search space. This makes the problem relatively
easier to solve for the stochastic methods (such as EMLPSO). Also, reported results

15

(a)

(b)

4
3
2

10

1
0
-1
5

-2
-3
-4

0
-5

-5
0

10

-5

-4

-3

-2

-1

(c) 150
100
50
0
-50
-100
-150
-150

-100

-50

50

100

150

Fig. 8.6 A particular run of EMLPSO to locate disjoint feasible regions of a Branin1, b Rasterigin1,
and c Schwefel1. The red areas are feasible regions/the gray areas are infeasible regions, and white
dots are the personal best of the particles

228

M.R. Bonyadi and Z. Michalewicz

in Table 8.4 shows that the proposed EMLPSO was located all feasible regions for
Branin1 function.
Rastrigin1 (Fig. 8.6b) contains 36 disjoint feasible regions with many different
sizes. Some of these regions are very small which makes it harder to locate them. In
this test problem, the proposed EMLPSO has located 20.4 (in average) number of
feasible regions over all 36. Compared to other listed methods, EMLPSO has located
more number of regions in average.
Schwefel1 (Fig. 8.6c) function contains 6 disjoint feasible regions in the different
sizes. Two of these regions are hard to locate as they has been surrounded by two larger
feasible regions. In fact, the methods tend to move the solutions toward these larger
regions rather than the smaller ones in between. However, the proposed EMLPSO
could locate 5.5 regions over all 6 regions (in average) while the other methods, CC
and CCPSO+MELCH, have located 3 and 2.9 feasible regions in average.

8.6 Conclusions and Future Work


Feasible regions in a constrained optimization problem (COP) might have an irregular
shape, e.g., many disjointed regions or regions connected with narrow passages. The
quality of the solutions in each feasible region might be different and the optimal
solution might be in any of these regions. Hence, locating feasible regions, and as
many of these as possible, is of great value. In this chapter, we used the idea of
niching (locating different local optima) in a multi-modal optimization to locate
feasible regions in the COPs. One of the successful algorithms for niching is PSO
with a special type of topology called a nonoverlapping topology. However, existing
studies have shown that PSO with this topology is effective in locating local optima
when the number of dimensions is small (up to 8). We proposed a new PSO (called
mutation linear PSO, MLPSO) which is effective in locating local optima (niching) in
functions with a higher number of dimensions. The abilities of MLPSO in locating
local optima with up to 50 dimensions were tested through some experiments. In
order to locate feasible regions, a constraint handling technique was incorporated
into MLPSO and the new method was called epsilon MLPSO, EMLPSO. EMLPSO
was applied to some COPs and several different topologies of the method were
compared in terms of locating feasible regions. Results showed that EMLPSO with
the nonoverlapping topology with a small number of particles in each sub-swarm
is effective in locating feasible regions. As a future work, it is worthwhile to apply
EMLPSO on more benchmark constraint optimization functions and analyze its
performance in dealing with different COPs.
Acknowledgments This work was partially funded by the ARC Discovery Grants DP0985723,
DP1096053, and DP130104395, as well as by the grant N N519 5788038 from the Polish Ministry
of Science and Higher Education (MNiSW).

8 Locating Potentially Disjoint Feasible Regions

229

References
Bonyadi MR, Michalewicz Z (2014) A locally convergent rotationally invariant particle swarm
optimization algorithm. Swarm Intell 8(3):159198
Bonyadi MR, Li X, Michalewicz Z (2013) A hybrid particle swarm with velocity mutation for
constraint optimization problems. In: Genetic and evolutionary computation conference. ACM,
pp 18
Bonyadi MR, Michalewicz Z, Li X (2014) An analysis of the velocity updating rule of the particle
swarm optimization algorithm. J Heuristics 20(4):417452
Brits R, Engelbrecht AP, Van den Bergh F (2002) A niching particle swarm optimizer. In: 4th AsiaPacific conference on simulated evolution and learning, vol 2. Orchid Country Club, Singapore,
pp 692696
Brits R, Engelbrecht AP, Van den Bergh F (2007) Locating multiple optima using particle swarm
optimization. Appl Math Comput 189(2):18591883
Clerc M, Kennedy J (2002) The particle swarmexplosion, stability, and convergence in a multidimensional complex space. IEEE Trans Evol Comput 6(1):5873
Dantzig G (1998) Linear programming and extensions. Princeton University Press, Princeton
Engelbrecht AP, Masiye BS, Pampard G (2005) Niching ability of basic particle swarm optimization
algorithms. In: Swarm intelligence symposium. IEEE, pp 397400
Gilbert JC, Nocedal J (1992) Global convergence properties of conjugate gradient methods for
optimization. SIAM J Optim 2(1):2142
Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. AddisonWesley Publishing Company, Reading
Hansen N (2006) The CMA evolution strategy: a comparing review. In: Towards a new evolutionary
computation. Springer, Berlin, pp 75102
Helwig S, Wanka R (2007) Particle swarm optimization in high-dimensional bounded search spaces.
In: Swarm intelligence symposium. IEEE, pp 198205
Jabr RA (2012) Solution to economic dispatching with disjoint feasible regions via semidefinite
programming. IEEE Trans Power Syst 27(1):572573
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: International conference on neural
networks, vol 4. IEEE, pp 19421948
Kennedy J, Mendes R (2002) Population structure and particle swarm performance. In: Congress
on evolutionary computation, vol 2. IEEE, pp 16711676
Lasdon L, Plummer JC (2008) Multistart algorithms for seeking feasibility. Comput Oper Res
35(5):13791393
Li XD (2010) Niching without niching parameters: particle swarm optimization using a ring topology. IEEE Trans Evol Comput 14(4):150169
Liang JJ, Zhigang S, Zhihui L (2010) Coevolutionary comprehensive learning particle swarm optimizer. In: Congress on evolutionary computation. IEEE, pp 18
Michalewicz Z, Schoenauer M (1996) Evolutionary algorithms for constrained parameter optimization problems. Evol Comput 4(1):132
Paquet U, Engelbrecht AP (2007) Particle swarms for linearly constrained optimisation. Fundam
Inf 76(1):147170
Ricardo A, Prez-aguila R (2004) General n-dimensional rotations
Shi Y, Eberhart R (1998) A modified particle swarm optimizer. In: World congress on computational
intelligence. IEEE, pp 6973
Smith L, Chinneck J, Aitken V (2013) Constraint consensus concentration for identifying disjoint
feasible regions in nonlinear programmes. Optim Methods Softw 28(2):339363
Spears WM, Green DT, Spears DF (2010) Biases in particle swarm optimization. Int J Swarm Intell
Res 1(2):3457
Suganthan PN, Hansen N, Liang JJ, Deb K, Chen YP, Auger A, Tiwari S (2005) Problem definitions
and evaluation criteria for the CEC 2005 special session on real-parameter optimization. KanGAL
Report

230

M.R. Bonyadi and Z. Michalewicz

Takahama T, Sakai S (2005) Constrained optimization by constrained particle swarm optimizer


with -level control. Soft Comput Transdiscipl Sci Tech 10191029
Takahama T, Sakai S (2010) Constrained optimization by the constrained differential evolution
with an archive and gradient-based mutation. In: Congress on evolutionary computation (CEC).
IEEE, pp 19
Trelea IC (2003) The particle swarm optimization algorithm: convergence analysis and parameter
selection. Inf Process Lett 85(6):317325
Tsang E (1993) Foundations of constraint satisfaction, vol 289. Academic Press, London
Van den Bergh F, Engelbrecht AP (2002) A new locally convergent particle swarm optimiser. In:
Systems, man and cybernetics, vol 3. IEEE, pp 96101
Van den Bergh F, Engelbrecht AP (2006) A study of particle swarm optimization particle trajectories.
Inf Sci 176(8):937971
Van den Bergh F, Engelbrecht AP (2010) A convergence proof for the particle swarm optimiser.
Fund Inf 105(4):341374
Wang H, Yang S, Ip WH, Wang D (2010) A particle swarm optimization based memetic algorithm
for dynamic optimization problems. Nat Comput 9(3):703725
Whitley D, Gordon VS, Mathias K (1994) Lamarckian evolution, the Baldwin effect and function
optimization. Springer, Heidelberg, pp 515
Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):8083
Wilke DN, Kok S, Groenwold AA (2007a) Comparison of linear and classical velocity update rules
in particle swarm optimization: notes on diversity. Int J Numer Methods Eng 70(8):962984
Wilke DN, Kok S, Groenwold AA (2007b) Comparison of linear and classical velocity update rules
in particle swarm optimization: notes on scale and frame invariance. Int J Numer Methods Eng
70(8):9851008

Chapter 9

Ensemble of Constraint Handling


Techniques for Single Objective
Constrained Optimization
Rammohan Mallipeddi, Swagatam Das and Ponnuthurai
Nagaratnam Suganthan
Abstract Many optimization problems in science and engineering involve
constraints due to which the feasible region reduces and the search process gets
complicated. In addition, when evolutionary algorithms (EAs) are employed to solve
constrained optimization problems additional mechanisms referred to as constraint
handling techniques are required as EAs generally perform unconstrained search.
Generally, the performance of a constraint handling technique depends on its effectiveness in utilizing the information present in the infeasible individuals generated
during the evolution process. In the literature, a variety of techniques are developed
to exploit the information present in infeasible individuals. However, according to
the No Free Lunch (NFL) theorem, no single state-of-the-art constraint handling
technique can outperform all others on every problem. In other words, depending on
several factors, such as the ratio between feasible search space and the whole search
space, multi-modality of the problem, the chosen EA and global exploration/local
exploitation stages of the search process, different constraint handling methods can
be effective on different problems and during different stages of the search process.
Hence, solving a particular constrained problem requires numerous trial-and-error
runs to choose a suitable constraint handling technique and to fine-tune the associated parameters. The trial-and-error approach may be unrealistic in applications
where the objective function is computationally expensive or solutions are required
in real-time.In this chapter, we present an ensemble of constraint handling techniques
(ECHT) as an efficient alternative to the trial-and-error-based search for the best constraint handling technique with its best parameters for a given problem. Ensemble
R. Mallipeddi (B)
Kyungpook National University, 1370 Sangkyuk-Dong, 702 701 Puk-gu,
Daegu, South Korea
e-mail: mallipeddi.ram@gmail.com
S. Das
Electronics and Communication Sciences Unit Indian Statistical Institute,
203 B T Road, 700108 Kolkata, India
e-mail: swagatam.das@isical.ac.in
P.N. Suganthan
EEE, SS2-B2a-21, 639798 Ntu, Singapore
e-mail: epnsugan@ntu.edu.sg
Springer India 2015
R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,
Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_9

231

232

R. Mallipeddi et al.

being a general concept can be realized with any EA framework. In this chapter,
ECHT is combined with an improved differential evolution (DE) algorithm referred
to as EPSDE. EPSDE is an improved of DE version based on ensemble framework.
The performance of the proposed architecture is compared with the state-of-the-art
algorithms.
Keywords Constraint handling
problems

Ensemble method Single objective constraint

9.1 Introduction
Optimization is an intrinsic part of life and of human activity. For example,
manufacturers seek maximum efficiency in the design of their production processes,
investors aim at creating portfolios that avoid high risk while yielding a good return,
traffic planners need to decide on the level and ways of routing traffic to minimize
congestion, etc.
Classical optimization techniques make use of differential calculus, where it is
assumed that the function is differentiable twice with respect to the design variables,
and that the derivatives are continuous in locating the optimum solution. Thus, classical methods have limited scope in practical real-world applications as objective
functions are characterized by chaotic disturbances, randomness, and complex nonlinear dynamics and may not always be continuous and/or differentiable. Recently,
population-based stochastic algorithms such as evolutionary algorithms (EAs) are
well known for their ability to handle non linear and complex optimization problems. The primary advantage of EAs over other numerical methods is that they just
require the objective function values, while properties such as differentiability and
continuity are not necessary (Anile et al. 2005).
Many optimization problems in science and engineering involve constraints.
The presence of constraints reduces the feasible region and complicates the search
process. In addition, when solving constrained optimization problems, solution candidates that satisfy all the constraints are feasible individuals while individuals that
fail to satisfy any of the constraints are infeasible individuals. To solve constrained
optimization problems, EAs require additional mechanisms referred to as constraint
handling techniques. One of the major issues in constraint optimization using EAs
is how to deal with infeasible individuals throughout the search process. One way
to handle is to completely disregard infeasible individuals and continue the search
process with feasible individuals only. This approach may be ineffective as EAs are
probabilistic search methods and potential information present in infeasible individuals can be wasted. If the search space is discontinuous, then the EA can also
be trapped in one of the local minima. Therefore, different techniques have been
developed to exploit the information in infeasible individuals. In the literature, several constraint handling techniques are proposed to be used with the EAs (Coello

9 Ensemble of Constraint Handling Techniques

233

Coello 2002). Michalewicz and Schoenauer (1996) grouped the methods for handling
constraints within EAs into four categories: preserving feasibility of solutions (Koziel
and Michalewicz 1999), penalty functions, make a separation between feasible and
infeasible solutions, and hybrid methods. A constrained optimization problem can
also be formulated as a multi-objective (Wang et al. 2007) problem, but it is computationally intensive due to non-domination sorting.
According to the No Free Lunch theorem (Wolpert and Macready 1997), no single
state-of-the-art constraint handling technique can outperform all others on every
problem. Hence, solving a particular constrained problem requires numerous trialand-error runs to choose a suitable constraint handling technique and to fine-tune the
associated parameters. This approach clearly suffers from unrealistic computational
requirements in particular if the objective function is computationally expensive (Jin
2005) or solutions are required in real-time. Moreover, depending on several factors
such as the ratio between feasible search space and the whole search space, multimodality of the problem, the chosen EA and global exploration/local exploitation
stages of the search process, different constraint handling methods can be effective
during different stages of the search process.
In pattern recognition and machine learning (Rokach 2009; Zhang 2000), ensemble methodology has been successfully employed. Ensemble integrates different
methods available to perform the same task into a single method so that the reliability can be improved. For example, in classification, an ensemble model formed by
integrating multiple classifiers reduces the variance, or instability caused by single
methods and improves the classification efficiency or prediction accuracy.
In this chapter, an ensemble of constraint handling techniques (ECHT) with four
constraint handling techniques (Coello Coello 2002; Huang et al. 2006; Runarsson
and Yao 2000; Tessema and Yen 2006) is presented as an efficient alternative to
the trial-and-error-based search for the best constraint handling technique with its
best parameters for a given problem. In ECHT, each constraint handling technique
has its own population and each function call is efficiently utilized by each of these
populations. Ensemble being a general concept can be realized with any EA framework. In this chapter, we integrate ECHT with an improved version of DE algorithm
referred to as EPSDE. EPSDE is a version of DE algorithm which is based on the concept of ensemble (Mallipeddi et al. 2011). In EPSDE, a pool of distinct mutation and
crossover strategies along with a pool of control parameters associated with DE algorithm coexist throughout the evolution process and competes to produce offspring.
Experimental results show that the performance of ECHT-EPSDE is better than each
single constraint handling method used to form the ensemble and competitive to the
state-of-the-art algorithms.

234

R. Mallipeddi et al.

9.2 Constraint Handling TechniquesA Review


A constrained optimization problem with D parameters to be optimized is usually
written as a nonlinear programming problem of the following form (Qin et al. 2009):
Minimize: f (X ), X = (x1 , x2 , . . . , x D ) and X S

subject to:

gi (X ) 0,
h j (X ) = 0,

(9.1)

i = 1, . . . , p
j = p + 1, . . . , m

Here f need not be continuous but must be bounded. S is the search space. p
and (m p) are the number of inequality and equality constraints respectively. The
inequality constraints that satisfy gi (X ) = 0 at the global optimum solution are
called active constraints. All equality constraints are active constraints. The equality
constraints can be transformed into inequality form and can be combined with other
inequality constraints as

G i (X ) =

max{gi (X ), 0}
max{| h i (X ) | , 0}

i = 1, . . . p
i = p + 1, . . . , m

(9.2)

where is a tolerance parameter for the equality constraints. An adaptive setting of the
tolerance parameter, which is originally proposed in (Hamida and Schoenauer 2002)
and used in Mezura-Montes and Coello Coello (2003), Mezura-Montes and Coello
Coello (2005),Wang et al. (2008) is adopted in our work with some modifications.
Therefore, the objective is to minimize the fitness function f (X ) such that the optimal
solution obtained satisfies all the inequality constraints G i (X ). The overall constraint
violation for an infeasible individual is a weighted mean of all the constraints, which
is expressed as
m
wi (G i (X ))
m
(X ) = i=1
(9.3)
i=1 wi
where wi (=1/G maxi ) is a weight parameter, G maxi is the maximum violation of
constraint G i (X ) obtained so far. Here, wi is set as 1/G maxi which varies during
the evolution in order to balance the contribution of every constraint in the problem
irrespective of their differing numerical ranges.
The search process for finding the feasible global optimum in a constrained problem can be divided in to three phases (Wang et al. 2008) depending on the number
of feasible solutions present in the combined parent population and its offspring
population as (a) Phase 1: No feasible solution, (b) Phase 2: At least one feasible
solution, and (c) Phase 3: Combined offspring-parent population has more feasible
solutions than the size of next generation parent population. Different constraint
handling techniques perform differently during each of these three phases.

9 Ensemble of Constraint Handling Techniques

235

9.2.1 Superiority of Feasible Solutions (SF) (Deb 2000;


Powell and Skolnick 1993)
In SF, when two solutions X i and X j are compared, X i is regarded superior to X j
under the following conditions:
X i is feasible and X j is not.
X i and X j are both feasible and X i has a smaller objective value (in a minimization
problem) than X j .
X i and X j are both infeasible, but X i as a smaller overall constraint violation (X i )
as computed by using Eq. (9.3).
Therefore, in SF, feasible ones are always considered better than infeasible ones.
Two infeasible solutions are compared based on their overall constraint violations
only, while two feasible solutions are compared based on their objective function
values only. Comparison of infeasible solutions based on the overall constraint violation aims to push infeasible solutions to the feasible region, while comparison of
two feasible solutions on the objective value improves the overall solution. Therefore,
in Phase 1, infeasible solutions with low overall constraint violation are selected. In
Phase 2, first all the feasible ones are selected and then infeasible ones with low
overall constraint violation are selected. In Phase 3, only feasible ones with best
objective values are selected.

9.2.2 Self-adaptive Penalty (SP) (Tessema and Yen 2006)


The simplest and the earliest method of involving infeasible individuals in the search
process, even after sufficient number of feasible solutions are obtained, is the static
penalty method. In this method, a penalty value is added to the fitness value of each
infeasible individual so that it will be penalized for violating the constraints. Static
penalty functions are popular due to their simplicity but they usually require different
parameters to be defined by the user to control the amount of penalty added when
multiple constraints are violated. The parameters are usually problem-dependent. To
overcome this difficulty, adaptive penalty functions (Farmani and Wright 2003) are
suggested where information gathered from the search process is used to control the
amount of penalty added to infeasible individuals. Adaptive penalty functions are
easy to implement and they do not require users to define parameters.
In Tessema and Yen (2006), a self-adaptive penalty function method is proposed
to solve constrained optimization problems. Two types of penalties are added to
each infeasible individual to identify the best infeasible individuals in the current
population. The amount of the added penalties is controlled by the number of feasible individuals currently present in the combined population. If there are a few
feasible individuals, a higher amount of penalty is added to infeasible individuals
with a higher amount of constraint violation. On the other hand, if there are several

236

R. Mallipeddi et al.

feasible individuals, then infeasible individuals with high fitness values will have
small penalties added to their fitness values. These two penalties allow the algorithm
to switch between finding more feasible solutions and searching for the optimum
solution at any time during the search process. This algorithm requires no parameter
tuning. The final fitness value based on which the population members are ranked is
given as F(X ) = d(X ) + p(X ), where d(X ) is the distance value and p(X ) is the
penalty value. The distance value is computed as follows:

d(X ) =

(X ),


f (X )2 + (X )2 ,

if r f = 0
otherwise

(9.4)

where r f =

Number of feasible individuals


, (X ) is the overall constrain violation as
Population size

f (X ) f min
Eq. (9.3), f (X ) = fmax fmin . f max and f min are the maximum and

defined in
minimum values of the objective function f (X ) in the current combined population.
The penalty value is defined as
p(X ) = (1 r f )M(X ) + r f N (X )


where
M(X ) =

N (X ) =

0

f (X ),

(9.5)

if r f = 0
otherwise

(9.6)

if X is a feasible individual
if X is an infeasible individual

(9.7)

0,
(X ),

Therefore, in Farmani and Wright (2003), Tessema and Yen (2006), the selection
of individuals in all the three phases is based on a value determined by the overall
constraint violation and objective values. Thus, there is a chance for an individual
with lower overall constraint violation and higher fitness to get selected over a feasible
individual with lower fitness even in Phase 3, where there is sufficient number of
feasible solutions to form the parent population using only feasible solutions.

9.2.3 -Constraint (EC) (Takahama and Sakai 2006)


In -constraint handling method the relaxation of the constraints is controlled by
using the parameter. As solving a constrained optimization problem becomes
tedious when active constraints are present, proper control of the parameter is
essential (Takahama and Sakai 2006) to obtain high quality solutions for problems
with equality constraints. The level is updated until the generation counter G reaches
the control generation Tc . After the generation counter exceeds Tc , the level is set
to zero to obtain solutions with no constraint violation.

9 Ensemble of Constraint Handling Techniques

237

(0) = (X )

(k) =


(0) 1
0,

G
Tc

cp

(9.8)
0 < G < Tc
G Tc

(9.9)

where X is the top -th individual and = (0.05 N P). The recommended
parameter ranges are (Takahama and Sakai 2006):Tc [0.1Tmax , 0.8Tmax ] and cp
[2, 10].
The selection of individuals in the three phases of evolution by using the
-constraint technique is similar to the SF, but in the EC, a solution is regarded
as feasible if its overall constraint violation is lower than (G).

9.2.4 Stochastic Ranking (SR) (Runarsson and Yao 2000)


Runarsson and Yao (2000) introduced stochastic ranking (SR) method to achieve
a balance between objective and the overall constraint violation stochastically. A
probability factor p f is used to determine whether the objective function value or
the constraint violation value determines the rank of each individual. Basic form of
the SR (Runarsson and Yao 2000) can be presented as:
If (no constraint violation or rand < p f )
Rank based on the objective value only
else
Rank based on the constraint violation only
End
In Runarsson and Yao (2005), an improved version of the SR (ISR) was proposed
using evolution strategies and differential variation. In SR, comparison between two
individuals may be based on objective value alone or overall constraint violation
alone as randomly determined. Thus, infeasible solutions with better objective value
have a chance to be selected in all three phases of evolution. In our work, a modified
version of the SR presented in Runarsson and Yao (2000) is used. Here, the value of
p f is not maintained a constant instead, decreased linearly from p f = 0.475 in the
initial generation to p f = 0.025 in the final generation.
From the above discussions, we can observe that each of the constraint handling
methods used in ECHT differs in at least one of the three phases. In addition, it should
be noted that the ECHT approach is general and can be formulated with any search
method and constraint handling techniques.

238

R. Mallipeddi et al.

9.3 Ensemble of Constraint Handling Techniques (ECHT)


Each constrained optimization problem would be unique in terms of the ratio between
feasible search space and the whole search space, multi-modality and the nature of
constraint functions. As evolutionary algorithms are stochastic in nature, the evolution paths can be different in every run even when the same problem is solved
using the same algorithm. In other words, the search process passes through different phases at different points during the search process. Therefore, depending on
several factors such as the ratio between feasible search space and the whole search
space, multi-modality of the problem, nature of equality/inequality constraints, the
chosen EA and global exploration/local exploitation stages of the search algorithm,
different constraint handling methods can be effective during different stages of the
search process. Due to the strong interactions between these diverse factors and the
stochastic nature of the evolutionary algorithms, it is not straightforward to determine which constraint handling method is the best during a particular stage of the
evolution to solve a given problem using a given EA. Motivated by these observations, we develop the ECHT to implicitly benefit from the match between constraint
handling methods, characteristics of the problem being solved, chosen EA, and the
exploration-exploitation stages of the search process.
A real-world problem can take several minutes to several hours to compute the
objective function value (Jin 2005). Therefore, finding a better constraint handling
method for such problem by trial-and-error may become difficult. The computation
time wasted in searching for a better constraint handling method can be saved by
using the proposed ECHT.
In this section, we present ECHT with four constraint handling techniques
discussed in previous section. Each constraint handling technique has its own population and parameters. Each population corresponding to a constraint handling method
produces its offspring and evaluates them. The parent population corresponding to
a particular constraint handling method not only competes with its own offspring
population but also with offspring population of the other three constraint handling
methods. Due to this, an offspring produced by a particular constraint handling
method may be rejected by its own population, but could be accepted by the populations of other constraint handling methods. Hence, in ECHT every function call is
utilized effectively. If the evaluation of objective/constraint functions is computationally expensive, more constraint handling methods can be included in the ensemble
to benefit more from each function call. And if a particular constraint handling technique is best suited for the search method and the problem during a point in the
search process, the offspring population produced by the population of that constraint handling method will dominate the other and enter other populations too. In
the subsequent generations, these superior offspring will become parents in other
populations too. Therefore, ECHT transforms the burden of choosing the best constraint handling technique and tuning the associated parameter values for a particular
problem into an advantage. If the constraint handling methods selected to form an
ensemble are similar in nature then the populations associated with each of them may

9 Ensemble of Constraint Handling Techniques

239

lose diversity and the search ability of ECHT may deteriorate. Thus, the performance
of ECHT can be improved by selecting constraint handling methods with diverse and
competitive nature. The general framework of the ensemble algorithm is illustrated
in the flowchart shown in Fig. 9.1.
As ECHT employs different constraint handling methods each having its own
population, it can be compared with hybrid methods like memetic algorithms
(Ishibuchi et al. 2003; Ong and Keane 2004; Ong et al. 2006). Some methods like
island models (Skolicki and De Jong 2007) sometimes called Migration model or
Coarse Grained model, also employ subpopulations in their approach. The main
difference between the ECHT and the island model is that in island model, subpopulations in different islands evolve separately with occasional communication
between them to maintain diversity while in ECHT the communication between different populations is by sharing of all offspring and thus facilitating efficient usage
of each function call.

9.3.1 ECHT-EPSDE
In this section, an ECHT with EPSDE as the basic search algorithm (ECHT-EPSDE)
is demonstrated. ECHT-EPSDE uses the four constraint handling techniques discussed in Sects. 9.2.19.2.4. Each constraint handling technique has its own population and parameters. Each population corresponding to a constraint handling method
produces its offspring using the associated strategies and parameters of the EPSDE.
The offspring produced are evaluated. In ECHT-EPSDE, the parent population corresponding to a particular constraint handling method not only competes with its own
offspring population but also with offspring population of the other three constraint
handling methods. In DE, since mutation and crossover are employed to produce
an offspring, among the parent and offspring population of the same constraint handling technique DEs one-to-one selection is employed. But when the parents of one
constraint handling method competes with offspring population of the other constraint handling method then corresponding to every offspring a parent is randomly
selected for competition. Hence, in ECHT-EPSDE every function call is utilized by
every population associated with each constraint handling technique in the ensemble. Due to this, an offspring produced by a particular constraint handling method
may be rejected by its own population, but could be accepted by the populations of
other constraint handling methods. Therefore, the ensemble transforms the burden of
choosing a particular constraint handling technique and tuning the associated parameter values for a particular problem into an advantage.
The ECHT-EPSDE can be summarized as
STEP 1: Each of the four constraint handling techniques (SF, SP, EC and SR
in Sects. 9.2.19.2.4) has its own population of NP individuals each with dimension D (POPk , k = 1, . . . , 4) and parameter/strategy pools (P Sk , k = 1, . . . , 4)

240

R. Mallipeddi et al.
STEP 1:

INITIALIZE POPULATIONS & PARAMETERS ACCORDING TO EP RULES


AND EACH CH ( i = 1,..., 4 ) RULES

POP1
PAR1

POP2
PAR2

POP3
PAR3

POP4
PAR4

STEP 2: EVALUATE OBJECTIVE & CONSTRAINT FUNCTIONS OF ALL POPULATIONS


INCREASE NUMBER OF FUNCTION EVALUATIONS (nfeval)

NO
nfeval

Max_FEs

STOP

YES

STEP 3: UPDATE THE PARAMETERS OF EACH POPULATION CORRESPONDING TO


EACH CONSTRAINT HANDLING METHOD CH ( i = 1,..., 4 )
i

STEP 4:

PRODUCE OFFS i FROM PARi BY EP MUTATION STRATEGIES

OFFS1

OFFS2

OFFS3

OFFS4

STEP 5: EVALUATE OBJECTIVE & CONSTRAINT FUNCTIONS OF ALL OFFSPRING


INCREASE NUMBER OF FUNCTION EVALUATIONS (nfeval)

STEP 6:

COMBINE POPULATION i WITH ALL OFFSPRING

POP1
OFFS1
OFFS2
OFFS3
OFFS4

STEP 7:

POP1

POP2
OFFS1
OFFS2
OFFS3
OFFS4

POP3
OFFS1
OFFS2
OFFS3
OFFS4

POP4
OFFS1
OFFS2
OFFS3
OFFS4

SELECT POPULATIONS OF NEXT GENERATION ACCORDING TO THE


RULES OF EP & CH ( i = 1,...,4)
i
POP2

POP3

POP4

Fig. 9.1 Flowchart of ECHT (CH: constraint handling method, POP: population, PAR: parameters,
OFF: offspring, Max_FEs: maximum number of function evaluations)

9 Ensemble of Constraint Handling Techniques

241

initialized according to the EPSDE rules and the corresponding constraint handling
method (C Hk , k = 1, . . . , 4). Set the generation counter G = 0.
STEP 2: Evaluate the objective/constraint function values and the overall constraint
violation for each individual X ik ,i {1, . . . , N P} of every population (POPk , k =
1, . . . , 4) using Eqs. (9.29.3).
STEP 3: The parameter values of constraint handling methods are updated according
to Sect. 9.2.
STEP 4: Each parent population (POPk , k = 1, . . . , 4) produces offspring population (OFFS k , k = 1, . . . , 4) by mutation and crossover (Takahama and Sakai
2006).
STEP 5: Compute the objective/constraint function values and the overall constraint

violation of each offspring X i k i {1, . . . , N P}. Each offspring retains the objective and constraint function values separately, i.e., each offspring is evaluated only
once.
STEP 6: Each parent population POPk , k = 1, . . . , 4 is combined with offspring
produced by it and the offspring produced by all other populations corresponding to
different constraint handling techniques as in STEP 6 in Fig. 9.1. The four different
groups are:
Group 1: (POP1 , OFFS k , k = 1, . . . , 4), Group 2: (POP2 ,OFFS k , k = 1, . . . , 4),
Group 3: (POP3 , OFFS k , k = 1, . . . , 4) and Group 4: (POP4 ,OFFS k , k =
1, . . . , 4).
STEP 7: In selection step, parent populations POPk , k = 1, . . . , 4 for the next
generation are selected from Groups 1, 2, 3, and 4 respectively. In a Group (say
Group 1), since OFF1 is produced by POP1 by mutation and crossover, DEs selection
based on competition between parent and its offspring is employed when POP1
competes with OFF1 . But when POP1 competes with OFF2 or OFF3 or OFF4 ,
produced by other populations, each member in POP1 competes with a randomly
selected offspring from OFF2 or OFF3 or OFF4 .
STEP 8: Stop if termination criterion is met. Else, G = G + 1 and go to STEP 3.

9.3.2 Experimental Setup and Results


In Mallipeddi and Suganthan (2010b), we evaluated the performance of ECHT-DE
with the four constraint handling methods used in ECHT (SF-DE, SP-DE, SR-DE,
and EC-DE) are evaluated and compared. In addition, the performance of ECHT-DE
is compared with some of the state-of-the-art methods on a set of 24 well-defined
problems of CEC 2006 (Liang et al. 2006).
In this chapter, we evaluated the performance of EPSDE-ECHT using 10D and
30D versions of CEC 2010. The performance of the algorithm is compared with the
state-of-the-art algorithms that participated in the CEC 2010 competition. The details
regarding the problems and the evaluation criteria are presented in Mallipeddi and
Suganthan (2010).

242

R. Mallipeddi et al.

In ECHT-EPSDE the population corresponding to each constraint handling


method is set to 50. The details regarding the selection of the parameter and strategy pools of EPSDE algorithm are discussed in Mallipeddi et al. (2011). On each
problem of the problem set, every algorithm is run 25 times independently. The
maximum number of function evaluations used is 2 105 and 6 105 for 10D and
30D respectively. The parameters corresponding to the constraint handling methods
are set to: Tc = 0.2Tmax , c p = 5 and P f is linearly decreased from an initial value
of 0.4750.025 in the final generation. However, the performance of the ECHT can
be improved by tuning the parameters of individual constraint handling methods.
The tolerance parameter for the equality constraints is adapted using the following
expression:
(G)
(9.10)
(G + 1) =

Table 9.1 Function values achieved for FES = 2 105 for 10D problems
C01
C02
C03
C04
C05
246.8502
246.7401
240.4916
0, 0, 0
0
245.7474
2.2307
C07
Best
1.000E05
Median 1.000E05
Worst 1.000E05
c
0, 0, 0

0
Mean
1.000E05
Std
2.9292E05
C13
Best
0.0036
Median 0.0036
Worst 0.0036
c
0, 0, 0

0
Mean
0.0036
Std
7.7800E09
Best
Median
Worst
c

Mean
Std

580.7301
602.0537
608.4520
0, 0, 0
0
600.5612
7.2523
C08
20.0780
19.9875
18.9875
0, 0, 0
0
19.3492
0.3452
C14
0.7473
0.7473
0.7406
0, 0, 0
0
0.7470
0.0014

0.0034
0.0034
0.0034
0, 0, 0
0
0.0034
8.5413E18
C09
68.4294
68.4294
61.6487
0, 0, 0
0
67.4211
1.8913
C15
1417.2374
1417.2374
1417.2374
0, 0, 0
0
1417.2374
0

420.9687
420.9687
420.9687
0, 0, 0
0
420.9687
4.6711E07
C10
2.2777
2.2777
2.2612
0, 0, 0
0
2.2761
5.2000E03
C16
325.4888
0.1992
0.1992
0, 0, 0
0
75.2591
122.3254

0
0
0
0, 0, 0
0
0
0
C11
2.2800E+02
9.9040E+02
1.5013E+03
0, 0, 0
0
1.0356E+03
1.0344E+03
C17
2960.9139
2960.9139
2960.9139
0, 0, 0
0
2960.9139
0

C06
2.4983E+01
7.7043E+01
9.2743E+04
0, 0, 0
0
9.7245E+03
2.9188E+04
C12
0
0
0
0, 0, 0
0
0
0
C18
0
0
0
0, 0, 0
0
0
0

9 Ensemble of Constraint Handling Techniques

243

The initial (0) is selected as the median of equality constraint violations over the

entire initial population. The value of is selected in such a way that it causes to
reach a value of E-04 at around 600 generations, after which the value of is fixed
at E-04.
The experimental results (best, median, mean, worst, and standard deviation
values) are presented in Tables 9.1 and 9.2. c are the number of violated constraints
at the median solution: the sequence of three numbers indicates the number of violations (including inequality and equalities) by more than 1.0, more than 0.01 and
more than 0.0001 respectively. is the mean value of the violations of all constraints
at the median solution. The ranking of the algorithm in comparison with the stateof-the-art algorithms is shown in Tables 9.3 and 9.4. The overall and average ranking
for each of the algorithms is presented in Table 9.5.
From the results it can be observed that the best three algorithms are DEg,
ECHT-EPSDE and ECHT-DE with average ranks of 3.08, 3.58, and 4.67. In other
words, the performance of ECHT-EPSDE is better than the ECHT-DE variant.

Table 9.2 Function values achieved for FES = 6 105 for 30D problems
C01
C02
C03
C04
C05
Best
Median
Worst
c

Mean
Std

500
500
501
0, 1, 1
1.3250E02
485.3521
76.4931
C07
Best
6.2793E04
Median 7.2345E04
Worst 8.3291E04
c
0, 0, 0

0
Mean
7.8321E04
Std
9.5612E05
C13
Best
0.0039
Median 0.0039
Worst 0.0039
c
0, 0, 0

0
Mean
0.0039
Std
1.1166E05

1962.5740
2040.3251
2051.3521
0, 0, 0
0
2021.2371
24.5128
C08
20.2688
19.8770
11.1774
0, 0, 0
0
18.5035
2.7152
C14
0.8217
0.8012
0.7557
0, 0, 0
0
0.7994
0.0179

0.0005
0.0001
0.0022
0, 0, 1
3.1000E03
0.0007
0.025
C09
67.4137
64.4208
62.6694
0, 0, 0
0
64.3612
1.2845
C15
2344.6224
2933.9001
3310.3263
0, 0, 0
0
2887.4795
556.8420

420.9832
439.1865
500
1, 1, 1
2.9637E+03
450.6785
28.4321
C10
1.2574
2.3390
4.1011
0, 0, 0
0
2.4532
0.9931
C16
0.1993
0.1993
11096.2789
0, 0, 0
0
79.5125
255.1325

28.6735
29.6333
87.3162
0, 0, 0
0
37.2923
15.1524
C11
4.3051E+03
4.3051E+03
4.3053E+03
0, 0, 0
0
4.3051E+03
6.7521E07
C17
3.1120
9320.5713
21577.5875
0, 0, 1
7.6318E04
12705.5579
6455.6924

C06
2.4983E+01
2.49832E+01
2.49832E+01
0, 0, 0
0
2.49832E+01
3.5147E06
C12
6514.7354
12470.9657
10670.6636
0, 0, 1
1.7311E04
12229.2897
2178.3588
C18
4.2090E09
2.400E07
4.1800E05
0, 0, 0
0
2.1100E06
8.3000E06

244

R. Mallipeddi et al.

Table 9.3 Ranking for 10D problems


Algorithm/Problem C01
C02
jDEsoco
DE-VPS
RGA
E-ABC
DEg
DCDE
Co-CLPSO
CDEb6e6r
sp-MODE
MTS
IEMA
ECHT-DE
ECHT-EPSDE
Algorithm/Problem
jDEsoco
DE-VPS
RGA
E-ABC
DEg
DCDE
Co-CLPSO
CDEb6e6r
sp-MODE
MTS
IEMA
ECHT-DE
ECHT-EPSDE

7
11
9
10
1
12
8
5
1
13
6
1
4
C10
5
6
7
10
2
4
8
12
13
9
11
3
1

13
7
9
8
6
5
4
10
12
11
1
3
1
C11
3
8
9
12
1
6
11
7
13
10
2
5
4

C03

C04

C05

C06

C07

C08

C09

9
11
13
12
1
1
6
7
10
8
5
1
1
C12
5
10
11
5
1
9
5
2
13
12
5
5
2

1
9
8
11
5
10
7
1
13
12
6
1
1
C13
4
6
7
8
2
12
9
2
13
11
5
10
1

10
6
11
8
1
1
1
12
13
7
9
5
1
C14
4
5
7
12
1
2
3
11
10
13
6
8
9

4
10
11
8
1
1
1
12
13
7
9
6
5
C15
8
5
7
11
2
1
4
13
9
12
6
10
3

1
10
11
12
1
8
9
1
1
13
5
6
7
C16
9
1
10
7
8
6
2
12
13
11
3
5
4

2
11
5
12
9
10
1
8
7
13
6
3
3
C17
10
7
8
9
5
2
6
12
13
11
1
4
3

4
6
7
9
1
5
8
12
13
10
11
3
2
C18
10
1
8
9
1
6
7
12
13
11
1
1
1

9 Ensemble of Constraint Handling Techniques


Table 9.4 Ranking for 30D Problems
Algorithm/Problem C01
C02
jDEsoco
DE-VPS
RGA
E-ABC
DEg
DCDE
Co-CLPSO
CDEb6e6r
sp-MODE
MTS
IEMA
ECHT-DE
ECHT-EPSDE
Algorithm/Problem
jDEsoco
DE-VPS
RGA
E-ABC
DEg
DCDE
Co-CLPSO
CDEb6e6r
sp-MODE
MTS
IEMA
ECHT-DE
ECHT-EPSDE

5
12
7
8
2
11
10
1
3
13
4
6
9
C10
2
7
8
11
3
1
9
13
12
10
6
5
4

9
7
8
10
3
2
1
11
12
13
6
5
4
C11
3
8
7
10
2
6
11
1
13
9
12
5
3

245

C03

C04

C05

C06

C07

C08

C09

3
8
12
11
2
1
10
6
9
7
13
4
5
C12
1
10
7
8
11
2
3
9
13
4
12
5
6

4
8
7
10
5
9
6
3
13
11
12
1
1
C13
1
10
9
6
4
8
11
3
13
12
2
7
5

9
5
6
7
1
10
2
12
13
8
11
3
4
C14
4
5
10
8
1
3
5
11
13
12
2
9
7

3
8
9
7
1
10
4
11
13
6
12
5
2
C15
6
4
7
10
2
1
3
12
11
13
5
9
8

1
7
12
13
4
6
9
1
11
8
5
10
3
C16
8
7
10
9
1
6
1
11
13
12
5
1
1

7
10
13
9
2
3
8
1
11
12
4
5
6
C17
10
6
9
8
7
5
4
13
11
12
1
2
3

2
9
8
10
3
13
7
1
12
11
6
5
4
C18
10
5
7
9
8
4
6
12
11
13
1
1
1

246
Table 9.5 Overall ranking of the algorithms
Algorithm
JDEsoco (Brest et el. 2010)
DE-VPS (Tasgetiren et al. 2010)
RGA (Saha et al. 2010)
E-ABC (Mezura-Montes and Velez-Koeppel 2010)
DEg (Takahama and Sakai 2010)
DCDE (Zhihui et al. 2010)
Co-CLPSO (Liang et al. 2010)
CDEb6e6r (Tvrdik and Polakova 2010)
sp-MODE (Reynoso-Meza et al. 2010)
MTS (Lin-Yu and Chun 2010)
IEMA (Singh et al. 2010)
ECHT-DE (Mallipeddi and Suganthan 2010a)
ECHT-EPSDE

R. Mallipeddi et al.

Ranking
10D
30D

Overall

Average

109
130
158
173
49
101
100
151
193
194
98
80
53

197
266
314
337
111
202
210
283
400
380
217
168
129

5.47
7.39
8.72
9.36
3.08
5.61
5.83
7.86
11.11
10.56
6.03
4.67
3.58

88
136
156
164
62
101
110
132
207
186
119
88
76

9.4 Conclusions
In this chapter, a novel constraint handling procedure called ECHT was presented
with four different constraint handling methods where each constraint handling
method has its own population. In ECHT every function call is effectively used
by all four populations and the offspring population produced by the best suited constraint handling technique dominates the others at a particular stage of the optimization process. Furthermore, an offspring produced by a particular constraint handling
method may be rejected by its own population, but could be accepted by the populations associated with other constraint handling methods. No Free Lunch (NFL)
theorem implies that irrespective of the exhaustiveness of parameter tuning, no single constraint handling method can be the best for every constrained optimization
problem. Hence, according to the NFL, the ECHT has the potential to perform well
over diverse problems over any single constraint handling method. In this chapter, we
evaluated the performance of ECHT using EPSDE algorithm. Experimental results
showed that the ECHT-EPSDE outperforms the state-of-the-art methods on CEC
2010 problems.

9 Ensemble of Constraint Handling Techniques

247

References
Anile AM, Cutello V, Nicosia G, Rascuna R, Spinella S (2005) Comparison among evolutionary
algorithms and classical optimization methods for circuit design problems. Paper presented at
the IEEE conference on evolutionary computation, Vancouver, Canada
Brest J, Boskovic B, Zumer V (2010) An improved self-adaptive differential evolution algorithm in
single objective constrained real-parameter optimization. Paper presented at the IEEE congress
on evolutionary computation
Coello Coello CA (2002) Theoretical and numerical constraint-handling techniques used with evolutionary algorithms: a survey of the state of the art. Comput Methods Appl Mech Eng 191(11
12):12451287
Deb K (2000) An efficient constraint handling method for genetic algorithms. Comput Methods
Appl Mech Eng 186(24):311338
Farmani R, Wright JA (2003) Self-adaptive fitness formulation for constrained optimization. IEEE
Trans Evol Comput 7(5):445455
Hamida SB, Schoenauer M (2002) ASCHEA: New results using adaptive segregational constraint
handling. Paper presented at the proceedings of congress evolutionary computation
Huang VL, Qin AK, Suganthan PN (2006) Self-adaptive differential evolution algorithm for constrained real-parameter optimization. Paper presented at the IEEE congress on evolutionary computation, Vancouver, Canada
Ishibuchi H, Yoshida T, Murata T (2003) Balance between genetic search and local search in
memetic algorithms for multiobjective permutation flowshop scheduling. IEEE Trans Evol Comput 7(2):204223
Jin Y (2005) A comprehensive survey of fitness approximation in evolutionary computation. Soft
Comput 9(1):312
Koziel S, Michalewicz Z (1999) Evolutionary algorithms, homomorphous mappings, and constrained parameter optimization. Evol Comput 7(1):1944
Liang JJ, Runarsson TP, Mezura-Montes E, Clerc M, Suganthan PN, Coello Coello CA, Deb K
(2006) Problem definitions and evaluation criteria for the CEC 2006 special session on constrained
real-parameter optimization: Technical Report, Nanyang Technological University, Singapore
Available from http://www3.ntu.edu.sg/home/EPNSugan/
Liang JJ, Shang Z, Li Z (2010) Coevolutionary comprehensive learning particle swarm optimizer.
Paper presented at the IEEE congress on evolutionary computation
Mallipeddi R, Suganthan PN (2010) Problem definitions and evaluation criteria for the CEC 2010
competition on constrained real-parameter optimization, Nanyang Technological University, Singapore
Lin-Yu T, Chun C (2010) Multiple trajectory search for single objective constrained real-parameter
optimization problems. Paper presented at the IEEE congress on evolutionary computation
Mallipeddi R, Suganthan PN (2010a) Differential evolution with ensemble of constraint handling
techniques for solving CEC 2010 benchmark problems. Paper presented at the IEEE congress on
evolutionary computation
Mallipeddi R, Suganthan PN (2010b) Ensemble of constraint handling techniques. IEEE Trans Evol
Comput 14(4):561579
Mallipeddi R, Suganthan PN, Pan QK, Tasgetiren MF (2011) Differential evolution algorithm with
ensemble of parameters and mutation strategies. Appl Soft Comput 11(21):6791696. doi: http://
dx.doi.org/10.1016/j.asoc.2010.04.024
Mezura-Montes E, Coello Coello CA (2003) Adding diversity mechanism to a simple evolution strategy to solve constrained optimization problems. Paper presented at the proceedings of congress
on evolutionary computation
Mezura-Montes E, Coello Coello CA (2005) A simple multimembered evolution strategy to solve
constrained optimization problems. IEEE Trans Evol Comput 9(1):117
Mezura-Montes E, Velez-Koeppel RE (2010) Elitist artificial bee colony for constrained realparameter optimization. Paper presented at the IEEE congress on evolutionary computation

248

R. Mallipeddi et al.

Michalewicz Z, Schoenauer M (1996) Evolutionary algorithms for constrained parameter optimization problems. Evol Comput 4(1):132
Ong YS, Keane AJ (2004) Meta-Lamarckian learning in memetic algorithms. IEEE Trans Evol
Comput 8(2):99110
Ong YS, Lim M-H, Zhu N, Wong K-W (2006) Classification of adaptive memetic algorithms: a
comparative study. IEEE Trans Syst, Man, Cybern 36(1):141152
Powell D, Skolnick M (1993) Using genetic algorithms in engineering design optimization with
non-linear constraints. Paper presented at the proceedings of fifth international conference on
genetic algorithms, San Mateo,California
Qin AK, Huang VL, Suganthan PN (2009) Differential evolution algorithm with strategy adaptation
for global numerical optimization. IEEE Trans Evol Comput 13(2):398417
Reynoso-Meza G, Blasco X, Sanchis J, Martinez M (2010) Multiobjective optimization algorithm
for solving constrained single objective problems. Paper presented at the IEEE congress on
evolutionary computation
Rokach L (2009) Taxonomy for characterizing ensemble methods in classification tasks: a review
and annotated bibliography. Comput Stat Data Anal 53:40464072
Runarsson TP, Yao X (2000) Stochastic ranking for constrained evolutionary optimization. IEEE
Trans Evol Comput 4(3):284294
Runarsson TP, Yao X (2005) Search biases in constrained evolutionary optimization. IEEE Trans
Syst, Man, Cybern 35(2):233243
Saha A, Datta R, Deb K (2010) Hybrid gradient projection based genetic algorithms for constrained
optimization. Paper presented at the IEEE congress on evolutionary computation
Singh HK, Ray T, Smith W (2010) Performance of infeasibility empowered memetic algorithm
for CEC 2010 constrained optimization problems. Paper presented at the IEEE congress on
evolutionary computation
Skolicki Z, De Jong K (2007) The importance of a two-level perspective for Island model design.
Paper presented at the IEEE congress on evolutionary computation
Takahama T, Sakai S (2006) Constrained Optimization by the constrained differential evolution with
gradient-based mutation and feasible elites. Paper presented at the IEEE congress on evolutionary
computation, Sheraton Vancouver wall centre hotel, Vancouver, BC, Canada
Takahama T, Sakai S (2010) Constrained optimization by the -constrained differential evolution
with an archive and gradient-based mutation. Paper presented at the IEEE congress on evolutionary computation
Tasgetiren MF, Suganthan PN, Quan-ke P, Mallipeddi R, Sarman S (2010) An ensemble of differential evolution algorithms for constrained function optimization. Paper presented at the IEEE
congress on evolutionary computation
Tessema B, Yen GG (2006) A Self adaptive penalty function based algorithm for constrained
optimization. Paper presented at the IEEE congress on evolutionary computation
Tvrdik J, Polakova, R (2010) Competitive differential evolution for constrained problems. Paper
presented at the IEEE congress on evolutionary computation
Wang Y, Cai Z, Guo G, Zhou Y (2007) Multiobjective optimization and hybrid evolutionary algorithm to solve constrained optimization problems. IEEE Trans Syst, Man, Cybern 37(3):560575
Wang Y, Cai Z, Zhou Y, Zeng W (2008) An adaptive tradeoff model for constrained evolutionary
optimization. IEEE Trans Evol Comput 12(1):8092
Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol
Comput 1(1):6782
Zhang GP (2000) Neural networks for classification: a survey. IEEE Trans Syst, Man, Cybern Part
CAppl Rev 30(4):451462
Zhihui L, Liang JJ, Xi H, Zhigang S (2010) Differential evolution with dynamic constraint-handling
mechanism. Paper presented at the IEEE congress on evolutionary computation

Chapter 10

Evolutionary Constrained Optimization:


A Hybrid Approach
Rituparna Datta and Kalyanmoy Deb

Abstract The holy grail of constrained optimization is the development of an


efficient, scale invariant, and generic constraint-handling procedure in single- and
multi-objective constrained optimization problems. Constrained optimization is a
computationally difficult task, particularly if the constraint functions are nonlinear
and nonconvex. As a generic classical approach, the penalty function approach is a
popular methodology that degrades the objective function value by adding a penalty
proportional to the constraint violation. However, the penalty function approach
has been criticized for its sensitivity to the associated penalty parameters. Since its
inception, evolutionary algorithms (EAs) have been modified in various ways to solve
constrained optimization problems. Of them, the recent use of a bi-objective evolutionary algorithm in which the minimization of the constraint violation is included
as an additional objective, has received significant attention. In this chapter, we propose a combination of a bi-objective evolutionary approach with the penalty function
methodology in a manner complementary to each other. The bi-objective approach
provides an appropriate estimate of the penalty parameter, while the solution of the
unconstrained penalized function by a classical method induces a convergence property to the overall hybrid algorithm. We demonstrate the working of the procedure
on a number of standard numerical test problems. In most cases, our proposed hybrid
methodology is observed to take one or more orders of magnitude lesser number of
function evaluations to find the constrained minimum solution accurately than some
of the best-reported existing methodologies.
Keywords Constrained optimization Penalty function Inequality and equality
constraints Bi-objective evolutionary algorithms Hybrid methodology
R. Datta (B)
Department of Electrical Engineering, Korea Advanced Institute of Science and Technology,
291 Daehak-ro, Yuseong-gu, Daejeon 305-701,
Republic of Korea
e-mail: rdatta@rit.kaist.ac.kr
K. Deb
Department of Electrical and Computer Engineering, Department of Computer Science
and Engineering and Department of Mechanical Engineering, Michigan State University,
428 S. Shaw Lane, 2120 EB, East Lansing, MI 48824, USA
e-mail: kdeb@egr.msu.edu
Springer India 2015
R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,
Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_10

249

250

R. Datta and K. Deb

10.1 Introduction
Most real-world optimization problems involve constraints mainly due to physical
limitations or functional requirements. A constraint can be of equality type or of
inequality type, but all constraints must be satisfied for a solution to be called feasible. Most often in practice, constraints are of inequality type and in some cases
an equality constraint can be suitably approximated as an inequality constraint. In
some situations, a transformation of an equality constraint to a suitable inequality
constraint does not change the optimal solution. Thus, in most constrained optimization studies, researchers are interested in devising efficient algorithms for handling
inequality constraints.
Traditionally, constrained problems with inequality constraints are solved by using
a penalty function approach in which a penalty term proportional to the extent of
constraint violation is added to the objective function to define a penalized function.
Since constraint violation is included, the penalized function can be treated as an
unconstrained objective, which then can be optimized using an unconstrained optimization technique. A nice aspect of this approach is that it does not care about
the structure of the constraint functions (linear or nonlinear, convex or nonconvex).
However, as it turns out, the proportionality term (which is more commonly known
as the penalty parameter) plays a significant role in the working of the penalty
function approach. In essence, the penalty parameter acts as a balancer between
objective value and the overall constraint violation value. If too small a penalty
parameter is chosen, constraints are not emphasized enough, thereby causing the
algorithm to lead to an infeasible solution. On the other hand, if too large a penalty
parameter is chosen, the objective function is not emphasized enough and the problem behaves like a constraint satisfaction problem, thereby leading the algorithm to
find an arbitrary feasible solution. Classical optimization researchers tried to make
a good balance between the two tasks (constraint satisfaction and convergence to
the optimum) by trial-and-error means in the beginning and finally resorting to a
sequential penalty function approach. In the sequential approach, a small penalty
parameter is chosen at first and the corresponding penalized function is optimized.
Since the solution is likely to be an infeasible solution, a larger penalty parameter is
chosen next and the corresponding penalized function is optimized starting from the
obtained solution of the previous iteration. This process is continued till no further
improvement in the solution is obtained. Although this method, in principle, seems
to eliminate the difficulties of choosing an appropriate penalty parameter by trialand-error or other ad hoc schemes, the sequential penalty function method is found
to not work well on problems having (i) a large number of constraints, (ii) a number
local or global optima, and (iii) different scaling of constraint functions. Moreover,
the performance of the algorithm depends on the choice of initial penalty parameter
value and how the penalty parameter values are increased from one iteration to the
next.

10 Evolutionary Constrained Optimization: A Hybrid Approach

251

Early evolutionary algorithms (EAs) have implemented the penalty function


approach in different ways. In the initial studies (Deb 1991), a fixed penalty parameter was chosen and a fitness function was derived from the corresponding penalized
function. As expected, these methods required trial-and-error simulation runs to
arrive at a suitable penalty parameter value to find a reasonable solution to a problem. Later studies (Michalewicz and Janikow 1991; Homaifar et al. 1994; Dadios
and Ashraf 2006) have used dynamically changing penalty parameter values (with
generations) and also self-adaptive penalty parameters (Coello and Carlos 2000;
Tessema and Yen 2006) based on current objective and constraint values. Although
the chronology of improvements of the penalty function approach with an EA seemed
to have improved EAs performance from early trial-and-error approaches, radically different methodologies came into existence to suit EAs population and flexible structure. These methodologies made remarkable improvements and somehow
the traditional penalty function approach has remained in oblivion in the recent
past.
Among these recent EA methodologies, the penalty-parameter-less approach (Deb
2000) and its extensions (Angantyr et al. 2003) eliminated the need for any penalty
parameter due to the availability of a population of solutions at any iteration. By
comparing constraint and objective values within population members, these methodologies have redesigned selection operators that carefully emphasized feasible over
infeasible solutions and better feasible and infeasible solutions.
Another approach gaining a lot of popularity is a bi-objective EA approach (Deb
et al. 2007; Ray et al. 2009), in which in addition to the given objective function,
an estimate of the overall constraint violation is included as a second objective.
The development of multi-objective evolutionary algorithms (EMO) (Deb 2001;
Coello et al. 2007) allowed solution of such bi-objective optimization problems
effectively. Although at first instance this may seem to have made the constrainthandling task more complex, certainly the use of two apparently conflicting objectives
(of minimizing given objective and minimizing constraint violation) brings in more
flexibility in the search space, which has the potential to overcome multimodality
and under or over-specification of the penalty parameter.
From the above-mentioned ideas, we combine the original penalty function
approach with a specific bi-objective EAthe elitist nondominated sorting genetic
algorithm (NSGA-II) (Deb et al. 2002) to form a hybrid evolutionary-cum-classical
constrained handling procedure in a complementary manner to each other. The
difficulties of choosing a suitable penalty parameter are overcome by finding the
Pareto-optimal front of the bi-objective problem and deriving an appropriate penalty
parameter from it theoretically. On the other hand, the difficulties of an EA to converge to the true optimum is overcome by solving the derived penalized function
problem using a classical optimization algorithm.

252

R. Datta and K. Deb

10.2 Constrained Optimization Problems and the Penalty


Function Approach
A constrained optimization problem is formulated as follows:

Minimize f (x),

subject to gj (x) 0, j = 1, . . . , J,
hk (x) = 0, k = 1, . . . , K,

xil xi xiu , i = 1, . . . , n.

(10.1)

In the above nonlinear programming (NLP) problem, there are n variables, J greaterthan-equal-to type constraints, and K equality constraints. The function f (x) is the
objective function, gj (x) is the jth inequality constraint, and hk (x) is the kth equality
constraint. The ith variable varies in the range [xil , xiu ]. The conventional way to deal
with equality constraints is by converting it into an appropriate inequality constraint:
gJ+k (x) = |k hk (x)| 0, with a small given value of k .
The penalty function approach is a popular approach used with classical and early
evolutionary approaches. In this approach, an amount proportional to the constraint
violation of a solution is added to the objective function value to form the penalized
function value, as follows:
P(x, R) = f (x) +

J

j=1

Rj gj (x) +

K


Rk |hk (x)| .

(10.2)

k=1

The term gj (x) is zero if gj (x) 0 and is gj (x), otherwise. The parameter Rj
is the penalty parameter associated with inequality constraints and Rk is the penalty
parameter associated with equality constraints. The penalty function approach has
the following features:
1. The optimum value of the penalized function P() largely depends on the penalty
parameters Rj and Rk . Users generally attempt with different values of Rj and Rk
to find what value would push the search toward the feasible region. This requires
extensive experimentation to find a reasonable approximation for the solution of
problem given in Eq. (10.1).
2. The addition of the penalty term makes a distortion of the penalized function
from the given objective function. For small values of the penalty parameter,
the distortion is small, but the optimal solution of P() may happen to lie in
the infeasible region. By contrast, if a large Rj and Rk is used, any infeasible
solution has a large penalty, thereby causing any feasible solution to be projected
as an exceedingly better solution than any infeasible solution. The difference
between two feasible solutions gets overshadowed by the difference between an
feasible and an infeasible solution. This often leads the algorithm to converge to
an arbitrary feasible solution. Moreover, the distortion may be so severe that in

10 Evolutionary Constrained Optimization: A Hybrid Approach

253

the presence of two or more constraints, P() may have artificial locally optimal
solutions.
To overcome these difficulties, classical penalty function approach works in a
sequence of solving a number of penalized functions, where in every sequence the
penalty parameters are increased in steps and the current sequence of optimization
begins from the optimized solution found in the previous sequence. However, the
sequential penalty function approach has shown its weakness in (i) handling multimodal objective functions having a number of local optima, (ii) handling a large
number of constraints, particularly due to the increased chance of having artificial
local optimum where the procedure can get stuck to, and (iii) using the numerical
gradient-based approaches, due to the inherent numerical error which is caused in
taking one feasible and one infeasible solution in the numerical gradient computation.
Let us consider a single-variable constrained problem to illustrate some of these
difficulties:

Minimize f (x) = x 2 sin((x + 0.1)) + 10(1 x),


(10.3)
subject to g1 (x) 1 x 0,

g2 (x) x 0.
Figure 10.1 shows the objective function f (x) in x [0, 6.5] in which all solutions
satisfying x > 1 are infeasible.
The constrained minimum is the point H with x = 1. Due to multimodalities
associated with the objective function, the first iteration of the sequential penalty
function method (with R = 0) may find the global minimum (A) of the associated
penalized function P(x, 0). In the next sequence, if R is increased to one and the
resulting P(x, 1) is minimized starting from A, a solution close to A will be achieved,

P(x,R)

R=20

10
1

H
G

5
2

E
D
C

f(x)
A

x
Fig. 10.1 Penalized function for different values of R for the problem given in Eq. (10.3)

254

R. Datta and K. Deb

as is evident from the (R = 1)-line of the figure. In subsequent sequences, although


R is increased continuously, the resulting minimum solution will not change much.
Due to an insignificant change in the resulting optimal solution, the algorithm may
eventually terminate after a few iterations and a solution close to A will be declared
as the final optimized solution. Clearly, such a solution is infeasible and is far from
the true constrained minimum solution (H). The difficulty with the single-objective
optimization task is that even if a solution close to H is encountered, it will not be
judged to be better than solution A in such a problem. We shall get back to this
problem later and illustrate how a bi-objective formulation of the same problem can
allow solutions such as F or G to be present in the population and help find the
constrained minimum in such problems.
There is another point we would like to make from this example. When the
penalized function with R = 20 or more is attempted to solve with a global optimizer,
there is some probability that the algorithm can get out of local optimum (A ) and
converge to the global minimum (H) of the corresponding penalized function, thereby
correctly solving the problem. This gives us a motivation to use a global optimizer,
such as an EA, than a classical gradient-based local search approach, with the penalty
function approach.

10.3 Past Studies on Evolutionary Constrained Optimization


Due to the importance of solving the constrained problems in practice, evolutionary algorithm researchers have been regularly devising newer constraint-handling
techniques. A standard EA is modified with a number of different principles for
this purpose. Some comprehensive survey with the evolutionary-based constrainthandling methods can be found in Michalewicz and Schoenauer (1996), Coello and
Carlos (2002), Mezura-Montes and Coello (2011).
Michalewicz and Janikow (1991) classified different constrained-handling
schemes within EA into six different classes. Among them, a majority of the EA
approaches used two methodologies: (i) penalizing infeasible solutions and (ii) carefully delineating feasible and infeasible solutions. We mention the studies related to
each of these two methods in the following subsections.

10.3.1 Penalty-Based EA Studies


The initial constrained EA studies used static, dynamic, and self-adaptive penalty
function methods, which handled constraints by adding a penalty term proportional to
the constraint violation to the objective function (Dadios and Ashraf 2006; Homaifar
et al. 1994; Michalewicz and Janikow 1991). Richardson et al. (1989) proposed a set
of guidelines for genetic algorithms using penalty function approach. Gen and Cheng
(1996) proposed a tutorial survey of studies till 1996 on penalty techniques used in

10 Evolutionary Constrained Optimization: A Hybrid Approach

255

genetic algorithms. Coit et al. (1996) proposed a general adaptive penalty technique
which uses a feedback obtained during the search along with a dynamic distance
metric. Another study proposed adaptation of penalty parameter using co-evolution
(Coello and Carlos 2000). A stochastic approach is proposed by Runarsson and Yao
(2000) to balance the objective and penalty functions. Nanakorn and Meesomklin
(2001) proposed an adaptive penalty function that gets adjusted by itself during
the evolution in such a way that the desired degree of penalty is always obtained.
Kuri-Morales and Gutirrez-Garca (2002) proposed a statistical analysis based on
the penalty function method using genetic algorithms with five different penalty
function strategies. For each of these, they have considered three particular GAs. The
behavior of each strategy and the associated GAs is then established by extensively
sampling the function suite and finding the worst-case best values.
Zhou et al. (2003) did not suggest any new penalty term, but performed a time
complexity analysis of EAs for solving constrained optimization using the penalty
function approach. It is shown that when the penalty coefficient is chosen properly,
direct comparison between pairs of solutions using penalty fitness function is equivalent to that using the criteria superiority of feasible point or superiority of objective
function value. They also analyzed the role of penalty coefficients in EAs in terms of
time complexity. In some cases, EAs benefit greatly from higher penalty parameter
values, while in other examples, EAs benefit from lower penalty parameter values.
However, the analysis procedure still cannot make any theoretical predication on the
choice of suitable penalty parameter for an arbitrary problem.
Wang and Ma (2006) proposed an EA-based constraint-handling scheme with continuous penalty function where only one control parameter is proposed on penalty
function. Lin and Chuang (2007) proposed an adjustment of penalty parameter with
generations by using the rough set theory. Matthew et al. (2009) suggested an adaptive
GA that incorporates population-level statistics to dynamically update penalty functions, a process analogous to strategic oscillation used in the tabu search literature.
The method of delineating feasible from infeasible solutions was proposed by
Powell and Skolnick (1993). The method was modified in devising a penaltyparameter-less approach (Deb 2000). From the objective function and constraint
function values, a fitness function is derived so that (i) every feasible solution is
better than any infeasible solution, (ii) between two feasible solutions, the one with
better objective function value is better, and (iii) between two infeasible solutions,
the one with a smaller overall constraint violation is better. Angantyr et al. (2003) is
another effort in this direction:
1. If no feasible individual exists in the current population, the search should be
directed toward the feasible region.
2. If the majority of the individuals in the current populations are feasible, the search
should be directed toward the unconstrained optimum.
3. A feasible individual closer to the optimum is always better than the feasible
individual away from the optimum.
4. An infeasible individual might be a better individual than the feasible individual
if the number of feasible individuals is high.

256

R. Datta and K. Deb

A constraint-handling study hybridized genetic algorithm with artificial immune


system (AIS), where the role of AIS was to help in pushing the population towards
feasible region (Bernardino et al. 2007). A recent study combined genetic algorithm
with complex search algorithm (Sha and Xu 2011) to improve the convergence
and applied to constrained trajectory optimization. Optimal solution of genetic algorithm was used as an initial parameter for the complex search method. Another recent
methodology proposed a hybrid genetic algorithm with a flexible allowance technique (GAFAT) for solving constrained engineering design optimization problems by
fusing center-based differential crossover (CBDX), Levenberg Marquardt mutation
(LMM), and nonuniform mutation (NUM) (Zhao et al. 2011).
A recent methodology described a framework based on both genetic algorithm
and differential evolution, which consists of collective search operators in every
generation and adaptively mixes them to solve constrained optimization problems
(Elsayed et al. 2011).

10.3.2 Multi-objective-Based EA Studies


More recent studies convert the original problem into a bi-objective optimization
problem in which a measure of an overall constraint violation is used as an additional
objective (Surry et al. 1995; Zhou et al. 2003). Another study suggested the use of
violation of each constraint as a different objective, thereby making the approach a
truly multi-objective one (Coello 2000).
Let us return to the example problem in Fig. 10.1. If we consider a set of solutions
(A to H) and treat them for two objectives (minimization of f (x) and constraint
violation CV(x) = 1 x + x), we obtain the plot in Fig. 10.2. It is clear that
due to the consideration of the constraint violation as an objective, now, solutions
BH are nondominated with solution A. Since a bi-objective EA will maintain a
diverse population of such solutions, any solution (albeit having a worse objective
value f (x)) close to the critical constraint boundary will also be emphasized and
there is a greater chance of finding the true constrained optimum by the bi-objective
optimization procedure quickly. With a single objective of minimizing f (x) (as done
in the penalty function approach), such a flexibility will be lost.
Surry et al. (1995) proposed a multi-objective-based constraint-handling strategy (Constrained Optimization by Multiobjective Optimization Genetic algorithms
(COMOGA)), where the population was first ranked based on the constraint violation
followed by the objective function value. Camponogara and Talukdar (1997) proposed solving a bi-objective problem in which EA generates the Pareto-optimal front.
Based on domination, two solutions are chosen and a search direction is generated.
Zhou et al. (2003) proposed constraint handling based on a bi-objective formulation,
where solutions are ranked based on the SPEA (Zitzler and Thiele 1999)-style Pareto
strength. In the formulation, one objective is the given objective function itself and
degree of constraint violation forms the second objective. In each generation two

10 Evolutionary Constrained Optimization: A Hybrid Approach

257

Dominated
points

H
G

F E

f(x)

D
C
front

Fig. 10.2 Two-objective plot of a set of solutions for the problem given in Eq. (10.3)

offspring are selected based on the highest Pareto strength and with lower degree of
constraint violation.
Venkatraman and Yen (2005) proposed a two-phase framework. In the first phase,
the objective function is neglected and the problem is treated as a constraint satisfaction problem to find at least one feasible solution. Population is ranked based on
the sum of constraint violations. As and when at least a single feasible solution is
found, both the objective function and the constraint violation are taken into account
where two objectives are original objective and summation of normalized constraint
violation values.
Cai and Wang (2005) proposed a novel EA for constrained optimization. In the
process of population evolution, the algorithm is based on multiobjective optimization, i.e., an individual in the parent population may be replaced if it is dominated
by a nondominated individual in the offspring population. In addition, three models
of a population-based algorithm generator and an infeasible solution archiving and
replacement mechanism are introduced. Furthermore, the simplex crossover is used
as a recombination operator to enrich the exploration and exploitation abilities of the
approach proposed.
Deb et al. (2007) proposed a hybrid reference-point-based evolutionary multiobjective optimization (EMO) algorithm, where the summation of normalized constraint violation is used as the second objective. Wang et al. (2008) proposed a
multi-objective way of constraint handling with three main issues: (i) the evaluation of infeasible solutions when the population contains only infeasible individuals;
(ii) balancing feasible and infeasible solutions when the population consists of a
combination of feasible and infeasible individuals; and (iii) the selection of feasible
solutions when the population is composed of feasible individuals only.

258

R. Datta and K. Deb

Echeverri et al. (2009) presented a bi-objective-based two-phase methodology


based on the philosophy of lexicographic goal programming for solving constraint
optimization problems. In the first phase, the objective function is completely disregarded and the entire search effort is directed toward finding a single feasible solution.
In the second phase, the problem is treated as a bi-objective optimization problem,
turning the constraint optimization into a two-objective optimization problem. Ray
et al. (2009) proposed an infeasibility-driven bi-objective method that maintains a
small percentage of infeasible solutions close to the constraint boundary.

10.3.3 Hybrid Approaches


Although many other ideas are suggested, researchers realized that the task of finding
the constrained optimum by an EA can be made more efficient and accurate, if it
is hybridized with a classical local search procedure. Some such studies are Myung
and Kim (1998), Fatourechi et al. (2005). A combination of a genetic algorithm and
a local search method can speed-up the search to locate the exact global optimum.
Applying a local search to the solutions that are guided by a genetic algorithm can
help in convergence to the global optimum.
Burke and Smith (2000) proposed a hybrid EA-local search for the thermal generator maintenance scheduling problem. A heuristic is used for solutions initialization.
Fatourechi et al. (2005) proposed a hybrid genetic algorithms for user customization
of the energy normalization parameters in braincomputer interface systems. The GA
is hybridized with local search. Victoire and Jeyakumar (2005) proposed a sequential
quadratic programming (SQP) method for the dynamic economic dispatch problem
of generating units considering the valve-point effects. The developed method is a
two-phase optimizer. In the first phase, the candidates of EP explores the solution
space freely. In the second phase, the SQP is invoked when there is an improvement
of solution (a feasible solution) during the EP run. Thus, the SQP guides EP for better
performance in the complex solution space.
Wang et al. (2006) proposed an effective hybrid genetic algorithm (HGA) for a
flow shop scheduling problem with limited buffers. In the HGA, not only multiple
genetic operators are used simultaneously in a hybrid sense, but also a neighborhood
structure based on graph theoretical approach is employed to enhance the local search,
so that the exploration and exploitation abilities can be well balanced. Moreover, a
decision probability is used to control the utilization of genetic mutation operation
and local search based on problem-specific information so as to prevent the premature convergence and concentrate the computing effort on promising neighboring
solutions.
El-Mihoub et al. (2006) proposed different forms of integration between genetic
algorithms and other search and optimization techniques and also examined several
issues that needed to be taken into consideration when designing an HGA that used
another search method as a local search tool.

10 Evolutionary Constrained Optimization: A Hybrid Approach

259

Deb et al. (2007) proposed a hybrid reference-point-based evolutionary multiobjective optimization (EMO) algorithm coupled with the classical SQP procedure
for solving constrained single-objective optimization problems. The reference pointbased EMO procedure allowed the procedure to focus its search near the constraint
boundaries, while the SQP methodology acted as a local search to improve the
solutions. Deep et al. (2008) proposed a constraint-handling method based on the
features of genetic algorithm and self-organizing migrating algorithm.
Araujo et al. (2009) proposed a novel methodology to be coupled with a genetic
algorithm to solve optimization problems with inequality constraints. This methodology can be seen as a local search operator that uses quadratic and linear approximations for both objective function and constraints. In the local search phase, these
approximations define an associated problem with a quadratic objective function and
quadratic and/or linear constraints that are solved using a linear matrix inequality
(LMI) formulation. The solution of this associated problems is then reintroduced in
the GA population.
Bernardino et al. (2009) proposed a hybridized genetic algorithm (GA) with an
artificial immune system (AIS) as an alternative to tackle constrained optimization
problems in engineering. The AIS is inspired by the clonal selection principle and is
embedded into a standard GA search engine in order to help move the population into
the feasible region. The resulting GA-AIS hybrid is tested in a suite of constrained
optimization problems with continuous variables, as well as structural and mixed
integer reliability engineering optimization problems. In order to improve the diversity of the population, a variant of the algorithm is developed with the inclusion of a
clearing procedure. The performance of the GA-AIS hybrids is compared with other
alternative techniques, such as the adaptive penalty method, and the stochastic ranking technique, which represent two different types of constraint handling techniques
that have been shown to provide good results in the literature.
Yuan and Qian (2010) proposed a new HGA combined with local search to solve
twice continuously differentiable nonlinear programming (NLP) problems. The local
search eliminates the necessity of a penalization of infeasible solutions or any special
crossover and mutation operators.
Recently Mezura-Montes (2009) edited a book on constraint handling in evolutionary optimization. The most recent study in constraint-handling survey using
nature-inspired optimization can be found in Mezura-Montes and Coello (2011). The
following methodologies are briefly described in their paper:

Feasibility rules
Stochastic ranking
-constraint method
Novel penalty functions
Novel special operators
Multiobjective concepts
Ensemble of constraint-handling techniques

The authors also showed a good future direction for the researchers in constrainthandling areas. These areas will be helpful for researchers, novice, and experts alike.

260

R. Datta and K. Deb

Year
Fig. 10.3 Paper published in evolutionary constrained optimization per year (19612013,
September 26) (taken from Coello (2013))

The areas are as follows:

Constraint-handling for multiobjective optimization


Constraint Approximation
Dynamic constraints
Hyper-heuristics
Theory

The aforesaid literature clearly indicates that different techniques are proposed
using EAs for efficient constraint handling. However, it is difficult to cover the
whole literature on constraint handling. Coello (2013) maintains a constrainthandling repository which holds a broad spectrum of constraint-handling techniques.
Figure 10.3 quantitatively shows the histogram of a number of paper published in
evolutionary constrained optimization. From Fig. 10.3 it is clear that researchers are
coming up with new constraint-handling mechanisms using EAs, for which the number of published papers is directly proportional to time. For the year 2013, we have
data until September 26.

10.4 Proposed Hybrid Methodology


It is clear from the above growing list of literature that EAs are increasingly being used
for constrained optimization problems. This popularity is due to their flexibility in
working with any form of constraint violation information and ability to get integrated
with any other algorithm. In this section, demonstration of this flexibility of EAs is
given by using a bi-objective EA and integrating it with a penalty-function-based

10 Evolutionary Constrained Optimization: A Hybrid Approach

261

classical approach to speed-up the convergence. The main motivation of the hybridization is to take advantage of one method to overcome difficulties of the other method
and, in the process, develop an algorithm that may outperform each method individually and preferably to most reported high-performing algorithms.

10.4.1 Advantages of Using a Bi-objective Approach


Evolutionary multi-objective optimization (EMO) algorithms have demonstrated
enough for their ability to find multiple trade-off solutions for handling two, three, and
four conflicting objectives. The principle of EMO has also been utilized to problems
other than multi-objective optimization problemsa process now largely known as
a multiobjectivization process (Knowles et al. 2008). Although we are interested
in solving a single-objective constrained optimization problem, we have mentioned
earlier that the concept of multi-objective optimization was found to be useful and
convenient in handling single-objective constrained optimization problems. A biobjective optimization problem is formulated to handle single-objective constrained
problems in the past (Coello 2000; Deb et al. 2007; Surry et al. 1995). Let us consider
the following single-objective, two-variable minimization problem:

minimize f (x) = 1 + x12 + x22 ,
subject to g(x) 1 (x1 1.5)2 (x2 1.5)2 0.

(10.4)

The feasible region is the area inside a circle of radius one and center at (1.5, 1.5)T .
Since the objective function is one more than the distance of any point from
the origin,
the constrained minimum lies on the circle and at x1 = x2 = 1.5 1/ 2 = 0.793.
The corresponding function value is f = 2.121. Thus, in this problem, the minimum
point makes the constraint g() active. This problem was also considered elsewhere
(Deb 2001).
Let us now convert this problem into the following two-objective problem:
minimize f1 (x) = CV(x) = g(x),


minimize f2 (x) = f (x) = 1 +

x12 + x22 ,

(10.5)

where CV(x) is the constraint violation. For multiple inequality and equality constraints, the constraint violation function is defined in terms of normalized constraint
functions, as follows:
CV(x) =

J
K




hk (x) .
gj (x) +
j=1

k=1

(10.6)

262
6
5

f(x)

f(x)

Fig. 10.4 The constrained


minimum, feasible
solutions of the original
single-objective optimization
problem, and the
Pareto-optimal set of the
bi-objective problem given
in Eq. (10.5)

R. Datta and K. Deb

R0

Feasible solutions
of equation (1)

1 tangent
CV(x)

A
Constrained minimum

2
1

Unconstrained
minimum
Paretooptimal front

10

15

20

25

Constraint violation, CV(x)

For the above problem, the first objective (f1 ()) is always nonnegative. If for any
solution the first objective value is exactly equal to zero, it is the feasible solution to
the original problem, given in Eq. (10.4). Figure 10.4 shows the objective space of
the above bi-objective optimization problem. Since all feasible solutions lie on the
CV = 0 axis, the minimum of all feasible solutions corresponds to the minimum
point of the original problem. This minimum solution is shown in the figure as
solution A.
The corresponding Pareto-optimal front for the two-objective optimization problem (given in Eq. (10.5)) is marked. Interestingly, the constraint minimum solution
A lies on one end of the Pareto-optimal solution front. Such bi-objective problems
are usually solved using a lexicographic method (Miettinen 1999), in which after
finding the minimum-CV solution (corresponds to CV = 0 here), the second level
optimization task would minimize f (x) subject to CV(x) 0. But this problem
is identical to the original problem (Eq. (10.4)). Thus, the lexicographic method of
solving the bi-objective problems is not computationally and algorithmically advantageous in solving the original constrained optimization problem. However, an EMO
with a modification in its search process can be used to solve the bi-objective problem. Since we are interested in the extreme solution A, there is no need for us to
find the entire Pareto-optimal front. Fortunately, a number of preference-based EMO
procedure which can find only a part of the entire Pareto-optimal front (Branke 2008;
Branke and Deb 2004). In solving constrained minimization problems, we may then
employ such a technique to find the Pareto-optimal region close to the extreme left
of the Pareto-optimal front (as in Fig. 10.4).
In summary, we claim here that since an EMO procedure (even for a preferencebased EMO approach) emphasizes multiple trade-off solutions by its niching (crowding or clustering) mechanism, an EMO population will maintain a diverse set of
solutions than a single-objective EA would. This feature of EMO should help solve
complex constrained problems better. Moreover, the use of bi-objective optimization

10 Evolutionary Constrained Optimization: A Hybrid Approach

263

avoids the need of any additional penalty parameter which is required in a standard
penalty function-based EA approach.

10.4.2 Hybridizing with a Penalty Approach


EAs and EMOs do not use gradients or any mathematical optimality principle to
terminate their runs. Thus, a final solution found with an EMO is always questionable for its nearness to the true optimum solution. For this purpose, EA and EMO
methodologies are recently being hybridized with a classical optimization method as
a local search operator. Since the termination of a local search procedure is usually
checked based on mathematical optimality conditions (such as the Kaursh-KuhnTucker (KKT) error norm being close to zero, as used in standard optimization softwares (Byrd et al. 2006; Moler 2004), and the solution of the local search method
is introduced in the EA population, the final EA solution also carries the optimality
property. Usually, such local search methods are sensitive to the initial point used to
start the algorithm and the use of an EA is then justified for the supply of a good initial
solution to a local search method. Some such implementations can be found in Hedar
and Fukushima (2003) for single-objective optimization problems and Sharma et al.
(2007), Kumar et al. (2007), Sindhya et al. (2008) for multiobjective optimization
problems.
In this study, we are interested in using a classical penalty function approach with
our proposed bi-objective approach, mainly due to the simplicity and popularity of
penalty function approaches for handling constraints. Instead of using a number of
penalty parameters, one for each constraint as proposed in Eq. (10.2), a normalization
technique of each constraint may help us use only one penalty parameter. Most
resource or limitation-based constraints usually appear with a left-side term (gj (x))
restricted to have a least value bj , such that gj (x) bj . In such constraints, we suggest
the following normalization process:
g j (x) = gj (x)/bj 1 0.

(10.7)

A similar normalization can be applied to equality constraints as well. We then use


the following unconstrained penalty term, requiring only one penalty parameter R:
P(x, R) = f (x) + R

J

gj (x).

(10.8)

j=1

Here, the purpose of the penalty parameter is to make a balance of the overall constraint violation to the objective function value. If an appropriate R is not chosen, the
optimum solution of the above penalized function P() will not be close to the true
constrained minimum solution. There is an intimate connection to this fact with our
bi-objective problem given in Eq. (10.5), which we discuss next.

264

R. Datta and K. Deb

The overall constraint


violation arising from all inequality constraints can be
written as CV(x) = Jj=1 gj (x). Thus, the penalized term given in Eq. (10.8) can
be written as follows:
P(x, R) = f (x) + R CV(x),
= f2 (x) + Rf1 (x),

(10.9)
(10.10)

where f1 () and f2 () are described in Eq. (10.5). It is well known that one way to solve
a two-objective minimization problem (minimize {f1 (x), f2 (x)}) is to convert the
problem as a weighted-sum minimization problem (Chankong and Haimes 1983):
minimize Fw1 ,w2 (x) = w1 f1 (x) + w2 f2 (x).

(10.11)

In the above formulation, w1 and w2 are two nonnegative numbers (and both are not
zero). It is proven that the solution to the above problem is always a Pareto-optimal
point of the two-objective optimization problem (Miettinen 1999). Moreover, the
optimal point of problem (10.11) is a particular point on the Pareto-optimal front
which minimizes Fw1 ,w2 . For a convex Pareto-optimal front, the optimal point for
the weighted-sum approach is usually the point on which the linear contour line
of the weighted-sum function is tangent to the Pareto-optimal front, as depicted in
Fig. 10.5. The contour line has a slope of m = w1 /w2 .
Against this background, let us now compare Eqs. (10.11 with 10.10). We observe
that solving the penalized function P() given in Eq. (10.10) is equivalent to solving
the bi-objective optimization problem given in Eq. (10.5) with w1 = R and w2 = 1.
This implies that for a chosen value of penalty parameter (R), the corresponding
optimal solution will be a Pareto-optimal solution to the bi-objective problem given
in Eq. (10.5), but need not be the optimal solution for the original single-objective
optimization problem (or solution A). This is the reason why the penalty function

Fig. 10.5 The effect of


weights in the weight-sum
approach for a generic
bi-objective optimization

Feasible objective space


f2

A
w1
w2
f1
Paretooptimal front

10 Evolutionary Constrained Optimization: A Hybrid Approach

265

approach is so sensitive to R. As a result, different R values in the penalty function


approach produce different optimized solutions.
This connection makes one aspect clear. Let us say that at CV = 0, the slope
of the Pareto-optimal front of the bi-objective problem is R0 , or m = R0 , as
illustrated in Fig. 10.4. Thus, for R R0 , the optimal solution of the corresponding
penalized function (Eq. (10.10)) is nothing but the constrained optimum solution.
This reveals that for any problem there exists a critical lower bound of R which will
theoretically cause the penalty function approach to find the constrained minimum.
This critical value (R0 ) is nothing but the slope of the Pareto-optimal curve at the
zero constraint violation solution. However, the information of this critical R is not
known a priori and here we propose our hybrid bi-objective-cum-penalty-function
approach to compute R0 for this purpose.
The key issue is then to identify the critical R for a particular problem, as it involves
knowing the optimal solution A beforehand. However, there is another fact that we
can consider here to avoid computing R0 . It also seems that if R is larger than R0 the
corresponding minimum of the penalized function P() will also be the constrained
minimum of the original problem. Extending the idea, we can then use an R which
is arbitrarily large (say 106 or more) for this purpose and be done with for every
problem. Theoretically, for such a large R, the idea of solving the penalized function
should work, but there is a practical problem that does not allow us to use such a
large value of R. With an unnecessarily large R, the objective function f () has almost
no effect on P(). The problem becomes more of a constraint satisfaction problem,
rather than a constrained optimization problem. In such a case, the search is directed
toward the feasible region and not specifically directed toward the constrained minimum solution. If particularly this solution is not close to the optimum solution, it
then becomes difficult to converge to the constrained minimum solution. With a large
penalty parameter, there is a scaling problem which is also critical for the classical
gradient-based methods. When solutions come close to the constrained boundary,
any numerical gradient computation will involve evaluation of some solutions from
the feasible region and some from the infeasible region to utilize the finite difference
idea. Since infeasible solutions are heavily penalized, there will be large difference
in the function values, thereby causing an instability in the numerical derivative computations. This is the reason that the classical penalty function approach (Reklaitis
et al. 1983) considers a successive use of penalty function method with a carefully
chosen sequence of R.
In the following subsection, we present our hybrid methodology which would
find an appropriate R through a bi-objective optimization adaptively.

10.4.3 Proposed Algorithm


Based on the bi-objective principles of handling a constrained optimization problem
and the use of a penalty function approach mentioned above, we now propose the

266

R. Datta and K. Deb

following hybrid constrained-handling algorithm. First, we set the generation counter


t = 0.
Step 1:

Step 2:

Step 3:

Apply an EMO algorithm to the bi-objective optimization problem to find


the nondominated front:

minimize f (x),

minimize CV(x),
(10.12)
subject to CV(x) c,

x(L) x x(U) .
The constraint is added to find the nondominated solutions close to
minimum-CV(x) solution. Since CV(x) is the normalized constraint violation (Eq. (10.6)), it is suggested that c = 0.2J be chosen for problems
having no equality constraints and c = 0.4(J + K) in the presence of
equality constraints. To have an adequate number of feasible solutions in
the population to estimate the critical penalty parameter R0 , we count the
number of feasible solutions (checked with CV 106 ). If there are more
than three bi-objective feasible solutions (with CV c) in the population,
we proceed to Step 2, else increment generation counter t and repeat Step 1.
If t > 0 and ((t mod ) = 0), compute Rnew from the current nondominated front as follows. First, a cubic-polynomial curve is fitted for the
nondominated points (f = a + b(CV) + c(CV)2 + d(CV)3 ) and then the
penalty parameter is estimated by finding the slope at CV = 0, that is,
R = b. Since this is a lower bound on R, we use R = rb, where r is
a weighting parameter greater than equal to one. So as not to have abrupt
changes in the values of R between two consecutive local searches, we set
Rnew = (1 w)Rprev + wR, where w is a weighting factor. In the very first
local search, we use Rnew = R.
Thereafter, the following penalized function is optimized with Rnew computed from above and starting with the current minimum-CV solution:

Jj=1 gj (x),
if K = 0,

2
minimize P(x) = f (x) + Rnew J

K
j=1 gj (x)2 + k=1 h k (x) , otherwise.
x(L) x x(U) .

(10.13)

Step 4:

Say, the solution is x .


If x is feasible and the difference between f (x) and the objective value
of the previous local searched solution (or a given target objective value)
is smaller than a small number f (104 is used here), the algorithm is
terminated and x is declared as the optimized solution. Else, we increment t
by one, set Rprev = R, and proceed to Step 1.

10 Evolutionary Constrained Optimization: A Hybrid Approach

267

It is interesting to note that the penalty parameter is no more a user-tunable


parameter and gets adjusted from the obtained nondominated front. However, we
have introduced three new parameters , r, and w, instead. Our extensive parametric
study (described in Sect. 10.7) on a number of problems shows that two of these
parameters (w and r) do not have much effect on the outcome of our proposed
method, as an appropriate penalty parameter will have an effect on the performance
of an algorithm. Moreover, the parameter [1, 5] works well on all problems
studied here. By contrast, the choice of a penalty parameter in a penalty function
approach is crucial and we attempt to overcome this aspect by making an educated
guess of this parameter through a bi-objective study.
In all our study, we use Matlabs fmincon() procedure to solve the penalized
function (the local search problem of Step 3) with standard parameter settings. Function evaluations needed by fmincon() procedure are added to those needed by the
bi-objective NSGA-II procedure to count the overall function evaluations. Other local
search solvers (such as Knitro (Byrd et al. 2006)) may also be used instead.

10.5 Proof-of-Principle Results


To illustrate the working of our proposed hybrid approach, we consider a two-variable
problem first (Problem P1):
minimize f (x) = (x1 3)2 + (x2 2)2 ,
subject to g1 (x) 4.84 (x1 0.05)2 (x2 2.5)2 0,
g2 (x) x12 + (x2 2.5)2 4.84 0,
0 x1 6, 0 x2 6.

(10.14)

For this problem, only constraint g1 is active at the minimum point. To demonstrate
the working of our proposed hybrid strategy, we use different optimization techniques
to solve the same problem.

10.5.1 Generating the Bi-objective Pareto-Optimal Front


First, we find the Pareto-optimal front for two objectivesminimization of f (x) and
minimization of constraint violation CV(x))near the minimum-CV solution, by
solving the following -constraint problem and by generating the Pareto-optimal
front theoretically (Chankong and Haimes 1983):

minimize f (x),

subject to g1 (x) ,

0 x1 6, 0 x2 6.

(10.15)

268

R. Datta and K. Deb

Fig. 10.6 Pareto-optimal


front from KKT theory and
by proposed hybrid
procedure

f(x)

R0

CV(x)

We use different values of and for each case find the optimum solution by solving
mathematical KKT conditions exactly. The resulting f (x) and CV(x) values are
shown in Fig. 10.6 with diamonds.
The optimum solution of the problem given in Eq. (10.14) is obtained for = 0
and is x = (2.219, 2.132)T with a function value of 0.627. The corresponding
Lagrange multiplier is u1 = 1.74. Later, we shall use this theoretical result to verify
the working of our hybrid procedure.
When we fit a cubic polynomial passing through the obtained points (f -CV) from
the above theoretical analysis, we obtain the following fitted function of the Paretooptimal front:
f = 0.628 1.739(CV) + 1.643(CV)2 0.686(CV)3 .

(10.16)

The slope of this front at CV = 0 is m = b = 1.7393 (Fig. 10.6). Thus, R0 =


m = 1.739 is the critical lower bound of R. This result indicates that if we use
any penalty parameter greater than or equal to R0 , we hope to find the constrained
optimum solution using the penalty function method.
To investigate, we consider a number of R values and find the optimal solution
of the resulting penalty function (with g1 () alone) using KKT optimality conditions.
The solutions are tabulated in Table 10.1.
The unconstrained minimum has a solution (3, 2)T with a function value equal to
zero. When a small R is used, the optimal solution of the penalized function is close to
this unconstrained solution, as shown in the table and in Fig. 10.7. As R is increased,
the optimized solution gets closer to the constrained minimum solution and function
value reaches 0.6274 at around R = 1.74. The solution remains more or less at this
value for a large range of R. For a large value of R (for R > 50), the optimized
solutions move away from the constrained minimum and converges to an arbitrary
feasible solution. With a large R, the problem becomes a constraint satisfaction

10 Evolutionary Constrained Optimization: A Hybrid Approach

269

Table 10.1 Effect of penalty parameter values for the problem given in Eq. (10.14)
Penalty parameter
x1
x2
F
0.01
2.9939
0.1
2.9406
1
2.4949
1.5
2.3021
1.75
2.2189
10
2.2187
15
2.2191
50
2.2215
Theoretical optimum (using Eq. (10.18))
1.74
2.219

CV

2.0010
2.0096
2.0856
2.1183
2.1303
2.1302
2.1326
2.1469

0.0085
0.0812
0.5330
0.6181
0.6274
0.6274
0.6274
0.6277

0.8421
0.7761
0.2507
0.0780
0.0001
0
0
0

2.132

0.627

Solutions are obtained using KKT optimality conditions

Fig. 10.7 The feasible


search region is within the
two circular arcs for the
problem given in Eq. (10.14).
Results for different penalty
parameter values are shown

3.5

3.0

Infeasible
region

2.0
2.5

1.5

Feasible
region
1.0 0.1
1.5 0.5 R=0.01
150
0.5
0.79
1,500
2
1.0
R=15,000
1.75

Infeasible
region

problem. Since constraint satisfaction becomes the main aim, the algorithm converges
to any arbitrary feasible solution. This example clearly shows the importance of
setting an appropriate value of R. Too small or too large values may produce infeasible
or an arbitrary feasible solution, respectively.

10.5.2 Relation Between Critical Penalty Parameter


and Lagrange Multiplier
For a single active constraint (g1 (x) 0) at the optimum x , there is an important
result we would like to discuss here. The KKT equilibrium conditions for the problem
given in Eq. (10.1) without equality constraints are as follows:

270

R. Datta and K. Deb

f (x ) u1 g1 (x ) = 0,

g1 (x ) 0,
u1 g1 (x ) = 0,
u1 0.

Here, any variable bound that will be active at the optimum must also be considered as
an inequality constraint. Next, we consider the penalized function given in Eq. (10.2).
The solution (xp ) of the penalized function (given at Eq. (10.8)) at an Rcr R0 can
be obtained by setting the first derivative of P() to zero:
f (xp ) + Rcr

dg1 (xp )
g1 (xp ) = 0.
dg1

(10.17)

The derivative of the bracket operator at g1 = 0 does not exist, as at a point for which
g1 = 0+ , the derivative is zero and at a point for which g1 = 0 , the derivative is
1. But considering that an algorithm usually approaches the optimum from the
infeasible region, the optimum is usually found with an arbitrarily small tolerance
on constraint violation. In such a case, the derivative at a point xp for which g1 = 0
is 1. The comparison of both conditions states that Rcr = u1 . Since xp is arbitrarily
close to the optimum, thus the second and third KKT conditions above are also
satisfied at this point with the tolerance. Since u1 = Rcr and the penalty parameter
R is chosen to be positive, u1 > 0. Thus, for a solution of the penalized function
formed with a single active constraint, we have an interesting and important result:
Rcr = u1 .

(10.18)

For the example problem of this section, we notice that the u1 = 1.74 obtained from
the KKT condition is identical to the critical lower bound on Rcr .
Finding the bi-objective Pareto-optimal front through a generating method verified
by KKT optimality theory and by verifying the derived critical penalty parameter
with the theoretical Lagrange multiplier obtained through the KKT optimality theory,
we are now certain about two aspects:
1. The obtained bi-objective front is optimal.
2. The critical penalty parameter obtained from the front is adequate to obtain the
constrained minimum.

10.5.3 Applying the Proposed Hybrid Strategy


We now apply our proposed hybrid strategy to solve the same problem.
In Step 1, we apply NSGA-II (Deb et al. 2002) to solve the bi-objective optimization problem (minimize {f (x), CV(x)}). Following parameter values are used:
population of size 60, crossover probability 0.9, mutation probability 0.5, crossover

10 Evolutionary Constrained Optimization: A Hybrid Approach

271

Table 10.2 Function evaluations, FE (NSGA-II and local search), needed by the hybrid algorithm
in 25 runs
Best
Median
Worst
FE
f

677 (600 + 77)


0.627380

733 (600 + 133)


0.627379

999 (900 + 99)


0.627379

index 10, and mutation index 100 (Deb 2001). Here, we use = 5, r = 2, and
w = 0.5. The hybrid algorithm is terminated when two consecutive local searches
produce feasible solutions with a difference of 104 or less in the objective values.
The obtained front is shown in Fig. 10.6 with small circles, which seems to match
with the theoretical front obtained by performing KKT optimality conditions on
several -constraint versions (in diamonds) of the bi-objective problem.
At best, our hybrid approach finds the optimum solution in only 677 function
evaluations (600 needed by EMO and 77 by fmincon() procedure). The corresponding solution is x = (2.219, 2.132)T with an objective value of 0.627380.
Table 10.2 shows the best, median, and worst performance of the hybrid algorithm
in 25 different runs.
Figure 10.8 shows the variation of population-best objective value with generation
number for the median performing run (with 999 function evaluations). The figure
shows that the objective value reduces with generation number. The algorithm could
not find any feasible solution in the first two generations, but from generation 3,
the best population member is always feasible. At generation 5, the local search
method is called the first time. The penalty parameter obtained from the NSGA-II
front is R = 1.896 at generation 5 and a solution very close to the true optimum is

R=1.896
Infeasible
points

R
R=1.722

Fig. 10.8 Objective value reduces with generation for the problem in Eq. (10.14)

272

R. Datta and K. Deb

obtained. However, at generation 10, a solution with f = 0.62738 is found by the


local search. Since our algorithm terminates when in two consecutive local searches,
solutions within a function difference of 104 or smaller is obtained, the algorithm
continues for another round of local search at generation 15 before termination. At
this generation, the penalty parameter value is found to be R = 1.722, which is close
to the critical R for this problem, as shown in Table 10.1.
Thus, it is observed that the bi-objective algorithm of the proposed hybrid approach
is able to find a near-theoretical Pareto-optimal front with generations and derive a
penalty parameter close to the critical R needed for the penalty function method to
find the constrained minimum point accurately.

10.5.4 Problem P2
We consider another two-variable problem as another test case:

minimize f (x) = x12 + x22 10x1 + 4x2 + 2,

subject to g1 (x) x12 x2 + 6 0,


g2 (x) x2 x1 0,

0 x1 10, 0 x2 10.

(10.19)

For this problem, constraint g2 is active at the minimum point. To demonstrate the
working of our proposed hybrid strategy, first, the Pareto-optimal front is obtained
for two objectivesminimization of f (x) and minimization of constraint violation
CV(x))near the minimum-CV solution. We form the following -constraint problem to generate the Pareto-optimal front theoretically:

minimize f (x),

subject to g2 (x) ,
(10.20)

0 x1 10, 0 x2 10.
We use different values of and for each case find the optimum solution by solving
mathematical KKT conditions exactly. The resulting f (x) and CV(x) values are
shown in Fig. 10.9 with diamonds.
The optimum solution of the problem given in Eq. (10.19) is obtained for = 0
and is x = (1.5, 1.5)T with a function value of 2.5. The corresponding Lagrange
multiplier is u1 = 7.
Now we investigate our proposed hybrid strategy by solving the bi-objective
optimization problem (minimize {f (x), CV(x)}). The parameters are: population of
size 40, other parameters are same as Problem P1. The obtained front is shown
in Fig. 10.9 with small circles, which matches with the theoretical front. When we
fit a cubic polynomial passing through the obtained points (f -CV), we obtain the
following approximate form of the Pareto-optimal front:
f = 2.499 6.962(CV) + 0.305(CV)2 + 0.334(CV)3 .

(10.21)

10 Evolutionary Constrained Optimization: A Hybrid Approach

273

The slope of this front at CV = 0 is m = b = 6.962 (Fig. 10.9). Thus, R = m =


6.962 is the critical lower bound of R. This study indicates that if we use any penalty
parameter larger than or equal to this lower bound of R, we hope to find the optimum
solution using the penalty function method. Our approach is able to find the penalty
parameter and our result matches with the theoretical result.

10.5.5 Problem P3
Next, we consider a 20-dimensional problem having a nonconvex feasible region to
demonstrate another proof-of-principle result:
minimize f (x) =

20

(xi

i=1
20

subject to g1 (x)

i=1

1)2 ,

xi2 1,

g2 (x) (x1 0.01)2 +


g3 (x) (x1 0.02)2 +
g4 (x) (x1

0.03)2

g5 (x) (x1 0.04)2 +

20

i=2
20

i=2
20

i=2
20

xi2 2,
xi2 4,
xi2 6,
xi2 8,

g6 (x) (x1 0.05)2 +


xi2 10,

i=2

20

2
2

g7 (x) (x1 0.06) +


xi 12,

i=2

20

2
2

g8 (x) (x1 0.07) +


xi 14,

i=2

20

2
2
g9 (x) (x1 0.08) +
xi 16,

i=2

20

2
2
g10 (x) (x1 0.09) +
xi 18,

i=2

0 xi 10, i = 1, . . . , 20.
i=2
20

(10.22)

For this nonlinear problem as well, only one constraint (g1 ) is active at the minimum
point. Since only one constraint is active at the minimum, we can verify the accuracy
of obtained R by using Eq. (10.18).
We apply our proposed hybrid strategy to solve the corresponding bi-objective
optimization problem (minimize {f (x), CV(x)}). Following parameter values are

R. Datta and K. Deb

f(x)

274

CV(x)
Fig. 10.9 Pareto-optimal front from KKT theory and by proposed hybrid procedure for problem P2

Table 10.3 Function


evaluations, FE (NSGA-II
and local search) for problem
P3, needed by the hybrid
algorithm in 25 runs

Bi-Obj
LS
FE
f

Best

Median

Worst

11,200
488
11,688
12.057798

12,800
805
13,605
12.055730

16,000
932
16,932
12.055730

used: population of size 320 (16 times the number of variables). Other parameters
are the same as above two problems.
The termination criterion is identical to that used in previous two problems. In 25
different runs, the hybrid approach, at its best, finds the optimum solution in 11,688
function evaluations (11,200 needed by NSGA-II and 488 by the fmincon()
procedure). Table 10.3 presents the best, median, and worst function evaluations
needed by the hybrid approach. For the median run, the procedure terminates after
35 generations and the obtained front at the final generation is shown in Fig. 10.10
with small circles. A variation of the best objective value with generation number,
shown in Fig. 10.11, indicates that at generation 30 (only after six penalized function optimizations at a gap of = 5 generations), the constrained optimum point is
found.
To investigate the accuracy of the obtained nondominated front and the obtained
constrained minimum point, we use the -constraint strategy as before and the resulting KKT-optimal points are shown as diamonds in Fig. 10.10. The NSGA-II obtained
front is close to these theoretical results. The optimum solution of the problem given
in Eq. (10.22) is obtained next by setting = 0 and by using KKT optimality conditions. The optimal solution is found to be xi = 0.2236 for i = 1, . . . , 20 with a
function value of f = 12.0557. The corresponding Lagrange multiplier for g1 is
found to be u1 = 3.4721 and for all other constraints uj = 0.

10 Evolutionary Constrained Optimization: A Hybrid Approach

275

f(x)

R0

with

CV(x)
Fig. 10.10 Pareto-optimal front from KKT theory and by proposed hybrid procedure for 20 variable
problem P3, given in Eq. (10.22)

sible

Fig. 10.11 Objective value reduces with generation for the problem P3, given in Eq. (10.22)

A cubic-polynomial curve passing through the obtained points (f -CV) of the


obtained NSGA-II front is as follows:
f = 12.088 3.4664(CV) + 0.8845(CV)2 0.657(CV)3 .

(10.23)

The slope of this front at CV = 0 is computed to be m = b = 3.4664 (Fig. 10.10).


Using our earlier argument, R0 = m = 3.4664 becomes the critical lower bound
of R. Interestingly, this critical R value is close to the optimal Lagrange multiplier
(u1 ) obtained from the theoretical study above, which is another result we obtained
in Sect. 10.5.2. This study also indicates that if we use any penalty parameter larger

276

R. Datta and K. Deb

Table 10.4 Effect of penalty parameter values for the problem P3, given in Eq. (10.22)
Penalty parameter
xi
F
CV
0.005
0.9901
0.01
0.9805
0.1
0.8333
0.5
0.5000
1
0.3333
2
0.2238
3
0.2237
3.5
0.2236
4
0.2236
5
0.2236
10
0.2236
100
0.2236
1000
0.2236
Theoretical optima (using Eq. (10.18))
3.4721
0.2236

0.1880
0.3722
3.1333
9.0000
11.3333
12.0557
12.0557
12.0557
12.0557
12.0557
12.0557
12.0557
12.0557
12.0557

18.6069
18.2271
12.8889
4.0000
1.2222
0
0
0
0
0
0
0
0
0

Solutions are obtained using KKT optimality conditions

than or equal to R0 , we hope to find the optimum solution using the penalty function
method. To investigate, we consider a number of R values and find the optimal
solution of the resulting penalized function (with g1 () alone) using KKT optimality
conditions. The solutions are tabulated in Table 10.4.
The table shows that at around R = 3.5, the optimized solution to the penalized
function is close to the true constrained minimum solution, thereby supporting all
our computations above.
With these three proof-of-principles results verified from theoretical analysis, we
are now ready to apply our proposed hybrid methodology to a number of standard
test problems borrowed from the constrained optimization literature.

10.6 Simulation Results on Standard Test Problems


Now the proposed strategy will be applied in some of the constrained single-objective
test problems, the details of which can be found in Liang et al. (2006). Table 10.6
presents the best, median, and worst function evaluations needed by our approach out
of 25 independent runs. NSGA-II and local search function evaluations are shown
separately. Many different evolutionary optimization methodologies have been used
to solve these test problems in the past (Zavala et al. 2009; Leguizamn and Coello
2009; Mezura-Montes and Palomeque-Ortiz 2009; Takahama and Sakai 2009; Brest
2009; Ray et al. 2009; Wang and Cai 2012).

10 Evolutionary Constrained Optimization: A Hybrid Approach

277

The following parameter values are used for our hybrid approach: population
size (inequality constraints) = 16n (where n is the number of variables, unless
stated otherwise), SBX probability = 0.9, SBX index = 10, polynomial mutation
probability = 1/n, and mutation index = 100. The termination criterion is described
in Sect. 10.4.3. A run is called unsuccessful, if within 2,00,000 number of function
evaluations of NSGA-II, a feasible solution is not found. Here, we also compare our
results with a few top existing studies.

10.6.1 Problem g01


The g01 problem has n = 13 variables and nine (J = 9) inequality constraints.
Figure 10.12 presents the history of the best objective value of the population and
the corresponding constraint violation value with the generation counter for a typical
simulation of 25 runs. The objective value at a generation is joined with a solid line
with the next generation value if the solution is feasible at the current generation,
otherwise a dashed line is used.
The figure shows that for the first seven generations, no feasible solution (with
CV 106 ) is found by our approach. At generation 8, the first feasible solutions
appears. The corresponding CV value is zero, indicating that the obtained solution is
feasible. Note that the credit for finding the first feasible solution goes entirely to the
bi-objective approach, as the penalty function approach does not take place before
generation 10. Since = 5 is used, the penalty function is called only at generations
5, 10, and so on, but since no bi-objective feasible solution with CV 0.2J existed at
generation 5, Step 3 was not executed for this run. It is interesting to note that as soon
as the first feasible solution is found at generation 8, the population-best function

Infeasible
points

CV

Fig. 10.12 Objective value reduces with generation for problem g01

278

R. Datta and K. Deb

value reduces thereafter. At generation 10, the first local search (penalty function
approach) is performed due to the existence of more than three bi-objective feasible
solutions in the population and the constrained optimum solution is obtained by the
penalized function approach, which is reflected at the statistics of generation 11. The
optimized objective value is f = 15. At generation 15, the penalty parameter value
is found to be R = 3.36. Table 10.6 presents the best, median, and worst function
evaluations obtained using the proposed approach. The least number of function
evaluations needed in any run by our approach is only 2,630, of which only 630
evaluations are needed by the local search.
Table 10.7 compares the function evaluations needed by our approach with four
leading studies from the literature. Instead of comparing obtained solutions with
objective function values of these algorithms, here we compare the number of function evaluations needed by each algorithm to achieve an identical accuracy in the
final solution. Zavala et al. (2009) used a particle swarm optimizer with two new
perturbation operator and a ring structure of neighborhood topology to solve constrained problems. Takahama and Sakai (2009) used a differential evolution (DE)based approach with gradient-based optimization procedures to find feasible points
and also emphasizes feasible solutions to make an efficient search. Brest (2009) also
used a DE approach but updated its parameters self-adaptively. A recent study (Wang
and Cai 2012) proposed a bi-objective differential evolution to handle the constraints.
Another recent approach (Elsayed et al. 2011) suggested self-adaptive multioperator
differential evolution (SAMO-DE) and SAMO-GA approaches and solved the same
problems. The study reported similar results but with 2,40,000 function evaluations.
The highlight of our study is that the proposed hybrid approach requires only
2,630 function evaluations compared to 18,594 function evaluations in terms of
best performance of an existing algorithm (Takahama and Sakai 2009)an order of
magnitude better. In fact, the worst performance (4,857) of the proposed algorithm is
much better than the best performance of any other existing evolutionary algorithm
for this problem. An efficient use of the bi-objective approach and the penalty function
method helps find an appropriate penalty parameter for the overall algorithm to
converge to the constrained optimum quickly and accurately.

10.6.2 Problem g02


The g02 problem has 20 variables and two constraints. In a particular run described
here, all solutions are found to be feasible right from the initial generation. However,
due to the unavailability of an adequate number of bi-objective feasible solutions up
to 30 generations, the local search could not take place and the first local search is
executed at generation 35. The adaptation of penalty parameter, R is also shown in
Fig. 10.13. The variation of R is shown by a dashed line. The figure is plotted till
two consecutive local searches produce solutions with less than 104 difference in
objective values.

10 Evolutionary Constrained Optimization: A Hybrid Approach

279

Fig. 10.13 Objective value reduces with generation for problem g02

The penalty parameter gets increased adaptively from around 0.04 to a value of
0.10 at 50th generation and thereafter reduces to a value close to 0.04 again. The
obvious question to ask is why a similar penalty parameter value does not find a
near-optimal solution at generation 35, whereas it seems to find the same at around
generation 75? The answer lies in the fact that in handling multimodal or other
complex objective functions, the solution of the penalized function approach not
only depends on an appropriate penalty parameter value, but also on the chosen
initial solution. At generation 35, some points are found to be feasible by NSGA-II,
but they are not close to the constrained optimum. Due to the multimodality nature of
the objective function in this problem, although a critical R (=2 0.0415) was used
in the penalized function approach, the solution of the penalized function by the local
search procedure is sensitive to the initial point used in the local search process and
it cannot find a solution close to the true constrained minimum. Table 10.5 shows the
best solution (feasible) obtained by NSGA-II at generation 35 and the corresponding
solution found by the local search.
The local search makes a significant change in variables 8, 11, and 12 to reduce
the objective value from 0.691 to 0.749 (shown in bold in the table), but the true
constrained minimum is at f = 0.803619. However, at generation 75, a penalty
parameter value of R = 2 0.0377 was used (close to that at generation 35), but
since a much better initial solution (found by NSGA-II) was used for the local search
operation, a much better objective value (f = 0.802) is obtained, as shown in the
table. As is evident from Fig. 10.13, at an intermediate generation (50), the penalty
parameter is adaptively increased to almost R = 2 0.1 to overemphasize constraint
satisfaction in an effort to search more useful feasible solutions.
Figure 10.13 also shows the variation in the population-best objective value. The
proposed hybrid approach needs much lesser function evaluations (25,156), compared to the four best past approaches taken from the literature (best-reported algorithm takes 87,419 function evaluations), as shown in Table 10.7 to achieve a solu-

280

R. Datta and K. Deb

Table 10.5 Two local search statistics at generations 35 and 75 for problem g02
Variable, x
(g1 (x), g2 (x))
At generation 35, before local search
(3.21,3.16,3.00,3.24,2.88,3.11,2.78,0.28,0.70,0.56,
3.05,0.64,0.34,0.38,0.55,0.51,0.85,0.53,1.01,0.11)
At generation 35, after local search
(3.11,3.04,3.11,2.98,3.10,2.82,2.91,2.81,0.30,0.54,
0.78,2.80,0.40,0.31,0.35,0.44,0.50,0.26,0.58,0.38)
At generation 75, before local search
(3.14,3.18,3.07,3.05,3.09,3.07,2.93,2.95,0.54,0.45,
0.50,0.47,0.45,0.47,0.49,0.48,0.45,0.41,0.46,0.42)
At generation 75, after local search
(3.14,3.13,3.08,3.08,3.09,3.05,2.93,2.95,0.54,0.45,
0.49,0.47,0.46,0.47,0.49,0.48,0.45,0.42,0.46,0.41)

f (x)

(0.22,119.11)

0.691

(0.44,118.49)

0.749

(0.01,119.95)

0.801

(0.00,119.98)

0.802

tion with similar accuracy. Table 10.6 shows the best-known solution with the best,
median, and worst solutions found by our approach. In only one out of 25 cases, our
approach is unable to find a feasible solution.

10.6.3 Problem g03


The problem has 10 variables and a single equality constraint (Liang et al. 2006).
The constraint is not normalized. Figure 10.14 indicates the evolution of the objective function value of the population, eventually to the best and the corresponding

Infeasible
points

CV

Fig. 10.14 Function value reduces with generation for g03 problem

10 Evolutionary Constrained Optimization: A Hybrid Approach

281

constraint violation value with the generation for a typical simulation out of 25 initial populations. The figure shows that initially solutions are infeasible up to the
fourth generation. At generation 5, the first local search is performed, as at least four
solutions satisfying the constraint in Eq. (10.12) are found. After the 10th generation
when the second local search is done, the approach reaches near the optima. At the
15th generation, two consecutive values of local searched solution are close to each
other and the algorithm is terminated. The figure also shows the change of CV from
an positive value (infeasible) to zero (feasible). Table 10.6 presents the function evaluations required by our approach to terminate according to the termination condition
described in the algorithm.
However, to compare our approach with existing studies, we rerun our algorithm with a different termination condition. In Step 4, when the objective function
value of the feasible xLS is close to the best-reported function value (within 104 ),
the algorithm is terminated. Table 10.7 tabulates and compares the overall function
evaluations needed by our approach with four other methodologies taken from the
literature.

10.6.4 Problem g04


Problem g04 has five variables and six constraints. Constraints g1 and g6 are active at
the constrained minimum point. Figure 10.15 shows the variation in the populationbest objective value for a particular run. All solutions are feasible right from the
initial population. The adaptation of R is also shown in the figure. At generation 5,
the first local search is performed and a near-optimal solution is found. However,
the algorithm continues for a few more local searches due to dissatisfaction of our
stipulated termination criterion. Only after the third local search at generation 15, the
solution comes close to the best-known constrained minimum solution. The value of
R at the end of generation 20 is found to be 7734.8 for this simulation run.
All 25 runs find a feasible solution close (within 104 ) to the best-known optimum.
The best performance of our algorithm requires only 1,210 evaluations, whereas
the best-reported existing EA methodology takes at least 10 times more function
evaluations to achieve similar accurate solutions. In terms of the median and worst
performance, our approach is an order of magnitude faster.

10.6.5 Problem g05


The problem has five variables with five constraints, of which three constraints are
of equality type and are multimodal in nature (Liang et al. 2006). The constraints are
normalized as follows:

282

R. Datta and K. Deb

Table 10.6 Comparison of obtained solutions with the best-known optimal solutions and number
of solutions found of 25 runs with a termination criterion of two local searched solutions having a
maximum difference of f = 104
Problem

Best known
optimum

Obtained feasible values


Best
Median

g01 (FE)
NSGA-II + Local
(f )
g02 (FE)
NSGA-II + Local
(f )
g03 (FE)
NSGA-II + Local
(f )
g04 (FE)
NSGA-II + Local
(f )
g05 (FE)
NSGA-II + Local
(f )
g06 (FE)
NSGA-II + Local
(f )
g07 (FE)
NSGA-II + Local
(f )
g08 (FE)
NSGA-II + Local
(f )
g09 (FE)
NSGA-II + Local
(f )
g10 (FE)
NSGA-II + Local
(f )
g11 (FE)
NSGA-II + Local
(f )

2,630
2,000 + 630
15
15
26,156
24,000 + 2,156
0.803619
0.803580
3,813
3,000 + 813
1.000500
1.000350
1,210
800 + 410
30665.538671 30665.538712
9,943
8,400 + 1,543
5126.496714
5125.931709
1,514
1,200 + 314
6961.813875 6961.813796
15,645
10,000 + 5,645
24.306209
24.305902
822
800 + 22
0.095825
0.095825
2,732
2,000 + 732
680.630057
680.630127
7,905
3,200 + 4,705
7049.248020
7, 049.248102
1,334
1,200 + 134
0.749900
0.749534

3,722
3,000 + 722
15
50,048
46,400 + 3,648
0.803559
4,435
3,000 + 1,435
1.000349
1,449
800 + 649
30665.538747
11,994
10,000 + 1,994
5126.338978
4,149
2,800 + 1,349
6961.813859
30,409
19,000 + 11,409
24.305867
1,226
1,200 + 26
0.095825
4,580
4,000 + 580
680.630101
49,102
18,400 + 30,702
7,049.2481469
1,559
1,400 + 159
0.749776

Worst
4,857
4,000 + 857
15
63,536
59,200 + 4,336
0.803563
11,920
9,000 + 2,920
1.001999
2,295
1,200 + 1,095
30665.538670
14,539
12,400 + 2,139
5126.336735
11,735
8,000 + 3,735
6961.813873
64,732
42,000 + 22,732
24.305881
2,008
2,000 + 8
0.095825
5,864
5,000 + 864
680.630109
1,80,446
62,800 + 1,17,646
7,049.248035
1,612
1,400 + 212
0.749758

Best, median, and worst function evaluations for successful runs with NSGA-II and local search are
shown separately. Since the algorithm is terminated when two consecutive local searches produce
similar solutions, in some cases, the smallest FE solution need not be the best in terms of function
value

10 Evolutionary Constrained Optimization: A Hybrid Approach

283

Table 10.7 Comparison of function evaluations needed by the proposed approach and four existing
approaches
Problem
Zavala
Takahama
Brest
Wang
Proposed
approach
g01

g02

g03

g04

g05

g06

g07

g08

g09

g10

g11

Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst

80,776
90,343
96,669
87,419
93,359
99,654
97,892
1,06,180
1,22,540
93,147
1,03,308
1,109,15
1,49,493
1,65,915
1,88,040
95,944
1,09,795
1,30,293
1,14,709
1,38,767
2,08,751
2,270
4,282
5,433
94,593
1,03,857
1,19,718
1,09,243
1,35,735
1,93,426
89,734
1,12,467
1,27,650

18,594
19,502
19,917
1,08,303
1,14,347
1,29,255
30,733
35,470
41,716
12,771
13,719
14,466
15,402
16,522
17,238
5,037
5,733
6,243
60,873
67,946
75,569
621
881
1,173
19,234
21,080
21,987
87,848
92,807
1,07,794
4,569
4,569
4,569

51,685
55,211
57,151
1,75,090
2,26,789
2,53,197
1,84,568
2,15,694
2,54,105
56,730
62,506
67,383
49,765
53,773
57,863
31,410
34,586
37,033
1,84,927
1,97,901
2,21,866
1,905
4,044
4,777
79,296
89,372
98,062
2,03,851
2,20,676
2,64,575
52,128
83,442
1,05,093

1,01,908
1,22,324
1,36,228
1,70,372
1,89,204
2,22,468
63,364
75,860
86,772
63,540
73,572
79,556
26,580
28,692
31,508
26,932
35,908
41,716
1,42,388
1,56,644
1,66,148
2,820
5,988
8,276
63,540
70,404
83,780
1,71,252
1,83,924
1,92,900
3,532
6,164
8,100

2,630
3,722
4,857
26,156
50,048
63,536
4,687
5,984
33,336
1,210
1,449
2,295
10,048
11,101
25,671
1,514
4,149
11,735
15,645
30,409
64,732
822
1,226
2,008
2,732
4,850
5,864
7,905
49,102
1,80,446
1,334
1,559
1,612

Here, a run is terminated when a solution within 104 from the best-known function value is
obtained

284

R. Datta and K. Deb

Fig. 10.15 Objective value reduces with generation for problem g04

g 1 (x) = [x4 x3 + 0.55] /0.55 0,


g 2 (x) = [x3 x4 + 0.55] /0.55 0,
h 3 (x) = [1000 sin(x3 0.25) + 1000 sin(x4 0.25) + 894.8 x1 ] /1000 = 0,
h 4 (x) = [1000 sin(x3 0.25) + 1000 sin(x3 x4 0.25) + 894.8 x2 ] /1000 = 0,
h 5 (x) = [1000 sin(x4 0.25) + 1000 sin(x4 x3 0.25) + 1294.8] /1000 = 0.

Figure 10.16 shows that up to 90 generations, no feasible solution to the original


problem is found. This is due to the existence of equality constraints in the problem. We already discussed that equality constraints shrinks the feasible region and
finding feasible solution is very difficult. The figure also shows the variation in the

points
CV

Fig. 10.16 Function value reduces with generation for g05 problem

10 Evolutionary Constrained Optimization: A Hybrid Approach

285

population-best objective value for a particular run. At generation 5, at least four


solutions satisfying Eq. (10.12) are found and the local search is executed. It helps
reduce the objective value; however, another 17 local searches are needed to get
close to the constrained minimum solution. Since the variation in function values
between two consecutive local searches (generations 120 and 125) is within 104 ,
the algorithm terminates then. The objective function value at a generation is joined
with a solid line with the next generation value if the solution is feasible at the current generation, otherwise a dashed line is used. Table 10.6 shows the best, median,
and worst function evaluations with corresponding objective function value obtained
using the proposed approach.
The problem is also solved using a termination criterion depending on closeness to
the best-reported solution. When two-objective function value from the local search
is feasible and the difference is within 103 , the algorithm terminates. Table 10.7
indicates that in terms of best and median function evaluations, our approach is
better than all others. If we compare in terms of worst performance, (Takahama and
Sakai 2009) found better solutions compared to other algorithms.

10.6.6 Problem g06


Problem g06 has two variables and two constraints, but the objective function and
the feasible search space are nonconvex. The feasible space is also remarkably small
compared to the variable space for optimization. Due to these complexities, we use
a population of size 80 here.
Figure 10.17 shows the generation-wise proceedings of a typical run. For the first
nine generations, no feasible solution is found by our hybrid approach. The first

Infeasible
points

CV

Fig. 10.17 Objective value reduces with generation for problem g06

286

R. Datta and K. Deb

local search takes place at generation 10 and thereafter the population-best objective
value reduces monotonically. However, our strict termination criterion allows the
algorithm to continue a few more local searches to terminate the overall algorithm.
The optimized function value is found to be f = 6961.813796 (slightly better
than the best-known solution). At the end of generation 60, the penalty parameter
value is observed to be around R = 10,903. Table 10.7 presents the best, median,
and worst function evaluations obtained using the proposed approach. In terms of the
best function evaluations, our approach takes less than a third of function evaluations
needed by the best-reported EA. The median performance of our hybrid approach is
also better, but in terms of the worst performance, Takahama and Sakais (Takahama
and Sakai 2009) result is better. In Sect. 10.7, we shall revisit this problem with a
parametric study. All 25 runs are found to be successful with our approach.

10.6.7 Problem g07


This g07 problem has ten variables and eight constraints. Figure 10.18 shows that the
performance of the hybrid procedure for a typical simulation run. Up until generation
10, no feasible solution was found. The algorithm focused in reducing the constraint
violation till this generation. The first local search is applied at generation 10 (when at
least four bi-objective feasible solutions were found). Thereafter, the function value
continuously reduces to a value close to the optimal function value.
Table 10.7 shows that our proposed approach requires almost four times less number of minimum function evaluations than the best-reported results (15,645 compared
to 60,873) and on a median scale our approach requires about half the function evaluations. Our method is also faster in terms of their worst performances.

3000
2500
2000
1500

1000
500

CV

25

Fig. 10.18 Objective value reduces with generation for problem g07

10 Evolutionary Constrained Optimization: A Hybrid Approach

287

CV

0 1

10

15

20

25

Fig. 10.19 Objective value reduces with generation for problem g08

10.6.8 Problem g08


This g08 problem has two variables and two constraints. The objective function is
multimodal. We have used N = 48 for this problem. Table 10.6 shows the function
evaluations and obtained objective values. Figure 10.19 shows that a feasible solution
is found after the first generation itself and a solution close to the best-known optimum
was found only after 10 generations, but due to the multimodal nature of the objective
function, the algorithm took a few more generations to come close (with a objective
value difference of 104 ) to the best-known optimum in this problem. The penalty
parameter R takes a small value (0.4618) at the end of 25 generations, as at the
constrained minimum solution, no constraint is active.
In this problem, Takahama and Sakais (Takahama and Sakai 2009) algorithm
performs slightly better than ours. We discuss a possible reason in the following
paragraph.
Due to the periodic nature of the objective function, this problem has multiple
optimum close to the constrained minimum. As discussed earlier, the penalty function
approach may face difficulties in handling such problems, particularly if there exists a
much better local minimum of the objective function in the vicinity of the constrained
minimum but in the infeasible region. Figure 10.20 shows the objective function
landscape around the feasible region on this problem.
It is evident from the figure that the constrained minimum is surrounded by an array
of local maximum and minimum points. Most of them are infeasible and the scenario
is similar to the multimodal problem shown in Fig. 10.1. If an appropriate penalty
parameter is not chosen, a penalized function may have its global minimum at one
of the local and infeasible minimum point, such as point A or B shown in Fig. 10.20.
This is the reason for our hybrid approach to take relatively more function evaluations
than Takahoma and Sakais differential evolution (DE) based nonpenalty-function
approach.

288

R. Datta and K. Deb


f(x1,x2)

Feasible
region

B
A

x1
Constrained
minimum

x2

Fig. 10.20 Search space near the constrained minimum reveals multiple optima for problem g08

It is worth mentioning here that the positioning of local minimum points occur
at certain interval, as shown in the figure. An algorithm such as DE can exploit
such periods of good regions through its difference operator and may allow points
to jump from one local minimum to another. Takahoma and Sakais approach uses
DE to create new solutions and may have exploited the periodicity of multiple local
minimum points around the constrained minimum and helped find a point near the
constrained minimum quickly. However, we shall show later, a parametric study on
this problem using our approach is able to achieve a more reliable performance and
a better performance over multiple runs with our proposed approach.

10.6.9 Problem g09


This g09 problem has seven variables and four constraints. Constraints g1 and g4
are active at the known minimum point. Figure 10.21 shows that the variation of the

CV

Fig. 10.21 Objective value reduces with generation for problem g09

10 Evolutionary Constrained Optimization: A Hybrid Approach

289

population-best function value with generations. During the first two generations,
the population could not find any feasible point. Since an accuracy of 104 is set for
termination with respect to the best-known objective value, the algorithm takes many
generations to fulfill this criterion; otherwise, a solution very close to the optimum
was found at generation 25, as evident from the figure.
Table 10.7 shows the efficacy of the hybrid approach. Again, the best performance
of our approach (with function evaluations of 2,732) is at least seven times better
than that of the best-known approach (19,234) and our approach is also better in
terms of median and worst-case performance compared to existing algorithms.

10.6.10 Problem g10


This g10 problem has eight variables and six constraints. This problem is known to be
a difficult for an algorithm to converge to the optimal solution accurately. Tables 10.6
and 10.7 shows the function evaluations and obtained objective values. A much better
solution (best f = 7049.248 with 7,905 function evaluations) is obtained using
our proposed approach than the existing parameter-less penalty-based GA approach
(best f = 7060.221 with 320,000 function evaluations) (Deb 2000). Figure 10.22
shows that the proposed procedure finds a feasible solution at generation 10 after
the local search is performed. Thereafter, the function value continuously reduces
to a value close to the best-known optimal function value. Table 10.7 shows that the
best performance of our approach is at least 10 times faster than the best-reported
EA study and our approach requires about half the function evaluations with respect
to median performance. However, in terms of the worst performance, Takahoma
and Sakais approach is better than ours. We shall investigate later whether another
parameter setting in our algorithm performs better in this problem.

CV

Fig. 10.22 Objective value reduces with generation for problem g10

290

R. Datta and K. Deb

Fig. 10.23 Function value reduces with generation for g11 problem

10.6.11 Problem g11


This problem has a single equality constraint (Liang et al. 2006), hence no normalization of the constraint is needed. Figure 10.23 indicates the difference in the
population-best objective value for a typical run out of 25 runs. In this problem, all
solutions are feasible right from the initial population. Here, we show the adaptation
of penalty parameter R with generations. The penalty parameter value increases with
generation, as it learns to adapt R every time the local search fails to find a feasible solution. The local search starts at generation 5 and the algorithm took 7 local
searches to converge. After the third local search operation, at generation 15, the
solution comes close to the best-reported constrained minimum solution. When the
difference between two consecutive local searched solutions is in the order of 104 ,
the algorithm terminates. In this problem, the corresponding value of R is 589.95. The
best performance of our algorithm needs only 1,334 solution evaluations, whereas
the best-reported EA requires more than three times more function evaluations to
achieve a similar solution. Tables 10.6 and 10.7 indicate the same for the problem.

10.6.12 Problem g12


Problem g12 has three variables and nine constraints. We have used a population of
size 16 3 or 48 for this problem. Table 10.9 shows the function evaluations and
obtained objective values. Feasible solutions are found in the first generation itself and
Fig. 10.24 shows how the population-best objective value reduced with generation
counter. In terms of the best performance, Brests (Brest 2009) approach performs
better than ours, but in terms of the median and worst function performance, our
approach is better than all four methods. Interestingly, in all four past approaches,

10 Evolutionary Constrained Optimization: A Hybrid Approach

291

Fig. 10.24 Objective value reduces with generation for problem g12

the required number of function evaluations vary significantly over 25 runs. We


shall revisit this problem during the parametric study to investigate whether the
performance can be enhanced with a better parameter setting.

10.6.13 Problem g13


This problem has five variables and three equality constraints (Liang et al. 2006).
The objective function has an exponential term. Since all constraints are of equality
type, they are not normalized.
Figure 10.25 shows the variation in the objective function value of the populationbest solution and the adaptation of penalty parameter with the increasing number of
1.5
1.4
1.2

1.0
0.8

0.6
0.4
0.2
0.05

Fig. 10.25 Function value reduces with generation for g13 problem

292
Table 10.8 Adaptation of
penalty parameter values for
problem g13

R. Datta and K. Deb


Gen

0
5
10
15
20

0.0000
2.6255
6.8048
17.6183
45.2747

generations. The first local search starts at generation 5 and the algorithm takes three
more local searches to fulfill our termination criteria. Table 10.8 and Fig. 10.25 show
that starting with a lower value of R, the hybrid approach increases with generation
number to a suitable value needed to find the constrained minimum. Tables 10.9 and
10.10 present the results.

10.6.14 Problem g14


This problem has a nonlinear objective function with ten variables and three equality
constraints (Liang et al. 2006). Constraints are not normalized. Table 10.9 shows
the function evaluations and obtained the best, median, and worst objective function
values using the proposed approach. For a particular simulation (out of 25 different
initial populations), no feasible solution with respect to the original problem is found
up to generation 39. However, the first local search starts after 20 generations, due
to the first-time availability of at least four solutions satisfying the constraint in
Eq. (10.12). Thereafter, the proposed procedure takes a few more local searches to
converge close to the best-reported solution. The value of the penalty parameter R at
the final generation for this problem is found to be 297,811.12. Figure 10.26 shows
the evolution of the objective function value of the population from a lower value
(with a constraint violation) to a higher value and thereafter converges close to the
best-known optimum.
Table 10.9 compares our best, median, and worst solutions with corresponding
function evaluations. Table 10.10 shows the FEs when the termination happens when
a solution is found close to the best-known solution. The proposed approach is much
faster than the existing methods.

10.6.15 Problem g15


This problem has a quadratic objective function with only three variables and two
nonlinear equality constraints (Liang et al. 2006). Constraints are not normalized.
Figure 10.27 denotes the variation in the population-best objective value with number
of generations. During the first 24 generations, the population cannot find any feasible
solution with respect to the original problem. However, at least four feasible solutions

10 Evolutionary Constrained Optimization: A Hybrid Approach

293

Table 10.9 Comparison of obtained solutions with the best-known optimal solutions and number
of solutions found of 25 runs with a termination criterion of two local searched solutions having a
maximum difference of f = 104
Problem
g12 (FE)
NSGA-II + Local
(f )
g13 (FE)
NSGA-II + Local
(f )
g14 (FE)
NSGA-II + Local
(f )
g15 (FE)
NSGA-II + Local
(f )
g16 (FE)
NSGA-II + Local
(f )
g17 (FE)
NSGA-II + Local
(f )
g18 (FE)
NSGA-II + Local
(f )
g19 (FE)
NSGA-II + Local
(f )
g21 (FE)
NSGA-II + Local
(f )
g23 (FE)
NSGA-II + Local
(f )
g24 (FE)
NSGA-II + Local
(f )

Best known
optimum

1.0

0.053941

47.764888

961.715022

1.905155

8853.539674

0.866025

32.655592

193.724510

400.0

5.508013

Obtained feasible values


Best
Median

Worst

496
480 + 16
1.0
1,499
1,000 + 499
0.0539169458
10,498
9,000 + 1,498
47.762282
1,431
1,200 + 231
961.715195
10,293
7,200 + 3,093
1.905073
2,109
1,800 + 309
8927.602048
4,493
3,600 + 893
0.866012
40,467
38,000 + 2,467
32.655610
4,044
3,500 + 544
193.775400
1,032
800 + 232
399.972900
1,092
800 + 292
5.508013

504
480 + 24
1.0
3,778
3,000 + 778
0.0539162638
13,692
12,000 + 1,692
47.761438
3,700
2,100 + 1,600
961.735327
30,213
24,000 + 6,213
1.905014
13,406
7,200 + 6,206
8853.748783
10,219
7,200 + 3,019
0.866024
172,601
146,000 + 26,601
32.655649
9,456
8,400 + 1,056
193.781075
16,848
12,400 + 4,448
400.000216
2,890
2,000 + 890
5.508025

504
480 + 24
1.0
2,577
2,000 + 577
0.0539899948
12,720
11,000 + 1,720
47.761435
2,254
1,800 + 454
961.715403
18,319
12,800 + 5,519
1.905037
4,344
3,000 + 1,344
8853.537314
7,267
5,760 + 1,507
0.866019
96,139
84,000 + 12,139
32.655615
5,289
4,200 + 1,089
193.778862
4,967
3,600 + 1,367
399.998757
1,716
1,200 + 516
5.508014

Best, median, and worst function evaluations for successful runs with NSGA-II and local search are
shown separately. Since the algorithm is terminated when two consecutive local searches produce
similar solutions, in some cases the smallest FE solution need not be best in terms of function value

294

R. Datta and K. Deb

Table 10.10 Comparison of function evaluations needed by the proposed approach and four existing approaches
Problem
Zavala
Takahama
Brest
Wang
Proposed
approach
g12

g13

g14

g15

g16

g17

g18

g19

g21

g23

g24

Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst

482
6,158
9,928
1,49,727
1,60,964
1,68,800
1,38,471
1,49,104
1,65,292
1,27,670
1,35,323
1,47,268
65872
75451
83087
2,21,036
2,32,612
2,36,434
97157
1,07,690
1,24,217
109,150
122,279
167,921
2,06,559
2,21,373
2,33,325
2,60,154
2,74,395
2,91,456
11,081
18,278
6,33,378

2,901
4,269
5,620
2,707
4,918
11,759
30,925
32,172
32,938
4,053
6,805
10,880
8,965
10,159
11,200
15,913
16,511
16,934
46,856
57,910
60,108
147,772
162,947
178,724
31,620
35,293
35,797
70,349
79,059
88,523
1,959
2,451
2,739

364
6,899
10,424
1,38,630
1,47,330
4,28,869
2,23,822
2,42,265
2,56,523
1,53,943
1,57,822
1,60,014
48,883
54,081
57,678
1,85,888
2,05,132
2,55,333
1,39,131
1,69,638
1,91,345
322,120
363,456
427,042
1,31,557
1,49,672
1,58,079
2,60,180
3,21,118
4,64,740
9,359
12,844
14,827

1,764
5,460
8,100
19,484
30,980
42,316
97,684
1,06,660
1,18,452
10,732
12,868
14,788
27,460
29,396
32,388
75,460
1,34,644
2,94,452
93,812
1,04,196
1,16,340
2,41,476
2,51,684
2,69,284
85,012
95,332
2,24,756
2,08,036
2,40,772
3,26,484
13,908
23,060
31,684

496
504
504
1,499
2,577
3,778
7,042
9,265
11,449
1,082
2,117
22,772
10,293
18,319
30,213
2,728
4,638
2,33,239
4,493
7,267
10,219
40,467
96,139
172,601
2,342
3,392
7,062
3,517
4,008
13,346
1,092
1,716
2,890

Here, a run is terminated when a solution within 104 from the best-known function value is obtained

with respect to problem (10.12) are found at generation 20 and the first local search
takes place. The local search helps to reduce the objective value. Thereafter, at 30th
generation, since a better solution (with a difference in objective value smaller than
104 ) is not found, the algorithm terminates.

10 Evolutionary Constrained Optimization: A Hybrid Approach

CV

295

Infeasible
points

Fig. 10.26 Function value reduces with generation for g14 problem
972
970
968

966
964
962
960
958
956
954

points
CV

952
950

Fig. 10.27 Function value reduces with generation for g15 problem

Table 10.10 shows the efficacy of the hybrid approach by using problem information as termination criteria. Again, the best performance of our approach (with
function evaluations of 1,082) is more than 3.7 times better than that of the bestknown approach (4,053) and our approach is also better in terms of median and
worst-case performance compared to existing algorithms.

10.6.16 Problem g16


Problem g16 has 5 variables and 38 constraints. The optimal objective value reported
is f = 1.909155. Figure 10.28 shows the variation of the population-best objec-

296

R. Datta and K. Deb

f
R

Fig. 10.28 Objective value reduces with generation for problem g16

tive function value and adaptive penalty parameter R with generation counter. For
this problem, the initial population contained a feasible solution; however, with some
initial fluctuations, the proposed methodology is able to steadily reduce the objective
function value to the constrained minimum function value. Table 10.10 shows that
Takahama and Sakais method (Takahama and Sakai 2009) requires smaller number
of function evaluations, but our proposed methodology performs as a second-best
method. We shall show in the next section that with a parametric study, our methodology performs the best.

10.6.17 Problem g17


This problem has six variables and four equality constraints (Liang et al. 2006).
Constraints are not normalized. All the constraints are multimodal in nature, thereby
making this problem difficult to solve. Table 10.9 shows that the best run could not
reach the optima. In terms of the median and worst performances, our proposed
approach is able to match the FEs of the existing algorithms. The algorithm is tested
with 25 initial populations, of which 18 times it is not able to find the optima correctly
with our progressive termination criterion. When a termination is checked to compare
a solutions closeness to the best-known algorithm, our approach could not find the
optima 12 times. In terms of the worst performance, the performance of the proposed
approach is slightly worse. Figure 10.29 shows the evolution of the objective function
value of the population-best solution and the corresponding constraint violation value
with generation for a typical simulation out of 25 runs. The figure shows that no
feasible solution is found up to 19 generations. However, at generation 10, the first
local search is executed. With successive local searches of solving penalized functions
with an adaptive R, the obtained solution get better and better with generations. At
generation 25, the value of penalty parameter R is found to be 2,730,778.23.

10 Evolutionary Constrained Optimization: A Hybrid Approach

297

Infeasible
points
CV

Fig. 10.29 Function value reduces with generation for g17 problem

10.6.18 Problem g18


Problem g18 has 9 variables and 13 constraints. Table 10.9 shows the function evaluations and obtained objective values using the proposed approach. For a particular
simulation shown in Fig. 10.30, no feasible solution was found up to 35 generations
for this problem. Thereafter, the procedure takes a few more local searches to converge close to the best-reported solution. The value of penalty parameter R at the final
generation for this problem is found to be 0.1428. Table 10.10 shows that the least
number of function evaluations needed by the proposed approach is 4,493, which is
an order of magnitude better than that of the best-reported algorithm (46,856).

CV

Fig. 10.30 Objective value reduces with generation for problem g18

298

R. Datta and K. Deb

f
R

Fig. 10.31 Objective value reduces with generation for problem g19

10.6.19 Problem g19


Problem g19 has 15 variables and five inequality constraints. The optimal objective
value is f = 32.656. Figure 10.31 shows the variation of population-best f and
adaptive R. The reduction in f with generation counter is clear from the figure.
Table 10.10 also shows that the proposed methodology requires less than half the
number of function evaluations than four existing evolutionary methods.

10.6.20 Problem g21


This problem has both inequality and equality constraints (Liang et al. 2006). Total
number of constraints are six, out of which five are of equality type. The constraints
are normalized as follows:


g 1 (x) = x1 35x20.6 35x30.6 /35 0,

h1(x)
= [300x3 + 7500x5 7500x6 25x4 x5 + 25x4 x6 + x3 x4 ] /7500 = 0,
h 2 (x) = 100x2 + 155.365x4 + 2500x7 x2 x4 25x4 x7 15536.5 = 0,
h 3 (x) = x5 + ln(x4 + 900) = 0,
h 4 (x) = x6 + ln(x4 + 300) = 0,
h 5 (x) = x7 + ln(2x4 + 700) = 0.
Figure 10.32 indicates that up to 19 generations, no feasible solution is found.
This is due to the existence of a large number of equality constraints which make the

10 Evolutionary Constrained Optimization: A Hybrid Approach

299

CV

Infeasible
points

Fig. 10.32 Function value reduces with generation for g21 problem

problem complex. The first local search takes place at generation 5 and a solution
having a similar objective value to that of the optimum solution is found, but all
constraints are still not satisfied. To make the constraints satisfied with an order of
104 , it takes another five local searches.
Table 10.9 shows number of function evaluations required by our approach to
terminate according to the termination condition described in the algorithm. In Step
4, when the objective function value of the feasible xLS is close to the best-reported
function value (within 103 ), the algorithm is terminated. Table 10.10 also compares
the overall function evaluations needed by our approach with four other methodologies taken from literature with a different termination criterion.

10.6.21 Problem g23


This problem has nine variables, two inequality constraints, and four equality constraints. Although the objective function and constraint functions are linear or
quadratic, variables for the optimum solution take differently scaled values. This
makes the problem difficult to solve by using evolutionary methods
(Zavala et al. 2009; Brest 2009). To solve this problem, we have utilized the four
equality constraints to replace four of the eight variables. This resulted in a fourvariable problem but having 10 inequality constraints, among which two constraints
are from the original problem and each of the four replaced variables are bounded
with their specified lower and upper bounds. The reported optimum in Liang et al.
(2006) does not satisfy four inequality constraints and the new optimum solution is
x = 0, 100, 0, 100, 0, 0, 100, 200, 0.01 with f = 400. In Table 10.10, we have
fixed a termination criteria of our results in which the final solution must have at

300

R. Datta and K. Deb

CV

Fig. 10.33 Function value reduces with generation for g23 problem

most 10 4 error from f . Figure 10.33 shows the variation in the population-best
objective value for a particular run out of all 25 runs. Until fourth generation, the
proposed approach could not find any feasible solution. The first local search has
taken place after 5th generation and a feasible solution is found. The algorithm is
terminated after generation 20, as two consecutive local searched solutions match
our prefixed termination criterion.

10.6.22 Problem g24


This g24 problem has two variables and two quartic constraints. Table 10.9 shows the
function evaluations and obtained objective values. Table 10.10 shows that the best
and median performances of our proposed approach is better than that of the bestreported algorithm. However, in terms of the worst performance, the performance of
the proposed approach is slightly worse.
The adaptation of R is shown in Fig. 10.34 for a typical run. The feasible solutions
are found right from the first generation. With successive local searches of solving
penalized functions with an adaptive R, the obtained solutions get better and better
with generations. At generation 25, the value of penalty parameter R is found to be
69.568.
Based on the performance on the above problems, it can be summarized that the
proposed hybrid approach with our initial setting of parameters (w = 0.5, r = 2 and
= 5) performs well in comparison to best-reported existing studies.

10 Evolutionary Constrained Optimization: A Hybrid Approach

301

Fig. 10.34 Objective value reduces with generation for problem g24

10.7 Parametric Study


Next, we perform a detail parametric study on some of the above problems to investigate if any change in parameters would help improve the performance of the proposed approach further. Three parameters are chosen for this purpose: (i) the history
factor for updating the penalty parameter, w, (ii) the multiplying factor, r, used to
enhance the penalty parameter value, and (iii) the frequency of local search, . In
our previous simulations, we used w = 0.5, but here we use the following values in
our parametric study: w is varied as 0.25, 0.4, 0.5, 0.6, 0.75, and 1, meaning that a
100w% importance is given to the new penalty value (obtained from current analysis
of the nondominated front) and 100(1 w)% importance is given to the previous
penalty parameter value. A value of w = 1 means that the previous penalty parameter is completely ignored at the current local search. The multiplying factor (r) is
responsible for making the penalty parameter value higher that that obtained from
the multiobjective study. In our previous simulations, we have always used r = 2,
but in this parametric study, we use 1, 1.5, 2, 3, 5, and 10. The frequency of local
search ( ) is an important parameter and signifies the number of generations between
two consecutive local searches. We used = 5 in our previous simulations, but here
we vary in {1, 3, 5, 7, 10, 20, and50} generations.
In the previous section, we took advantage of the known best-reported solutions
to terminate a simulation. Since earlier studies used the same as the termination
criterion, it helped us to compare our methodology with them. However, since in an
arbitrary problem, we do not have this luxury, here we use a different termination
criterion for the parametric study. We terminate a run if two consecutive local searches
produce an objective value difference less than 104 and also find feasible solutions
having a maximum constraint violation of 106 . For each problem, 50 runs are
performed from different initial populations and the median function evaluations are
plotted in figures by showing best and worst function evaluations. For brevity, we
present results on a few selected problems.

302

R. Datta and K. Deb


(50)

60000
40000
20000

(50)

(46)

(42)

0.75

(50)
(50)

10000
6500
5500
4500
4000
2500
2000

0.25

0.4

0.5

0.6

Fig. 10.35 Parametric study of w for problem g01

10.7.1 Problem g01


First, we consider problem g01. Figure 10.35 shows parameter study with w. In
terms of the best performance almost all the weighting factors have the same effect,
with w = 0.4 with a slight advantage, as shown in Fig. 10.35.
A multiplying factor of r = 2 or 3 seems to be a good choice in this problem,
as shown in Fig. 10.36. Both these parameters do not seem to matter much on the
median performance; however, for some parameter values, the variation over 50 runs
is wide.
The parametric study on indicates a different scenario here (Fig. 10.37). It seems
that the smaller the value of (meaning more frequent local searches), the better the

30000

(50)

(45)

(50)

20000

10000

(50) (49)
(45)

6000
5500
4500
4000
2500
2000

1 1.5 2

Fig. 10.36 Parametric study of r for problem g01

10

10 Evolutionary Constrained Optimization: A Hybrid Approach

50000
40000
30000
25000
20000

303
(42)

(48)
(49) (49)

(44)

(50
12000 (48)
10000
8500
5000
4500
3500
2500
2000
1500
1 3 5 7 10

50

20

Fig. 10.37 Parametric study of for problem g01

performance of the proposed algorithm for this problem. The algorithm requires
only 1,627 function evaluation to its best to find a solution close to the constrained
minimum with = 1.

10.7.2 Problem g04


A similar outcome is obtained from a parametric study on problem g04. Figures 10.38 and 10.39 show that the parameter values (w and r) do not have much of
an effect on the performance. However, Fig. 10.40 indicates that a smaller is better.
From 40 (and subsequently we shall observe for other problems), we notice that
a small causes a better best performance, but with a wider variance of function
15000

(50)

(50)

(50)
(50)

10000
8500

(50)

(50)

5000
3500
3000
2500
1500
1000

0.25

0.4

0.5

Fig. 10.38 Parametric study of w for problem g04

0.6

0.75

304

R. Datta and K. Deb


60000
50000
40000
30000

(50)
(50)

20000
15000

(50)
(50)

10000
8500

(50)

(50)

5000
3500
2500
2000
1500
1000

1 1.5 2

10

Fig. 10.39 Parametric study of r for problem g04


50000
40000
30000

(47)
(50)

20000
15000
(50) (50)
10000 (50) (50) (50)
8500
5000
3500
2500
2000
1500
1000
1 3 5 7 10

20

50

Fig. 10.40 Parametric study of for problem g04

evaluations over 50 runs. Frequent local searches with = 1 help some populations to converge to the constrained optimum quickly, whereas make a premature
convergence to a suboptimal solution for some other populations.

10.7.3 Problem g08


Recall that this problem was not solved by our approach as efficiently as Takahoma
and Sakais approach (Takahama and Sakai 2009). Here, we perform a parametric
study to investigate if the performance of the proposed approach can be improved.
Figure 10.41 shows the effect of parameter (frequency of local search) on this

10 Evolutionary Constrained Optimization: A Hybrid Approach


20000
15000
(49)(48) (47)
10000 (48)
8000
5000
(49)
3000
2000
1500
1000
500
300
200
1 3 5 7 10

305
(50)

(46)

20

50

Fig. 10.41 Number of function evaluations versus for problem g08

problem. The other two parameters are kept the same as before (r = 2 and w = 0.5).
It is clear that with = 1 (local search at every generation) makes the search faster
for this problem and our proposed approach requires only 225 function evaluations
to its best. A parametric study on w and r indicates that these two parameters are not
that sensitive.

10.7.4 Problem g12


Figure 10.42 shows the effect of the history parameter w of the penalty parameter.
The best performance takes place with 496 function evaluations, the whereas median
and worst performance requires identical 504 function evaluations.
(50)

(50)

(50) (50)

(50)

(50)

0.25

0.4

0.5

0.75

Fig. 10.42 Parametric study of w for problem g12

0.6

306

R. Datta and K. Deb

(50) (50) (50) (50)

1 1.5 2

(50)

(50)

10

Fig. 10.43 Parametric study of r for problem g12


(50)

5000
4000
3000
(50)

2000
1500
1000
500
400
300
200
150

(50)
(50)
(50)
(50)
(50)
1 3 5 7 10

20

50

Fig. 10.44 Parametric study of for problem g12

The multiplying factor r has no effect on the performance of our algorithm


(Fig. 10.43). However, the performance gets better with a smaller value of , as
shown in Fig. 10.44. Interestingly, all 50 runs produce an identical solution in this
problem. Based on these results, we conclude that = 1 produces best performance
and the overall function evaluations needed by our algorithm for the best, median,
and worst performance are 168, 168, and 168, respectively.

10.7.5 Problem g24


Like in the other test problems, the major effect is found to come from the parameter.
Figure 10.45 shows the variation of number of function evaluations with . The

10 Evolutionary Constrained Optimization: A Hybrid Approach


12000
10000
8000
6000
5000
(50)
4000
(50)
(50)
(50)
3000
(50)
2500
2000
1500

307
(48)

(48)

1000
500
1 3 5 7 10

20

50

Fig. 10.45 Number of function evaluations versus for problem g24

best, median, and worst performance occurs with 503, 1,142, and 2,693 function
evaluations, respectively.

10.8 Performance with Modified Parameter Values


The following observations can be made from the above parametric study:
1. The parameters w (history factor) and r (multiplying factor) make not much of
an effect on the outcome of the algorithm as long as they are kept within certain
values. Based on the study, we recommend to use w = 0.5 and r = 2.
2. The parameter (frequency of local search) is an important parameter and in
general, the smaller its value, the better the performance of the algorithm. A more
frequent local search allows faster upgrades in the solution, thereby allowing a
better performance on some problems. The study shows that = 1 is worth a
consideration.
We use the above recommendations ( = 1, w = 0.5, and r = 2) and make one
final round of simulations of all inequality test problems considered in this chapter and
compare the performance of our algorithm with four best-reported existing studies
in Table 10.11. In each case, 25 simulations are performed and a run is terminated
if a solution having an objective value at most 104 greater than the best-reported
objective value is obtained. Note that the termination criterion used in this section is
different from that used in the parametric study. We use this termination criterion here
to make a fair comparison with existing studies (shown in the table), which used an
identical termination criterion. We make a number of observations from these results:

308

R. Datta and K. Deb

Table 10.11 Comparison of our algorithm with modified parameter values with four existing
approaches on problems having inequality constraints only
Problem
Zavala
Takahama
Brest
Wang
Proposed
approach
g01

g02

g04

g06

g07

g08

g09

g10

g12

g16

g18

g19

g24

Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst
Best
Median
Worst

80,776
90,343
96,669
87,419
93,359
99,654
93,147
1,03,308
1,109,15
95,944
1,09,795
1,30,293
1,14,709
1,38,767
2,08,751
2,270
4,282
5,433
94,593
1,03,857
1,19,718
1,09,243
1,35,735
1,93,426
482
6,158
9,928
65872
75451
83087
97157
1,07,690
1,24,217
109,150
122,279
167,921
11,081
18,278
6,33,378

18,594
19,502
19,917
1,08,303
1,14,347
1,29,255
12,771
13,719
14,466
5,037
5,733
6,243
60,873
67,946
75,569
621
881
1,173
19,234
21,080
21,987
87,848
92,807
1,07,794
2,901
4,269
5,620
8,965
10,159
11,200
46,856
57,910
60,108
147,772
162,947
178,724
1,959
2,451
2,739

51,685
55,211
57,151
1,75,090
2,26,789
2,53,197
56,730
62,506
67,383
31,410
34,586
37,033
1,84,927
1,97,901
2,21,866
1,905
4,044
4,777
79,296
89,372
98,062
2,03,851
2,20,676
2,64,575
364
6,899
10,424
48,883
54,081
57,678
1,39,131
1,69,638
1,91,345
322,120
363,456
427,042
9,359
12,844
14,827

1,01,908
1,22,324
1,36,228
1,70,372
1,89,204
2,22,468
63,540
73,572
79,556
26,932
35,908
41,716
1,42,388
1,56,644
1,66,148
2,820
5,988
8,276
63,540
70,404
83,780
1,71,252
1,83,924
1,92,900
1,764
5,460
8,100
27,460
29,396
32,388
93,812
1,04,196
1,16,340
2,41,476
2,51,684
2,69,284
13,908
23,060
31,684

2,341
2,891
4,736
24,312
61,526
97,478
865
1,556
2,420
884
2,645
4,382
11,980
31,803
70,453
304
506
1,158
2,908
6,141
39,659
6,134
21,933
94,949
168
168
168
6,994
9,723
10,502
3,630
5,285
16,337
35,294
58,813
81,633
503
1,142
2,693

A run is terminated when a solution having a function value within 104 from the best-known value
is obtained

10 Evolutionary Constrained Optimization: A Hybrid Approach

309

1. For all problems, the proposed hybrid evolutionary-cum-penalty function-based


bi-objective approach is computationally faster than the best-reported EA results
on the best, median, and worst performance comparison. In other words, the
function evaluations reported in Table 10.11 for all 13 test problems are the lowest
than that reported in any previous study. These results are obtained with a uniform
parameter setting of = 1, w = 0.5, and r = 2.
2. In comparison with Table 10.7 which were obtained using = 5, w = 0.5,
and r = 2, the best performance with = 1 is better in all problems except
problem g09. However, the range of function evaluations between best and worst
in 25 simulations is wider. Thus, we may conclude that = 5 is a more reliable
strategy, whereas = 1 (frequent local searches) setting has the ability to locate
the constrained minimum very quickly from certain populations. From some
other populations, the algorithm becomes too greedy and requires a longer time
to recover and is able to finally converge to the correct optimum.
3. The parameters w and r have not much effect on the performance of the proposed
method. The frequency of local search (parameter ) may be set between one and
five generations, in general.
Overall, the combination of bi-objective EA and the penalty function approach seems
to find constrained minimum quickly and more accurately than any of the four bestknown constrained-handling EAs alone. The reason for the significant improvement
in performance is due to an appropriate mix of best features of two complementary
algorithms.

10.9 Closure
In this chapter, we have suggested a hybrid bi-objective evolutionary and penalty
function-based classical optimization procedure which alleviates the drawbacks of
each other. The difficulty in accurate convergence to the optimum by an evolutionary multiobjective optimization (EMO) procedure is overcome by the use of a local
search involving a classical optimization procedure. On the other hand, the difficulty
of the commonly-used penalty function-based approach is overcome by estimating
a suitable penalty parameter self-adaptively by the EMO procedure. The hybrid procedure is applied to a number of numerical optimization problems taken from the
literature. Results from 25 different initial populations indicate that the proposed
procedure is robust. In almost all cases, the required number of function evaluations
is found to be many times smaller (sometimes even one or two orders of magnitude
smaller) than the best-reported EAs. This is a significant result, particularly considering the long-term focus and emphasis of evolutionary computation algorithms in
solving constrained optimization problems. The reason for such a significant performance of the proposed procedure is the appropriate use of two complementary
approaches in such a way that the hybrid procedure exploits the strength of each
approach in making a fast and accurate convergence to the constrained optimum.

310

R. Datta and K. Deb

Furthermore, a parametric study is performed to investigate the effect of three


parameters associated with the proposed algorithm. The study indicates that two of
the three parameters have not much effect, whereas the third parameter (frequency
of local searches) seems to require a small value for a better performance. Based
on these observations, we have rerun our algorithm with the revised setting of the
third parameter and better overall performance of our algorithm has been reported.
Importantly, the parametric study also helps to eliminate the need for any additional
parameter associated with the constraint-handling part of the hybrid algorithm.
The test problems used in this study have all been considered by many evolutionary computation researchers over the past two decades using different methodologies
(single or multiobjective) and monolithic with EAs alone or hybrid with classical
methods. The use of EMO methodology and penalty function method to complement each others weaknesses is a novel approach, innovative, and theory-driven.
By using the strength of both methods in one procedure, we are able to develop
a constrained-handling methodology which is computationally fast and accurate, in
solving the chosen set of test problems. Further simulations are now needed to test the
algorithms performance on more complex problems. Nevertheless, the triumph of
this method remains in understanding and hybridizing two contemporary optimization fields together and in developing a hybrid methodology which seems to provide
a direction of research in the area of fast and accurate optimization of constrained
problems.
Acknowledgments The original concept of the chapter is published in the following journal: A
bi-objective constrained optimization algorithm using a hybrid evolutionary and penalty function
approach, Kalyanmoy Deb & Rituparna Datta, Engineering Optimization, Volume 45, Issue 5, 2013
(Published online: 26 Jun 2012), Taylor & Francis. It is reprinted by permission of the publisher
(Taylor & Francis Ltd, http://www.tandfonline.com) with substantial improvement. The authors
would like to thank Taylor & Francis Ltd., for their permission to use the content of the journal.

References
Angantyr A, Andersson J, Aidanpaa J-O (2003) Constrained optimization based on a multiobjective
evolutionary algorithm. In: Proceedings of congress on evolutionary computation, pp 15601567
Araujo MC, Wanner EF, Guimaraes FG, Takahashi RHC (2009) Constrained optimization based on
quadratic approximations in genetic algorithms. In: Mezura-Montes E (ed) Constraint-handling
in evolutionary computation. Springer, Berlin, pp 193218
Bernardino H, Barbosa H, Lemonge A (2007) A hybrid genetic algorithm for constrained optimization problems in mechanical engineering. In: IEEE congress on evolutionary computation, CEC
2007. IEEE, pp 646653
Bernardino HS, Barbosa HJC, Lemonge ACC, Fonseca LG (2009) On GA-AIS hybrids for constrained optimization problems in engineering. Springer, New York
Branke J (2008) Consideration of partial user preferences in evolutionary multiobjective optimization. Multiobjective optimization. Springer, New York, pp 157178
Branke J, Deb K (2004) Integrating user preferences into evolutionary multi-objective optimization.
In: Jin Y (ed) Knowledge incorporation in evolutionary computation. Springer, Heidelberg, pp
461477

10 Evolutionary Constrained Optimization: A Hybrid Approach

311

Brest J (2009) Constrained real-parameter optimization with self-adaptive differential evolution.


In: Mezura-Montes E (ed) Constraint-handling in evolutionary computation. Springer, Berlin, pp
7394
Burke EK, Smith AJ (2000) Hybrid evolutionary techniques for the maintenance scheduling problem. IEEE Trans Power Syst 15(1):122128
Byrd R, Nocedal J, Waltz R (2006) Large-scale nonlinear optimization. K nitro: an integrated
package for nonlinear optimization. Springer, New York
Cai Z, Wang Y (2005) A multiobjective optimization-based evolutionary algorithm for constrained
optimization. IEEE Trans Evol Comput 10(6):658675
Camponogara E, Talukdar S (1997) A genetic algorithm for constrained and multiobjective optimization. In: 3rd Nordic workshop on genetic algorithms and their applications (3NWGA), pp
4962
Chankong V, Haimes YY (1983) Multiobjective decision making theory and methodology. NorthHolland, New York
Coello C, Carlos A (2000) Use of a self-adaptive penalty approach for engineering optimization
problems. Comput Ind 41(2):113127
Coello C, Carlos A (2002) Theoretical and numerical constraint-handling techniques used with
evolutionary algorithms: a survey of the state of the art. Comput Methods Appl Mech Eng 191(11
12):12451287
Coello C, Lamont G, Van Veldhuizen D (2007) Evolutionary algorithms for solving multi-objective
problems. Springer, New York
Coello CAC (2000) Treating objectives as constraints for single objective optimization. Eng Optim
32(3):275308
Coello CAC (2013) List of references on constraint-handling techniques used with evolutionary
algorithms. http://www.cs.cinvestav.mx/~constraint/
Coit D, Smith A, Tate D (1996) Adaptive penalty methods for genetic optimization of constrained
combinatorial problems. INFORMS J Comput 8:173182
Dadios E, Ashraf J (2006) Genetic algorithm with adaptive and dynamic penalty functions for the
selection of cleaner production measures: a constrained optimization problem. Clean Technol
Environ Policy 8(2):8595
Deb K (1991) Optimal design of a welded beam structure via genetic algorithms. AIAA J
29(11):20132015
Deb K (2000) An efficient constraint handling method for genetic algorithms. Comput Methods
Appl Mech Eng 186(24):311338
Deb K (2001) Multi-objective optimization using evolutionary algorithms. Wiley, Chichester
Deb K, Agrawal S, Pratap A, Meyarivan T (2002) A fast and elitist multi-objective genetic algorithm:
NSGA-II. IEEE Trans Evol Comput 6(2):182197
Deb K, Lele S, Datta R (2007) A hybrid evolutionary multi-objective and SQP based procedure
for constrained optimization. In: Proceedings of the 2nd international conference on advances in
computation and intelligence. Springer, pp 3645
Deep K et al (2008) A self-organizing migrating genetic algorithm for constrained optimization.
Appl Math Comput 198(1):237250
Echeverri MG, Lezama JML, Romero R (2009) An efficient constraint handling methodology for
multi-objective evolutionary algorithms. Revista Facultad de Ingenieria-Universidad de Antioquia 49:141150
El-Mihoub T, Hopgood A, Nolle L, Battersby A, Date S (2006) Hybrid genetic algorithms: a review.
Eng Lett 3(2):124137
Elsayed S, Sarker R, Essam D (2011) Multi-operator based evolutionary algorithms for solving
constrained optimization problems. Comput Oper Res 38(12):18771896
Fatourechi M, Bashashati A, Ward R, Birch G (2005) A hybrid genetic algorithm approach for
improving the performance of the LF-ASD brain computer interface. In: IEEE international
conference on acoustics, speech, and signal processing. Proceedings (ICASSP05), vol 5

312

R. Datta and K. Deb

Gen M, Cheng R (1996) A survey of penalty techniques in genetic algorithms. In: Proceedings of
IEEE international conference on evolutionary computation. IEEE Press
Hedar A, Fukushima M (2003) Simplex coding genetic algorithm for the global optimization of
nonlinear functions. In: Tanino T, Tanaka T, Inuiguchi M (eds) Multi-objective programming and
goal programming., Advances in soft computingSpringer, New York, pp 135140
Homaifar A, Lai SH-V, Qi X (1994) Constrained optimization via genetic algorithms. Simulation
62(4):242254
Knowles J, Corne D, Deb K (2008) Multiobjective problem solving from nature: from concepts to
applications., Natural computing seriesSpringer, New York
Kumar A, Sharma D, Deb K (2007) A hybrid multi-objective optimisation procedure using PCX
based NSGA-II and sequential quadratic programming. In: Proceedings of the congress on evolutionary computation (CEC-2007). Singapore, pp 30113018
Kuri-Morales A, Gutirrez-Garca J (2002) Penalty function methods for constrained optimization
with genetic algorithms: a statistical analysis. MICAI 2002: Adv Artif Intell 34(2):187200
Leguizamn G, Coello C (2009) Boundary search for constrained numerical optimization problems.
In: Mezura-Montes E (ed) Constraint-handling in evolutionary computation. Springer, Berlin, pp
2549
Liang JJ, Runarsson TP, Mezura-Montes E, Clerc M, Suganthan PN, Coello CAC, Deb K (2006)
Problem definitions and evaluation criteria for the CEC 2006: special session on constrained
real-parameter optimization. Technical report, Nanyang Technological University, Singapore
Lin C, Chuang C (2007) A rough set penalty function for marriage selection in multiple-evaluation
genetic algorithms. Rough Sets Knowl Technol, pp 500507
Matthew P et al (2009) Selection and penalty strategies for genetic algorithms designed to solve
spatial forest planning problems. Int J For Res 2009:115
Mezura-Montes E (2009) Constraint-handling in evolutionary optimization. Springer, Berlin
Mezura-Montes E, Palomeque-Ortiz A (2009) Self-adaptive and deterministic parameter control in
differential evolution for constrained optimization. In: Mezura-Montes E (ed) Constraint-handling
in evolutionary computation. Springer, Berlin, pp 95120
Mezura-Montes E, Coello CAC (2011) Constraint-handling in nature-inspired numerical optimization: past, present and future. Swarm Evol Comput 1(4):173194
Michalewicz Z, Janikow CZ (1991) Handling constraints in genetic algorithms. In: Proceedings of
the fourth international conference on genetic algorithms, pp 151157
Michalewicz Z, Schoenauer M (1996) Evolutionary algorithms for constrained parameter optimization problems. Evol Comput 4(1):132
Miettinen K (1999) Nonlinear multiobjective optimization. Kluwer, Boston
Moler C (2004) Numerical computing with MATLAB. Society for Industrial Mathematics
Myung H, Kim J (1998) Hybrid interior-Lagrangian penalty based evolutionary optimization. In:
Evolutionary programming VII, Springer, pp 8594
Nanakorn P, Meesomklin K (2001) An adaptive penalty function in genetic algorithms for structural
design optimization. Comput Struct 79(2930):25272539
Powell D, Skolnick MM (1993) Using genetic algorithms in engineering design optimization with
nonlinear constraints. In: Proceedings of the fifth international conference on genetic algorithms,
pp 424430
Ray T, Singh H, Isaacs A, Smith W (2009) Infeasibility driven evolutionary algorithm for constrained optimization. In: Mezura-Montes E (ed) Constraint-handling in evolutionary computation. Springer, Berlin, pp 145165
Reklaitis GV, Ravindran A, Ragsdell KM (1983) Engineering optimization methods and applications. Wiley, New York
Richardson JT, Palmer MR, Liepins GE, Hilliard MR (1989) Some guidelines for genetic algorithms
with penalty functions. In: Proceedings of the 3rd international conference on genetic algorithms,
Morgan Kaufmann Publishers Inc, pp 191197
Runarsson T, Yao X (2000) Stochastic ranking for constrained evolutionary optimization. IEEE
Trans Evol Comput 4(3):284294

10 Evolutionary Constrained Optimization: A Hybrid Approach

313

Sha J, Xu M (2011) Applying hybrid genetic algorithm to constrained trajectory optimization.


In: 2011 international conference on Electronic and mechanical engineering and information
technology (EMEIT). IEEE, vol 7, pp 37923795
Sharma D, Kumar A, Deb K, Sindhya K (2007) Hybridization of SBX based NSGA-II and sequential
quadratic programming for solving multi-objective optimization problems. In: IEEE congress on
evolutionary computation, CEC 2007. IEEE, pp 30033010
Sindhya K, Deb K, Miettinen K (2008) A local search based evolutionary multi-objective optimization approach for fast and accurate convergence. Parallel problem solving from nature-PPSN X.
Springer, Heidelberg
Surry PD, Radcliffe N J, Boyd ID (1995) A multi-objective approach to constrained optimisation
of gas supply networks: the COMOGA method. In: Evolutionary computing. AISB workshop.
Springer, pp 166180
Takahama T, Sakai S (2009) Solving difficult constrained optimization problems by the constrained
differential evolution with gradient-based mutation. In: Mezura-Montes E (ed) Constrainthandling in evolutionary computation. Springer, Berlin, pp 5172
Tessema B, Yen G (2006) A self adaptive penalty function based algorithm for constrained optimization. In: IEEE congress on evolutionary computation, CEC 2006. IEEE, pp 246253
Venkatraman S, Yen G (2005) A generic framework for constrained optimization using genetic
algorithms. IEEE Trans Evol Comput 9(4):424435
Victoire T, Jeyakumar A (2005) A modified hybrid EP-SQP approach for dynamic dispatch with
valve-point effect. Int J Electr Power Energy Syst 27(8):594601
Wang Y, Ma W (2006) A penalty-based evolutionary algorithm for constrained optimization. Adv
Nat Comput 4221:740748
Wang L, Zhang L, Zheng D (2006) An effective hybrid genetic algorithm for flow shop scheduling
with limited buffers. Comput Oper Res 33(10):29602971
Wang Y, Cai Z, Zhou Y, Zeng W (2008) An adaptive tradeoff model for constrained evolutionary
optimization. IEEE Trans Evol Comput 12(1):8092
Wang Y, Cai Z (2012) Combining multiobjective optimization with differential evolution to solve
constrained optimization problems. IEEE Trans Evol Comput 16(1):117134
Yuan Q, Qian F (2010) A hybrid genetic algorithm for twice continuously differentiable NLP
problems. Comput Chem Eng 34(1):3641
Zavala A, Aguirre A, Diharce E (2009) Continuous constrained optimization with dynamic tolerance
using the COPSO algorithm. In: Mezura-Montes E (ed) Constraint-handling in evolutionary
computation. Springer, Berlin, pp 123
Zhao J, Wang L, Zeng P, Fan W (2011) An effective hybrid genetic algorithm with flexible allowance
technique for constrained engineering design optimization. Expert Syst Appl 38(12):15103
15109
Zhou Y, Li Y, He J, Kang L (2003) Multi-objective and MGG evolutionary algorithm for constrained
optimization. In: The 2003 congress on evolutionary computation, CEC03. IEEE, vol 1, pp 15
Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and
the Strength Pareto approach. IEEE Trans Evol Comput 3(4):257271

About the Book

All real-world optimization problems comprise of constraints, due to the limitations


in the availability of resources. Researchers have proposed numerous constraint handling mechanisms using Evolutionary Algorithms in last two decades. The motivation
for this monograph, Evolutionary Constrained Optimization is to make available
a self-contained collection of modern research addressing the general constrained
optimization problems using evolutionary algorithms. The included chapters on different aspects of constraint handling will be helpful for researchers, novice and
experts alike.
This book will be ideal for a graduate class on optimization, but will also be useful
for interested senior students working on their research projects. Although the book
addresses constrained methods using evolutionary algorithms, classical optimization
researchers (from both mathematical and numerical fields) would also get benefited
by this book.

Springer India 2015


R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,
Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5

315

Index

Symbols
constrained differential evolution, 158
constrained method, 158, 162
constraint, 236
level, 162, 169
-level comparison, 162

A
Adaptive penalty, 5
Adaptive technique, 6
Additive penalty, 3
Approximation model, 158, 161

B
Barrier functional, 3
Benchmark problems, 30
Biased walk, 31
Bi-objective and multi-objective approaches
to constraint handling, 54
Black-box constraints, 5256, 64, 78
Black-box optimization, 52, 54, 64, 68
Block coordinate search (BCS), 56, 61, 65

C
CEP-RBF algorithm, 53, 55, 65, 78
Closed-loop optimization, 96
COBRA algorithm, 55
Co-evolutionary technique, 20
Commitment composite ERC
composite-defining bits, 103
Comparative performance, 22
Comparison operator, 207
Composite, 102

Computationally expensive, 5254, 56, 68,


78
Conjugate gradient algorithms, 206
Constrained black-box optimization, 52, 53,
79
Constrained EP, 65, 78
Constrained optimization, 1, 249, 252
constraints, 250
equality, 250
inequality, 250
constraint satisfaction, 250
constraint violation, 250
normalization, 263
penalty function approach, 249, 252
penalty parameter, 250
Constrained optimization problem, 29, 99,
136, 206
Constraint handling, 54, 108, 206
online purchasing strategy, 125
repairing strategies, 116
sliding window strategy, 126
Constraints, 206
Constraint violation, 4, 12
Constraint violation function, 224
Constriction factor, 214
ConstrLMSRBF algorithm, 55, 65, 67, 79
Convergence, 216
Convergence rate, 190194
Cubic RBF, 63, 78
Cultural algorithm, 54
Cumulative step size adaptation, 183

D
Data profile, 67, 68, 78
Diagonal matrices, 213
Differential evolution, 30, 167

Springer India 2015


R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,
Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5

317

318
Disjoint feasible regions, 209
Distance requirement from previously evaluated points, 53, 56, 58, 61, 78
Dynamic constraint, 96
Dynamic optimization problem, 99
Dynamic penalty, 5
Dynamical systems approach, 182
Dynamicalization, 137

E
Entropic measure, 34
Ephemeral resource constraint, 96
activation period, 100
commitment composite ERC, 102
commitment relaxation ERC, 101
constraint schema, 100
constraint time frame, 100
ephemeral resource-constrained optimization problem, 97
epoch, 101
evaluable region, 98
high-level constraint schema, 103
non-evaluable solutions, 99
period length, 102
periodic ERC, 102
preparation period, 100
recovery period, 100
simulated time, 99
time-evolving parameters, t , 99
Epsilon constrained method, 54
ERC, see ephemeral resource constraint
ERCOP,
see
ephemeral
resourceconstrained optimization problem
Estimated comparison, 158, 165
Evaluation control approach, 161
Evolution strategy, 37, 137, 181
Evolutionary algorithms, 249
bi-objective evolutionary algorithm, 249
Evolutionary experimentation, see closedloop optimization
Evolutionary multiobjective optimization
(EMO), 261
Experimental evolution, see closed-loop
optimization
Exterior technique, 4

F
Feasible, 32, 206
Feasible set, 3
Fitness landscape, 29
Fuzzy logic, 21

Index
G
Gaussian process model, 54
Gaussian RBF, 63
Gbest topology, 216
Generation-based control, 161
Genetic algorithm, 206
Genetic drift, 112
ephemeral resource constraint, 112
Genetic programming, 137, 143
Gradient descent algorithms, 206
Gradient of RBF model, 64
H
High-dimensional optimization, 55, 65, 78
Highly constrained problem, 55, 65, 78
Hybrid, 251
Hybrid algorithms, 233, 239
I
Individual-based control, 161
Infeasible, 35
Interior technique, 3
J
Just-in-time scheduling, 126
K
Kriging model, 54
L
Large-scale optimization, 52, 64, 78
Latin hypercube method, 143
Linear programming, 206
Locating disjoint feasible regions, 224
M
Margin on surrogate constraints, 53, 56, 58,
60, 61, 78
Markov chain analysis, 104
Markov process, 182, 189, 195
MAX-SAT, 119, 127
Measure the ruggedness, 33
Metamodel, 54
MOPTA08 benchmark problem, 53, 55, 63
65, 73, 78
Multi-armed bandit, 122
Multi-layer perceptron, 143
Multi-modal optimization, 209
Multiobjectivization, 261

Index

319

Multiplicative penalty, 3
Multiquadric RBF, 63
Multi-start methods, 211

Rugged landscape, 29
Ruggedness, 29
Ruggedness quantifying, 29

N
Nearest neighbor regression, 54
Neighbor set, 212
Neural network, 54
Neutrality, 34
Niching, 209
No Free Lunch theorem, 233, 246
Non-domination sorting, 233
Nondominated sorting genetic algorithm
(NSGA-II), 251

S
Scatter search, 65, 67, 79
Self-adaptive, 5
Self-adaptive penalty, 235
Self-adaptive technique, 19
Sequential penalty derivative-free (SDPEN)
algorithm, 53, 6567, 78, 79
Single objective, 206
Spring design optimization, 153
Stagnation, 215, 220
Standard constraints, 99
Static penalty, 5
Step size, 37
Stochastic ranking, 137, 237
Stochastic ranking evolution strategy
(SRES), 65, 67, 79
Stochastic ranking method, 38
Sub-swarm , 222
Superiority of feasible solutions, 235
Support vector machine (SVM), 54
Surrogate, 137
Surrogate approach, 162
Surrogate-assisted
evolutionary algorithm, 52, 54
evolutionary programming (EP), 52, 53,
55, 78
optimization, 52, 63, 78
particle swarm, 56
Surrogate-based optimization, 52, 63, 78
Surrogate model, 54, 78, 162
Swarm explosion, 214
Symbolic and sub-symbolic regression, 137

O
OneMax problem, 119
Online optimization, 99
Optimal region, 209

P
Parameterless technique, 11
Particle swarm optimization, 206
Penalty approach to constraint handling, 54,
65
Penalty coefficient, 38
Penalty functions, 136
Penalty method, 3, 4
Performance profile, 23, 67, 68, 78
Performance ratio, 67
Personal best, 212
Potential model, 162, 164
Problem characteristic, 31

Q
Quadratic polynomial, 54

R
R