Академический Документы
Профессиональный Документы
Культура Документы
Yaroslav D.Sergeyev
Roman G.Strongin
DanielaLera
Introduction
to Global
Optimization
Exploiting SpaceFilling Curves
123
SpringerBriefs in Optimization
Series Editors
Panos M. Pardalos
Janos D. Pinter
Stephen Robinson
Tamas Terlaky
My T. Thai
SpringerBriefs in Optimization showcases algorithmic and theoretical techniques, case studies, and applications within the broad-based field of optimization.
Manuscripts related to the ever-growing applications of optimization in applied
mathematics, engineering, medicine, economics, and other applied sciences are
encouraged.
Introduction to Global
Optimization Exploiting
Space-Filling Curves
123
Yaroslav D. Sergeyev
Universit`a della Calabria
Department of Computer Engineering,
Modeling, Electronics and Systems
Rende, Italy
Roman G. Strongin
N.I. Lobachevsky University
of Nizhni Novgorod
Software Department
Nizhni Novgorod, Russia
Daniela Lera
University of Cagliari
Department of Mathematics
and Computer Science
Cagliari, Italy
ISSN 2190-8354
ISSN 2191-575X (electronic)
ISBN 978-1-4614-8041-9
ISBN 978-1-4614-8042-6 (eBook)
DOI 10.1007/978-1-4614-8042-6
Springer New York Heidelberg Dordrecht London
Library of Congress Control Number: 2013943827
Mathematics Subject Classification (2010): 90C26, 14H50, 68W01, 65K05, 90C56, 90C30, 68U99,
65Y99
Yaroslav D. Sergeyev, Roman G. Strongin, Daniela Lera 2013
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection
with reviews or scholarly analysis or material supplied specifically for the purpose of being entered
and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of
this publication or parts thereof is permitted only under the provisions of the Copyright Law of the
Publishers location, in its current version, and permission for use must always be obtained from Springer.
Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations
are liable to prosecution under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of
publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for
any errors or omissions that may be made. The publisher makes no warranty, express or implied, with
respect to the material contained herein.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
Preface
In the literature there exist a lot of traditional local search techniques that have been
designed for problems where the objective function F(y), y D RN , has only one
optimum and a strong a priori information is known about F(y) (for instance, it is
supposed that F(y) is convex and differentiable). In such cases it is used to speak
about local optimization problems. However, in practice the objects and systems to
be optimized are frequently such that the respective objective function F(y) does not
satisfy these strong suppositions. In particular, F(y) can be multiextremal with an
unknown number of local extrema, non-differentiable, each function evaluation can
be a very time-consuming operation (from minutes to hours for just one evaluation
of F(y) on the fastest existing computers), and nothing is known about the internal
structure of F(y) but its continuity. Very often when it is required to find the best
among all the existing locally optimal solutions, in the literature problems of this
kind are called black-box global optimization problems and exactly this kind of
problems and methods for their solving are considered in this book.
The absence of a strong information about F(y) (i.e., convexity, differentiability,
etc.) does not allow one to use traditional local search techniques that require this
kind of information and the necessity to develop algorithms of a new type arises.
In addition, an obvious extra difficulty in using local search algorithms consists
of the presence of several local solutions. When one needs to approximate the
global solution (i.e., the best among the local ones), something more is required in
comparison with local optimization procedures that lead to a local optimum without
discussing the main issue of global optimization: whether the found solution is the
global one we are interested in or not.
Thus, numerical algorithms for solving multidimensional global optimization
problems are the main topic of this book and an important part of the lives of the
authors who have dedicated several decades of their careers to global optimization.
Results of their research in this direction have been presented as plenary lectures
v
vi
Preface
Preface
vii
Yaroslav D. Sergeyev
Roman G. Strongin
Daniela Lera
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
1.1 Examples of Space-Filling Curves . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
1.2 Statement of the Global Optimization Problem . . .. . . . . . . . . . . . . . . . . . . .
1
1
6
9
9
13
13
28
37
59
60
61
65
73
81
91
91
93
47
47
50
54
ix
Contents
Chapter 1
Introduction
1 Introduction
that presents this mathematical masterpiece: G. Peano, Sur une courbe, qui remplit
toute une aire plane, Mathematische Annalen, 36, Janvier 1890, 157160. Further
examples of Peano curves have then been proposed by Hilbert in 1891 (see [62]),
Moore in 1900 (see [84]), Sierpinski in 1912 (see [125]), and others (see [99] and
references given therein).
However, it should be mentioned that the story has begun some years
before 1890, precisely in 1878, when Cantor (see [11]) has proved that any two
finite-dimensional smooth manifolds have the same cardinality. In particular, this
result implies that the interval [0, 1] can be mapped bijectively onto the square
[0, 1] [0, 1]. A year later (and in the same journal) Netto has shown that such a
mapping is necessarily discontinuous (see [85]). Thus, since bijective mappings
are discontinuous, the next important question regarding existence of continuous
mappings from an interval into the space asks about surjective mappings.
These results are of our interest because a continuous mapping from an interval
into the plane (or, more generally, into the space) is one of the ways used to define a
curve. In the two-dimensional case the problem that has been formulated above
is the question of the existence of a curve that passes through every point of a
two-dimensional region having a positive Jordan area. The answer exactly to this
question has been given by Peano in [88] where he has constructed the first instance
of such a curve. In his turn, a year later, Hilbert (see [62]) has made a very important
contribution by explaining how Peano curves can be constructed geometrically.
uck,
Mathematische Annalen, 38, 1891, 459460 has been published in the same journal
where Peano has introduced the space-filling curves. It consists of just two pages
that are shown in Fig. 1.2.
Hilbert has introduced the following procedure that can be explained easily. Let
us take the interval [0, 1] and divide it into four equal subintervals. Let us then take
the square [0, 1] [0, 1] and divide it into four equal subsquares. Since the interval
[0, 1] can be mapped continuously onto the square [0, 1] [0, 1], each of its four
subintervals can be mapped continuously onto one of the four subsquares. Then each
of the subintervals and subsquares is partitioned again and the procedure is repeated
infinitely many times. Hilbert has shown first that subsquares can be rotated in an
opportune way in order to ensure the continuity of the curve on the square. Second,
that the inclusion relationships are preserved; i.e., if a square corresponds to an
interval, then its subsquares correspond to the subintervals of that interval. He has
shown also that the curve constructed in this way is nowhere differentiable.
Figure 1.2 shows that in his paper Hilbert sketches the first steps of his iterative
construction having as the limit to his space-filling curve and consisting of the
sequence of piecewise linear continuous curves that approximate closer and closer
the space-filling curve. It can be seen that at each iteration, the current curve is
substituted by four reduced copies of itself. By using the modern language we can
say that the curve is constructed by applying the principle of self-similarity. Remind
that a structure is said to be self-similar if it can be broken down into arbitrary small
pieces each of which is a small replica of the entire structure.
1 Introduction
Fig. 1.2 The paper of D. Hilbert showing in particular how Peano curves can be constructed
geometrically
The original Peano curve possesses the same property but to be precise it is
necessary to mention that Peanos construction is slightly different. Some of its first
steps are presented in Fig. 1.3. In this book, Hilberts version of Peano curves will
be mainly used; however, in order to emphasize the priority of Peano and following
the tradition used in the literature, the terminology Peano curve will be used to call
this precise curve.
As has been already mentioned, several kinds of space-filling curves have been
then proposed after publishing the seminal Peano and Hilbert articles. In Fig. 1.4 we
show Moores version of Peano curve (see [84]) and Fig. 1.5 presents the procedure
constructing a curve introduced by Sierpinski in 1912 (see [125]) that in contrast to
the previous ones is closed. Notice that all the curves presented in Figs. 1.21.5 are
in two dimensions. However, Peano curves can be generalized for n > 2 dimensions
and such generalizations will be actually used in this book. To illustrate this point,
Fig. 1.6 shows the procedure of generation of the three-dimensional space-filling
curve.
Space-filling curves for a long time were considered by many people just like
monsters or a kind of noodling until Benot Mandelbrot has published his famous
book B. Mandelbrot, Les objets fractals: forme, hasard et dimension, Flammarion,
Paris, 1975, describing objects that everybody knows nowadays under the name
Mandelbrot has given to them: fractals. Space-filling curves are, in fact, examples of
fractalsobjects that are constructed using principles of self-similarity. They have
a number of amazing properties (the important and interesting notion of fractional
dimension is one of them) and can be frequently met in nature (see, e.g., [20, 25,
60, 89] and references given therein). Fractal objects have been broadly studied and
fractal models have been successfully applied in various fields. The reader interested
in space-filling curves and fractals can continue his/her studies using, for instance,
the following publications [20, 25, 60, 66, 80, 89, 90, 95, 99, 110112, 139, 148].
1 Introduction
1
1
0
1 1
level 1
1
1
0
1 1
level 2
1
1
0
1 1
level 3
1
1
0
1 1
level 4
(1.1)
21 y j 21 ,
1 j N},
(1.2)
RN is the N-dimensional Euclidean space and the objective function F(y) satisfies
the Lipschitz condition with a constant L, 0 < L < , i.e., for any two points y , y
D it is true that
1/2
(yj yj )2
j=1
(1.3)
a j w j b j,
1 j N},
(1.4)
(1.5)
= max{b j a j :
(1.6)
1 j N},
it is possible to keep up the initial presentation (1.2) for the domain of the search
(which is assumed to be the standard one) not altering the relations of Lipschitzian
properties in dimensions.
Numerical methods for finding solutions to the Lipschitz global optimization
problem (1.1) have been widely discussed in the literature (see [12, 17, 22
24, 26, 27, 37, 39, 41, 45, 49, 51, 55, 58, 64, 78, 86, 87, 91, 93, 107, 108, 113, 117,
118, 120, 123, 130, 136, 140142, 144, 147, 151, 152], etc.). There exist a number
of generalizations of the problem (1.1). Among them, global optimization problems
with multiextremal, non-differentiable, partially defined constraints deserve a special attention. However, due to the format of this monograph (Springer Brief) they
are not considered here. For the interested reader we advise to see the monograph
[139] together with the following publications [109, 119, 122124, 134, 137, 140]
where an original approach that does not require the introduction of penalties to
deal with constraints has been proposed and used together with the space-filling
curves.
The assumption (1.3) on the function F(y) says that the relative differences of
F(y) are bounded by the constant L. This assumption is very practical because it can
be interpreted as a mathematical description of a limited power of change present
in real systems. If we suppose that the constant L is known, then this information
can be successfully used to develop global optimization algorithms (see [2224, 37,
39,41,51,55,58,64,86,87,91,93,117,139,144,147,151], etc.). From the theoretical
point of view this supposition is certainly very useful. In practice, there exists a
variety of techniques allowing one to approximate L (see [12, 26, 27, 45, 51, 78, 93,
107, 108, 113, 118, 120, 139, 140], etc.). Some of these techniques will be introduced
and discussed in the subsequent chapters.
It is well known that Lipschitz global optimization algorithms (see, e.g., [22, 23,
38, 58, 59, 91, 93, 141, 142]) require, in general, substantially fewer function evaluations than the plain uniform grid technique. This happens because Lipschitz global
optimization methods in order to select each subsequent trial point (hereinafter
evaluation of the objective function at a point is called trial) use all the previously
1 Introduction
computed values of the objective function (see [139]). The one-dimensional case,
N = 1, has been deeply studied and powerful numerical algorithms allowing
us to work with it efficiently have been proposed in the literature (see, e.g.,
[22, 23, 58, 59, 91, 93, 117, 118, 139]). In this case, after k executed trials there are no
serious problems in choosing the point xk+1 of the next trial. In fact, this is reduced
to selecting the minimal value among k values, each of which is usually easy to
compute (see, e.g., [58, 59, 93, 117, 139]). If the dimension N > 1, then the relations
between the location of the next evaluation point and the results of the already
performed evaluations become significantly more complicated. Therefore, finding
the point xk+1 that is optimal with respect to certain criteria is usually the most
time-consuming operation of the multidimensional algorithm, and its complexity
increases with the growth of the problem dimension.
This happens because an optimal selection of xk+1 turns into solving at each
step of the search process an auxiliary multidimensional optimization problem of
an increasing multi-extremality along with the accumulation of the trial points. As a
result, an algorithm aiming to effectively use the acquired search information to
reduce the number of trials needed to estimate the sought optimum also includes
an inherent multi-extremal optimization problem (see [58, 93, 117, 134, 139] where
this subject is covered in much greater detail). But, as was already mentioned, the
case N = 1 is effectively solvable. Therefore it is of great interest to reduce
the multivariate optimization problem (1.1) to its one-dimensional equivalent, which
could be then effectively solved by using techniques developed for dealing with
one-dimensional global optimization problems.
A possible way to do so (see, e.g., [9,131135,139]) is to employ a single-valued
Peano curve y(x) continuously mapping the unit interval [0, 1] from the x-axis onto
the hypercube (1.2) and, thus, yielding the equality
F = F(y ) = F(y(x )) = min{F(y(x)) : x [0, 1]}.
(1.7)
As was already said in the previous section, these curves, first introduced by Peano
in [88] and Hilbert in [62], fill in the cube D, i.e., they pass through every point of
D giving so the possibility to construct numerical univariate algorithms for solving
the problem (1.7) and, therefore, the original problem (1.1). Putting this possibility
in practice is the main goal of this monograph that both describes how to build
approximations to Peano curves on a computer and introduces a number of efficient
global optimization algorithms using these approximations.
Chapter 2
10
y2
D(2)
D(3)
y1
y1
D(0)
D(1)
D(0,0) D(0,3) D(1,0) D(1,1)
M=2
M=1
Fig. 2.1 Case N = 2. Subcubes of the first partition (left picture) and of the second partition (right
picture) of the initial cube D
d(0)
0
)[
)[
d(1)
)[
)[
)[
)[
)[
)[
d(2)
)[
)[
)[
)[
)[
d(3)
)[
)[
)[
)[
)[
)[
)[
d(0,0) d(0,1) d(0,2) d(0,3) d(1,0) d(1,1) d(1,2) d(1,3) d(2,0) d(2,1) d(2,2) d(2,3) d(3,0) d(3,1) d(3,2) d(3,3)
Fig. 2.2 Case N = 2. Subintervals d(z1 ) of the first partition and subintervals d(z1 , z2 ) of the
second partition of the unit interval [0, 1] on the x-axis
(2.1.1)
where 0 z j 2N 1, 1 j M.
Next, cut the interval [0,1] on the x-axis into 2N equal parts; each particular part
is designated d(z1 ), 0 z1 2N 1: the numeration streams from left to right along
the x-axis. Then, once again, cut each of the above parts into 2N smaller (equal)
parts, etc. Designate d(z1 , . . . , zM ), 0 z j 2N 1, 1 j M, the subinterval of
the Mth partition; the length of any such interval is equal to 2MN . Assume that
each interval contains its left-end-point, but it does not contain its right-end-point;
the only exception is for the case when the right-end-point is equal to unity, which
corresponds to the relations z1 = z2 = . . . = zM = 2N 1. Obviously,
[0, 1] d(z1 ) d(z1 , z2 ) . . . d(z1 , . . . , zM ) ;
case N = 2 is illustrated by Fig. 2.2 (for M = 1 and M = 2).
(2.1.2)
11
(2.1.3)
0 v = i 2i < 1,
(2.1.4)
i=1
N1
jNi 2i,
1 jM.
(2.1.5)
i=0
In the sequel, the interval (2.1.3) will also be referred to as d(M, v). The relations
(2.1.4), (2.1.5) provide a basis for computing the parameters from one side of the
identity
d(M, v) = d(z1 , . . . , zM )
(2.1.6)
(i.e., M, v or z1 , . . . , zM ) being given the parameters from the other side of this
identity (i.e., z1 , . . . , zM or M, v).
Now, establish a mutually single-valued correspondence between all the
subintervals of any particular Mth partition and all the subcubes of the same
Mth partition by accepting that d(M, v) from (2.1.6) corresponds to D(z1 , . . . , zM )
and vice versa. The above subcube will also be designated D(M, v), i.e.,
D(M, v) = D(z1 , . . . , zM ),
(2.1.7)
where the indexes z1 , . . . , zM have the same values as in (2.1.6) and they could be
computed through (2.1.3)(2.1.5).
In accordance with (2.1.1) and (2.1.2), the introduced correspondence
satisfies the
Condition 1. D(M + 1, v ) D(M, v ) if and only if d(M + 1, v ) d(M, v ).
We also require this correspondence to satisfy the following
Condition 2. Two subintervals d(M, v ) and d(M, v ) have a common end-point
(this point may only be either v or v ) if and only if the corresponding subcubes D(M, v ) and D(M, v ) have a common face (i.e., these subcubes must be
contiguous).
Two linked systems of partitioning (i.e., the partitioning of the cube D from
(1.2) and the partitioning of the unit interval [0, 1] on the x-axis) that meet the
12
above two conditions provide the possibility for constructing the evolvent curve
which may be employed in (1.7). Note that Condition 1 is already met, but
Condition 2 has to be ensured by a special choice of numeration for the subcubes
D(z1 , . . . , zM ), M 1, which actually establishes the juxtaposition of the subcubes
(2.1.7) to the subintervals (2.1.6). The particular scheme of such numeration
suggested in [132, 134] will be introduced in the next section.
Theorem 2.1. Let y(x) be a correspondence defined by the assumption that for any
M 1 the image y(x) D(M, v) if and only if the inverse image x d(M, v). Then:
1. y(x) is the single-valued continuous mapping of the unit interval [0, 1] onto the
hypercube D from (1.2); hence, y(x) is a space-filling curve.
2. If F(y), y D, is Lipschitzian with some constant L, then the univariate function
1
F(y(x)), x [0,
1], satisfies Holder conditions with the exponent N and the
coefficient 2L N + 3, i.e.,
(2.1.9)
Therefore, either there is some interval d(M, v) containing both points x , x or there
are two intervals d(M, v ), d(M, v ) having a common end-point and containing x
and x in the union. In the first case, y , y D(M, v) and
|y j (x ) y j (x )| 2M ,
1 jN.
(2.1.10)
In the second case, y(x ) D(M, v ), y(x ) D(M, v ), but the subcubes D(M, v )
and D(M, v ) are contiguous due to Condition 2. This means that for some particular
index k, 1 k N,
|yk (x ) yk (x )| 2(M1) ;
(2.1.11)
but for all integer values j = k, 1 j N, the statement (2.1.10) is still true.
From (2.1.10) and (2.1.11),
13
1/2
[yl (x ) yl (x
2
)]
l=1
1/2
(N 1)22M + 22(M1)
= 2M N + 3,
and, in consideration of (2.1.9), we derive the estimate
(2.1.12)
whence it follows that the Euclidean distance between the points y(x ) and y(x )
vanishes with |x x | 0. Finally, employ (1.3), (2.1.12) to substantiate the
relation (2.1.9) for the function F(y(x)), x [0, 1], which is the superposition of
the Lipschitzian function F(y), y D, and the introduced space-filling curve y(x).
Once again, we recall that the space-filling curve (or Peano curve) y(x) is
defined as a limit object emerging in some sequential construction. Therefore,
in practical application some appropriate approximations to y(x) are to be used.
Particular techniques for computing such approximations (with any preset accuracy)
are suggested and substantiated in [46, 132, 134, 138]. Some of them are presented
in the next section.
14
From this definition, if two subintervals d(M, v ), d(M, v ) have a common
end-point, then the corresponding vectors (z1 , . . . , zM ), (z1 , . . . , zM ) from (2.1.6)
have to be adjacent.
Therefore, Condition 2 from Sect. 2.1 is possible to interpret as the necessity for
any two adjacent subcubes D(z1 , . . . , zM ), D(z1 , . . . , zM ) to have a common face (i.e.,
to be contiguous).
Introduce the auxiliary hypercube
= {y RN : 21 y j 3 21, 1 j N}
(2.2.1)
1 j N,
uN (s) = N1 ,
(2.2.2)
where j , 0 j < N, are the digits in the binary presentation of the number s:
s = N1 2N1 + . . . + 0 20 .
(2.2.3)
Theorem 2.2. The numeration of subcubes (s) set by the relations (2.2.2), (2.2.3)
ensures that:
1. All the centers u(s), 0 s 2N 1, are different.
2. Any two centers u(s), u(s+1), 0 s < 2N 1, are different just in one coordinate.
3.
u(0) = (0, . . . , 0, 0),
u(2N 1) = (0, . . . , 0, 1) .
(2.2.4)
Proof. 1. Consider the first statement and assume the opposite, i.e., that the
relations (2.2.2) juxtapose the same center to the different numbers s, s :
u(s) = u(s ),
s = s ,
0 s, s 2N 1 ;
(2.2.5)
(2.2.6)
From (2.2.2), (2.2.3), (2.2.6) and the first equality in (2.2.5) follows that
N1 = N1
(2.2.7)
15
and
( j + j1)mod 2 = ( j + j1
)mod 2,
1 j<N.
(2.2.8)
The relations (2.2.8) imply that for any integer j, 1 j < N, either the equalities
j = j ,
j1 = j1
j = j ,
j1 = j1
(2.2.9)
or the equalities
(2.2.10)
have to be true; here is the negation symbol inverting the value of the binary
digit (i.e., 0 = 1, 1 = 0).
Suppose that (2.2.9) is true for any integer j in the range 1 j k < N. This
implies the validity of the relations
k = k ,
k1 = k1
.
(2.2.11)
If k + 1 < N, then the conditions (2.2.10) could not be met for j = k + 1 because
the equalities
k+1 =k+1
,
k =k
1 j < N,
(2.2.13)
follows that
u j (s + 1) = u j (s),
1 j N,
j = N k + 1 .
(2.2.14)
16
If k = 1, then
uN (s + 1) =N1 =uN (s) ;
(2.2.15)
(2.2.16)
y RN ,
(2.2.17)
p = (21 , . . . , 21 ) RN ,
(2.2.18)
g(D) =
(2.2.19)
with
and assume that the subcube D(z1 ) has the number z1 = s if and only if D(z1 ) is the
inverse image of (s), i.e.,
g(D(z1 )) = (s) .
(2.2.20)
To employ the above approach for numbering the second partition subcubes
D(z1 , z2 ), 0 z2 2N 1, from D(z1 ), 0 z1 2N 1, we introduce the linear
mappings
g(z1 ; y) = 22 {y [u(z1) p]21} + p
meeting the conditions
g(z1 ; D(z1 )) = ,
0 z1 2N 1,
(2.2.21)
17
and assume that the subcube D(z1 , z2 ) from D(z1 ) has the index z2 = s if and only if
D(z1 , z2 ) is the inverse image of (s), i.e.,
g(z1 ; D(z1 , z2 )) = (s) .
(2.2.22)
Note that u(z1 ) from (2.2.21) is the center of the subcube (z1 ) juxtaposed to D(z1 )
by the relation (2.2.20).
The suggested numeration of the second-partition subcubes D(z1 , z2 ) with
indexes z2 , 0 z2 2N 1, by means of the scheme (2.2.21), (2.2.22) ensures
contiguity of any two adjacent subcubes D(z1 , z2 ), D(z1 , z2 + 1), 0 z2 < 2N 1,
from the same cube D(z1 ). But there is still a problem to be solved. Any two
subcubes D(z1 , 2N 1) and D(z1 + 1, 0), 0 z1 < 2N 1, are adjacent too and,
therefore, they should also have a common face. This means that there should be
some special linkage in the numerations of elements D(z1 , z2 ), 0 z2 2N 1,
from different subcubes D(z1 ), 0 z1 2N 1.
To provide the basis for such a linkage we, first, introduce a variety of
numerations for the elements (s), 0 s 2N 1. As follows from (2.2.4), the
numeration defined by the rules (2.2.2), (2.2.3) ensures that the initial center u(0)
is the zero-vector and that the centers u(0) and u(2N 1) differ only in the Nth
coordinate. The permutation of uN and ut in u(s) resulting in the vector designated
ut (s) = (u1 (s), . . . , ut1 (s), uN (s), ut+1 (s), . . . , uN1 (s), ut (s)),
1 t N, does not change the initial vector, i.e.,
ut (0) = u(0),
1 t N,
and moves the only nonzero coordinate in u(2N 1) to the t-th position, which
means that ut (0) and ut (2N 1) are different only in the t-th coordinate:
uti (2N
1) =
uti (0), i = t,
uti (0), i = t.
(2.2.23)
1 i N,
(2.2.24)
ensures that the used vector q is the center of the initial subcube (0), i.e.,
utq (0) = q .
Thereby, these two operations allow us to construct the numeration which assures
that the initial subcube (0) has the preset center q and the centers of the subcubes
(0) and (2N 1) are different only in the t-th coordinate, i.e., they satisfy the
condition (2.2.23) for any preset integer t, 1 t N.
18
(2.2.25)
(2.2.26)
We assume that the centers of the initial subcube D(z1 , 0) and of the last subcube
D(z1 , 2N 1) from the cube D(z1 ), 0 z1 2N 1, should also be different just
in one coordinate and that the function (2.2.25) determines the number of this
coordinate.
We also introduce a binary function
ui (s), i = 1,
wi (s + 1) = wi (s) =
(2.2.27)
ui (s), 2 i N,
where s is supposed to be the odd number and
w(0) = u(0) ;
(2.2.28)
as already mentioned is a negation sign inverting the value of the binary digit. The
binary vector w(z1 ) is to be used for determining the position of the center of the
subcube (0) employed in (2.2.22) with z2 = 0, i.e., it will used to set the value of
the vector q embedded into (2.2.24).
Now, we suggest carrying out the numbering of the subcubes D(z1 , z2 ), 0 z2
2N 1, in the following way. From (2.2.25) to (2.2.28), compute the values l =
l(z1 ), w = w(z1 ) for the given index z1 , 0 z1 2N 1. Select
t = l(z1 ),
q = w(z1 )
(2.2.29)
and, using the above described permutations and additions, determine the vectors
utq (s), 0 s 2N 1; note that the vector function utq (s) may be different for
different values of z1 , 0 z1 2N 1. Finally, we employ (2.2.22) to number
the subcubes D(z1 , z2 ), 0 z2 2N 1, under the condition that the index s of
19
s
0
1
2
3
(0 , 1 )
(0,0)
(1,0)
(0,1)
(1,1)
z1
0
1
2
3
utq (0)
(0,0)
(0,0)
(0,0)
(1,1)
(u1 , u2 )
(0,0)
(1,0)
(1,1)
(0,1)
utq (1)
(0,1)
(1,0)
(1,0)
(1,0)
l
1
2
2
1
utq (2)
(1,1)
(1,1)
(1,1)
(0,0)
utq (3)
(1,0)
(0,1)
(0,1)
(0,1)
D(2)
D(3)
D(0)
(w1 , w2 )
(0,0)
(0,0)
(0,0)
(1,1)
l(3)=1
l(2)=2
l(0)=1
l(1)=2
D(1)
the subcube (s) from the right-hand side of (2.2.22) has to be identical with the
number of the center utq (s) of this subcube generated for the given index z1 in the
above way.
Tables 2.1, 2.2 and Fig. 2.3 illustrate the role of the numbers l from (2.2.25),
(2.2.26) and of the vectors w from (2.2.27), (2.2.28) in establishing the numeration
of the second-partition subcubes D(z1 , z2 ) which were already pictured in Fig. 2.1
(case N = 2, M = 2). Table 2.1 presents the vectors (0 , 1 ) with 0 , 1 from (2.2.3),
u(s) from (2.2.2), w(s) from (2.2.27), (2.2.28), and the number l(s) from (2.2.25),
(2.2.26) as the functions of s, 0 s 3. Table 2.2 contains the sets of the centers
utq (s), 0 s 3, juxtaposed to the subcubes (s) from (2.2.22) for the given index
z1 , 0 z1 3. These centers are computed from (2.2.24) under the conditions
(2.2.29).
Circles in Fig. 2.3 mark the centers of the first-partition subcubes D(z1 ),
0 z1 3. These centers are linked with the dotted-line arrows in the order of
20
numeration. The corresponding centers u(s) juxtaposed to the subcubes (s) from
(2.2.20) while numbering D(z1 ), 0 z1 3, are given in the third column of
Table 2.1.
Red dots in Fig. 2.3 mark the centers of the second-partition subcubes D(z1 , z2 ).
Centers of the adjacent subcubes are linked with solid-line arrows streaming from
the initial subcube D(0, 0) to the last subcube D(3, 3). Centers utq (s) juxtaposed to
the subcubes (s) from (2.2.22) while numbering D(z1 , z2 ), 0 z2 3, are given
in Table 2.2 (each row of this table corresponds to some particular value of the first
index z1 ).
The picture allows us to clarify the role of the vector w(z1 ) from the last column
of Table 2.1 which points the position for the center of the initial subcube D(z1 , 0)
from the next partition of the cube D(z1 ). As it is clear from the picture, the values
l(z1 ) and w(z1 ) are coherent in such a way that the centers of the subcubes D(z1 , 3)
and D(z1 + 1, 0), 0 z1 < 3, are different just in one coordinate, though these
adjacent subcubes belong to different cubes of the foregoing partition. The centers
of such cubes are linked with thick arrows in Fig. 2.3.
Let us consider now how to link numerations in subsequent partitions. Note that
the already considered two cases (numbering in the first partition and numbering in
the second partition) were treated in somewhat different ways. In the first case we
used the relations (2.2.17)(2.2.20) juxtaposing the centers u(s) from (2.2.2), (2.2.3)
to the cubes (s). In the second case we employed the relations (2.2.21), (2.2.22)
and the cubes (s) which juxtaposed the centers utq (s) from (2.2.24) linked with
the corresponding centers u(z1 ) due to (2.2.25)(2.2.29); in fact, each center utq (z2 )
depends also on some value z1 (we use the short notation utq (s) just to compact the
writing; this should not cause any confusion).
It is possible to unify both considered cases by introducing the linear mappings
M
(2.2.30)
j=1
and assuming that the subcube D(z1 , . . . , zM , zM+1 ) is characterized by the index
zM+1 if and only if
g(z1 , . . . , zM ; D(z1 , . . . , zM , zM+1 )) = (s) ;
(2.2.31)
here the cube (s) is the one having the center utq (s) from (2.2.24) with
N,
M = 0,
t = t(zM+1 ) =
(2.2.32)
l(zM ), M > 0,
and
q = q(zM+1 ) =
(0, . . . , 0) RN , M = 0,
M > 0.
w(zM ),
(2.2.33)
21
If M = 1, then the relations (2.2.30) and (2.2.32), (2.2.33) are, respectively, identical
to the relations (2.2.21) and (2.2.29). If M = 0, which corresponds to the numeration
in the first partition, then (2.2.30) is identical to (2.2.17) and application of (2.2.24)
in conjunction with (2.2.32), (2.2.33) yields
utq (s) = u(s),
0 s 2N 1 .
Thus, (2.2.30), (2.2.31) together with (2.2.24), (2.2.32), (2.2.33), and (2.2.25)
(2.2.28) combine the rules for numbering in the first and in the second partitions.
Moreover, it is possible to generalize this scheme for any M > 1. The only
amendment needed is to accept that the rule (2.2.24) transforming u(s) into utq (s)
has to be appended with similar transformation for the vector w(s)
t
wtq
i (s) = (wi (s) + qi )mod 2,
1 i N,
(2.2.34)
N, l(s) = t,
lt (s) = t, l(s) = N,
(2.2.35)
where t is the pointer used in the permutations yielding ut (s) and wt (s).
It has to be clarified that all the values u(zM ), l(zM ), w(zM ) embedded into
the right-hand sides of the expressions (2.2.27), (2.2.32), (2.2.33) to produce the
subsequent auxiliary values w, t, q for the numeration in the next partition are
functions of the corresponding values u, l, w generated in the foregoing partition.
Once again, we stress that utq (zM+1 ), wtq (zM+1 ), and lt (zM+1 ) are dependent on
z1 , .., zM if M 1.
Theorem 2.3. The introduced system of the linked numerations ensures the contiguity of any two adjacent subcubes from any Mth (M 1) partition of the cube D
from (1.2); see [132].
Proof. 1. Consider any two adjacent subcubes D(z1 ) and D(z1 + 1), 0 z1 < 2N
1, of the first partition mapped by the correspondence (2.2.17) onto the auxiliary
subcubes (z1 ) and (z1 + 1); see (2.2.20). As already proved in Theorem 2.2,
the centers u(z1 ), u(z1 + 1), 0 z1 < 2N 1, of the subcubes (z1 ), (z1 + 1) are
different just in one coordinate if they are numbered in accordance with the rules
(2.2.2), (2.2.3). That is, the subcubes (z1 ), (z1 + 1) have to be contiguous and,
therefore, the corresponding cubes D(z1 ), D(z1 + 1) are contiguous too.
Suppose that the Theorem is true for any adjacent subcubes of the k-th
partition of the cube D, where 1 k M. Then it is left to prove that it is also
true for the adjacent subcubes of the (M + 1)st partition.
As long as for the given z1 , 0 z1 2N 1, the set of all the subcubes
D(z1 , z2 , . . . , zM+1 ) constitutes the Mth partition of the cube D(z1 ), then, due
to the assumption, all the adjacent subcubes D(z1 , z2 , . . . , zM+1 ) from D(z1 ) are
22
(2.2.36)
[utq (z j ) p]2 j
(2.2.37)
j=1
yi (z1 , 2N 1, . . . , 2N 1) yi(z1 + 1, 0, . . ., 0)
=
0,
i = l ,
=
2(M+1) , i = l ;
(2.2.38)
i.e., the centers of the cubes from (2.2.36) have to be different just in one, l-th,
coordinate and the absolute difference in this coordinate has to be equal to the
edge length for the (M + 1)st partition subcube. We proceed with computing the
estimate for the left-hand side of (2.2.38) for the accepted system of numeration.
2. Introduce the notations u(z1 , . . . , zM ; zM+1 ), w(z1 , . . . , zM ; zM+1 ) for the
vectors utq (zM+1 ), wtq (zM+1 ) corresponding to the particular subcube
D(z1 , . . . , zM , zM+1 ) from the cube D(z1 , . . . , zM ).
Suppose that z1 = 2k 1, 1 k 2N1 1, i.e., z1 is the odd number and
z1 < 2N 1, and consider the sequence of indexes z1 , z2 , . . . ; z j = 2N 1, j 2.
First, we study the sequence of numbers t(z j ), j 1, corresponding to the
introduced sequence of indexes. From (2.2.32),
t(z1 ) = N
(2.2.39)
(2.2.40)
23
j = 2 + 1, 1,
1,
t(z j ) = l(z1 ), j = 2,
N,
j = 1, j = 2 , 2.
(2.2.41)
From (2.2.33), q(z1 ) = (0, . . . , 0) and, with account of (2.2.24), (2.2.33), (2.2.34)
and (2.2.41), we derive the relations
utq (z1 ) = u(z1 ),
(2.2.42)
wi (z1 ), i = l(z1 ),
wi (z1 ), i = l(z1 ),
1 i N,
(2.2.43)
ui (z1 ), i = 1, i = l(z1 ),
ui (z1 ), i = 1, i = l(z1 ),
1 i N.
(2.2.44)
In the analogous way, from (2.2.4), (2.2.27), (2.2.33), (2.2.34), (2.2.40), and
(2.2.41), obtain
qi (z3 ) = wi (z1 ; 2 1) =
N
ui (z1 ), i = l(z1 ),
ui (z1 ), i = l(z1 ),
1 i N.
(2.2.45)
(2.2.46)
wi (z1 , 2N 1; 2N 1) =
ui (z1 ), l(zi ) = i = 1, l(z1 ) = i = N, l(z1 ) = i = N,
ui (z1 ), l(zi ) = i = 1, l(z1 ) = i = N, l(z1 ) = i = N,
24
(2.2.47)
This means that each subsequent repetition of the above discourse will just add
one more parameter (equal to 2N 1) into the left-hand side of (2.2.47).
Therefore, for any M > 1
u(z1 , 2N 1, . . . , 2N 1; 2N 1) = u(z1 ; 2N 1),
which being substituted into (2.2.37) yields
y(z1 , . . . , zM , zM+1 ) = y(z1 , 2N 1, . . . , 2N 1) =
1
= {u(z1 ) + (1 2M )u(z1 ; 2N 1) (2 2M)p}.
2
(2.2.48)
(2.2.49)
q(z1 + 1) = (0, . . . , 0) .
q(z2 ) = w(z1 ) .
1 t N,
(2.2.50)
(2.2.51)
25
(2.2.52)
(2.2.53)
ui (z1 ; 2N 1) = wi (z1 ),
i = l(z1 ) .
1 i N.
(2.2.54)
26
; 2N
1) =
=
1 i N,
(2.2.55)
ui (z1 ), i = 1,
1,
ui (z1 ), i =
(2.2.56)
q(z3 ) = u(z1 ),
(2.2.57)
(2.2.58)
(2.2.59)
q(z5 ) = u(z1 ),
27
(2.2.60)
1 i N.
(2.2.61)
ui (z1 ), i = 1,
1,
ui (z1 ), i =
1 i N.
(2.2.62)
Recall in this occasion that z1 is the even integer. Therefore, due to (2.2.61), we
obtain that w(z1 + 1) = u(z1 ). The last equality, in conjunction with (2.2.60) and
(2.2.37), implies
y(z1 + 1, z2 , . . . , zM+1 ) = y(z1 + 1, 0, . . ., 0) =
1
= {u(z1 + 1) + (1 2M)u(z1 ) (2 2M )p}.
2
(2.2.63)
Now, from (2.2.48) and (2.2.63) follows the validity of (2.2.38) also for even
indexes z1 > 0 because, due to (2.2.55), (2.2.62),
u(z1 ; 2N 1) = u(z1 + 1)
and the vectors u(z1 ; 2N 1) and u(z1 ) are different only in the first coordinate
(i.e., l = 1); see (2.2.55).
4. Suppose that z1 = 0 and consider the sequence of indexes z1 , z2 , . . . ; z j = 2N
1, j 2. In this case, from (2.2.26), (2.2.32) and (2.2.35) follows the relation for
the parameter t in the operation of permutation
t(z j ) =
1, j = 2 , 1,
N, j = 2 + 1, 0.
(2.2.64)
28
t(z4 ) = t(z2 ) = 1,
u(0, 2N 1, 2N 1; 2N 1) = u(0, 2N 1; 2N 1) =
= u(0; 2N 1) = u(1),
i.e., the case j = 4 is the reproduction of the state of discourse at j = 2. Therefore
for any M > 1:
u(0, 2N 1, . . . , 2N 1; 2N 1) = u(0; 2N 1) = u(1) ;
(2.2.65)
(2.2.66)
(2.2.67)
1 jN.
29
This allows us to outline the following scheme for computing the approximation
y(z1 , . . . , zM ) for any point y(x), x [0, 1], with the preset accuracy , 0 < < 1:
1. Select the integer M (ln / ln 2 + 1).
2. Detect the interval d(M, v) containing the inverse image x, i.e., x d(M, v) =
[v, v + 2MN ] and estimate the indexes z1 , . . . , zM from (2.1.4), (2.1.5).
3. Compute the center y(z1 , . . . , zM ) from (2.2.37). This last operation is executed
by sequential estimation of the centers utq (z j ), 1 j M, from (2.2.24) with t
from (2.2.32), (2.2.35) and q from (2.2.33), (2.2.34).
In all the above numerical examples the curve y(x) was approximated by (2.2.37)
at N = 2, M = 10.
Remark 2.1. The centers (2.2.37) constitute a uniform orthogonal net of 2MN nodes
in the hypercube D with mesh width equal to 2M . Therefore, all the points
x d(z1 , . . . , zM ) have the same image y(z1 , . . . , zM ). But in some applications it
is preferable to use a one-to-one continuous correspondence lM (y) approximating
Peano curve y(x) with the same accuracy as is ensured by the implementation of
(2.2.37).
A piecewise-linear curve of this type is now described; it maps the interval [0,1]
into (not onto) the cube D, but it covers the net constituted by the centers (2.2.37).
Establish the numeration of all the intervals (2.1.3) constituting the Mth partition
of the interval [0, 1] by subscripts in increasing order of the coordinate:
d(z1 , . . . , zM ) = [vi , vi + 2MN ),
0 i 2MN 1 .
Next, assume that the center y(z1 , . . . , zM ) of the hypercube D(z1 , . . . , zM ) is assigned
the same number (the superscript) as the number of the subinterval d(z1 , . . . , zM )
corresponding to this subcube, i.e.,
yi = y(z1 , . . . , zM ),
0 i 2MN 1 .
This numeration ensures that any two centers yi , yi+1 , 0 i < 2MN 1, correspond
to the contiguous hypercubes (see Condition 2 from Sect. 2.1), which means that
they are different just in one coordinate.
Consider the following curve l(x) = lM (x) mapping the unit interval [0, 1] into
the hypercube D from (1.2):
l(x) = yi + (yi+1 yi )[(w(x) vi )/(vi+1 vi )],
(2.2.68)
0x1.
(2.2.69)
30
0 i < 2MN 1,
(2.2.70)
generated by this curve is the linear segment connecting the nodes yi , yi+1 and, thus,
l(x), 0 x 1, is the piecewise-linear curve running through the centers yi , 0
i 2MN 1 in the order of the established numeration. The curve l(x) = lM (x)
henceforth to be referred to as a Peano-like piecewise-linear evolvent because it
approximates the Peano curve y(x) from Theorem 2.1 with accuracy not worse than
2M in each coordinate; note that M is the parameter of the family of curves (2.2.68)
as long as it determines the number and the positions of the nodes (2.2.37) used in
the construction of l(x). For the sake of illustration, Fig. 2.4 presents the image of
the interval [0, 1] generated by l(x) at N = 2, M = 3 (the corresponding centers
yi , 0 i 63, are marked by red dots).
Remark 2.2. The expression (2.2.68), (2.2.69) allow us to determine the point l(x)
for any given x [0, 1] by, first, estimating the difference
31
0, k = ,
tq
[uk (zM ) 21]2(M1) , k = , zM = 2N 1,
, k = , zM =
2N 1,
where utq (zM ) is from (2.2.37). Now, it is left to outline the scheme for computing
the number .
Represent the sequence z1 , . . . , zM as z1 , . . . , z , z +1 , . . . , zM where 1 M
and z = 2N 1, z +1 = . . . = zM = 2N 1; note that the case z1 = . . . = zM = 2N 1
is impossible because the center y(2N 1, . . . , 2N 1) does not coincide with the
node yq , q = 2MN 1. As it follows from the construction of y(x), the centers
y(z1 , . . . , z , 2N 1, . . ., 2N 1)
and
y(z1 , . . . , z 1 , z + 1, 0, . . ., 0)
corresponding to the adjacent subcubes are different in the same coordinate as the
auxiliary centers
u(z1 , . . . , z 1 ; z )
and u(z1 , . . . , z 1 , z + 1) ;
see the notations introduced in the second clause from the proof of Theorem 2.3.
Therefore, if z is the odd number, then, in accordance with (2.2.25),
(z1 , . . . , zM ) = l(z1 , . . . , z 1 ; z ) .
If z is even, then from (2.2.26), (2.2.32), (2.2.62), and the permutation rule,
(z1 , . . . , zM ) =
1, t =
N,
N, t = 1,
(2.2.71)
32
which are similar to (2.1.9), then justification of the relations (2.1.9) is just a
reproduction of the corresponding discourse from the proof of Theorem 2.1.
Suppose that the conditions (2.2.71) are met at n M. If the points x , x are from
the same interval (2.2.70) and the corresponding images l(x ), l(x ) belong to the
same linear segment connecting the nodes yi , yi+1 , which are different just in one
coordinate, then from (2.2.68), (2.2.69), and (2.2.71),
||l(x ) l(x )|| = 2MN ||yi yi+1 ||(1 2MN )|x x |
2M(N1) 2nN = 2 2(n+1)2(Mn)(N1) 2(|x x |)1/N
(2.2.72)
because
||yi yi+1 || = 2M .
If the points l(x ), l(x ) belong to two different linear segments linked at the
common end-point yi+1 , then
||l(x ) l(x )|| ||l(x ) yi || + ||yi+1 l(x )|| =
= ||yi+1 yi ||
1 (w(x ) vi )2MN
+ ||yi+2 yi+1 ||
w(x ) vi+1
=
= 2M(N1) (|w(x ) vi+1 | + |w(x ) vi+1|)
2 2M(N1) |x x | < 2M(N1) 2nN < 2(|x x |)1/N ,
which is equivalent to (2.2.72). Therefore, in consideration of the function g(y), y
D, being Lipschitzian, we obtain the relation
x , x [0, 1] .
(2.2.73)
(2.2.74)
33
Fig. 2.5 Piecewise-linear curves covering the set (2.2.75) at N = 2, M = 3: spiral (the left picture)
and TV evolvent (the right picture); nodes of the set (2.2.75) are marked by the red dots
We confine our consideration to the class of piecewise-linear curves and characterize the complexity of any particular curve from this family by the number of
linear segments it is built of (each linear segment is assumed to be parallel to one of
the coordinate axes). Spiral and TV evolvents (the images of the unit interval [0, 1]
generated by these curves are given in Fig. 2.5; case N = 2, M = 3) are clearly from
this family and they are much simpler than the Peano-like curve l(x); see Fig. 2.4 (in
both figures the nodes of the grid (2.2.75) are marked with red dots). For example,
the TV evolvent is possible to present in the parametric form t(x), 0 x 1, by the
following coordinate functions
t1 (x) = (1)q+1 21 {2M 1 + | | },
t2 (x) = 21 {(1 + 2q)2M 1 + | | },
(2.2.75)
34
whence it follows
||s(x ) s(x )|| > 2M(11/N)1 (|x x |)1/N .
This means that there does not exist any coefficient that ensures the validity of a
relation similar to (2.1.12), (2.2.72) and does not dependent on M.
The Peano curve y(x) is defined as the limit object and, therefore, only approximations to this curve are applicable in the actual computing. The piecewise linear
evolvent l(x) = lM (x) suggested above covers all the nodes of the grid H(M, N)
from (2.2.74) and, thus, it allows us to ensure the required accuracy in analyzing
multidimensional problems by solving their one-dimensional images produced by
the implementation of l(x). But this evolvent has some deficiencies.
The first one is due to the fact that the grid H(M + , N) with mesh width equal
to 2(M+ ) does not contain the nodes of the less finer grid H(M, N). Therefore, in
general, the point lM (x ) may not be covered by the curve lM+ (x), 0 x 1, and,
hence, the outcomes already obtained while computing the values F(lM (x)) will not
be of any use if the demand for greater accuracy necessitates switching to the curve
lM+ (x), 1. This difficulty is possible to overcome by setting the parameter M
equal to a substantially larger value than seems to be sufficient at the beginning of
the search.
Another problem arises from the fact that l(x) is a one-to-one correspondence
between the unit interval [0, 1] and the set {l(x) : 0 x 1} D though the Peano
curve y(x) has a different property: the point y D = {y(x) : 0 x 1} could have
several inverse images in [0, 1] (but not more than 2N ). That is, the points y D could
be characterized by their multiplicity with respect to the correspondence y(x). This
is due to the fact that though each point x [0, 1] is contained just in one subinterval
of any Mth partition, some subcubes corresponding to several different subintervals
of the same Mth partition (e.g., all the subcubes of the first partition) could have
a common vertex. Therefore, some different inverse images x , x [0, 1], x = x ,
could have the same image, i.e., y(x ) = y(x ).
This multiplicity of points y D with respect to the correspondence y(x) is
the fundamental property reflecting the essence of the dimensionality notion: the
segment [0, 1] and the cube D are sets of equal cardinality and the first one could
be mapped onto the other by some single-valued mapping, but if this mapping
is continuous, then it could not be univalent (i.e., it could not be a one-to-one
correspondence), and the dimensionality N of the hypercube D determines the
bound from above (2N ) for the maximal possible multiplicity of y(x).
Therefore, the global minimizer y of the function F(y) over D could have several
inverse images xi , 1 i m, i.e., y = y(xi ), 1 i m, which are the global
minimizers of the function F(y(x)) over [0, 1].
To overcome the above deficiencies of l(x), we suggest one more evolvent n(x) =
nM (x) mapping some uniform grid in the interval [0, 1] onto the grid P(M, N) in the
hypercube D from (1.2) having mesh width equal to 2M (in each coordinate) and
meeting the condition
P(M, N) P(M + 1, N) .
(2.2.76)
35
The evolvent n(x) approximates the Peano curve y(x) and its points in D own the
property of multiplicity; each node of the grid P(M, N) could have several (but not
more than 2N ) inverses in the interval [0, 1].
Construction of n(x). Assume that the set of nodes in P(M, N) coincides with
the set of vertices of the hypercubes D(z1 , . . . , zM ) of the Mth partition. Then the
mesh width for such a grid is equal to 2M and the total number of all the nodes in
P(M, N) is (2M + 1)N . As long as the vertices of the Mth partition hypercubes are
also the vertices of some hypercubes of any subsequent partition M + , 1, then
the inclusion (2.2.77) is valid for the suggested grid.
Note that each of 2N vertices on any hypercube D(z1 , . . . , zM ) of the Mth
partition is simultaneously the vertex of just one hypercube D(z1 , . . . , zM , zM+1 ) from
D(z1 , . . . , zM ). Denote P(z1 , . . . , zM+1 ) the common vertex of the hypercubes
D(z1 , . . . , zM , zM+1 ) D(z1 , . . . , zM ) .
(2.2.77)
Due to (2.2.37), the center of the hypercube from the left-hand part of (2.2.77) and
the center of the hypercube from the right-hand part of (2.2.77) are linked by the
relation
y(z1 , . . . , zM+1 ) = y(z1 , . . . , zM ) + (utq (zM+1 ) 21)2(M+1) ,
whence it follows that
n(z1 , . . . , zM+1 ) = y(z1 , . . . , zM ) + (utq (zM+1 ) 21)2M ,
(2.2.78)
and varying zM+1 from 0 to 2N 1 results in computing from (2.2.78) all the 2N
vertices of the hypercube D(z1 , . . . , zM ).
Formula (2.2.78) establishes the single-valued correspondence between 2(M+1)N
intervals d(z1 , . . . , zM+1 ) of the Mth partition of [0, 1] and (2M + 1)N nodes
n(z1 , . . . , zM+1 ) of the grid P(M, N); this correspondence is obviously not a univalent
(not one-to-one) correspondence.
Number all the intervals d(z1 , . . . , zM+1 ) from left to right with subscript i, 0
i 2(M+1)N 1 and denote vi , vi+1 the end-points of the ith interval. Next, introduce
the numeration of the centers yi = y(z1 , . . . , zM+1 ) from (2.2.37) assuming that the
center corresponding to the hypercube D(z1 , . . . , zM+1 ) is assigned the same number
i as the number of the interval d(z1 , . . . , zM+1 ) = [vi , vi+1 ). Thus, we have defined
the one-to-one correspondence of the nodes
vi ,
0 i 2(M+1)N 1,
(2.2.79)
constituting a uniform grid in the interval [0, 1] and of the centers yi which, in
accordance with (2.2.78), generates the one-to-one correspondence of the end-points
vi from (2.2.79) and of the nodes p P(M, N).
Note that if the centers yi and yi+1 are from the same hypercube of the Mth
partition, then these centers (and consequently the corresponding points vi and vi+1 )
36
32
48
47 37
36
45
46 38
42
39 35
25
34
26 30
31
29
44
4
43 41
5 7
40
8 24
27
21
28
20
6
2 10
9 23
13
22
14 18
19
17
1 11
15
16
12
33
are juxtaposed with some different nodes from P(M, N). Therefore, the node p may
be juxtaposed to the points vi , vi+1 if and only if the corresponding centers yi and
yi+1 are from some different (but adjacent) subcubes of the Mth partition. As long
as the number of subcubes in the Mth partition of D is equal to 2MN , then there
are exactly 2MN 1 pairs vi , vi+1 juxtaposed with the same node from P(M, N) (in
general, this node is different for different pairs vi , vi+1 of the above type).
To ensure that any two vicinal nodes in [0, 1] are juxtaposed with different nodes
from P(M, N) we substitute each of the above pairs vi , vi+1 with just one node in
[0, 1]. Next, we rearrange the collocation of nodes in [0, 1] to keep up the uniformity
of the grid.
To do so we construct the uniform grid in the interval [0, 1] with the nodes
h j,
0 j 2(M+1)N 2MN = q,
(2.2.80)
where h0 = 0 and hq = 1, and juxtapose to the node h j of the grid (2.2.80) the node
vi of the grid (2.2.79), where
i = j + ( j 1)/(2N 1) .
(2.2.81)
Next, we assume that the node h j is juxtaposed with the node of the grid P(M, N)
generated by (2.2.78) for the center yi with i from (2.2.81). This mapping of the
uniform grid (2.2.81) in the interval [0, 1] onto the grid P(M, N) in the hypercube
D will be referred to as Non-Univalent Peano-like Evolvent (NUPE, for short) and
designated n(x) = nM (x).
For the sake of illustration, Fig. 2.6 presents the nodes of the grid P(2, 2) (marked
by the red dots) and each node is assigned with the numbers j of the points h j from
(2.2.80) mapped onto this node of the grid P(2, 2). These numbers are plotted around
the relevant nodes.
37
p P(M, N),
is the set of all centers of the (M + 1)st partition subcubes from D generating the
same given node p P(M, N). If the center y Y (p) is assigned the number i, i.e.,
if y = yi , then it corresponds to the node vi of the grid (2.2.79), and being given this
number i it is possible to compute the number j = i i/2N of the corresponding
node h j from the grid (2.2.80). Different nodes h j obtained as the result of these
computations performed for all centers y Y (p) constitute the set of all inverse
images for the node p P(M, N) with respect to n(x). From (2.1.4), (2.1.5), follows
that the point vi corresponding to the center yi = y(z1 , . . . , zM+1 ) can be estimated
from the expression
vi =
M+1
z j 2 jN ,
j=1
38
File x to y.c
#include <math.h>
if ( key == 2 ) {
d=d*(1.0-1.0/mne); k=0;
} else
if ( key > 2 ) {
dr=mne/nexp;
dr=dr-fmod(dr,1.0);
dd=mne-dr;
dr=d*dd;
dd=dr-fmod(dr,1.0);
dr=dd+(dd-1.0)/(nexp-1.0);
dd=dr-fmod(dr,1.0);
d=dd*(1./mne);
}
i=is;
node(i);
i=iu[0];
iu[0]=iu[it];
iu[it]=i;
i=iv[0];
iv[0]=iv[it];
iv[it]=i;
if ( l == 0 )
l=it;
else if ( l == it ) l=0;
if ( (iq>0)||((iq==0)&&(is==0)) ) k=l;
else if ( iq<0 ) k = ( it==n1 ) ? 0 : n1;
39
40
if ( key == 2 ) {
if ( is==(nexp-1) ) i=-1;
else i=1;
p=2*i*iu[k]*r*d;
p=y[k]-p;
y[k]=p;
} else if ( key == 3 ) {
for ( i=0; i<n; i++ ) {
p=r*iu[i];
p=p+y[i];
y[i]=p;
}
iv[n1]=1;
} else {
iff=nexp;
k1=-1;
for ( i=0; i<n; i++ ) {
iff=iff/2;
if ( is >= iff ) {
if ( (is==iff)&&(is != 1) ) { l=i; iq=-1; }
is=is-iff;
k2=1;
}
else {
k2=-1;
if ( (is==(iff-1))&&(is!= 0) ) { l=i; iq=1; }
j=-k1*k2;
41
iv[i]=j;
iu[i]=j;
k1=k2;
}
iv[l]=iv[l]*iq;
iv[n1]=-iv[n1];
}
}
File y to x.c
#include <math.h>
#include "map.h"
static void xyd ( double *, int, float *, int ); /* get a
preimage */
static void numbr ( int *iss);
extern int n1,nexp,l,iq,iu[10],iv[10];
double del;
void
invmad ( int m, double xp[], int kp, int *kxx, float p[], int n,
int incr ) {
/*
preimages calculation
- m - map level (number of partitioning)
- xp - preimages to be calculated
- kp - number of preimages that may be calculated (size of xp)
- kxx - number of preimages being calculated
- p - image for which preimages are calculated
- n - dimension of image (size of p)
- incr - minimum number of map nodes that must be between
preimages
*/
double mne,d1,dd,x,dr;
float r,d,u[10],y[10];
int i,k,kx,nexp;
void xyd ( double *, int, float *, int );
kx=0;
kp--;
for ( nexp=1,i=0; i<n; i++ ) { nexp*=2; u[i]=-1.0; }
dr=nexp;
for ( mne=1, r=0.5, i=0; i<m; i++ ) { mne*=dr; r*=0.5; }
dr=mne/nexp;
dr=dr-fmod(dr,1.0);
del=1./(mne-dr);
d1=del*(incr+0.5);
for ( kx=-1; kx<kp; ) {
for ( i=0; i<n; i++ ) { /* label 2 */
d=p[i];
y[i]=d-r*u[i];
}
42
xp[k+1]=x;
}
*kxx=++kx;
}
i=iu[0];
iu[0]=iu[it];
iu[it]=i;
numbr ( &is );
i=iv[0];
iv[0]=iv[it];
43
iv[it]=i;
for ( i=0; i<n; i++ )
iw[i]=-iw[i]*iv[i];
if ( l == 0 ) l=it;
else if ( l == it ) l=0;
it=l;
r1=r1/nexp;
x+=r1*is;
*xx=x;
}
if ( is == 0 ) l=n1;
else {
iv[n1]=-iv[n1];
if ( is == (nexp-1) ) l=n1;
else if ( l1 == n1 ) iv[l]=-iv[l]; else l=l1;
}
*iss=is;
}
Example 2.1. We consider in Fig. 2.7 an approximation of the Peano curve, the
piecewise-linear evolvent (obtained with key=2), in dimension N = 2 and with level
M = 3. The points 1, 2 and 3 have, respectively, coordinates in the interval [0, 1] and
44
Peano curve
Fig. 2.8 Several rotated Peano curves can help to construct one-dimensional schemes better
representing the information about vicinity of the points in the multidimensional domain
(0.3125, 0.4326)
2 : (0.0320)
(0.3125, 0.3145)
3 : (0.2060)
(0.3125, 0.1848)
It can be seen that in the domain D the point 2 is equidistant from the points 1 and 3
whereas at the interval [0, 1] the point 2 is significantly more distant from the point
3 than from the point 1.
It has been recently shown (see [6]) that a simultaneous usage in one-dimensional
reduction of several curves rotated with respect to the search domain (see Fig. 2.8)
gives the possibility to have a better representation of the information about vicinity
of the points in the multidimensional domain.
Example 2.2. The main function suggested here presents the test illustrating the
way in which the functions are to be called. First, it computes the image PM (0.5)
at M = 10, N = 2 and estimates all the inverse images for the obtained image; then
it generates the images for all three estimated inverse images demonstrating that in
all three cases the same image (0, 0) is obtained. Second, it computes the image
PM (0.55) at M = 14, N = 3 and regenerates eight inverse images which is followed
by generation of eight images (all the same).
File test.c
#include <conio.h>
#include <stdio.h>
#include "map.h"
#define KEY 3
main() {
double x, xp[32], d, del;
45
int i,j,n,m,kp,kx;
float y[10];
/* Initialization */
clrscr();
n=2;
m=5;
y[0] = 0.0;
y[1] = 0.0;
printf("\t\tTesting the map modules\n\n");
/* parameters input */
printf("Input dimension (2-5) - ");
scanf("%d",&n);
printf("Input map level (n*m< ) - ");
scanf("%d",&m);
printf("Input a preimage value - ");
scanf("%lf",&x);
/* calculation and output */
printf("\n\t\tCalculation results\n\n");
printf("Preimage = %lf, Dimension = %d, Map level = %d\n",x,n,m);
/* image calculation */
mapd ( x, m, y, n, KEY );
printf("Image (");
for ( i=0; i<n; i++ )
printf(" %f%c",y[i],(i==n-1)? :,);
printf(" )\n\n");
/* back calculation preimages of image */
printf("Input number of preimages that may be calculated - ");
scanf("%d",&kp);
invmad ( m, xp, kp, &kx, y, n, 1 );
printf("\nPreimages \n\n");
for ( i=0; i<kx; i++ )
printf(" %30.25f\n",xp[i]);
printf("\n");
/* testing preimages that are calculated */
for ( d=1.0, i=0; i<n; i++, d /= 2.0 );
for ( del=1.0, i=0; i<m; i++, del *= d );
for ( i=0; i<kx; i++ ) {
mapd ( ( (xp[i]==1.0) ? xp[i] : xp[i] + 0.5 * del ), m, y, n,
KEY );
printf("Image for %d preimage = (",i);
for ( j=0; j<n; j++ )
printf(" %f%c",y[j],(j==n-1)? :,);
printf(" )\n\n");
}
getch();
}
46
)
)
)
)
)
)
)
)
Chapter 3
3.1 Introduction
In this chapter, we return to the global optimization problem of a multiextremal
function satisfying the Lipschitz condition over a hyperinterval. Let us recollect
briefly some of the achievements we have got by now. To deal with the multidimensional global optimization problems we would like to develop algorithms that use
numerical approximations of space-filling curves to reduce the original Lipschitz
multidimensional problem to a univariate one satisfying the Holder condition.
In particular, we consider the following problem:
min{F(y) : y [a, b]},
(3.1.1)
(3.1.2)
with a constant L, 0 < L < , generally unknown; denotes the Euclidean norm.
Due to Theorem 2.1, the multidimensional global minimization problem (3.1.1),
(3.1.2) is turned into a one-dimensional problem. In particular, finding the global
minimum of the Lipschitz function F(y), y RN , over a hypercube is equivalent to
determining the global minimum of the function f (x):
f (x) = F(y(x)),
x [0, 1],
(3.1.3)
47
48
holds (in accordance with (2.1.9)) for the function f (x) with the constant
H = 2L N + 3,
(3.1.4)
(3.1.5)
x [0, 1].
(3.1.6)
U = UM 2(M+1)L N
is a lower bound for F(y) over the entire region D, i.e.,
U F(y),
y D.
(3.1.7)
1 i J.
It has been shown that the number of images, J, ranges between 1 and 2N . For
example, for N = 2 (see Fig. 3.1), the point A has four images on the curve, C has
three images, B has two, and D has only one image.
Let us consider now a point y D and its approximation (y) on the Peano curve.
Since the function F(y) satisfies the Lipschitz condition, we have
|F(y) F( (y))| Ly (y),
F(y) F( (y)) Ly (y).
3.1 Introduction
49
0,5
0,4
0,3
0,2
2
1
0,1
3 C
B
2
0,1
1
1
0,2
D
A
0,3
4
0,4 dm
0,5
0,5
0,4
0,3
0,2
0,1
0,1
0,2
0,3
0,4
0,5
Fig. 3.1 Images of points, belonging to the 2-dimensional domain, on the Peano curve
The point (y) belongs to the Peano curve and UM is a lower bound for F(y) along
the curve. Thus, it follows from (3.1.6) that F( (y)) UM and then
F(y) UM Ly (y) UM LdM ,
(3.1.8)
has been used. It is easy to understand how the distance dM can be calculated. The
Peano curves establish a correspondence between subintervals of the curve and Ndimensional sub-cubes of D RN (these sub-cubes for N = 2 are shown in Fig. 3.1)
with the side equal to 2M . Thus, dM is equal to thedistance between the center of
a sub-cube and one of its vertex, i.e., dM = 2(M+1) N. From this result and (3.1.8)
we obtain the final estimate
F(y) UM L2(M+1) N = U
that concludes the proof.
In this chapter, our goal is to introduce algorithms for solving the problem
(3.1.1), (3.1.2) by using algorithms proposed for minimizing functions in one
dimension. We reach our goal in three steps. First, in order to explain how Lipschitz
50
(3.2.1)
where the objective function f (x) can be non-differentiable, multiextremal, blackbox and it satisfies the Lipschitz condition over [a, b], i.e.,
| f (x) f (y)| L|x y|,
x, y [a, b],
(3.2.2)
k
i=2
ci (x),
Ck (x) f (x),
x [a, b],
51
2
f(x) zi-1
zi
2
xi)
ci (
xi-1
x
i
xi
where
ci (x) = max{zi1 L(x xi1), zi + L(x xi)},
x [xi1 , xi ].
The function Ck (x) is called either minorant for the objective function f (x) or lower
bounding or support function. For its shape Ck (x) is often called a saw-tooth cover
of f (x) over [a, b] and the function ci (x), i fixed, is called a tooth. It is simple to
show that the minimum of the tooth ci (x) is reached at the point
xi =
xi + xi1 zi1 zi
+
2
2L
(3.2.3)
xi xi1
zi + zi1
L
.
2
2
(3.2.4)
In the Piyavskii method the next trial point xk+1 is chosen in such a way that
Ck (xk+1 ) = min Ck (x),
x[a,b]
i.e., xk+1 is the point that corresponds to the minimum of the deepest tooth:
Ck (xk+1 ) = min ci (xi ).
1ik
After a finite number, K, of trials, the global minimum f from (3.2.1) can be
estimated by the value
fK = min{zi : 1 i K},
52
x1 = b.
(3.2.5)
The choice of the point xk+1 , k > 1, of any subsequent (k + 1)th trial is determined
by the following rules:
Step 1. Renumber the points x0 , . . . , xk of the previous trials by subscripts in
increasing order of the coordinate, i.e.,
a = x0 < x1 < . . . < xk1 < xk = b,
(3.2.6)
53
and juxtapose to them the values zi = f (xi ), 0 i k, which are the outcomes
z0 , . . . , zk renumbered by subscripts.
Step 2. Compute the maximal absolute value of the divided differences:
zi zi1
.
M = max
1ik xi xi1
(3.2.7)
1, M = 0,
rM, M > 0,
(3.2.8)
(zi zi1 )2
2(zi + zi1 )
m(xi xi1)
(3.2.9)
(3.2.10)
If the condition (3.2.10) has more than one solution, i.e., if the maximal value of the
characteristic is attained for several intervals, then the minimal integer satisfying
(3.2.10) is accepted as t.
Step 6. If
xt xt1 > ,
(3.2.11)
xt + xt1 zt zt1
2
2m
(3.2.12)
as the point for the next trial and go to Step 1. Otherwise, calculate an estimate of
the minimum as
f = min{zi : 1 i k}
and STOP.
54
(3.2.13)
55
where r > 1. Denote t = t(k) the number of the interval [xt1 , xt ] including the point
x at the step k(k > 0). If the trial points do not coincide with the limit point x,
i.e., if
x = xk for any k > 0, then, from (3.2.13), it follows that
lim (xt xt1 ) = 0.
In this case, the left-hand points xq = xt(q)1 and the right-end points x p = xt(p) of
the above intervals bracketing the point x constitute two subsequences convergent
to x from left and right, respectively.
Now consider the case when, at some step q, the trial is carried out exactly at the
point xq = x and, thus, at any step k > q, there exists an integer j = j(k) such that
x j = x = xq . Suppose that in this case there is no any subsequence convergent to the
point x from the left. Then
lim (x j x j1) > 0,
and there exists a number p such that the trials do not hit the interval (x j1 , x j ) =
(x p , xq ) = (x p , x)
if k > max(p, q). From (3.2.9), the characteristic R j(k) of this
interval is equal to
2
(z p f (x))
2(z p f (x))
4 f (x)
=
m(x x p)
= m(x x p )(1 )2 4 f (x)
R j = m(x x p ) +
where
z p f (x)
.
m(x x p)
Similarly, by introducing the notation t = j(k) + 1 for the number of the interval
(x,
x j+1 ), we obtain
Rt = m(xt x)(1
)2 4 f (x)
where
zt f (x)
.
m(xt x)
R j Rt > m (x x p )(1 r1)2 4(xt x)
(3.2.14)
56
4. If, at some step, the value m from (3.2.8) satisfies the inequality
m > 2L,
(3.2.15)
then any global minimizer x from (3.2.1) is the limit point of the sequence {xk };
besides, any limit point x of this sequence is the global minimizer of f (x).
Proof. Due to our assumption, the function to be minimized possesses a finite
number of local minimizers. Then the function f (x) is strictly monotonous in the
intervals (x , x)
and (x,
x + ) for sufficiently small real number > 0 (there
exists just one of these intervals if x = b or x = a). If we admit that the point x is not
locally optimal, then for all points x from at least one of these intervals it is true that
f (x) < f (x).
Due to the existence of two subsequences convergent from left and right, respectively, to x (see Lemma 3.1), the validity of the last inequality contradicts the third
statement of this theorem.
The assumption of the existence of some subsequence convergent to x also
contradicts the third statement of the theorem if f (x)
= f (x).
Let us show the validity of the third statement. Assume the opposite, i.e., that at
some step q 0 the outcome
zq = f (xq ) < f (x)
(3.2.16)
m(x j x j1)
|z j z j1|
57
(3.2.17)
If t = t(k) is the number of the interval [xt1 , xt ] containing the point x at the step k,
then
lim Rt(k) = 4 f (x)
(3.2.18)
because for the Lipschitzian function f (x) the estimate m from (3.2.7), (3.2.8)
is bound. From (3.2.17), (3.2.18) follows the validity of (3.2.14) for steps with
sufficiently large k. Hence, the point x cannot be the limit point if the assumption
(3.2.16) is true.
Let us prove the last statement. Let the condition (3.2.15) be met at some step q.
Then, from (3.2.7), (3.2.8), it will be met at any subsequent step k q. Denote
j = j(k) the number of the interval including the point x from (3.2.1) at the step k.
If x is not a limit point, then there exists such a number p > 0 that for any k p
xk+1
/ [x j1 , x j ].
(3.2.19)
(3.2.20)
is met for the above interval, whence, due to (3.2.9) and (3.2.15), follows the
estimate for R j
R j(k) > 4 f (x ).
(3.2.21)
This estimate is true for any k > max(p, q). But, on the other hand, (3.2.18) is valid
for any limit point x;
besides f (x ) f (x).
Therefore, from (3.2.18) and (3.2.21),
follows the validity of (3.2.14) if k is sufficiently large, which is the contradiction
to (3.2.19). Thus, the point x of the absolute minimum of the function f (x) over
[a, b] is the limit point of the sequence {xk } if the condition (3.2.15) is met. Then,
as a consequence of the second statement of the theorem, any other limit point x is
necessarily the global minimizer.
58
Problem
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
GA
377
308
581
923
326
263
383
530
314
416
779
746
1,829
290
1,613
992
1,412
620
302
1,412
PM
149
155
195
413
151
129
153
185
119
203
373
327
993
145
629
497
549
303
131
493
IA
127
135
224
379
126
112
115
188
125
157
405
271
472
108
471
557
470
243
117
81
Average
720.80
314.60
244.15
Corollary 3.1. Under the condition (3.2.15) the set of all limit points of the trial
sequence generated by the IA coincides with the union of all global minimizers of
the function.
In particular, if x is the unique global minimizer, i.e., for any x [a, b], f (x ) <
f (x) if x = x , then
lim xk = x .
59
practical problem and is inestimable for the algorithms PM and GA, we shall use
only the number of trials carried out before satisfying a stopping rule for comparing
the methods. As they all belong to the class of Divide the Best algorithms (see [54,
56]), we can use the condition
xt xt1 ,
(3.2.22)
where t is taken from (3.2.10), as a stopping rule. All experiments were carried out
with the accuracy
= 104 (b a).
(3.2.23)
The parameters of the methods were chosen in accordance with recommendations made by the authors. For GA and PM the exact values of the Lipschitz constant
were used for all test functions (see [35, 92]). Following the suggestions of [35] for
this algorithm we chose ai as the trial point in a current interval [ai , bi ]. The partition
operator which at every iteration divides [ai , bi ] into p equal subintervals was taken
as p = 4. The parameter of IA was chosen equal to 2. Results obtained by testing
the methods are presented in Table 3.1.
In all the experiments the a priori known locations and values of the global
minima were determined by placing an observation in the global minimizer vicinity
and the methods were stopped because the width of the interval [xi1 , xi ] (see
(3.2.22)) was less than from (3.2.23). We calculated the average number of trials
carried out by a method j, 1 j 4, until the stopping rule was satisfied as
j
=
Naver
1 20 j
ni ,
20 i=1
(3.2.24)
where ni is the number of trials carried out by the method j to solve the i-th problem.
60
(the words global constant mean here that the same value estimating L is used over
the whole search region). It should be noticed that the local estimates we are going to
develop in this subsection can be successfully used both in the geometric approach
and in the framework of the information algorithms.
Let us now consider the algorithm IA. Its characteristics Ri associated with each
subinterval [xi1 , xi ], 0 < i k, of the search region [a, b] are written (see (3.2.9)) as
follows
Ri = rM(xi xi1 ) +
(zi zi1 )2
2(zi + zi1 ).
rM(xi xi1)
(3.2.25)
It has been observed in [83] and [100, 101, 103], that (3.2.25) can be rewritten in the
form
Ri = (xi xi1 )(rM +
Mi2
) 2(zi + zi1 ),
rM
(3.2.26)
where
Mi =
| zi zi1 |
.
xi xi1
By comparing formula (3.2.26) with the formula (3.2.4) of the characteristic that is
used in Piyavskiis method we observe that the IA can be interpreted as a method
constructing an auxiliary piecewise-linear function with the local slopes si used at
each subinterval [xi1 , xi ], 0 < i k, where si have the following form
si = 0.5(rM +
Mi2
),
rM
1 i k,
(3.2.27)
a, b R,
61
(3.3.1)
x, y [a, b],
(3.3.2)
with a constant 0 < H < . It is supposed that the value N is known and the
objective function (3.3.1) can be represented by a black box procedure. This
problem arises in many applications, for instance, in the plant location problem
under a uniform delivered price policy (see [57]), in infinite horizon optimization
problems (see [68]), etc.
Two cases can be examined:
i) a constant M H is given a priori;
ii) nothing is known about the Holder constant H.
For the case i), an extension to Holder optimization of the Piyavskii method
(see [92]) has been proposed by Gourdin, Jaumard, and Ellaia in [52] (hereinafter
this method will be called GJE) in order to solve a global maximization problem
analogous to (3.3.1), (3.3.2). They consider an iterative construction of an upperbounding function corresponding to the lower envelope of parabolic functions.
At each iteration they must determine the maxima of the piecewise concave function
through line search techniques, i.e. by solving an equation of degree N. The
drawback of this approach is that for N large the computation of the local maxima
of the upper-bounding function can be tricky. We propose a technique that at each
iteration does not require the solution of nonlinear equations of degree N, so that
this method turns out to be very easy to apply even with N large, and it requires a
smaller time for its execution.
Moreover, the approach introduced here can be extended to the case ii) where
the constant H is not available a priori (in particular, we use an adaptive procedure
that estimates the global constant H during the search). Note that the algorithm GJE
cannot be extended in such a way. In fact, if an adaptive estimate H k of H would be
used in this algorithm in the course of the kth iteration, then, each time when H k is
updated, it would become necessary to solve k 1 equations of degree N making so
the whole algorithm too expensive from the computational point of view.
3.3.1 Algorithms
Let us describe first a general algorithm in a compact form and then, by specifying
Step 2, we will give two different algorithms.
62
(3.3.3)
where r > 1 is a reliability parameter of the method. The way to calculate the value
mi will be specified in each concrete algorithm.
Step 3. For each interval [xi1 , xi ], 2 i k, compute
1
zi zi1
yi = (xi + xi1)
,
1N
2
2rmi (xi xi1) N
(3.3.4)
where z j = f (x j ), 1 j k.
Step 4. Calculate, for the interval [xi1 , xi ], 2 i k, the characteristic
Ri = min{ f (xi1 ) rmi (yi xi1)1/N , f (xi ) rmi (xi yi )1/N }.
(3.3.5)
(3.3.6)
xt xt1 >
(3.3.7)
Step 6. If
where > 0 is a given search accuracy, then execute the next trial at the point
xk+1 = yt
(3.3.8)
63
Let us now introduce some observations with regard to the GA scheme described
above and compare it with the method GJE from [52]. During the course of the
(k + 1)th iteration the GA constructs an auxiliary piecewise function
Ck (x) =
ci (x),
(3.3.9)
i=2
where
ci (x) = max{ f (xi1 ) rmi (x xi1)1/N ,
x [xi1 , xi ].
f (xi ) rmi (xi x)1/N },
(3.3.10)
Let us compare (3.3.9), (3.3.10) used in the GA with the method GJE from [52]
where the authors also work with the functions Ck (x) but instead of the adaptive
estimate mi used by the GA they apply the a priori given Holder constant H
supposing that it has been granted. In the GJE, the new trial point xk+1 is chosen
as follows
xk+1 = arg min{ci (x) : x [xi1 , xi ], 2 i k}.
Since the GJE uses for constructing ci (x) the a priori given Holder constant H,
then due to (3.3.2), the function Ck (x) from (3.3.9) used in [52] has the following
property
Ck (x) f (x),
x [a, b],
i.e., Ck (x) is a low-bounding function for f (x) over [a, b] and, respectively, functions
ci (x) are also minorants over the intervals [xi1 , xi ], 2 i k, (see Fig. 3.3) and
Ck (xk+1 ) Ck (x) f (x),
x [xi1 , xi ],
2 i k.
In order to find the point xk+1 , the GJE requires, for each interval [xi1 , xi ], 2 i k,
to solve the following system
(3.3.11)
to determine the peak value Ai and the corresponding point pi (see Fig. 3.3) and
then to choose among them the point xk+1 .
In our method GA introduced above, for each interval [xi1 , xi ], 2 i k, we
approximate the point pi by the point yi from (3.3.4) found as intersection of the
lines rle f t (x) and rright (x) (see Fig. 3.4):
rle f t (x) = rmi (xi xi1 )
1N
N
1N
N
xi1 + f (xi1 ),
64
Ai
xi1
pi
xi
rleft(x)
rright(x)
Bi
c (x)
+
i
c (x)
Ri
yi
xi1
1N
N
1N
N
xi
xi + f (xi ).
It is important to notice that following (3.3.4), the value yi is very easy to calculate
even with N large.
Thus, the characteristic Ri calculated in Step 3 and related to the interval [xi1 , xi ],
+
represents the minimum value among the auxiliary functions c
i (x) and ci (x)
evaluated at the point yi (see Fig. 3.4)
+
Ri = min{c
i (yi ), ci (yi )},
1/N
1/N
, c+
.
c
i (x) = f (xi1 ) rmi (x xi1 )
i (x) = f (xi ) rmi (xi x)
(3.3.12)
Let us now introduce two different choices of the value mi in Step 2 of the GA in
order to get two different algorithms. The first choice is the traditional one.
65
Algorithm GA1
Step 2. Set
rmi = H,
2 i k, (r = 1).
(3.3.13)
Here the exact value of the a priori given Holder constant H is used as it was in the
GJE. Since it is quite difficult to know the Holder constant a priori, the typical way
to avoid this obstacle is to look for an approximation of H during the course of the
search (see [74, 105, 123, 132, 139, 140]). Thus, in the second algorithm we consider
the following adaptive global estimate of the Holder constant.
Algorithm GA2
Step 2. Set
mi = max{ , hk },
(3.3.14)
where > 0 is a small number (the second parameter of the method) that takes into
account our hypothesis that f (x) is not constant over the interval [a, b] and the value
hk is calculated as follows
hk = max{hi : 2 i k}
(3.3.15)
with
hi =
|zi zi1 |
,
|xi xi1|1/N
2 i k.
(3.3.16)
Note that if during the course of the k-th iteration hk = hk1 , then the auxiliary
function Ck+1 (x) will differ from Ck (x) only in the two subintervals obtained after
splitting the interval [xt1 , xt ] by the point xk+1 . Otherwise, if hk > hk1 , we have to
recalculate completely the function Ck+1 (x).
66
Ri < f (x),
x [xi1 , xi ].
(3.3.17)
Proof. If rmi > Hi , then, due to (3.3.2), (3.3.10), and (3.3.12), the function
+
ci (x) = max{c
i (x), ci (x)}
is a low-bounding function for f (x) over the interval [xi1 , xi ]. Moreover, since r > 1,
it follows
ci (x) < f (x),
x [xi1 , xi ].
+
The function c
i (x) is strictly decreasing on [xi1 , xi ] and ci (x) is strictly increasing
on this interval. Thus, it follows
+
min{c
i (x), ci (x)} min{ci (x) : x [xi1 , xi ]},
x [xi1 , xi ].
Particularly, this is true for x = yi , where yi is from (3.3.4). To conclude the proof it
is sufficiently to recall that, due to (3.3.4), (3.3.5), and (3.3.10),
+
Ri = min{c
i (yi ), ci (yi )}
and yi [xi1 , xi ].
{xk }
k+1
xt1 , xt x
k+1
) 0.5 xt xt1 +
1
0.5(1 + )(xt xt1 ).
r
(3.3.18)
67
(3.3.19)
/ {xk }, the subsequences {xs(k)1} and {xs(k) } are the ones we are looking
If x
for, and the theorem has been proved. Suppose now that x {xk } and that the
convergence to x is not bilateral, i.e., no sequence converging to x from the left
exists. In this case there exist integers q, n > 0, such that x = xq and for any iteration
number k > max(q, n) no trials will fall into the interval [xn , xq ] = [x j(k)1 , x j(k) ]. For
the value R j of this interval we have:
1
R j = min{z j1 rm j (y j x j1 ) N , f (x ) rm j (x j y j ) N }
(3.3.20)
that is
1
(3.3.21)
>0
(3.3.22)
(3.3.23)
is satisfied. This means that, by (3.3.6) and (3.3.8), a trial will fall into the interval
[xn , xq ] which contradicts our assumption that there is no subsequence converging
to x from the left. In the same way we can consider the case when there is no
subsequence converging to x from the right. Hence convergence to x is bilateral.
Corollary 3.2. For all trial points xk , it follows f (xk ) f (x ), k 1.
Proof. Suppose that there exists a point xq such that
zq = f (xq ) < f (x ).
(3.3.24)
R j = min{z j1 rm j (y j x j1 ) N , z j rm j (x j y j ) N },
R j < min{z j1 , z j } < f (x ).
68
Again, from (3.3.22) and (3.3.24) the inequality (3.3.23) holds. By (3.3.6) and
(3.3.8) this fact contradicts the assumption that x is a limit point of {xk }. Thus
f (xq ) f (x ) and the Corollary has been proved.
Corollary 3.3. If another limit point x = x exists, then f (x ) = f (x ).
Corollary 3.4. If the function f (x) has a finite number of local minima in [a, b],
then the point x is locally optimal.
Proof. If the point x is not a local minimizer then, taking into account the bilateral
convergence of {xk } to x and the fact that f (x) has a finite number of local minima
in [a, b], a point w such that f (w) < f (x ) will be found. But this is impossible by
Corollary 3.2.
Let us introduce now sufficient conditions for global convergence.
Theorem 3.4. Let x be a global minimizer of f (x). If there exists an iteration
number k such that for all k > k the inequality
rm j(k) > H j(k)
(3.3.25)
holds, where H j(k) is the Holder constant for the interval [x j(k)1 , x j(k) ], i.e.,
| f (x) f (y)| H j(k) |x y|1/N ,
x, y [x j(k)1 , x j(k) ]
(3.3.26)
and the interval [x j(k)1 , x j(k) ] is such that x [x j(k)1 , x j(k) ], then x is a limit point
of {xk }.
Proof. Suppose that x is not a limit point of the sequence {xk } and a point x = x is
a limit point of {xk }. Then there exists an iteration number n such that for all k n
xk+1
/ [x j1 , x j ],
j = j(k).
(3.3.27)
(3.3.28)
holds. Thus, considering (3.3.27), (3.3.22), and (3.3.28) together with the decision
rules of the algorithm, we conclude that a trial will fall into the interval [x j1 , x j ].
This fact contradicts our assumption and proves that x is a limit point of the
sequence {xk }.
Corollary 3.5. If the conditions of Theorem 3.4 are satisfied, then all limit points
of {xk } are global minimizers of f (x).
69
Interval
[4, 4]
[5, 5]
F5
F6
Formula
x6 15x4 + 27x2 + 250
2
2
(x
5x +2 6)/(x + 1)
(x 2)
if x 3
2ln(x
2)
+
1
otherwise
2x x2
if x 2
x2 + 8x 12 otherwise
(3x 1.4) sin 18x
2
2(x 3)2 + ex /2
F7
[10, 10]
F8
[10, 10]
F3
F4
Solution
3.0
2.414213
[0, 6]
2.0
[0, 6]
4.0
[0, 1]
[3, 3]
0.966085
1.590717
6.774576
0.49139
5.791785
7.083506
0.8003
5.48286
Theorem 3.5. For every function f (x) satisfying (3.3.2) with H < there exists r
such that for all r > r the Algorithm GA2 determines all global minimizers of the
function f (x) over the search interval [a, b].
Proof. Since H < and any value of r can be chosen in the Algorithm GA2, it
follows that there exists r such that condition (3.3.25) will be satisfied for all global
minimizers for r > r . This fact, due to Theorem 3.4, proves the Theorem.
We report now some numerical results showing a comparison of the algorithms
GA1 and GA2 with the method GJE from [52]. Three series of experiments have
been executed.
In the first and the second series of experiments, a set of eight functions described
in [52] have been used (see Table 3.2). Since the GJE algorithm requires, at each
iteration, the solution to the system (3.3.11) in order to find the peak point (pi , Ai )
(see Fig. 3.2) we distinguish two cases. In the first series we use the integers N = 2, 3,
and 4 because it is possible to use explicit expressions for the coordinates of the
intersection point (pi , Ai ) (see [52]). The second series of experiments considers the
case of fractional N.
Table 3.3 contains the number of trials executed by the algorithms with accuracy
= 104 (b a) (this accuracy is used in all series of experiments). The exact
constants H (see [52]) have been used in the GJE and the GA1. Parameters of the
GA2 were = 108 and r = 1.1. In this case, all the global minimizers have been
found by all the methods.
In Table 3.4 we present numerical results for the problems from Table 3.2 with
fractional values of N. In this case, in the method GJE, the system (3.3.11) should
70
N
2
3
4
F1
5,569
11,325
12,673
F2
4,517
5,890
7,027
F3
1,683
2,931
3,867
F4
4,077
4,640
6,286
F5
1,160
3,169
4,777
F6
2,879
5,191
5,370
F7
4,273
8,682
10,304
F8
3,336
8,489
8,724
Average
3,436
6,289
7,378
GA1
2
3
4
5,477
11,075
15,841
5,605
7,908
8,945
1,515
2,521
3,162
4,371
7,605
9,453
1,091
2,823
4,188
2,532
4,200
5,093
4,478
11,942
15,996
3,565
9,516
15,538
3,579
7,198
9,777
GA2
2
3
4
1,477
2,368
2,615
2,270
3,801
3,486
1,249
1,574
1,697
1,568
2,023
2,451
279
367
424
1,761
3,186
4,165
580
710
756
380
312
550
1,195
1,792
2,018
F2
F3
F4
F5
2,341
705 1,213
397
6,883
6,763 5,895
7,201
6,609 4,127
F6
F7
1,160
730
9,833
10,078
GA1
4/3
1,923 2,329
680 1,768
381 1,108
722
53/2 15,899 9,243 5,921 10,057 8,056 5,050 16,169
100/3 15,757 8,671 5,399 9,458 6,783 4,699 15,982
GA2
4/3
53/2
100/3
1,053 1,484
649
2,972 4,215 2,207
2,108 4,090 2,023
1,025
3,073
2,828
278 1,664
725 4,491
667 4,196
473
103
154
F8
557
9,617
9,094
Average
1,127
549
16,083
15,617
1,182
10,809
10,295
378
94
153
875
2,235
2,027
be solved by using a line search technique (see [52]) at each iteration. The following
methods have been used for this goal:
(i) the routine FSOLVE from the Optimization Toolbox of MATLAB 5.3;
(ii) the routine NEWT of Numerical Recipes (see [96]) that combines the Newtons
method for solving nonlinear equations with a globally convergent strategy that
will guarantee progress towards the solution at each iteration even if the initial
guess is not sufficiently close to the root.
These methods have been chosen because they can be easily found by a final
user. Unfortunately, our experience with both algorithms has shown that solving the
system (3.3.11) can be a problem itself. Particularly, we note that, when N increases,
the two curves li and li+ from (3.3.12) tend to flatten (see Figs. 3.5 and 3.6) and if
the intersection point (pi , Ai ) is close to the boundaries of the subinterval [xi1 , xi ],
then the system (3.3.11) can be difficult to solve. In some cases the methods looking
for the roots of the system do not converge to the solution.
For example, Fig. 3.6 presents the case when the point (denoted by *) which
approximates the root is obtained out of the search interval [xi1 , xi ]. Thus, the
system (3.3.11) is not solved and, as a consequence, the algorithm GJE does
not find the global minima of the objective function. These cases are shown in
Table 3.4 by .
71
5
0
5
10
15
20
25
0.95
5
0
5
10
15
20
25
30
35
40
0.75
0.8
0.85
0.9
Numerical experiments described in Table 3.4 have been executed with the
following parameters. The exact constants H h have been used in the methods
GJE and GA1. Parameters = 108 and r = 1.5 have been used in the GA2.
All global minimizers have been found by the algorithms GJE and GA1. Note
that the parameter r influences the reliability of the method GA2. For example, the
algorithm GA2 has found only one global minimizer in the experiments marked by
*. The value r = 3.5 allows one to find all global minimizers.
The third series of experiments (see Table 3.5) has been executed with the
following function from [74] shown in Fig. 3.7
FN (x) =
x [0, 10].
(3.3.29)
k=1
Over the interval [0, 10] it satisfies the Holder condition with a constant h, i.e.,
|FN (x) FN (y)| h|x y|1/N ,
x, y [0, 10].
72
Optimal point
2.82909266
2.83390034
2.83390034
2.83390034
2.83390034
2.83390034
2.83390034
Optimal value
1.15879294
1.15176044
1.14960372
1.14946908
1.14956783
1.14964447
1.14969913
HN
77
58
51
48
47
47
47
GJE
1,886
208
GA0
2,530
1,761
760
220
53
41
34
GA1
1,995
1,295
518
69
87
94
71
GA2
258
82
171
949
581
241
261
20
18
16
14
12
10
8
6
4
2
0
10
In this series of experiments a new algorithm, GA0, has been used to show
efficiency of the choice of the point yi from (3.3.4). The method GA0 works as
the algorithm GA1 but in Step 3 and Step 4 it uses the following characteristic
Ri = min{ f (xi1 ) rmi (yi xi1 )1/N , f (xi ) rmi (xi yi )1/N },
(3.3.30)
73
different values of N. In the method GA2 we have chosen r = 1.3 for the case N = 5,
r = 1.7 for N = 10, r = 2.8 for N = 20 and r = 9.3 if N = 40, 60, 80; the result for
N = 100 has been obtained with r = 15.
The algorithm GA2 has found good estimates of the global solution in all the
cases. The methods GA0 and GA1 have made the same for N = 5, 10, 20. It can be
seen that GA1 outperforms GA0. For N = 40, 60, 80, 100, these methods stop after
a few iterations in neighborhoods of local minimizers because the used accuracy
was not small enough in order to find the global solution. Augmenting accuracy
allows to locate the global minimizer. These cases are shown in Table 3.5 by *.
The symbol has the same meaning as in Table 3.4.
x [0, 1],
(3.4.1)
(3.4.2)
(3.4.3)
The choice of the point yk+1 , k > 1, of any subsequent (k + 1) st trial is done as
follows.
Step 1. Renumber the inverse images x0 , . . . , xk of all the points
y0 = y(x0 ), . . . , yk = y(xk )
(3.4.4)
74
(3.4.5)
and juxtapose to them the values z j = F(y(x j )), 1 j k, which are the outcomes
z0 = F(y(x0 )), . . . , zk = F(y(xk ))
(3.4.6)
1 i k}
(3.4.7)
where
(3.4.8)
(3.4.9)
(3.4.10)
(3.4.11)
Step 5. If
where > 0 is a given search accuracy, then calculate an estimate of the minimum
as
Fk = min{zi : 1 i k}
(3.4.12)
and STOP. Otherwise, execute the next trial at the point yk+1 = y(xk+1 ) from [a, b]
where
|zt zt1 | N 1
k+1
sgn(zt zt1 )
= 0.5(xt + xt1 )
(3.4.13)
x
M
2r
and go to Step 1.; sgn(zt zt1 ) denotes the sign of (zt zt1 ).
75
Step 0Step 5 of the scheme MIA describe the sequence of decision functions
xk+1 = Grk (x0 , . . . , xk ; z0 , . . . , zk )
generating the sequence of inverse images {xk } [0, 1] and also the sequence
{yk } [a, b] RN of trial points; see (3.4.4). These decision functions are obviously
dependent on the particular mapping y(x) used in (3.4.6). By analogy with (3.2.11),
the search sequence {yk } may be truncated by meeting the stopping condition
(3.4.11). But it should be stressed that in solving applied multidimensional problems
the actual termination of the search process is very often caused by exhaustion of the
available computing resources or by assuming the available running estimate from
(3.4.12) as already satisfactory and, thus, economizing on the computing effort.
Now, we proceed to the study of convergence properties of MIA, but before
embarking on the detailed treatment of this subject we single out one more feature
of the space-filling curve y(x) introduced in Theorem 2.1.
Lemma 3.3. Let {yk } = {y(xk )} be the sequence of points in [a, b] induced by
the sequence {xk } [0, 1]; here y(x) is the space-filling curve from Theorem 2.1.
Then:
is a limit point
1. If x is a limit point of the sequence {xk }, then the image y = y(x)
of the sequence {yk }.
2. If y is a limit point of the sequence {yk }, then there exists some inverse image x
of this point, i.e., y = y(x),
which is a limit point of the sequence {xk }.
Proof. 1. If x is a limit point of the sequence{xk }, then there exists some
subsequence {xkq }, k1 < k2 < . . ., converging to x,
i.e.,
=0.
lim |xkq x|
(3.4.14)
If the corresponding sequence { q } = {xkq } has two different limit points x and
x , i.e., there are two subsequences { qi } and { q j } satisfying the conditions
lim qi = x ,
lim q j = x ,
76
and x = x , then y = y(x ) = y(x ) because, due to (3.4.14), the subsequence
{ykq } has just one limit point. Therefore, any limit point x of the sequence { q }
is the inverse image of y.
Theorem 3.6. (Sufficient convergence conditions). Let the point y be a limit point of
the sequence {yk } generated by the rules of MIA while minimizing the Lipschitzian
with the constant L function F(y), y [a, b] RN . Then:
1) If side by side with y there exists another limit point y of the sequence {yk }, then
F(y)
= F(y ).
2) For any k zk = F(yk ) F(y).
3) If at some step of the search process the value M from (3.4.7) satisfies the
condition
rM > 231/N L N + 3,
(3.4.15)
then y is a global minimizer of the function F(y) over [a, b] and any global
minimizer y of F(y(x)), x [0, 1], is also a limit point of the sequence {yk }.
Proof. The assumption that F(y)
= F(y ) where y and y are some limit points of
the sequence {xk } obviously contradicts the second statement of the theorem, and
we proceed to proving this second statement.
Any point xk+1 from (3.4.13) partitions the interval [xt1 , xt ] into two subintervals
[xt1 , xk+1 ], [xk+1 , xt ]. Due to (3.4.7) and (3.4.13), these subintervals meet the
inequality
max{xt xk+1 , xk+1 xt1 } (xt xt1 )
(3.4.16)
(3.4.17)
77
In consideration of the function F(y), y [a, b], being Lipschitzian and with
account of (3.4.2), (3.4.6)(3.4.8), we derive that the value M (which is obviously
the function of the index k) is positive and bounded from above. Then, from (3.4.6)
(3.4.9) and (3.4.17), follows that
= 4F(y)
.
lim R j(k) = 4F(y(x))
(3.4.18)
Assume that the second statement is not true, i.e., that at some step q 0 the
outcome
zq = F(yq ) < F(y)
(3.4.19)
= rM l /|zl zl1 |,
where l is from (3.4.8) and > 1, due to (3.4.7), we derive the relations
Rl = |zl zl1 |( + 1 ) 2(zl + zl1 ) >
> 2{max(zl , zl1 ) min(zl , zl1 )} 2(zl + zl1 ) =
= 4 min(zl , zl1 ) .
The last inequality, which also holds true for the case zl = zl1 (this can easily be
checked directly from (3.4.9)), and the assumption (3.4.19) results in the estimate
+ 4[F(y)
F(yq )] .
Rl(k) > 4zq = 4F(y)
(3.4.20)
zt F + 2L N + 3(xt x )1/N ,
(3.4.21)
(3.4.22)
78
where F = F(y ). By summarizing (3.4.21) and (3.4.22), obtain (see [138]) the
estimate
(3.4.23)
= 2F + 221/N Lt N + 3.
Now, from (3.4.9), (3.4.15), and (3.4.23) follows the validity of the relation
Rt(k) > 4F
for sufficiently large values of the index k. This last estimate together with (3.4.18)
and the rule (3.4.10) leads to the conclusion that the interval [xt1 , xt ], t = t(k), is to
be hit by some subsequent trials generating, thereafter, a nested sequence of intervals
each containing x and, thus, due to (3.4.16), contracting to this inverse image of the
global minimizer y . Hence, y is to be a limit point of the sequence {yk }. Then due
to the first statement of the theorem, any limit point y of the sequence {yk } has to be
a global minimizer of F(y) over [a, b] RN .
Therefore, under the condition (3.4.15), the set of all limit points of the sequence
{yk } generated by MIA is identical to the set of all global minimizers of the
Lipschitzian function F(y) over [a, b].
Let us make some considerations with regard to the effective use of approximations to the Peano curve.
Corollary 3.6. From Theorem 2.4 follows that it is possible to solve the onedimensional problem
min{F(l(x)) :
x [0, 1]}
(3.4.24)
for a Lipschitzian with the constant L function F(y), y [a, b], and where l(x) =
lM (x) is the Peano-like piecewise-linear evolvent (2.2.68), (2.2.69), by employing
the decision rules of the algorithm for global search in many dimensions MIA. The
obvious amendment to be done is to substitute y(x) in the relations (3.4.3), (3.4.4),
and (3.4.6) with l(x).
But the problem (3.4.24) is not exactly equivalent to the problem of minimizing
F(y) over [a, b]. For l(x) = lM (x), where M is from (2.2.69), the relation between
these two problems is set defined by the inequality
(3.4.25)
(3.4.26)
79
having, as already mentioned in Chap. 2, a mesh width equal to 2M , but not the
entire cube [a, b] RN .
Therefore, the accuracy of solving the one-dimensional problem (3.4.24) should
not essentially exceed the precision assured by (3.4.25), i.e., these two sources of
inexactness have to be simultaneously taken into account.
Remark 3.2. The curve l(x) = lM (x) is built of linear segments, and for any pair of
points l(x ), l(x ) from the same segment, it is true that
||l(x ) l(x )|| 2M(N1) |x x | ;
see (2.2.72). Therefore, the function F(l(x)), x [0, 1], from (3.4.24) is Lipschitzian
with the constant LM = L2M(N1) increasing with the rise of M (i.e., along with
an increase of the required accuracy in solving the problem of minimizing the
Lipschitzian with constant L function F(y) over [a, b]). Due to this reason, the
one-dimensional algorithms based on the classical Lipschitz conditions are not
effective in solving problems similar to (3.4.24).
Search algorithm employing NUPE. Suppose that the function F(y), y [a, b],
is Lipschitzian with the constant L. Then the function F(nM (h j )), in which nM is
the Non-Univalent Peano-like Evolvent, NUPE (see Chap 2), defined on the grid
(2.2.80) satisfies the condition
(3.4.27)
Actually, as already mentioned, the whole idea of introducing nM (x) was to use
the property (3.4.27) as a kind of compensation for losing some information
about nearness of the performed trials in the hypercube [a, b] when reducing
the dimensionality with evolvents. Therefore, the algorithm of global search in
many dimensions MIA has to be modified for employing nM (x) due to the above
circumstances. Now, we outline the skeleton of the new algorithm.
The first two trials have to be selected as prescribed by (3.4.3), i.e.,
p0 = nM (0),
p1 = nM (1),
80
because h0 = 0 and hq = 1 are from (2.2.80) and, as it follows from the construction
of nM (x), the above points p0 , p1 are characterized by unit multiplicity. Suppose
that l > 1 trials have already been performed at the points p0 , . . . , pl from [a, b]
and x0 , . . . , xk (k l) are the inverse images of these points with respect to nM (x).
We assume that these inverse images are renumbered with subscripts as prescribed
by (3.4.5) and juxtaposed to the values zi = F(nM (xi )), 1 i k; note that this
juxtaposition is based on the outcomes of just l trials where, in general, due to
(3.4.27), l < k. Further selection of trial points follows the scheme:
Step 1. Employing (3.4.7)(3.4.13) execute the rules 25 of MIA and detect the
interval [h j , h j+1 ) from (2.2.80) containing the point xk+1 from (3.4.13), i.e., h j
xk+1 < h j+1 .
Step 2. Determine the node pl+1 = nM (h j ) P(M, N) and compute the outcome
zl+1 = F(pl+1 ).
Step 3. Compute all the inverse images h j1 , . . . , h j of the point pl+1 with respect
to nM (x).
Step 4. Introduce the new points
xk+1 = h j1 , . . . , xk+ = h j
characterized by the same outcome zl+1 , increment k by and pass over to the first
clause.
Termination could be forced by the condition t , where t ,t are, respectively,
from (3.4.2), (3.4.10), and > 0 is the preset accuracy of search which is supposed
to be greater than the mesh width of the grid (2.2.80). This rule may also be set
defined by another option: the search is to be terminated if the node h j generated
in accordance with the first clause of the above scheme coincides with one of the
already selected points from the series (3.4.5). Note that if there is a need to continue
the search after the coincidence, then it is sufficient just to augment the parameter
M (i.e., to use the evolvent corresponding to a finer grid; recall that the points of all
the already accumulated trials are the nodes of this finer grid).
We present now some numerical examples in order to test the behavior of the
algorithm MIA. First, we consider the following test function
F(y1 , y2 ) = y21 + y22 cos18y1 cos18y2 ,
21 y1 , y2 1,
from [97]. Minimization of this function carried out by applying the above algorithm
with r = 2, = 0.01 with non-univalent evolvent n(x) covering the grid P(9, 2)
required 63 trials and at the moment of termination there were 199 points in the
series (3.4.5). Solving of the same problem by the MIA employing the piecewiselinear approximation l(x) to the Peano curve with the grid H(10, 2) required 176
trials with r = 2, = 0.01.
81
Table 3.6 Results of experiments with six problems produced by the GKLS-generator
from [40]. In the table, N is the dimension of the problem; rg is the radius of the attraction
region of the global minimizer; n is the number of the function taken from the considered
class of test functions; is the radius of the ball used at the stopping rule; r is the
reliability parameter of the MIA.
N
2
3
4
5
6
7
rg
0.20
0.20
0.20
0.20
0.20
0.20
n
70
16
21
85
14
25
0.01N
0.01N
0.01N
0.02N
0.03N
0.05 N
r
2.1
3.4
3.3
4.2
4.2
4.2
Direct
733
6641
24569
127037
232593
400000
LBDirect
1015
10465
70825
224125
400000
400000
MIA
169
809
2947
33489
84053
80223
82
(3.1.4) in one dimension. Naturally, in order to realize the passage from the
multidimensional problem to the one-dimensional one, computable approximations
to the Peano curve should be employed in the numerical algorithms. Hereinafter
we use the designation pM (x) for an M-level piecewise-linear approximation of the
Peano curve.
By Theorem 3.1 one-dimensional methods from Sect. 3.3 constructing at each
iteration auxiliary functions providing a lower bound of the univariate objective
function can be used as a basis for developing new methods for solving the
multidimensional problem. The general scheme for solving problem (3.1.1), (3.1.2)
that we name MGA (Multidimensional Geometric Algorithm) is obtained by using
the scheme GA from Sect. 3.3 as follows.
2 i k,
(3.5.1)
where > 0 is a small number that takes into account our hypothesis that f (x) is
not constant over the interval [0, 1] and the value hk is calculated as follows
hk = max{hi : 2 i k}
(3.5.2)
with
hi =
|zi zi1 |
,
|xi xi1|1/N
2 i k.
(3.5.3)
Step 3. For each interval [xi1 , xi ], 2 i k, compute the point yi and the
characteristic Ri , according to (3.3.4) and (3.3.5), replacing the values zi = f (xi )
by F(pM (xi )).
Step 4. Select the interval [xt1 , xt ] according to (3.3.6) of the GA.
Step 5. If
|xt xt1 |1/N ,
(3.5.4)
83
where > 0 is a given search accuracy, then calculate an estimate of the global
minimum as
Fk = min{zi : 1 i k}
and STOP. Otherwise, execute the next trial at the point
xk+1 = yt
(3.5.5)
for x [xi1 , xi ],
2 i k,
(3.5.6)
x [0, 1].
(3.5.7)
evaluated at the point yi from (3.3.4). By making use of the Peano curves we have a
correspondence between a cube in dimension N and an interval in one dimension.
In the MGA we suppose that the Holder constant is unknown and in Step 2
we compute the value mi being an estimate of the Holder constant for f (x) over
the interval [xi1 , xi ], 2 i k. In this case the same estimates mi are used over the
whole search region for f (x). However, as it was already mentioned above, global
estimates of the constant can provide a very poor information about the behavior
of the objective function over every small subinterval [xi1 , xi ] [0, 1]. In the next
chapter we shall describe the local tuning technique that adaptively estimates the
local Holder constants over different subintervals of the search region allowing so
to accelerate the process of optimization.
Let us study now convergence properties of the MGA algorithm. Theorem 3.1
linking the multidimensional global optimization problem (3.1.1), (3.1.2) to the onedimensional problem (3.1.3), (3.1.4) allows us to concentrate our attention on the
one-dimensional case using the curve. We shall study properties of an infinite (i.e.,
= 0 in (3.5.4)) sequence {xk }, xk [0, 1], k 1, of trial points generated by the
algorithm MGA.
84
Theorem 3.7. Assume that the objective function f (x) satisfies the condition
(3.1.4), and let x be any limit point of {xk } generated by the MGA. Then the
following assertions hold:
Convergence to x is bilateral, if x (0, 1);
f (xk ) f (x ), for any k 1;
If there exists another limit point x = x , then f (x ) = f (x );
If the function f (x) has a finite number of local minima in [0, 1], then the point x
is locally optimal;
5. (Sufficient conditions for convergence to a global minimizer). Let x be a global
minimizer of f (x). If there exists an iteration number k such that for all k > k
the inequality
1.
2.
3.
4.
(3.5.8)
85
Proof. It follows from (3.5.1), and the finiteness of > 0 that approximations of
the Holder constant mi in the method are always greater than zero. Since H < in
(3.1.4) and any positive value of the parameter r can be chosen in the Scheme MGA,
it follows that there exists an r such that condition (3.5.8) will be satisfied for all
global minimizers for r > r . This fact, due to Theorem 3.7, proves the Theorem.
We present now numerical results of experiments executed for testing performance of the algorithm MGA. In all the experiments we have considered the
FORTRAN implementation of the methods tested. Since in many real life problems
each evaluation of the objective function is usually a very time-consuming operation
[63, 93, 98, 117, 139, 156], the number of function evaluations executed by the
methods until the satisfaction of a stopping rule has been chosen as the main
criterion of the comparison.
Classes of test functions. In the field of global optimization there exists an old
set of standard test functions (see [21]). However, recently it has been discovered
by several authors (see [1, 79, 145]) that these problems are not suitable for testing
global optimization methods since the functions belonging to the set are too simple
and methods can hardly miss the region of attraction of the global minimizer.
As a consequence, the number of trials executed by methods is usually very small
and, therefore, non-representative. These functions are especially inappropriate
for testing algorithms proposed to work with the global optimization of real
multiextremal black-box functions where it is necessary to execute many trials in
order to better explore the search region and to reduce the risk of missing the global
solution. The algorithms proposed in this book are oriented exactly on such a type
of hard global optimization problems. Hence, more sophisticated and systematic
tests are required to verify their performance. In our numerical experiments several
classes of N-dimensional test functions generated by the GKLS-generator (see [40])
have been used; an example of a function generated by the GKLS can be seen in
Fig. 3.8. This generator has several advantages that allow one to use it as a good
tool for the numerical comparison of algorithms (in fact, it is used to test numerical
methods in more than 40 countries in the world).
It generates classes of 100 test functions (see [40] for a detailed explanation,
examples of its usage, etc.) with the same number of local minima and supplies
a complete information about each of the functions: its dimension, the values of
all local minimizers, their coordinates, regions of attraction, etc. It is possible to
generate harder or simpler test classes easily. Only five parameters (see Table 3.7)
should be defined by the user and the other parameters are generated randomly.
An important feature of the generator consists of the complete repeatability of the
experiments: if you use the same five parameters, then each run of the generator will
produce the same class of functions.
The GKLS-generator works by constructing test functions F(y) in RN using a
convex quadratic function g(y), i.e., a paraboloid g(y) = y T 2 + t, that is then
distorted over the sets
86
3.5
3
2.5
2
1.5
1
0.5
0
0.5
1
1
0.5
0.5
0.5
0.5
1
Fig. 3.8 A function produced by the GKLS generator shown together with a piecewise-linear
approximation to Peano curve used for optimization
k = {y RN | y Pk rk },
1 k m,
Ck (y)
if y k , 1 k m
/ 1 , , m .
y T 2 + t if y
(3.5.9)
where
Ck (y) =
2 y Pk , T Pk 2
A
y Pk 3
y Pk
rk2
rk3
4 y Pk , T Pk 3
+ 2 A y Pk 2 + fk
+ 1
rk
y Pk
rk
(3.5.10)
with A = T Pk 2 + t fk .
The generator gives the possibility to use several types of functions. In the
described experiments, cubic continuous multiextremal functions have been used.
In all series of experiments we have considered classes of 100 N-dimensional
functions with 10 local minima over the domain [1, 1] RN . For each dimension
N = 2, 3, 4 two test classes were considered: a simple class and a difficult one. Note
(see Table 3.7) that a more difficult test class can be created either by decreasing
the radius, rg , of the approximate attraction region of the global minimizer, or by
increasing the distance, d, from the global minimizer to the paraboloid vertex.
Experiments have been carried out by using the following stopping criteria:
87
Difficulty
Simple
Hard
Simple
Hard
Simple
Hard
N
2
2
3
3
4
4
m
10
10
10
10
10
10
f
1.0
1.0
1.0
1.0
1.0
1.0
d
0.66
0.90
0.66
0.90
0.66
0.90
rg
0.33
0.20
0.33
0.20
0.33
0.20
f ; the distance from the global minimizer to the vertex of the paraboloid,
d; the radius of the attraction region of the global minimizer, rg
Stopping criteria. The value = 0 is fixed in the stopping rule (3.5.4) and the
search terminates when a trial point falls in a ball Bi having a radius and the
center at the global minimizer of the considered function, i.e.,
Bi = {y RN : y yi },
(3.5.11)
where yi denotes the global minimizer of the i-th function of the test class,
1 i 100.
Comparison MGADirectLBDirect. In this series of experiments we compare
the algorithm MGA with the original Direct algorithm proposed in [67] and its
recent locally biased modification LBDirect introduced in [31, 34]. These methods
have been chosen for comparison because they, just as the MGA method, do not
require the knowledge of the Lipschitz constant of the objective function and the
knowledge of the objective function gradient. The FORTRAN implementations of
these two methods described in [31,33] and downloadable from [32] have been used
in all the experiments. Parameters recommended by the authors have been used in
both methods.
88
Table 3.8 Results of experiments with six classes of test functions generated by the GKLS
Max trials
Class
1
2
3
4
5
6
N
2
2
3
3
4
4
Direct
127
1159
1179
77951
90000(1)
90000(43)
Average
LBDirect
165
2665
1717
85931
90000(15)
90000(65)
MGA
239
938
3945
26964
27682
90000(1)
Direct
68.14
208.54
238.06
5857.16
>12206.49
>57333.89
90
80
80
70
60
50
40
30
20
LBDirect
10
Direct
MGA
50
100
150
Iterations
200
250
100
90
No. of solved functions
100
LBDirect
70.74
304.28
355.30
9990.54
>23452.25
>65236.00
MGA
90.06
333.14
817.74
3541.82
3950.36
>22315.59
70
60
50
40
30
20
LBDirect
10
Direct
MGA
500
1000
1500 2000
Iterations
2500
3000
Fig. 3.9 Methods MGA, Direct and LBDirect, N = 2. Class no.1, left; Class no.2, right
Results of numerical experiments with the six GKLS tests classes from Table 3.7
are shown in Table 3.8. The columns Max trials report the maximal number of
trials required for satisfying the stopping rule (a) for all 100 functions of the class.
The notation 90,000 ( j) means that after 90,000 function evaluations the method
under consideration was not able to solve j problems. The Average columns in
Table 3.8 report the average number of trials performed during minimization of
the 100 functions from each GKLS class. The symbol > reflects the situations
when not all functions of a class were successfully minimized by the method under
consideration: that is the method stopped when 90,000 trials had been executed
during minimizations of several functions of this particular test class. In these cases,
the value 90,000 was used in calculations of the average value, providing in such a
way a lower estimate of the average.
Figure 3.9 shows the behavior of the three methods for N = 2 on classes 1 and
2 from Table 3.7, respectively (for example, it can be seen in Fig. 3.9-left that after
100 function evaluations the LBDirect has found the solution at 79 problems, Direct
at 91 problems and the MGA at 63 problems). Figure 3.10 illustrates the results of
the experiment for N = 3 on classes 3 and 4 from Table 3.7, respectively.
Figure 3.11 shows the behavior of the three methods for N = 4 on classes 5 and
6 from Table 3.7, respectively (it can be seen in Fig. 3.11-left that after 10,000
function evaluations the LBDirect has found the solution at 58 problems, Direct
at 73 problems, and the MGA at 93 problems). It can be seen from Fig. 3.11-left
89
90
80
80
100
90
No. of solved functions
100
70
60
50
40
30
70
60
50
40
30
20
LBDirect
20
10
Direct
10
LBDirect
Direct
MGA
MGA
4
5
Iterations
9
x 104
Fig. 3.10 Methods MGA, Direct and LBDirect, N = 3. Class no.3, left; Class no.4, right
90
80
80
70
60
50
40
30
20
LBDirect
Direct
10
0
MGA
4
5
iterations
9
x 104
100
90
No. of solved functions
100
70
60
50
40
30
20
LBDirect
10
Direct
MGA
4
5
Iterations
9
x 104
Fig. 3.11 Methods MGA, Direct and LBDirect, N = 4. Class no.5, left; Class no.6, right
that after 90,000 evaluations of the objective function the Direct method has not
found the solution for 1 function, and the LBDirect has not found the solution for
15 functions of the class 5. Figure 3.11-right shows that the Direct and LBDirect
methods were not able to locate after executing the maximal possible value of
function evaluations, 90, 000, the global minimum of 43 and 62 functions of the
class 6, respectively. The MGA was able to solve all the problems in the classes
15; the MGA has not found the solution only in 1 function in the class 6.
As it can be seen from Table 3.7 and Figs. 3.93.11, for simple problems Direct
and LBDirect are better than the MGA and for harder problems the MGA is better
than its competitors. The advantage of the MGA becomes more pronounced both
when classes of test functions become harder and when the dimension of problems
increases. It can be noticed also that on the taken test classes the performance of
the LBDirect is worse with respect to the Direct (note that these results are in a
good agreement with experiments executed in [116, 117]). A possible reason of this
behavior can be the following. Since the considered test functions have many local
minima and due to its locally biased character, LBDirect spends too much time
exploring various local minimizers which are not global.
Chapter 4
4.1 Introduction
Let us return to the Lipschitz-continuous multidimensional function F(y), y D,
from (3.1.2) and the corresponding global minimization problem (3.1.1). In the
previous chapter, algorithms that use dynamic estimates of the Lipschitz information
for the entire hyperinterval D have been presented. This dynamical estimating
procedures have been introduced since the precise information about the value of
the constant L required by Piyavskiis method for its correct work is often hard to get
in practice. Thus, we have used the procedure (3.2.7), (3.2.8) to obtain an estimate
of the global Lipschitz constant L during the search (the word global means that
the same value is used over the whole region D). In this chapter, we introduce ideas
that can accelerate the global search significantly by using a local information about
F(y). In order to have a warm start let us first introduce these ideas informally for
the one-dimensional case.
Notice that both the a priori given exact constant L and its global overestimates
can provide a poor information about the behavior of the objective function f (x)
over a small subinterval [xi1 , xi ] [a, b]. In fact, over such an interval, the
corresponding local Lipschitz constant L[xi1 ,xi ] can be significantly less than the
global constant L. This fact would significantly slow down methods using global
value L or its estimate over [xi1 , xi ].
In order to overcome this difficulty and to accelerate the search a new approach
called local tuning technique has been introduced in [100,101]. The new approach
allows one to construct global optimization algorithms that tune their behavior to
the shape of the objective function at different sectors of the search region by
using adaptive estimates of the local Lipschitz constants in different subintervals
91
92
of the search domain during the course of the optimization process. It has been
successfully applied to a number of global optimization methods providing a
high level of the speed up both for problems with Lipschitz objective functions
and for problems with objective functions having Lipschitz first derivatives (see
[74, 77, 102, 103, 105, 117119, 123, 124, 139], etc.).
The main idea lies in the adaptive automatic concordance of the local and global
information obtained during the search for every subinterval [xi1 , xi ] of [a, b]. When
an interval [xi1 , xi ] is narrow, only the local information obtained within the near
vicinity of the trial points xi1 , xi has a decisive influence on the method. In this
case, the results of trials executed at points lying far away from the interval [xi1 , xi ]
are less significant for the method. In contrast, when the method works with a wide
subinterval, it takes into consideration data obtained from the whole search region
because the local information represented by the values f (xi1 ), f (xi ) becomes
less reliable due to the width of the interval [xi1 , xi ]. Thus, for every subinterval
both comparison and the balancing of global and local information is automatically
effected by the method. Such a balancing is very important because the usage of a
local information only can lead to the loss of the global solution (see [126]). It is
important to mention that the local tuning works during the global search over the
whole search region and does not require to stop the global procedure as it is usually
done by traditional global optimization methods when it is required to switch on a
local procedure.
Furthermore, the second accelerating technique, called local improvement, (see
[7577]) that can be used together with the local tuning technique, is presented
in this chapter. This approach forces the global optimization method to make a
local improvement of the best approximation of the global minimum immediately
after a new approximation better than the current one is found. The proposed local
improvement technique is of a particular interest due to the following reasons. First,
usually in the global optimization methods the local search phases are separated
from the global ones. This means that it is necessary to introduce a rule that: stops
the global phase and starts the local one; then it stops the local phase and starts the
global one. It happens very often (see, e.g., [63, 65, 93, 117, 139]) that the global
search and the local one are realized by different algorithms and the global search
is not able to use all evaluations of f (x) made during the local search losing so an
important information about the objective function that has been already obtained.
The local improvement technique does not have this defect and allows the global
search to use all the information obtained during the local phases.
Second, the local improvement technique can work without any usage of
the derivatives. This is a valuable asset because many traditional local methods
require the derivatives and therefore when one needs to solve the problem (3.1.1),
(3.1.2) they cannot be applied because, clearly, Lipschitz functions can be nondifferentiable.
93
x [a, b],
(4.2.1)
where
| f (x) f (y)| L|x y|,
x, y [a, b],
(4.2.2)
(4.2.3)
Step 2. Compute in a certain way the values mi being estimates of the local
Lipschitz constants of f (x) over the intervals [xi1 , xi ], 2 i k. The way to
calculate the values mi will be specified in each concrete algorithm described
hereinafter.
Step 3. Calculate for each interval [xi1 , xi ], 2 i k, its characteristic
Ri =
(xi xi1 )
zi + zi1
mi
,
2
2
(4.2.4)
94
Step 5. If
|xt xt1 | > ,
(4.2.5)
where > 0 is a given search accuracy, then execute the next trial at the point
xk+1 =
xt + xt1 zt1 zt
+
2
2mt
(4.2.6)
ci (x),
i=2
where
ci (x) = max{zi1 mi (x xi1), zi + mi (x xi )},
x [xi1 , xi ],
and the characteristic Ri from (4.2.4) represents the minimum of the auxiliary
function ci (x) over the interval [xi1 , xi ].
If the constants mi are equal or larger than the local Lipschitz constant Li
corresponding to the interval [xi1 , xi ], for all i, 2 i k, then the function Ck (x)
is a low-bounding function for f (x) over the interval [a, b], i.e., for every interval
[xi1 , xi ], 2 i k, we have
f (x) ci (x),
x [xi1 , xi ],
2 i k.
95
6
4
2
f(x)
0
2
4
6
2
10
12
14
16
18
20
22
section proposes four specific algorithms executing this operation in different ways.
In Step 2, we can make two different choices of computing the constant mi that lead
to two different procedures that are called Step 2.1 and Step 2.2, respectively. In the
first procedure we use an adaptive estimate of the global Lipschitz constant (see
[117, 139]), for each iteration k. More precisely we have:
Step 2.1.
Set
mi = r max{ , hk },
2 i k,
(4.2.7)
where > 0 is a small number that takes into account our hypothesis that f (x) is
not constant over the interval [a, b] and r > 1 is a reliability parameter.
The value hk is calculated as follows
hk = max{hi : 2 i k, }
(4.2.8)
with
hi =
|zi zi1 |
,
xi xi1
2 i k,
(4.2.9)
where the values zi = f (xi ), 1 i k. In Step 2.1, at each iteration k all quantities
mi assume the same value over the whole search region [a, b]. However, as it was
already mentioned this global estimate (4.2.7) of the Lipschitz constant can provide
a poor information about the behavior of the objective function f (x) over every
small subinterval [xi1 , xi ] [a, b]. In fact, when the local Lipschitz constant related
to the interval [xi1 , xi ] is significantly smaller than the global constant L, then the
methods using only this global constant or its estimate can work slowly over such
an interval (see [100, 117, 139]).
96
In order to overcome this difficulty, we consider the local tuning approach (see
[74, 100, 117]) that adaptively estimates the values of the local Lipschitz constants
Li corresponding to the intervals [xi1 , xi ], 2 i k. The auxiliary function Ck (x) is
then constructed by using these local estimates for each interval [xi1 , xi ], 2 i k.
This technique is described below as the rule Step 2.2.
Step 2.2 (local tuning technique).
Set
mi = r max{i , i , }
(4.2.10)
i = max{hi1 , hi , hi+1 }, 3 i k 1,
(4.2.11)
with
where hi is from (4.2.9), and when i = 2 and i = k only h2 , h3 , and hk1 , hk , should
be considered, respectively. The value
i = hk
(xi xi1 )
,
X max
(4.2.12)
(4.2.13)
97
This rule used together the exact Lipschitz constant in Step 2 gives us Piyavskiis
algorithm. In this case, the new trial point xk+1 (xt1 , xt ) is chosen in such a
way that
Rt = min{Ri : 2 i k} = ct (xk+1 ) = min{Ck (x) : x [a, b]}.
The new way to fix Step 4 is introduced below.
Step 4.2 (local improvement technique).
f lag is a parameter initially equal to zero. imin is the index corresponding to the
current estimate of the minimal value of the function, that is: zimin = f (ximin )
f (xi ), 1 i k. zk is the result of the last trial corresponding to a point x j in the line
(4.2.3), i.e., xk = x j .
IF (flag=1) THEN
IF zk < zimin THEN imin = j.
Local improvement: Alternate the choice of the interval [xt1 , xt ]
among t = imin + 1 and t = imin, if 2 imin k 1, (if imin = 1
or imin = k take t = 2 or t = k, respectively) in such a way that for
> 0 it follows
|xt xt1 | > .
(4.2.14)
ELSE (flag=0)
t = argmin{Ri : 2 i k}
ENDIF
flag=NOTFLAG(flag)
The motivation of the introduction of Step 4.2 presented above is the following.
In Step 4.1, at each iteration, we continue the search at an interval corresponding to
the minimal value of the characteristic Ri , 2 i k, see (4.2.13). This choice admits
occurrence of such a situation where the search goes on for a certain finite (but
possibly high) number of iterations at subregions of the domain that are distant
from the best found approximation to the global solution and only successively
concentrates trials at the interval containing a global minimizer. However, very
often it is of a crucial importance to be able to find a good approximation of
the global minimum in the lowest number of iterations. Due to this reason, in
Step 4.2 we take into account the rule (4.2.13) used in Step 4.1 and related to the
minimal characteristic, but we alternate it with a new selection method that forces
the algorithm to continue the search in the part of the domain close to the best value
of the objective function found up to now. The parameter flag assuming values 0
or 1 allows us to alternate the two methods of the selection.
More precisely, in Step 4.2 we start by identifying the index imin corresponding
to the current minimum among the found values of the objective function f (x), and
then we select the interval (ximin , ximin+1 ) located on the right of the best current
point, ximin , or the interval on the left of ximin , i.e., (ximin1 , ximin ). Step 4.2 keeps
98
working alternatively on the right and on the left of the current best point ximin until a
new trial point with value less than zimin is found. The search moves from the right to
the left of the best found approximation trying to improve it. However, since we are
not sure that the found best approximation ximin is really located in the neighborhood
of a global minimizer x , the local improvement is alternated in Step 4.2 with
the usual rule (4.2.13) providing so the global search of new subregions possibly
containing the global solution x . The parameter defines the width of the intervals
that can be subdivided during the phase of the local improvement. Note that the
trial points produced during the phases of the local improvement (obviously, there
can be more than one phase in the course of the search) are used during the further
iterations of the global search in the same way as the points produced during the
global phases.
Let us consider now possible combinations of the different choices of Step 2 and
Step 4 allowing us to construct the following four algorithms.
GE: GS with Step 2.1 and Step 4.1 (the method using the Global Estimate of the
Lipschitz constant L).
LT : GS with Step 2.2 and Step 4.1 (the method executing the Local Tuning on
the local Lipschitz constants).
GE LI: GS with Step 2.1 and Step 4.2 (the method using the Global Estimate of
L enriched by the Local Improvement technique).
LT LI: GS with Step 2.2 and Step 4.2 (the method executing the Local Tuning
on the local Lipschitz constants enriched by the Local Improvement technique).
Let us consider convergence properties of the introduced algorithms by studying
an infinite trial sequence {xk } generated by an algorithm belonging to the general
scheme GS for solving problem (4.2.1), (4.2.2).
Theorem 4.1. Assume that the objective function f (x) satisfies the condition
(4.2.2), and let x be any limit point of {xk } generated by the GE or by the LT
algorithm. Then the following assertions hold:
Convergence to x is bilateral, if x (a, b) (see definition 3.1);
f (xk ) f (x ), for all trial points xk , k 1;
If there exists another limit point x = x , then f (x ) = f (x );
If the function f (x) has a finite number of local minima in [a, b], then the point x
is locally optimal;
5. (Sufficient conditions for convergence to a global minimizer). Let x be a global
minimizer of f (x). If there exists an iteration number k such that for all k > k
the inequality
1.
2.
3.
4.
m j(k) L j(k)
(4.2.15)
holds, where L j(k) is the Lipschitz constant for the interval [x j(k)1 , x j(k) ]
containing x , and m j(k) is its estimate (see (4.2.7) and (4.2.10)). Then the set
of limit points of the sequence {xk } coincides with the set of global minimizers of
the function f (x).
99
Proof. The proofs of assertions 15 are analogous to the proofs of Theorems 4.14.2
and Corollaries 4.14.4 from [139].
Theorem 4.2. Assertions 15 of Theorem 4.1 hold for the algorithms GE LI and
LT LI for a fixed finite > 0 and = 0, where is the accuracy of the local
improvement from (4.2.14) and is from (4.2.5).
Proof. Since > 0 and = 0, the algorithms GE LI and LT LI use the local
improvement only at the initial stage of the search until the selected interval [xt1 , xt ]
is greater than . When |xt xt1 | the interval cannot be divided by the local
improvement technique and the selection criterion (4.2.13) is used. Thus, since the
one-dimensional search region has a finite length and is a fixed finite number,
there exists a finite iteration number j such that at all iterations k > j only selection
criterion (4.2.13) will be used. As a result, at the remaining part of the search,
the methods GE LI and LT LI behave themselves as the algorithms GE and LT ,
respectively. This consideration concludes the proof.
The next Theorem ensures the existence of the values of the parameter r such
that the global minimizers of f (x) will be located by the four proposed methods that
do not use the a priori known Lipschitz constant.
Theorem 4.3. For any function f (x) satisfying (4.2.2) with L < there exists a
value r such that for all r > r condition (4.2.15) holds for the four algorithms GE,
LT , GE LI, and LT LI.
Proof. It follows from (4.2.7), (4.2.10), and the finiteness of > 0 that approximations of the Lipschitz constant mi in the four methods are always greater than zero.
Since L < in (4.2.2) and any positive value of the parameter r can be chosen in
the scheme GS, it follows that there exists an r such that condition (4.2.15) will be
satisfied for all global minimizers for r > r . This fact, due to Theorems 4.1 and 4.2,
proves the Theorem.
Let us present now results of numerical experiments executed on 120 functions
taken from the literature to compare the performance of the four algorithms
described in this section. In order to test the effectiveness of the acceleration
techniques we have carried out the numerical tests considering also the Piyavskii
method (denoted by PM) and an algorithm obtained by modifying that of Piyavskii
with the addition of the local improvement procedure (denoted by PM LI). We note
that these two methods belong to the general scheme GS, in particular in Step 2 they
use the exact value of the Lipschitz constant.
Two series of experiments have been done. In the first series, a set of 20 functions
described in [59] has been considered. In Tables 4.1 and 4.2, we present numerical
results for the six methods proposed to work with the problem (4.2.1), (4.2.2).
In particular, Table 4.1 contains the numbers of trials executed by the algorithms
with the accuracy = 104(b a), where is from (4.2.5). Table 4.2 presents the
results for = 106 (b a). The parameters of the methods have been chosen as
follows: = 108 for all the methods, r = 1.1 for the algorithms GE, LT , and
GE LI, LT LI. The exact values of the Lipschitz constant of the functions f (x)
100
PM
GE
LT
PM LI
GE LI
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
149
155
195
413
151
129
153
185
119
203
373
327
993
145
629
497
549
303
131
493
158
127
203
322
142
90
140
184
132
180
428
99
536
108
550
588
422
257
117
70
37
36
145
45
46
84
41
126
44
43
74
71
73
43
62
79
100
44
39
70
37
33
67
39
151
39
41
55
37
43
47
45
993
39
41
41
43
41
39
41
35
35
25
39
145
41
33
41
37
37
43
33
536
25
37
43
79
39
31
37
Average
314.60
242.40
65.10
95.60
68.55
LT LI
35
35
41
37
53
41
35
29
35
39
37
35
75
27
37
41
81
37
33
33
40.80
have been used in the methods PM and PM LI. For all the algorithms using the
local improvement technique the accuracy from (4.2.14) has been fixed = .
All the global minima have been found by all the methods in all the experiments
presented in Tables 4.1 and 4.2. In the last rows of these tables, the average values
of the numbers of trials points generated by the algorithms are given. It can be seen
from Tables 4.1 and 4.2 that both accelerating techniques, the local tuning and the
local improvement, allow us to speed up the search significantly when we work with
the methods belonging to the scheme GS. With respect to the local tuning we can
see that the method LT is faster than the algorithms PM and GE. Analogously, the
LT LI is faster than the methods PM LI and GE LI. The introduction of the local
improvement also was very successful. In fact, the algorithms PM LI, GE LI, and
LT LI work significantly faster than the methods PM, GE, and LT , respectively.
Finally, it can be clearly seen from Tables 4.1 and 4.2 that the acceleration effects
produced by both techniques are more pronounced when the accuracy of the search
increases.
In the second series of experiments, a class of 100 one-dimensional randomized
test functions from [94] has been taken. Each function f j (x), 1 j 100, of this
class is defined over the interval [5, 5] and has the following form
f j (x) = 0.025(x xj )2 + sin2((x xj ) + (x xj )2 ) + sin2 (x xj ),
(4.2.16)
101
Table 4.2 Results of numerical experiments executed on 20 test problems from [59] by the six
methods belonging to the scheme GS; the accuracy = 106 (b a), r = 1.1
PM
GE
LT
PM LI
GE LI
LT LI
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Problem
1,681
1,285
1,515
4,711
1,065
1,129
1,599
1,641
1,315
1,625
4,105
3,351
8,057
1,023
7,115
4,003
5,877
3,389
1,417
2,483
1,242
1,439
1,496
3,708
1,028
761
1,362
1,444
1,386
1,384
3,438
1,167
6,146
1,045
4,961
6,894
4,466
2,085
1,329
654
60
58
213
66
67
81
64
194
64
65
122
114
116
66
103
129
143
67
60
66
55
53
89
63
59
63
65
81
61
59
71
67
8,057
57
65
63
69
65
61
61
55
61
51
63
65
65
55
67
59
63
63
57
6,146
49
61
65
103
61
57
61
57
57
61
59
74
61
59
49
57
57
61
55
119
49
59
63
103
57
53
53
Average
2919.30
2371.75
95.90
464.20
366.35
63.15
Method
PM
GE
LT
PM LI
GE LI
LT LI
r
1.1
1.1
1.1
1.3*
= 104
400.54
167.63
47.28
44.82
40.22
38.88
r
1.1
1.1
1.2
1.2
= 106
2928.48
1562.27
70.21
65.70
62.96
60.04
where the global minimizer xj , 1 j 100, is chosen randomly from the interval
[5, 5] and differently for the 100 functions of the class. Figure 4.2 shows the graph
of the function no. 38 from the set of test functions (4.2.16) and the trial points
generated by the six methods during minimization of this function with the accuracy
= 104(b a). The global minimum of the function, f = 0, is attained at the point
x = 3.3611804993. In Fig. 4.2 the effects of the acceleration techniques, the local
tuning and the local improvement, can be clearly seen.
Table 4.3 shows the average numbers of trial points generated by the six methods
belonging to the scheme GS. In columns 2 and 4, the values of the reliability
parameter r are given. The parameter was again taken equal to 108 and = .
In Table 4.3, the asterisk denotes that in the algorithm LT LI (for = 104(b a))
the value r=1.3 has been used for 99 functions, and for the function no. 32 the value
102
421
GE
163
LT
43
PM_LI
41
GE_LI
37
LT_LI
33
Fig. 4.2 Graph of the function number 38 from (4.2.16) and trial points generated by the six
methods during their work with this function
r=1.4 has been applied. Table 4.3 confirms for the second series of experiments the
same conclusions that have been made with respect to the effects of the introduction
of the acceleration techniques for the first series of numerical tests.
103
(4.3.1)
(4.3.2)
2 i k,
(4.3.3)
where > 0 is a small number that takes into account our hypothesis that
f (x) is not constant over the interval [0, 1] and the value hk is calculated
as follows
hk = max{hi : 2 i k}
(4.3.4)
104
with
hi =
|zi zi1 |
,
|xi xi1|1/N
2 i k.
(4.3.5)
HOLDER-ESTIMATE(2)
Set
mi = max{i , i , },
2 i k,
(4.3.6)
with
i = max{hi1 , hi , hi+1 }, 3 i k 1,
(4.3.7)
i = hk
|xi xi1 |
,
X max
(4.3.8)
105
(4.3.9)
where yi denotes the global minimizer of the i-th function of the test class,
1 i 100.
(b) A value > 0 is fixed and the search terminates when the rule (4.3.1) is satisfied;
then it is counted the number of functions of the class for which the method
under consideration was able to put a point in the ball Bi , 1 i 100.
Comparison AGAGI and ALALI. In the first series of experiments, the
efficiency of the local improvement technique was studied. For this purpose,
the algorithms AG and AL were compared with the algorithms AGI and ALI,
respectively, on the class 1 from Table 3.7 (see Fig. 4.3). All experiments were
performed with = 108, =106 , from (4.2.14), and = 0 using the strategy
(a) with the radius = 0.01 N, where is from (4.3.9). The choices of the
reliability parameter r are given below in subsection The choice of parameters in
the experiments.
In order to illustrate a different behavior of the methods using the local
improvement technique, Fig. 4.4 shows behavior of the AG and the AGI on problem
106
100
100
90
90
80
80
70
70
60
60
50
50
40
40
30
30
20
20
10
0
10
50
100
150
200
250
300
350
Fig. 4.3 Methods AGI and AG using the global estimate, left. Methods ALI and AL using local
estimates, right
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.2
0.2
0.4
0.4
0.6
0.6
0.8
0.8
1
1 0.8 0.6 0.4 0.2
1
1 0.8 0.6 0.4 0.2
Fig. 4.4 Function no.55, class 1. Trial points produced by the AG, left. Trial points produced by
the AGI, right. Trial points chosen by the local improvement strategy are shown by the symbol *
no.55 from class 1. Figure 4.4-left shows 337 points of trials executed by the AG
to find the global minimum of the problem and Fig. 4.4-right presents 107 points
of trials executed by the AGI to solve the same problem. Recall that the search has
been stopped using the rule (a), i.e., as soon as a point within the ball B55 has been
placed.
Comparison AGIALI. In the second series of experiments (see Fig. 4.5), the
algorithms AGI and ALI are compared in order to study the influence of the local
tuning technique in the situation when the local improvement is applied too. The
choices of the reliability parameter r are given below in subsection The choice of
parameters in the experiments and the other parameters have been chosen as in the
first series of experiments. In Fig. 4.5-left, the rule (a) is used. It can be noticed that
the method ALI is faster in finding the global solution: the maximum number of
iterations executed by ALI is 241 against 1,054 carried out by the algorithm AGI.
In Fig. 4.5-right, the strategy (b) is used, the algorithms stop when the rule (4.3.1) is
100
90
90
80
80
70
70
60
60
50
50
40
40
30
30
20
107
20
AGI, (average no. of trials 164, max 1054)
ALI, (average no. of trials 76, max 241)
10
10
0
0
200
400
600
800
1000
1200
Fig. 4.5 ALI and AGI: = 0, left. ALI and AGI: = .001, right
100
100
90
90
80
80
70
70
60
60
50
50
40
40
30
30
20
20
AG, (av trials 1023, max 1910)
ALI, (av trials 437, max 586)
10
0
0
200
400
600
10
0
5000
10000
15000
Fig. 4.6 N = 2, class 1, methods AG and ALI, left. N = 3, class 3, methods AG and ALI, right
satisfied, with = 0.001. This criterion is very important because in solving real-life
problems we do not know a priori the global solution of the problem. Thus, it is very
important to study, how many trials should execute the methods to find the solution
and to stop by using the practical criterion (b). It can be seen that the ALI is very
fast to stop, whereas the method AGI executes a global analysis of the whole domain
of each objective function so that the stopping rule (4.3.1) is verified after a higher
number of trials.
Comparison AGALI. In the third series of experiments, we compare the basic
algorithm AG with the algorithm ALI using both the local tuning and the local
improvement, on classes 1, 3 and 5 from Table 3.7. The practical rule (b) was used
in these experiments. The choices of the reliability parameter r are given below.
In dimension 2, the values of , , and were the same as in the experiments
above; was fixed equal to 0.001. In Fig. 4.6-left the behavior of the two methods
can be seen. Note that after 500 iterations the stopping rule in the ALI was verified
for 84 functions and all the minima have been found, whereas the algorithm AG
stopped only at 2 functions.
108
Fig. 4.7 N = 4, class 5,
methods AG and ALI
8
x 104
For N = 3, the radius = 0.01 N has been used. The parameters of the methods
have been chosen as follows: the search accuracy = 0.0022, = 106, and =
108. In Fig. 4.6-right the behavior of the methods can be seen. All global minima
have been found.
In the last experiment of this series, the class of functions with N = 4 has been
used. Themethods AG and ALI worked with the following parameters: = 0.005,
= 0.04 N, = 108 , = 108 . The algorithm AG was not able to stop within
the maximal number of trials, 90,000, for 11 functions; however, the a posteriori
analysis has shown that the global minima have been found for these functions, too.
Figure 4.7 illustrates the results of the experiment.
The choice of parameters in the experiments. In this subsection we specify the
values of the reliability parameter r used in all the experiments. As has been already
discussed above (see also Theorem 3.3 in [76]), every function optimized by the AG,
AGI, AL, and ALI algorithms has a crucial value r of this parameter. Therefore,
when one executes tests with a class of 100 different functions it becomes difficult
to use specific values of r for each function, hence in our experiments at most two
or three values of this parameter have been fixed for the entire class. Clearly, such
a choice does not allow the algorithms to show their complete potential because
both the local tuning and local improvement techniques have been introduced to
capture the peculiarities of each concrete objective function. However, even under
these unfavorable conditions, the four algorithms proposed in the paper have shown
a nice performance. Note that the meaning of r and other parameters of this kind
in Lipschitz global optimization is discussed in detail in a number of fundamental
monographs (see, e.g., [93, 117, 132, 139, 156]).
109
The following values of the reliability parameter r were used in the first series
of experiment: in the methods AG and AGI the reliability parameter r = 1.3; in the
ALI the value r = 2.8 was used for all 100 functions of the class and in the method
AL the same value r = 2.8 was used for 98 functions and r = 2.9 for the remaining
two functions.
In the second series of experiments the same value of the parameter r = 2.8 has
been used in both methods (AGI and ALI).
In the third series of experiments the following values of the parameter r have
been used: in dimension N = 2, in the AG the value r = 1.3 and in the ALI the
value r = 2.8. In dimension N = 3, the value r = 1.1 has been applied in the method
AG for all 100 functions of the class; in the method ALI, r = 3.1 has been used for
73 functions of the class, r = 3.4 for 20 functions, and r = 3.9 for the remaining 7
functions. In dimension N = 4, r = 1.1 in the method AG; r = 6.5 in the ALI for 77
functions of the class, r = 6.9 for 17 functions, r = 7.7 for the remaining 6 functions.
(4.4.1)
Step 2. Compute the value mi being an estimate of the Lipschitz constant of f (x)
over the interval [xi1 , xi ], 2 i k, according to Step 2.2 of Sect.4.2 (the
local tuning).
110
(zi zi1 )2
2(zi + zi1 ),
mi (xi xi1)
(4.4.2)
where zi = f (xi ).
Step 4. Select the interval [xt1 , xt ] for the next possible trial according to Step 4.2
of Sect. 4.2 (the local improvement).
Step 5. If
|xt xt1 | > ,
(4.4.3)
where > 0 is a given search accuracy, then execute the next trial at the
point
zt zt1
xk+1 = 0.5 xt + xt1
mt
(4.4.4)
111
Function
GA
PM
IA
OIL
OILI
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
377
308
581
923
326
263
383
530
314
416
779
746
1,829
290
1,613
992
1,412
620
302
1,412
149
155
195
413
151
129
153
185
119
203
373
327
993
145
629
497
549
303
131
493
127
135
224
379
126
112
115
188
125
157
405
271
472
108
471
557
470
243
117
81
35
36
136
41
45
54
39
132
42
40
71
68
45
46
63
53
101
41
34
42
34
36
40
40
42
40
40
34
38
38
36
40
32
36
38
38
48
40
32
38
Average
720.80
314.60
244.15
58.20
38.00
Algorithm MILI
Step 0. Starting points x1 , x2 , . . . , xm , m > 2, are fixed in such a way that x1 = 0,
xm = 1 and the other m 2 points are chosen arbitrarily. Values z j = f (x j ) =
F(pM (x j )), 1 j m, are calculated, where pM (x) is the M-approximation
of the Peano curve. After executing k trials the choice of new trial points is
done as follows.
Step 1. Execute Step 1 of OILI.
Step 2. (Local tuning.) Evaluate the values mi according to (4.2.10) of Sect. 4.2
replacing (xi xi1 ) by (xi xi1 )1/N in (4.2.11), (4.2.12) and X max by
(X max )1/N in (4.2.12). The values f (x j ) are replaced by F(pM (x j )).
Step 3. For each interval [xi1 , xi ], 2 i k, calculate characteristics Ri according
to (4.4.2) of algorithm OILI, replacing (xi xi1 ) by (xi xi1 )1/N .
Step 4. Execute Step 4 of OILI for select the index t.
Step 5. If
112
(4.4.5)
|zt zt1 |
mt
N
1
sgn(zt zt1 )
2r
(4.4.6)
and go to Step 1.
If in Step 4 of the scheme MILI we consider the traditional selection rule in
according to (3.2.10) from Sect. 3.2, i.e. we select the interval [xt1 , xt ] for the
next possible trial corresponding to the maximal characteristic, then we obtain the
Multidimensional Information algorithm with Local tuning that we will denote as
MIL hereinafter (see [101]).
Theorem 4.4. Let > 0 be fixed finite, and = 0, and x be a global minimizer of
f (x) = F(y(x)) and {k} be the sequence of all iteration numbers {k} = {1, 2, 3, . . .}
corresponding to trials generated by the MILI or the MIL. If there exists an infinite
subsequence {h} of iteration numbers {h} {k} such that for an interval
[xi1 , xi ], i = i(p), p {h},
containing the point x at the p-th iteration, the inequality
mi 211/N Ki +
222/N Ki2 Mi2
(4.4.7)
holds for the estimate mi of the local Lipschitz constant corresponding to the interval
[xi1 , xi ], then the set of limit points of the sequence {xk } of trials generated by the
MILI or the MIL coincides with the set of global minimizers of the function f (x).
In (4.4.7), the values Ki and Mi are the following:
Ki = max{(zi1 f (x ))/(x xi1 )1/N , (zi f (x ))/(xi x )1/N },
Mi = |zi1 zi |/(xi xi1)1/N .
Proof. Convergence properties of the method are similar to that ones described in
the previous section (see [75] for a detailed discussion).
Let us consider now two series of numerical experiments that involve a total
of 600 test functions in dimension N = 2, 3, 4. More precisely, six classes of 100
113
s = arg max Ts .
1s100
(4.4.8)
Criterion C2. Average number of trials Tavg performed by the method during
minimization of all 100 functions from a particular test class, i.e.,
Tavg =
1 100
Ts .
100 s=1
(4.4.9)
114
.01N
.01N
.01N
.01N
.02N
.02 N
C2
MIA
1219
4678
22608
70492
100000(53)
100000(96)
MIL
1065
2699
10800
47456
100000(2)
100000(24)
MILI
724
2372
8459
37688
100000(1)
100000(27)
MIA
710.39
1705.89
8242.19
20257.50
83362.00
99610.97
MIL
332.48
956.67
2218.82
12758.14
23577.91
61174.69
MILI
354.82
953.58
2312.93
11505.38
23337.03
61900.93
The algorithms MILI and MIL have been constructed in order to be tuned on each
concrete function. Therefore, when one executes tests with a class of 100 different
functions it becomes difficult to use specific values of r for each function and in our
experiments only one or two values of this parameter have been fixed for the entire
class. Clearly, such a choice does not allow the algorithms MILI and MIL to show
their complete potential in the comparison with the MIA. However, as it can be seen
from Table 4.5, even under these unfavorable conditions, the algorithms show a very
nice performance. In the results described in Table 4.5, all the algorithms were able
to find the solution to all 100 functions of each class. It can be seen that the MIL
and the MILI were very fast to stop, whereas the MIA executed a deeper global
analysis of the whole domain of each objective function so that the stopping rule
(4.4.5) was verified after a higher number of trials. In all the cases, the maximal
number of function evaluations has been taken equal to 100,000 and in Table 4.5, in
the C1 columns, the numbers in brackets present the number of functions for which
the algorithm has reached this number.
In the second series of experiments, the efficiency of the local improvement
technique was studied. For this purpose, the algorithm MILI has been compared
with the algorithm MIL by using the stopping strategy (a), i.e., the search went
on until a point within the ball Bi from (4.3.9), has been placed. In solving many
concrete problems very often it is crucial to find a good approximation of the global
minimum in the lowest number of iterations. The most important aim of the local
improvement is that of quicken the search: thus, we use the stopping criterion (a)
that allows us to see which of the two methods faster approaches the global solution.
In these experiments, we considered the criteria C1 and C2 previously described,
and a new criterion defined as follows.
Criterion C3. Number p (number q) of functions from a class for which the MIL
algorithm executed less (more) function evaluations than the algorithm MILI. If Ts
is the number of trials performed by the MILI and Ts is the corresponding number
of trials performed by the MIL method, p and q are evaluated as follows:
p=
100
s=1
s ,
s
=
1, Ts < Ts ,
0, otherwise;
(4.4.10)
115
MIL
668
1517
7018
40074
67017
76561
C2
MILI
434
1104
5345
15355
36097
73421
MIL
153.72
423.39
1427.06
6162.02
10297.14
21961.91
q=
100
s ,
s=1
C3
MILI
90.73
198.82
838.96
2875.06
6784.37
16327.21
MIL
20
22
25
25
36
40
s =
MILI
79
77
75
75
64
60
1, Ts < Ts ,
0, otherwise.
Ratio C1
MIL/MILI
1.5391
1.3741
1.3130
2.6098
1.8565
1.0427
Ratio C2
MIL/MILI
1.6942
2.1295
1.7008
2.1432
1.5177
1.3451
(4.4.11)
If p + q < 100, then both the methods solve the remaining 100 (p + q) problems
with the same number of function evaluations.
Table 4.6 presents results of numerical experiments in the second series. The
C1 and C2 columns have the same meaning as before. The C3 column
presents results of the comparison between the two methods in terms of this
criterion: the MIL sub-column presents the number of functions, p, of a particular
test class, for which MIL spent fewer trials than the MILI method. Analogously, the
MILI sub-column shows the number of functions, q, for which the MILI executed
less function evaluations with respect to the MIL (p and q are from (4.4.10) and
(4.4.11), respectively). For example, in the line corresponding to the test class 1,
for N = 2, we can see that the method MILI was better (was worse) than the MIL
on q = 79 (p = 20) functions, and for one function of this class the two methods
generated the same number of function trials.
In all the cases, the maximal number of function evaluations has been taken equal
to 100,000. The parameters d, , and and the values of the reliability parameter r
used in these experiments for the MIL and MILI methods are the same as in the first
series of experiments. It can be seen from Table 4.6 that on these test classes the
method MILI worked better than the information algorithm MIL. In particular, the
columns Ratio C1 and Ratio C2 of Table 4.6 show the improvement obtained
by the MILI with respect to Criteria C1 and C2. They represent the ratio between
the maximal (and the average) number of trials performed by the MIL with respect
to the corresponding number of trials performed by the MILI algorithm.
The choice of parameters in the experiments. The following values of the
reliability parameter r have been used for the methods in the first series of
experiments: for the test class 1 the value r = 4.9 in the MIL and MILI algorithms
and the value r = 3.1 in the MIA algorithm. For the class 2 the value r = 5.4 was
used in the MIL and the MILI for 97 functions, and r = 5.5 for the remaining 3
functions of this class; in the MIA the values r = 4.1 and r = 4.3 were used for 97
and 3 functions of the same class, respectively.
116
In dimension N = 3, the values r = 5.5 and r = 5.7 were applied in the MIL
and the MILI methods for 97 and 3 functions of the class 3, respectively; the values
r = 3.2 and r = 3.4 for 97 and 3 functions of this class when the MIA algorithm has
been used. By considering the test class 4 the following values of the parameter r
have been used: r = 6.5 and r = 6.6 in the MIL and the MILI methods for 99 and
1 function, respectively; r = 3.8 for 99 functions in the MIA and r = 4.1 for the
remaining function.
In dimension N = 4, the value r = 6.2 was used for all 100 functions of test class 5
in the MIL and the MILI, and r = 3.3, r = 3.5 in the MIA, for 96 and 4 functions,
respectively. The value r = 6.2 was applied for 92 functions of test class 6 in the
MIL and the MILI, and the values r = 6.6 and 6.8 were used for 5 and 3 functions,
respectively; in the MIA algorithm the value r = 3.8 has been used for 98 functions
of the class 6 and r = 4.1 for the remaining 2 functions.
Finally, the parameter from Step 4 of the MILI algorithm has been fixed equal
to 106 for N = 2, and equal to 108 for N = 3, 4.
Chapter 5
A Brief Conclusion
What we call the beginning is often the end. And to make an end
is to make a beginning. The end is where we start from.
T. S. Eliot
We conclude this brief book by emphasizing once again that it is just an introduction
to the subject. We have considered the basic Lipschitz global optimization problem,
i.e., global minimization of a multiextremal, non-differentiable Lipschitz function
over a hyperinterval with a special emphasis on Peano curves, strategies for adaptive
estimation of Lipschitz information, and acceleration of the search. There already
exists a lot of generalizations of the ideas presented here in several directions.
For the reader interested in a deeper immersion in the subject we give below some
of them:
Algorithms working with discontinuous functions and functions having Lipschitz
first derivatives (see [43,70,72,77,102,103,106,117,118,121,139] and reference
given therein).
Algorithms working with diagonal partitions and adaptive diagonal curves
for solving multidimensional problems with Lipschitz objective functions and
Lipschitz first derivatives (see [72, 107, 108, 116118] and reference given
therein).
Algorithms for multicriteria problems and problems with multiextremal nondifferentiable partially defined constrains (see [109, 119, 122124, 134, 137, 139,
140] and reference given therein).
Algorithms combining the ideas of Lipschitz global optimization with the
Interval Analysis framework (see [1416, 81], etc.).
Parallel non-redundant algorithms for Lipschitz global optimization problems
and problems with Lipschitz first derivatives (see [44, 105, 113115, 120, 138
140], etc.).
117
118
5 A Brief Conclusion
Algorithms for finding the minimal root of equations (and sets of equations)
having a multiextremal (and possibly non-differentiable) left-hand part over an
interval (see [15, 16, 18, 19, 83, 121], etc.).
Thus, this book is a demonstration that the demand from the world of applications
entails a continuous intensive activity in the development of new global optimization
approaches. The authors hope that what is written here may serve not only as
a tool for people from different applied areas but also as the source of many
other successful developments (especially by young researchers just coming to the
scene of global optimization). Therefore, we expect this book to be a valuable
introduction in the subject to faculty, students, and engineers working in local and
global optimization, applied mathematics, computer sciences, and in related areas.
References
1. Addis, B., Locatelli, M.: A new class of test functions for global optimization. J. Global
Optim. 38, 479501 (2007)
2. Addis, B., Locatelli, M., Schoen, F.: Local optima smoothing for global optimization. Optim.
Meth. Software 20, 417437 (2005)
3. Aguiar e Oliveira, H., Jr., Ingber, L., Petraglia, A., Rembold Petraglia, M., Augusta Soares
Machado, M.: Stochastic Global Optimization and Its Applications with Fuzzy Adaptive
Simulated Annealing. Springer, Berlin (2012)
4. Baritompa, W.P.: Customized method for global optimizationa geometric viewpoint.
J. Global Optim. 3, 193212 (1993)
5. Baritompa, W.P.: Accelerations for a variety of global optimization methods. J. Global Optim.
4, 3745 (1994)
6. Barkalov, K., Ryabov, V., Sidorov, S.: Parallel scalable algorithms with mixed local-global
strategy for global optimization problems. In: Hsu, C.H., Malyshkin, V. (eds.) MTPP 2010.
LNCS 6083, pp. 232240. Springer, Berlin (2010)
7. Bomze, I.M., Csendes, T., Horst, R., Pardalos, P.M.: Developments in Global Optimization.
Kluwer, Dordrecht (1997)
8. Breiman, L., Cutler, A.: A deterministic algorithm for global optimization. Math. Program.
58, 179199 (1993)
9. Butz, A.R.: Space filling curves and mathematical programming. Inform. Contr. 12, 313330
(1968)
119
120
References
16. Casado, L.G., Garca, I., Sergeyev, Ya.D.: Interval algorithms for finding the minimal root in
a set of multiextremal non-differentiable one-dimensional functions. SIAM J. Sci. Comput.
24, 359376 (2002)
References
121
41. Gaviano, M., Lera, D., Steri, A.M.: A local search method for continuous global optimization.
J. Global Optim. 48, 7385 (2010)
42. Gelfand, I., Raikov, D., Shilov, G.: Commutative Normed Rings. AMS Chelsea Publishing,
New York (1991)
43. Gergel, V.P.: A global search algorithm using derivatives. In: Systems Dynamics and
Optimization, pp. 161178. N.Novgorod University Press, N. Novgorod (1992) (In Russian)
44. Gergel, V.P., Sergeyev, Ya.D.: Sequential and parallel global optimization algorithms using
derivatives. Comput. Math. Appl. 37, 163180 (1999)
45. Gergel, V.P., Strongin, R.G.: Multiple Peano curves in recognition problems. Pattern Recogn.
Image Anal. 2, 161164 (1992)
46. Gergel, V.P., Strongin, L.G., Strongin, R.G.: Neighbourhood method in recognition problems.
Soviet J. Comput. Syst. Sci. 26, 4654 (1988)
47. Glover, F., Kochenberger, G.A.: Handbook on Metaheuristics, Kluwer, Dordrecht (2003)
48. Gornov, A.Yu., Zarodnyuk, T.S.: A method of stochastic coverings for optimal control
problems. Comput. Technol. 17, 3142 (2012) (In Russian)
49. Gorodetsky, S.Yu.: Multiextremal optimization based on domain triangulation. The Bulletin
of Nizhni Novgorod Lobachevsky University: Math. Model. Optim. Contr. 21, 249268
(1999) (In Russian)
50. Gorodetsky, S.Yu.: Paraboloid triangulation methods in solving multiextremal optimization
problems with constraints for a class of functions with lipschitz directional derivatives. The
Bulletin of Nizhni Novgorod Lobachevsky University: Math. Model. Optim. Contr. 1, 144
155 (2012) (In Russian)
51. Gorodetsky, S.Yu., Grishagin, V.A.: Nonlinear Programming and Multiextremal Optimization. NNGU Press, Nizhni Novgorod (2007) (In Russian)
52. Gourdin, E., Jaumard, B., Ellaia, R.: Global optimization of Holder functions. J. Global
Optim. 8, 323348 (1996)
53. Grishagin, V.A.: Operation characteristics of some global optimization algorithms. Prob.
Stoch. Search 7, 198206 (1978) (In Russian)
54. Grishagin, V.A.: On convergence conditions for a class of global search algorithms. In: Proceedings of the 3-rd All-Union Seminar Numerical Methods of Nonlinear Programming,
Kharkov, pp. 8284 (1979)
55. Grishagin, V.A.: On properties of a class of optimization algorithms. Transactions of the 3-rd
Conference of Young Scientists of Applied Mathematics and Cybernetics Research Institute
of Gorky University, Gorky, pp. 5058. Deposited with VINITI, Aug.14, 1984, No.583684
Dep. (1983)
56. Grishagin, V.A., Sergeyev, Ya.D., Strongin, R.G.: Parallel characteristical global optimization
algorithms. J. Global Optim. 10, 185206 (1997)
57. Hanjoul, P., Hansen, P., Peeters, D., Thisse, J.F.: Uncapacitated plant location under alternative space price policies. Manag. Sci. 36, 4147 (1990)
58. Hansen, P., Jaumard, B.: Lipshitz optimization. In: Horst, R., Pardalos, P.M. (eds.) Handbook
of Global Optimization, pp. 407493. Kluwer, Dordrecht (1995)
59. Hansen, P., Jaumard, B., Lu, S.H.: Global optimization of univariate Lipschitz functions: 2.
New algorithms and computational comparison. Math. Program. 55, 273293 (1992)
60. Hastings, H.M., Sugihara, G.: Fractals: A Users Guide for the Natural Sciences. Oxford
University Press, Oxford (1994)
61. Hendrix, E.M.T., G.-Toth, B.: Introduction to Nonlinear and Global Optimization. Springer,
New York (2010)
122
References
66. Iudin, D.I., Sergeyev, Ya.D., Hayakawa, M.: Interpretation of percolation in terms of infinity
computations. Appl. Math. Comput. 218, 80998111 (2012)
67. Jones, D.R., Perttunen, C.D., Stuckman, B.E.: Lipschitzian optimization without the Lipschitz
constant. J. Optim. Theor. Appl. 79, 157181 (1993)
68. Kiatsupaibul, S., Smith, R.L.: On the solution of infinite horizon optimization problems
through global optimization algorithms. Tech. Report 9819, DIOE, University of Michigan,
Ann Arbor (1998)
69. Kushner, H.: A new method for locating the maximum point of an arbitrary multipeak curve
in presence of noise. J. Basic Eng. 86, 97106 (1964)
70. Kvasov, D.E., Sergeyev, Ya.D.: A univariate global search working with a set of Lipschitz
constants for the first derivative. Optim. Lett. 3, 303318 (2009)
71. Kvasov, D.E., Sergeyev, Ya.D.: Univariate geometric Lipschitz global optimization
algorithms. Numer. Algebra Contr. Optim. 2, 6990 (2012)
72. Kvasov, D.E., Sergeyev, Ya.D.: Lipschitz gradients for global optimization in a one-pointbased partitioning scheme. J. Comput. Appl. Math. 236, 40424054 (2012)
73. Kvasov, D.E., Menniti, D., Pinnarelli, A., Sergeyev, Ya.D., Sorrentino, N.: Tuning fuzzy
power-system stabilizers in multi-machine systems by global optimization algorithms based
on efficient domain partitions. Elec. Power Syst. Res. 78, 12171229 (2008)
74. Lera, D., Sergeyev, Ya.D.: Global minimization algorithms for Holder functions. BIT 42,
119133 (2002)
75. Lera, D., Sergeyev, Ya.D.: An information global minimization algorithm using the local
improvement technique. J. Global Optim. 48, 99112 (2010)
76. Lera, D., Sergeyev, Ya.D.: Lipschitz and Holder global optimization using space-filling
curves. Appl. Numer. Math. 60, 115129 (2010)
77. Lera, D., Sergeyev, Ya.D.: Acceleration of univariate global optimization algorithms working
with Lipschitz functions and Lipschitz first derivatives. SIAM J. Optim. 23(1), 508529
(2013)
78. Liuzzi, G., Lucidi, S., Piccialli, V.: A partition-based global optimization algorithm. J. Global
Optim. 48, 113128 (2010)
79. Locatelli, M.: On the multilevel structure of global optimization problems. Comput. Optim.
Appl. 30, 522 (2005)
80. Mandelbrot, B.: Les objets fractals: forme, hasard et dimension. Flammarion, Paris (1975)
81. Martnez, J.A., Casado, L.G., Garca, I., Sergeyev, Ya.D., G.-Toth, B.: On an efficient use
of gradient information for accelerating interval global optimization algorithms. Numer.
Algorithm 37, 6169 (2004)
82. Mockus, J.: Bayesian Approach to Global Optimization. Kluwer, Dordrecht (1988)
83. Molinaro, A., Pizzuti, C., Sergeyev, Ya.D.: Acceleration tools for diagonal information global
optimization algorithms. Comput. Optim. Appl. 18, 526 (2001)
84. Moore, E.H.: On certain crinkly curves. Trans. Am. Math. Soc. 1, 7290 (1900)
85. Netto, E.: Beitrag zur Mannigfaltigkeitslehre. Journal fur die reine und angewandte Mathematik (Crelles Journal) 86, 263268 (1879)
References
123
92. Pijavskii, S.A.: An algorithm for finding the absolute extremum of a function. USSR Comput.
Math. Math. Phys. 12, 5767 (1972)
93. Pinter, J.: Global Optimization in Action (Continuous and Lipschitz Optimization:
Algorithms, Implementations and Applications). Kluwer, Dordrecht (1996)
94. Pinter, J.: Global optimization: software, test problems, and applications. In: Pardalos, P.M.,
Romeijn, H.E. (eds.) Handbook of Global Optimization, vol. 2, pp. 515569. Kluwer,
Dordrecht (2002)
95. Platzman, L.K., Bartholdi, J.J. III: Spacefilling curves and the planar travelling salesman
problem. J. ACM 36, 719737 (1989)
96. Press, W.H., Teukolsky, S.A., Vettering, W.T., Flannery, B.P.: Numerical Recipes in Fortran,
The Art of Scientific Computing, 2nd edn. Cambridge University Press, Cambridge (1992)
97. Rastrigin, L.A.: Random Search in Optimization Problems for Multiparameter Systems. Air
Force System Command, Foreign Technical Division, FTD-HT-67-363 (1965)
98. Ratscek, H., Rokne, J.: New Computer Methods for Global Optimization. Ellis Horwood,
Chichester (1988)
99. Sagan, H.: Space-Filling Curves. Springer, New York (1994)
100. Sergeyev, Ya.D.: A one-dimensional deterministic global minimization algorithm. Comput.
Math. Math. Phys. 35, 705717 (1995)
101. Sergeyev, Ya.D.: An information global optimization algorithm with local tuning. SIAM
J. Optim. 5, 858870 (1995)
102. Sergeyev, Ya.D.: A method using local tuning for minimizing functions with Lipschitz
derivatives. In: Bomze, E., Csendes, T., Horst, R., Pardalos, P.M. (eds.) Developments in
Global Optimization, pp. 199215. Kluwer, Dordrecht (1997)
103. Sergeyev, Ya.D.: Global one-dimensional optimization using smooth auxiliary functions.
Math. Program. 81, 127146 (1998)
104. Sergeyev, Ya.D.: On convergence of Divide the Best global optimization algorithms.
Optimization 44, 303325 (1998)
105. Sergeyev, Ya.D.: Parallel information algorithm with local tuning for solving multidimensional GO problems. J. Global Optim. 15, 157167 (1999)
106. Sergeyev, Ya.D.: Multidimensional global optimization using the first derivatives. Comput.
Math. Math. Phys. 39, 743752 (1999)
107. Sergeyev Ya.D.: An efficient strategy for adaptive partition of N-dimensional intervals in the
framework of diagonal algorithms. J. Optim. Theor. Appl. 107, 145168 (2000)
108. Sergeyev, Ya.D.: Efficient partition of N-dimensional intervals in the framework of one-pointbased algorithms. J. Optim. Theor. Appl. 124, 503510 (2005)
109. Sergeyev, Ya.D.: Univariate global optimization with multiextremal nondifferentiable
constraints without penalty functions. Comput. Optim. Appl. 34, 229248 (2006)
110. Sergeyev, Ya.D.: Blinking fractals and their quantitative analysis using infinite and
infinitesimal numbers. Chaos Solitons Fract. 33, 5075 (2007)
111. Sergeyev, Ya.D.: Evaluating the exact infinitesimal values of area of Sierpinskis carpet and
volume of Mengers sponge. Chaos Solitons Fract. 42, 30423046 (2009)
112. Sergeyev, Ya.D.: Using blinking fractals for mathematical modelling of processes of growth
in biological systems. Informatica 22, 559576 (2011)
113. Sergeyev, Ya.D., Grishagin, V.A.: Sequential and parallel algorithms for global optimization.
Optim. Meth. Software 3, 111124 (1994)
114. Sergeyev, Ya.D., Grishagin, V.A.: A parallel method for finding the global minimum of
univariate functions. J. Optim. Theor. Appl. 80, 513536 (1994)
115. Sergeyev, Ya.D., Grishagin, V.A.: Parallel asynchronous global search and the nested
optimization scheme. J. Comput. Anal. Appl. 3, 123145 (2001)
116. Sergeyev, Ya.D., Kvasov, D.E.: Global search based on efficient diagonal partitions and a set
of Lipschitz constants. SIAM J. Optim. 16, 910937 (2006)
117. Sergeyev, Ya.D., Kvasov, D.E.: Diagonal Global Optimization Methods. FizMatLit, Moscow
(2008) (In Russian)
124
References
118. Sergeyev, Ya.D., Kvasov, D.E.: Lipschitz global optimization. In: Cochran, J.J., Cox, L.A.,
Keskinocak, P., Kharoufeh, J.P., Smith, J.C. (eds.) Wiley Encyclopaedia of Operations
Research and Management Science, vol. 4, pp. 28122828. Wiley, New York (2011)
119. Sergeyev, Ya.D., Markin, D.L.: An algorithm for solving global optimization problems with
nonlinear constraints. J. Global Optim. 7, 407419 (1995)
120. Sergeyev, Ya.D., Strongin, R.G.: A global minimization algorithm with parallel iterations.
Comput. Math. Math. Phys. 29, 715 (1990)
121. Sergeyev, Ya.D., Daponte, P., Grimaldi, D., Molinaro, A.: Two methods for solving optimization problems arising in electronic measurements and electrical engineering. SIAM J. Optim.
10, 121 (1999)
122. Sergeyev, Ya.D., Famularo, D., Pugliese, P.: Index Branch-and-Bound Algorithm for Lipschitz univariate global optimization with multiextremal constraints. J. Global Optim. 21,
317341 (2001)
123. Sergeyev, Ya.D., Pugliese, P., Famularo, D.: Index information algorithm with local tuning
for solving multidimensional global optimization problems with multiextremal constraints.
Math. Program. 96, 489512 (2003)
124. Sergeyev, Ya.D., Kvasov, D., Khalaf, F.M.H.: A one-dimensional local tuning algorithm for
solving GO problems with partially defined constraints. Optim. Lett. 1, 8599 (2007)
125. Sierpinski, W.: O krzywych, wypelniajacych kwadrat. Prace Mat.-Fiz. 23, 193219 (1912)
126. Stephens, C.P., Baritompa, W.P.: Global optimization requires global information. J. Optim.
Theor. Appl. 96, 575588 (1998)
127. Strekalovsky, A.S.: Global optimality conditions for nonconvex optimization. J. Global
Optim. 4, 415434 (1998)
128. Strekalovsky, A.S.: Elements of Nonconvex Optimization. Nauka, Novosibirsk (2003)
(In Russian)
129. Strekalovsky, A.S., Orlov, A.V., MalyshevA.V.: On computational search for optimistic
solutions in bilevel problems. J. Global Optim. 48, 159 172 (2010)
130. Strigul, O.I.: Search for a global extremum in a certain subclass of functions with the Lipschitz
condition. Cybernetics 6, 7276 (1985)
131. Strongin, R.G.: On the convergence of an algorithm for finding a global extremum. Eng.
Cybern. 11, 549555 (1973)
132. Strongin, R.G.: Numerical Methods in Multiextremal Problems. Nauka, Moskow (1978)
(In Russian)
133. Strongin, R.G.: The information approach to multiextremal optimization problems. Stoch.
Stoch. Rep. 27, 6582 (1989)
134. Strongin, R.G.: Search for Global Optimum. Series of Mathematics and Cybernetics 2.
Znanie, Moscow (1990) (In Russian)
135. Strongin, R.G.: Algorithms for multi-extremal mathematical programming problems employing the set of joint space-filling curves. J. Global Optim. 2, 357378 (1992)
136. Strongin, R.G., Gergel, V.P.: On realization of the generalized multidimensional global
search algorithm on a computer. Problems of Cybernetics. Stochastic Search in Optimization
Problems. Scientific Council of Academy of Sciences of USSR for Cybernetics, Moscow
(1978) (In Russian)
137. Strongin, R.G., Markin, D.L.: Minimization of multiextremal functions with nonconvex
constraints. Cybernetics 22, 486493 (1986)
138. Strongin, R.G., Sergeyev, Ya.D.: Global multidimensional optimization on parallel computer.
Parallel Comput. 18, 12591273 (1992)
139. Strongin, R.G., Sergeyev, Ya.D.: Global Optimization with Non-convex Constraints: Sequential and Parallel Algorithms. Kluwer, Dordrecht (2000)
140. Strongin, R.G., Sergeyev, Ya.D.: Global optimization: fractal approach and non-redundant
parallelism. J. Global Optim. 27, 2550 (2003)
141. Sukharev, A.G.: Global extrema and methods of its search. In: Moiseev, N.N., Krasnoshchekov, P.S. (eds.) Mathematical Methods in Operations Research, pp. 437. Moscow
University, Moscow (1981) (In Russian)
References
125
142. Sukharev, A.G.: Minimax Algorithms in Problems of Numerical Analysis. Nauka, Moscow
(1989) (In Russian)
143. Tawarmalani, M., Sahinidis, N.V.: Convexification and Global Optimization in Continuous
and MixedInteger Nonlinear Programming: Theory, Algorithms, Software, and Applications. Kluwer, Dordrecht (2002)
144. Timonov, L.N.: An algorithm for search of a global extremum. Eng. Cybern. 15, 3844 (1977)
145. Torn, A., Ali, M.M., Viitanen, S.: Stochastic global optimization: problem classes and
solution techniques. J. Global Optim. 14, 437447 (1999)
157. Zilinskas,
A.: One-step Bayesian method for the search of the optimum of one-variable
functions. Cybernetics 1, 139144 (1975)
158. Zilinskas,
A., Mockus, J.: On one Bayesian method of search of the minimum. Avtomatica i
Vychislitelnaya Teknika 4, 4244 (1972) (In Russian)