Академический Документы
Профессиональный Документы
Культура Документы
(Университет ИТМО)
На правах рукописи
Санкт-Петербург 2023
Национальный исследовательский университет ИТМО
(Университет ИТМО)
На правах рукописи
Специальность 1.2.1.
«Искусственный интеллект и машинное обучение (физико-математические науки)»
Научный руководитель:
кандидат физико-математических наук
Хватов Александр Александрович
Санкт-Петербург 2023
Диссертация подготовлена в: федеральное государственное автономное образовательное
учреждение высшего образования «Национальный исследовательский университет ИТМО».
As a manuscript
Supervisor:
PhD
Hvatov Alexander A.
Saint-Petersburg 2023
The research was carried out at: ITMO University.
Supervisor: PhD
Hvatov Alexander A.
Official opponents: Derkach Denis A., PhD, HSE University, Director of the Institute for
Applied Research and Development Institute of Artificial Intelligence and
Digital Sciences
The defense will be held on 25.12.2023 at 11:15 at the meeting of the ITMO University Dissertation
Council 02.22.15, https://youtube.com/live/8ionXKx8Zcg?feature=share.
The thesis is available in the Library of ITMO University, Lomonosova St. 9, Saint-Petersburg, Russia and
on https://dissovet.itmo.ru website.
Science Secretary of the ITMO University Dissertation Council 02.22.00, PhD in engineering, Mouromtsev
Dmitry I.
5
Оглавление
Стр.
Реферат . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Введение . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Заключение . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Реферат
Содержание работы
2
0 1 1 X
K (s s)= 2
exp ( 2 (s s0 )i ); (4)
2⇡ 2 i=1
ронной сети, z(i) - входные переменные, которые в случае первого слоя соот�
ветствуют независимым переменным (x, t), и b(i) - вектор смещения. (u) =
(exp[2u] 1)/(exp[2u] 1) - функция активации (гиперболический тангенс).
ũ(x, t) = f (m) f (m 1)
... f (1) (x1 , ... , xn , t) (5)
@2u
Время, с u @u
@t @t2
Входные данные 0.12 60.55 1158.97
Сглаженные (полиномы) 0.13 5.56 36.1 154.0
Сглаженные (ИНС, 104 эпохи) 54.5 4.48 24.04 94.6
X
Lu = ai (t, x)bi ci ! min
⇤
(8)
ai ;ci
i
|e
u(t, x) u(t, x)| ! min
⇤
: L0 u
e=0 (9)
ai ;ci
X
ff itness = (||L||2 ) 1
= (|| a⇤i (t, x)bi ci a⇤i_rhs (t, x)ci_lhs ||2 ) 1
(10)
i6=i_lhs
i=M
X
Q(Lj ) = ||Lj (~ui )|| (14)
i=1
( 11 , 1
2, ... , ! ( 01
1
n_eq )
01 01
1 , 2 , ... , n_eq )
( 21 , 2
2, ... , ! ( 02
2
n_eq )
02 02
1 , 2 , ... , n_eq )
pi ⇠ U (0,1) (17)
if pi < pxover then 01 1
i = ↵ · i + (1 ↵) · 2i
else 01 1 02
i = i, i = i
2
1 @u @ 2 u @u
↵ +↵ 2 = (18)
r @r @r @t
Полученные по экспериментальным данным (10 независимых эксперимен�
тов) уравнения теплопроводности без конвекции можно описать при помощи со�
отношения (19), что соотносится с ожидаемыми значениями параметров. Неточ�
31
2
Уровень шума 1 @u
r @r
@ u
@r2
@u
@t C
0 (1.5 ± ✏) · 10 7
(1.54 ± ✏) · 10 7
1 ✏
0.1 (1.51 ± ✏) · 10 7
(1.53 ± ✏) · 10 7
1 ✏
0.3 (1.4 ± 0.3) · 10 7
(1.5 ± 0.21) · 10 7
1 0.0023 ± 0.005
0.5 (1.45 ± 0.5) · 10 7
(1.5 ± 0.21) · 10 7
1 0.05 ± 0.026
0.7 (1.4 ± 0.7) · 10 7
(1.3 ± 0.4) · 10 7
1 0.1 ± 0.053
1 (1.3 ± 0.3) · 10 7
(1.1 ± 0.7) · 10 7
1 0.3 ± 0.1
Таблица 3 � Полученные коэффициенты перед слагаемыми уравнения
теплопроводности в цилиндрических координатах. Коэффициенты
нормализованы так, что перед слагаемым с @u @t коэффициент - единица. C -
соответствует свободному слагаемому.
2
8 1 @u 8@ u @u
(9.4 ± 0.11) · 10 + (9.423 ± 0.04) · 10 + ✏ ± 0.01 · 10 8
= (19)
r @r @r2 @t
Synopsis
among other things, by the number of known variational principles and dynamic
analogies, and therefore the range of models that can be obtained by these methods
is limited. Modern methods for determining the structure of a model in the form
of differential equations from data are characterized by increased flexibility and the
range of considered model structures in the form of differential equations (both
ordinary and partial derivatives) and their systems, however, the need to specify
a significant number of parameters in conjunction with a set of restrictions to the
structure of the desired differential equation often lead to the problem of searching
in a high-dimensional space containing all possible structures of the differential
equation model. The discovery of differential equations (SINDy, PDE-Net, etc.) is
largely limited in its uses to applied problems by the lack of universality of the
corresponding tools.
The high computational complexity of the naive approach, in which all possible
structures of differential equations composed of a limited set of elementary functions
are considered to describe the process under study, leads to the need to use
brute-force elimination methods. In the problem of symbolic regression, which also
involves the construction of symbolic models of processes, albeit in the form of
algebraic expressions, a common approach for reducing the search space is genetic
algorithms that optimize the expression as a computation graph. When applying
genetic algorithms to the problem of finding the structure of a calculation graph for
differential equations, optimization of the graph corresponding to the equation in the
absence of restrictions on the structure leads to overfitting, which in the model space
corresponds to a cumbersome equation that cannot be interpreted by an expert. The
task of obtaining a calculation graph and its parameters (for example, sometimes it
is convenient to consider the nodes of the graph as parameterized functions to reduce
the dimension of the problem, and it is also convenient to separately consider the
numerical coefficients in front of the terms as node parameters) we will call the
problem of learning a model in the form of differential equations. This study is
devoted to solving the problem of learning from model data in the form of differential
equations with an unknown structure and uncertain coefficients, where the desired
equation structure is learned using an evolutionary optimization algorithm in the
space of elementary operations (for example, differentiation operations with respect
to a given variable), which has a smaller dimension than the classical space of all
39
possible terms, while due to the developed optimization algorithm, retraining of the
equation structure is avoided to allow expert interpretation of the process.
Object of the research - machine learning models in the form of differential
equations with unknown structure and coefficients.
Subject of the research are methods of training models on data in the form
of differential equations with unknown structure and coefficients.
Research objective of the work is to improve the quality 2 of equation
structure and coefficient detection, using machine learning models in the form
of differential equations by conducting the optimization in wide search space
(which consists of combinations of tokens - elementary operations, for example,
differentiation operations to a given order or functions from the grid) with the
development of an evolutionary algorithm for effective searching for the model’s
structure in a selected high-dimensional token space and using physics-based neural
networks (PINNs) to compute the model’s fitness function.
To achieve the set goal, it was necessary to solve the following
Research tasks::
1. Justify the requirements and direction of research based on an analytical
review of modern methods for teaching the structure and coefficients of
models in the form of differential equations.
2. Develop a method and algorithm for training a model in the form of a
differential equation (both ordinary and partial derivatives) corresponding
to the observed state of the dynamic system.
3. Develop a method and algorithm for training a model in the form of a
system of differential equations (both ordinary and partial derivatives),
based on multiobjective optimization.
4. Validate the developed algorithms on experimental studies of their quality
on benchmarks recognized by the international community, and on the basis
of the observed states of dynamic systems describing real data, as well as
comparisons with the closest analogues.
1. The method and the algorithm that implements it for training a
model in the form of differential equations with unknown structure and
coefficients based on evolutionary optimization algorithms and the method
2
The quality of measurement is assessed with metrics of obtained structures and coefficients accuracy, and the
robustness of obtaining the structures and the coefficients on noisy data. The assessment of the quality of the DE
structure detection is held with the Structural Hamming distance (SHD) between string representations.
40
[6], the author analyzed the applicability of the fitness function based on methods for
automatically solving differential equations. The author has prepared the algorithmic
and software side (interface) for using a complex for solving differential equations
based on optimization methods. For work [7], the author developed a method for
training models in the form of single differential equations based on multicriteria
evolutionary optimization, and conducted experiments validating the effectiveness
of the method. In preparing the work [9], the author participated in the development
of the method and the algorithm that implements it for training models in the form
of systems of differential equations from data based on multicriteria optimization.
In [10], the author prepared a block of methods for training models in the form of
differential equations within the framework of computationally efficient construction
of machine learning models based on evolutionary algorithms. For the research in
article [12], the author prepared a software package and performed a series of data
processing experiments for use in an algorithm for training models in the form of
differential equations.
Publications. The main results on the topic of the dissertation are presented
in 11 publications, of which 7 � in abstracts of reports [1�7] and 4 in journals
indexed in the SCOPUS system: [8�11].
CONTENT
The second mode of operation involves obtaining new equations from data, or
determining the parameters of already known physical laws (the inverse problem),
including unknown coefficients in the form of real numbers.
The second chapter is devoted to the developed method for describing
dynamic systems using data-driven models in the form of differential equations:
ordinary differential equations in the case of one-dimensional data (time series),
or partial differential equations when operating multidimensional space-time data
data. For the system under study, it is assumed that the unknown dynamics in the
⌦ region are determined in analytical form using a certain relation (22) containing
derivatives of the modeled dependent variable u with respect to the variables t (time)
and with respect to variables x1 , ... xk - spatial variables that make up the vector
x. We can generalize the statement by establishing the assumption that the order of
partial derivatives is unknown. Next, we will consider that derivatives of high orders
are determined with a significant error, therefore the developed approach is best
applicable to differential equations of orders no higher than 3. By Gu we mean the
operator of initial/boundary conditions defined along the edges (·) of the simulated
region ⌦ on the time interval [0, T ].
8
<Lu = F (u, @u @u
, , ..., @u @ 2 u @ 2 u 2
... , @@tu2 , ... ) = 0;
@x1 @x2 @t , @x21 , @x22 ,
(22)
:Gu = 0, u 2 (⌦
⇥[0, T ]);
According to this approach, the (x, y) 2 ⌦ measurements of system states
obtained as input data represent a particular solution to an unknown differential
equation. Thus, the task is not to build a model of a separate phenomenon that
reflects a specific manifestation of the system, but to construct an interpretable
generalizing model.
Another requirement for the developed approach was flexibility regarding the
types of equations to be determined: approaches based on the LASSO operator
are designed to obtain an approximation of time dynamics using combinations of
specified functions and partial derivatives of the modeled variable. Such conditions
significantly limit the class of equations to be determined: among second-order
equations, for example, they are able to determine only parabolic equations of
the form u0t = F (x, t, u, u0x , ...). Neither elliptic nor hyperbolic equations can be
expressed as similar dependencies, which significantly limits the applicability of
46
the approach in describing systems in a static state, when u(x, t) = u(x), and ,
accordingly 0 = u0t = u00tt = ....
To construct differential equations within the framework of the proposed
approach, it is proposed to use an evolutionary optimization algorithm that solves
the problem of constructing the structure of the desired equation. The generative
algorithm assumes the representation of terms based on a set of elements - aij tokens,
the types of which are determined depending on a priori assumptions about the data.
For tokens that describe arbitrary functions of independent variables, the presence
of parameters aij (t, x) = aij (p1 , ... , pN , t, x)) is allowed. For tokens belonging to
@u @ 2 u
the set of derivatives of the function {(u, @x @u @u
,
1 @x2
, ..., @t , @x21 , ...}, the notation tj is
introduced.To avoid physically unfounded structures of candidate solutions to the
optimization problem, admissible forms equations considered within the framework
of the algorithm are limited to linear combinations of terms composed of similar
elementary functions, as presented in the formula (23).If the equation contains
parametric tokens, for example, a polynomial function of coordinates, the problem
is expanded to search optimal parameter values.To specify a heterogeneity of the
form f (x, t) ⌘ const, the free term bbias 2 R is used.
Obviously, with such a formulation of the problem of constructing differential
equations, researchers must determine in advance the set of tokens from which
differential equations will be composed. The problem of expanding the functionality
of the method to cases where not all elementary functions can be determined a
priori can be solved either using a neural network approximation or in the form of
algebraic expressions in closed form.
P Q
L0 u = i ai (t, x)ci bbias = 0 , ci = j tj ,
(23)
ai (t, x) = a0i (t, x) · bi , b 2 R
Before executing the main part of the evolutionary algorithm, the data
preparation procedure is launched. The main task of the initial stage of the equation
definition process is to compute derivative tensors to represent the corresponding
tokens in the context of the structure of the differential equation. In some cases, it
is possible to determine in the system not only the value of the modeled quantity,
but also its derivatives, which makes such a data preparation procedure redundant.
However, in the general case, we assume that only measurements of the state of the
system are available for research, and derivatives must be calculated based on them.
47
Due to the fact that the calculation of derivatives based on function values
on a grid is a noise-unstable operation, while measurements of the state of real
physical systems are often characterized by significant errors, a necessary component
of the procedure is the smoothing operation and differentiation methods adapted to
the conditions of the problem. If the values of the derivatives are not accurately
determined, the algorithm may not determine the correct equation governing the
process: in cases of low values of the noise component in the data, a similar effect
manifests itself in deviations of the coefficient values from the expected ones, while
in the case of strong noise, the evolutionary algorithm converges to equations with
incorrect sets terms.
Four alternative approaches were considered as tools for numerical
differentiation of input data:
– finite difference schemes;
– approximation of a function in the interval containing the point for which
the derivative is taken using a polynomial (Chebyshev polynomials) and
analytical calculation of the derivative (Savitsky-Golay filtering);
– spectral methods for calculating derivatives;
– automatic differentiation of the neural network used to approximate the
data.
The operation of reducing noise in the input data is performed using kernel
smoothing. Despite the fact that the application of such methods to data violates
their structure, the assumption of smoothness does not contradict the expected
structure of data obtained based on measurements of the state of physical (primarily
hydrometeorological) data. A smoothing operation is applied for each time slice t
based on the convolution represented by the equation (24) given by the Gaussian
function (25). In this relation, s is the point for which smoothing is carried out, s0
is the point used for smoothing, is the parameter of the Gaussian kernel.
Z
ũ(s, t) = K (s s0 )u(s0 )ds0 ; (24)
2
0 1 1 X
K (s s)= 2
exp ( (s s0 )i ); (25)
2⇡ 2 2 i=1
48
ũ(x, t) = f (m) f (m 1)
... f (1) (x1 , ... , xn , t) (26)
Таблица 6 � Noise level (%) in the original function and its derivatives on noisy
and smoothed data
@2u
Time, s u @u
@t @t2
Input data 0.12 60.55 1158.97
Smoothed (polynomials) 0.13 5.56 36.1 154.0
Smoothed (ANN, 104 epochs) 54.5 4.48 24.04 94.6
From the results of the comparative experiment, it is noticeable that the use
of preprocessing based on neural networks makes it possible to obtain derivative
fields closer to the expected ones, but this requires significantly greater computing
resources.
For practical applications of the algorithm, it is recommended to use a
differentiation algorithm that uses a neural network to represent the data and then
uses finite difference circuits to calculate the derivatives. The intermediate neural
network makes a transition from a grid function corresponding to the data to a
continuous and smooth one on the search area, essentially representing in parametric
form a particular solution to the desired differential equation. This allows the use of
arbitrarily small steps between points in finite-difference schemes. Since the process
of representing data using neural networks is computationally expensive, in cases
where a significant level of noise in the input data is not expected, it is recommended
to use analytical differentiation of Chebyshev polynomials representing the data.
Further, the dissertation discusses the details of the implemented evolutionary
algorithm. The first aspect is the selection of the optimal encoding of the candidate
differential equation. In [2], a graph representation of equations was proposed that
respects assumptions about the structure of the equations. This approach allows the
use of graph optimization operators: to correspond to the structure of the equation
proposed in the formula (23), a graph-tree is used in the encoding. In it, leaf nodes
contain individual tokens, intermediate nodes contain a multiplication operator that
combines tokens into terms, and a root node containing an operator for summing
the resulting terms.
50
X
Lu = ai (t, x)bi ci ! min
⇤
(29)
ai ;ci
i
|e
u(t, x) u(t, x)| ! min
⇤
: L0 u
e=0 (30)
ai ;ci
The fitness function when solving the problem of minimizing the discrepancy
of the differential operator is introduced by the relation (31).
X
ff itness = (||L||2 ) 1
= (|| a⇤i (t, x)bi ci a⇤i_rhs (t, x)ci_lhs ||2 ) 1
(31)
i6=i_lhs
Least Absolute Shrinkage and Selection Operator (LASSO). Unlike other types of
regression, LASSO can reduce the number of non-zero elements of the coefficient
vector by assigning zero values to predictors that are not significant in approximating
the target variable.
The minimized functional used in the LASSO operator (33) is composed as
the sum of two terms. The first corresponds to the squared difference between the
weighted values of the terms on the right side of the equation and the value on the
left side, denoted as Ftarget , and the vector of predictions obtained as the inner
product of the feature matrix F and the vector of weights. ↵, and the second in the
L1 -norm of the weight vector taken with a constant sparsity :
The main disadvantage of the LASSO operator is its inability to obtain correct
coefficient values: its application requires the use of standardized data. Additional
linear regression on the detected active, that is, corresponding non-zero intermediate
weights, terms is performed to obtain the resulting actual coefficients of the equation.
The real-valued heterogeneity component bbias introduced in the relation (23) is
obtained as a bias from linear regression.
By the convergence of an evolutionary algorithm, in contrast to the general
case, we will understand not the filling of the population with matching candidate
equations (possibly non-optimal), as is often assumed in the theory of evolutionary
optimization, but the appearance in the population of an individual corresponding
to the differential equation with a minimum error in modeling the system. The issue
of convergence has been studied in detail for genetic algorithms with chromosomes of
finite length; we can generalize the encoding of an individual to a sequence of binary
values of length N equal to the number of all possible terms in the equation. The
value 1 corresponds to the presence of a term in the equation, and 0 corresponds to
its absence. At t ! inf the probability of obtaining the optimal equation tends to 1,
but this does not give accurate estimates of the algorithm’s running time; we can only
talk about the probabilities of obtaining the required candidates. Further analysis
is carried out from the point of view of a Markov chain, where populations created
by the algorithm are taken as states: the probability of obtaining a population
containing the optimal equation is considered. The resulting optimal equation cannot
be lost during evolution: due to the elitism operator, such a candidate cannot be
53
changed, and the minimum value of the error functional guarantees that it will not
be removed in the process of limiting the population size.
The possibility of obtaining an optimal differential equation is ensured by
a combination of exploration and exploitation (exploration/expolitation trade-off)
abilities of the algorithm: exploration ability is provided using the main mutation
operator, while operational ability (clarification of the structure of an individual
with a low value of the minimized functional) is provided by the mutation operator
, replacing the inactive terms of the equation, and the crossover operator.
The third chapter is devoted to the details of using multiobjective
optimization for problems of constructing models of dynamic systems in the form of
systems of differential equations.
In existing solutions, systems of ordinary differential equations and partial
differential equations are usually in vector form, that is, methods for finding a single
equation are applied to vector variables, or involve constructing each equation of
the system independently, sequentially approximating the time dynamics of each
component for each dependent variable. This approach limits the type and shape of
the resulting systems and cannot correspond to many real-world systems.
Let’s consider a system of k-dependent variables u = (u1 , ... , uk ), for which
we obtain a system of k - equations having the form:
8
>
< L1 (u) = 0
S(~u) = ... (34)
>
:
Lk (u) = 0
In the equation (34), a single operator Li 2 Eq represents the differential
equation of the system, reducible to the structure from the relation (23), Eq is
the set of all possible equations that can be obtained using this algorithm. Since
the relations in (34) are a system, it is assumed that all equations are executed
simultaneously. As in the case of the evolutionary definition of a single equation, the
optimization problem is posed from the point of view of minimizing the operator
discrepancy or the discrepancy between the solution of the proposed equation and
the input data.
The application of a multiobjective formulation of the problem makes it
possible to customize the detected system depending on the preferences of the
researcher. For example, for some applications, the accuracy of the data reproduction
54
is less important than the complexity of the equation, or a high degree of data noise
is expected, and in certain structures it is necessary to isolate the part describing
the main dynamics of the system. For other processes, the emphasis is on the
quality of the prediction based on the solution of the differential equation, and the
understandability of the model is not so important. We will define the first group of
criteria as “quality metrics”. For a given equation L, a similar quality metric is the
data reproduction norm, which is represented in the form (35).
i=M
X
Q(Lj ) = ||Lj (~ui )|| (35)
i=1
the possible stochastic nature of the processes or noise present in the measurements
limits the achievable quality. Therefore, you can conduct a test run of the equation
search algorithm to obtain approximately the best quality solution. Next, to begin
the evolutionary search, we generate a population of solutions by finding systems
with random sparsity constants, and divide the search space into sections based on
weight vectors. Using the weight mechanism, the algorithm preserves diversity in
the population and distributes candidate solutions across the Pareto set.
In the evolutionary multiobjective optimization algorithm, traditional methods
of variability are used for the values of metaparameters that determine the structure
of the equations: mutation and recombination (crossover) operators. The mutation
operator involves changing the gene containing the equation construction parameter
by an increment from the normal distribution N (0, ) with a predetermined
probability pmut 2 (0, 1), as in Eq. (37).
( 11 , 1
2, ... , 1
! ( 01
n_eq )
01 01
1 , 2 , ... , n_eq )
( 21 , 2
2, ... , 2
! ( 02
n_eq )
02 02
1 , 2 , ... , n_eq )
pi ⇠ U (0,1) (38)
if pi < pxover then 01 1
i = ↵ · i + (1 ↵) · 2i
else 01 1 02
i = i, i = i
2
The selection of parents for crossover is carried out for each region of the
objective function space defined by the weight vectors. With a given probability,
to maintain diversity in parental choice, we can select an individual outside the
56
It can be noted that when processing noisy data, a single-criteria algorithm has
problems with determining the correct equations of a complex structure. The issue
of improving the quality of determining differential equations using a multiobjective
evolutionary algorithm is discussed in the next chapter.
Particular attention in the study was given to testing the approach to searching
for equations that describe real systems. An experiment was carried out to determine
the equation describing the temperature dynamics in the medium around the heater
58
1 @u @ 2 u @u
↵ +↵ 2 = (39)
r @r @r @t
2
Noise level 1 @u
r @r
@ u
@r2
@u
@t C
0 (1.5 ± ✏) · 10 7
(1.54 ± ✏) · 10 7
1 ✏
0.1 (1.51 ± ✏) · 10 7
(1.53 ± ✏) · 10 7
1 ✏
0.3 (1.4 ± 0.3) · 10 7
(1.5 ± 0.21) · 10 7
1 0.0023 ± 0.005
0.5 (1.45 ± 0.5) · 10 7
(1.5 ± 0.21) · 10 7
1 0.05 ± 0.026
0.7 (1.4 ± 0.7) · 10 7
(1.3 ± 0.4) · 10 7
1 0.1 ± 0.053
1 (1.3 ± 0.3) · 10 7
(1.1 ± 0.7) · 10 7
1 0.3 ± 0.1
Таблица 8 � Obtained coefficients in front of the terms of the heat conduction
equation in cylindrical coordinates. The coefficients are normalized so that before
the term with @u@t the coefficient is one. C - corresponds to a free term.
2
8 1 @u 8@ u @u
(9.4 ± 0.11) · 10 + (9.423 ± 0.04) · 10 + ✏ ± 0.01 · 10 8
= (40)
r @r @r2 @t
with the encoding of an individual to solve the problem of searching for a differential
equations system, the chromosome, in addition to the structure of individual
equations, includes parameters that determine the behavior of the equation graph
generation algorithm. The ability of the algorithm to evaluate candidate equations
not only in terms of the quality of reproduction of the physical process, but also
on the basis of the complexity of their structure, makes it possible to expand the
diversity of the population in the process of evolution. This idea can be illustrated by
the fact that simple equations with a complexity of “2 active tokens”, which do not
fully describe the dynamics of the process, but represent only part of the dynamics,
will remain in the population, and can participate in the search for an equation as
the simplest meaningful models. Thus, the algorithm has the ability to determine
relatively simple equations that do not determine the noise component of the data.
The results of experiments comparing the efficiency of single-objective and
multi-objective searches for partial differential equations with the same computing
resources are shown in the figure (6). Based on the data obtained, it can be
determined that even in the problem of searching for a single equation, the
multiobjective formulation of the optimization problem has a number of advantages,
providing faster and more reliable convergence, but they require an expert solution
to select the desired equation among the set of Pareto-optimal candidates.
61
104
101 102
100 10 1
2
6 × 100 10
4
10
0
4 × 10 10 6
0 8
3 × 10 10
10
10
2
2 × 10 0 10
Single Objective Multi-Objective Single Objective Multi-Objective Single Objective Multi-Objective
а б в
As part of the research work, an analysis was carried out of the dependence of
the correctness of determining single differential equations using an evolutionary
algorithm for multiobjective data optimization on the level of data noise in
comparison with the SINDy approach based on sparse regression. Synthetic data
were selected representing the Van der Pol oscillator, the Lotka-Volterra system,
and the partial differential equation solution: Burgers and Korteweg-de Vries. An
example of experimental results for the Burgers equation is presented in the table
(10).
Таблица 10 � Correct terms inclusion statistics for the Burgers’ equation and of
the corresponding coefficients, paired with obtained by SINDy for the specified
noise levels. The abbreviation g.t. denotes ground truth.
EPDE
SINDy
NL, % u0t u00
xx uu0x
P, % b, µ ± 1.98 P, % b, µ ± 1.98 P, % b, µ ± 1.98 ист. u0t = 0.1u00
xx uu0x
0 100 1.001 ± 0 100 0.106 ± 0.0 100 0.997 ± 0.0 u0t = 0.1u00
xx 1.001uu0x
1 90 0.830 ± 0.218 60 0.053 ± 0.002 10 0.980 ± 0.0 u0t = 0.248u0x 0.292uu0x
2.5 80 0.599 ± 0.158 50 0.018 ± 0.0 0 u0t = 0.265u0x 0.229uu0x
5 100 0.674 ± 0.139 20 0.012 ± 0.0 0 u0t = 0.001uu000 xxx 0.825uu0x
10 100 0.674 ± 0.103 40 0.004 ± 0.0 0 u0t = 0.133uu00
xx
defined, the term uu0x is replaced with constructions from other tokens, which leads
to erroneous coefficient values.
Even with such a fairly low noise threshold, the algorithm shows itself better
than wSINDy in terms of convergence: EPDE made it possible to obtain correct
equation structures in a number of runs, while the method based on sparse regression
loses the ability to determine the equation in experiments even with noise levels
about 1%. However, the price of such an improvement in the quality of obtaining
equations is increased computational complexity. During evolutionary search, the
time spent increases by more than 102 times (on the order of minutes on the data
set used) when filtering Savitsky-Golay based on Chebyshev polynomials for data
preparation, and by approximately 103 (up to orders of tens of minutes) using ANN.
The conclusion contains the main results of the work, which are as follows:
When performing the dissertation research, a solution was proposed to existing
problems and contradictions in the field of model training in the forms of differential
equations. The method based on evolutionary optimization does not impose strict
restrictions on the structures of the equations being determined and, accordingly,
can be applied to a wider class of problems.
As a result of the dissertation research:
1. The current state of the field of methods for obtaining the structure and
coefficients of models in the form of differential equations has been studied
and a hypothesis has been put forward that the problem of symbolic
regression using an extended library of terms can be replaced by a more
flexible evolutionary algorithm;
2. A method and an algorithm that implements it for training a model in
the form of differential equations with unknown structure and coefficients
have been developed based on evolutionary optimization algorithms and
a method for numerically solving initial-boundary value problems using
physically based neural networks (PINN) to calculate the fitness function.
3. A method and an algorithm that implements it have been developed for
training models in the form of systems of ordinary differential equations
and partial differential equations based on a multi-criteria evolutionary
optimization algorithm with independent learning of the structure and
coefficients of the model for each of the system equations, taking into
account the possibility of specifying accuracy criteria relative to the
63
Введение
@ nu @u @ 2 u
= N̂ (u, , 2 , ...), (1.1)
@tn @x @ x
79
8
<upred (t ) = u(t )
0 0
(1.2)
:upred (tj+1 ) = upred (tj ) + j N✓ (u
pred
(tj ), j ), j = 0, 1, ... ,
@u @u
F (u, , , ...) = 0 (2.1)
@t @x1
Для исследуемой системы предполагается, что неизвестная динамика в
области ⌦ определяется в аналитической форме при помощи некоторого со�
отношения (2.2). Помимо самого дифференциального уравнения выделяются
необходимые начальные/граничные условия, заданные в соответствии с поряд�
ком уравнения по зависимым переменным.
8
<Lu = F (u, @u @u @u @ 2 u @ 2 u 2
... , @@tu2 , ... ) = 0;
@x1 , @x2 , ..., @t , @x21 , @x22 ,
(2.2)
:Gu = 0, u 2 (⌦)
⇥ [0, T ];
Согласно такому подходу, полученные в качестве входных данных (x, y) 2
⌦ - наблюдения состояний системы представляют собой частное решение неиз�
вестного дифференциального уравнения. Таким образом, ставится задача не
построения инструмента для прогноза отдельного явления, отражающего кон�
кретное проявление системы, а интерпретируемой обобщающей модели. Ещё
одним требованием к разрабатываемому подходу была поставлена гибкость от�
носительно типов выводимых уравнений: существующие подходы рассчитаны
на получение аппроксимации временной динамики (первой производной по вре�
мени) при помощи различных сочетаний заданных функций и частных произ�
водных моделируемой переменной. Подобные условия существенно ограничива�
86
Pn_terms Q
L0 u = ai (x)c i (x) + b bias = 0 , c i (x) = j tij (x), tij 2 Findep
i=0
Q (2.3)
ai (t, x) = a0i (x) ⇤ bi , b 2 R, a0i (x) = j a0ij (x), a0ij 2 F
dv vmax v
= (2.4)
dt v + Km
dv dv
v+ Km Vmax v = 0 (2.5)
dt dt
При инициализации алгоритма случайным образом генерируются графы
начальной популяции кандидатных дифференциальных уравнений в соответ�
ствии со следующей логикой: каждое уравнение должно содержать минимум
одно слагаемое, содержащее производную, а также избегается генерация повто�
ряющихся слагаемых в уравнениях. Для токенов, содержащих оптимизируемые
параметры составляется случайный набор исходных значений в рамках заранее
определённых интервалов. Для каждого созданного дифференциального урав�
нения одно слагаемое определяется в качестве "левой части уравнения таким
P
образом структура принимает вид i, i6=i_rps ai (t, x)ci = ai_rps (t, x)ci_rps . Со�
относимое с левой частью дифференциального уравнения слагаемое должно
содержать хотя бы одну производную во избежание получения алгебраических
уравнений.
Отдельное исследование, отраженное в диссертационной работе, было по�
священо подбору функции приспособленности. Обучение модели в форме диф�
ференциального уравнения в эволюционном алгоритме производится при помо�
90
2 3 2 3 2 3
1 u (t0 , x0 ) ux (t0 , x0 )
6 .. 7 6 .. 7 6 .. 7
6 . 7 6 . 7 6 . 7
6 7 6 7 6 7
f1 = 6 7 6 7 6
6 1 7 ; f2 = 6 u (ti , xj ) 7 ; f3 = 6 ux (ti , xj )
7 ; ...
7 (2.6)
6 .. 7 6 .. 7 6 .. 7
4 . 5 4 . 5 4 . 5
1 u (tm , xn ) ux (tm , xn )
2 3
u(t0 ,x0 ) ⇤ ux (t0 ,x0 )
6 .. 7
6 . 7
6 7
Fk = 6
0
6 u(t i ,x j ) ⇤ u x (t i ,x j ) 7 = f2 f3 ;
7 (2.7)
6 .. 7
4 . 5
u(tm ,xn ) ⇤ ux (tm ,xn )
Минимизируемый функционал уравнения регрессии LASSO (2.8) прини�
мает форму суммы двух слагаемых. Первое соответствует квадрату ошибки
между векторами целевой переменной, обозначаемой как Ftarget , и вектором
предсказаний, полученным через произведение матрицы признаков F и вектора
весов ↵, а второе - L1 -норма вектора весов, взятая с положительной константой
разреженности , регуляризующая систему:
X
L= a⇤i (t, x)bi ci ! min
⇤
(2.9)
ai ;ci
i
|e
u(t, x) u(t, x)| ! min
⇤
: L0 u
e=0 (2.10)
ai ;ci
X
ff itness = (||L||2 ) 1
= (|| (a⇤i (t, x)bi ci ) a⇤i_rhs (t, x)ci_rhs ||2 ) 1
(2.11)
i6=i_rhs
93
(||Le
u(t, x) f ||i + ||be
u(t, x) g||j ) ! min (2.13)
e
u
@e
u(t, x) u e(t + t, x) ue(t t, x)
= (2.14)
@t 2⇤ t
В поставленной задаче нам необходимо определить функцию u e(x, x), соот�
ветствующую минимальному значению функционала на соотношении 2.13. Для
этих целей используется параметризованная функция u e(t, x, ⇥) : Rk+1 ! R,
где ⇥ = (✓1 , ..., ✓n_params ) � вектор параметров для этого выбирается конкрет�
95
(||Le
u(t, x, ⇥) f ||i + ||be
u(t, x, ⇥) g||) ! min (2.15)
⇥
,i u ,i (u u) ,i u 2 i hC
ku0xi kp k kp + ku0xi kp + , (2.18)
2 i 2 i 2 i h 2
2
0 1 1 X
K (s s)= 2
exp ( 2 (s s0 )i ). (2.20)
2⇡ 2 i=1
Фильтр Савицкого-Голая
M
X
e(x) =
u bi (x)P i (x); (2.21)
i=0
N
X M
X
b = arg min (u(xj ) b0i P i (xj ))2 . (2.22)
b0 j=1 i=0
bm/2c
X
Tm (x) = 2k 2
Cm (x 1)k xm 2k
. (2.23)
k=0
bm/2c
X
Um (x) = 2k+1 2
Cm+1 (x 1)k xm 2k
(2.24)
k=0
102
ũ(x, t) = f (m) f (m 1)
... f (1) (x1 , ... , xn , t) (2.25)
@2u
Время, с u @u
@t @t2
Входные данные 0.12 60.55 1158.97
Сглаженные (полиномы) 0.13 5.56 36.1 154.0
Сглаженные (ИНС, 104 эпохи) 54.5 4.48 24.04 94.6
104
Спектральная производная
N 1
1 X nk
ûk = un exp( 2⇡i ). (2.27)
N n=0 N
N
X1 nk
un = ûk exp(2⇡i ). (2.28)
N
k=0
106
X ✓ ◆
0 2⇡i nk nk
u (tk ) = k ûn exp(2⇡i ) ûN k exp( 2⇡i ) . (2.29)
N 1
T N N
0<k< 2
1
G(!) = , (2.30)
1 + (!/!cutof f )2s
где ! � частота, !cutof f � частота среза, обозначающая граничную ча�
стоту, с которой начинается затухание, и s � параметр крутизны фильтра.
Полученное выражение получается введением в ряд штрафующих множителей
G(!) = G(k/N ), представляющих собой производные:
X ✓ ◆
0 2⇡i nk nk
u (tk ) = G(k/N ) k ûn exp(2⇡i ) ûN k exp( 2⇡i ) (2.31)
N 1
T N N
0<k< 2
@2u
@t @x2 |t · x| являются эквивалентными: оператор регуляризации переводит
s2 : | @u |
их в искомую форму.
Популяцию эволюционного алгоритма определим как x, состоящую из осо�
бей x = {x1 , x2 , ..., xn_pop |xi 2 S} x 2 X , X - множество всех возможных
111
X
P (x, y) = Pc (x, u)Pm (u, y) (2.33)
u2X
0.055
0.050
fitness function value
0.045
0.040
0.035
0.030
0.025
0.020
0.015
0 10 25 50 80 120
epoch
i=M
X
S̄ = arg min ||S(~ui )|| (3.4)
S2Eq k
i=1
8
XX <n , if tij = @nu
0 @ n1 x1 ... @ nk xd im , n 1
C(L u) = compl(tij ); compl(ti j) =
:0.5 , otherwise
j i
(3.5)
Введённые критерии C(L0 u) и Q(L0 u) для каждого из k уравнений систе�
мы образуют пространство для оптимизации. Так как приложение предполага�
ет использование эволюционного алгоритма многокритериальной оптимизации,
вводится отношение подчинения между кандидатными решениями оптимизаци�
онной задачи. Введём бинарное отношение на множестве созданных алгоритмом
систем, позволяющее определить предпочтительность одного решения над дру�
гим. Будем говорить, что кандидатное решение - система дифференциальных
уравнений S1 (u) доминирует по Парето над решением S2 (u) (обозначается как
S1 (u) S2 (u)), если для каждого i - индекса уравнения системы выполняется
Qi (S1 (u)) Qi (S2 (u)), и Ci (S1 (u)) Ci (S2 (u)), а также существует индекс j,
для которого Qj (S1 (u)) < Qj (S2 (u)) и/или Cj (S1 (u)) < Cj (S2 (u)). Подобное
отношение доминирования можно интерпретировать как факт того, что каж�
дое уравнение системы S1 (u) одновременно не хуже описывает динамическую
систему, и представлено при помощи более простой структуры.
Очевидно, что подобное отношение вводит лишь частичный порядок: для
S1 (u) и S2 (u), таких, что существует множество индексов критериев I1 , по ко�
торым система S1 является предпочительной, и I2 , по которым предпочтения
отдаются системе S2 , нельзя сказать, что одно кандидатное решение доминиру�
ет над другим. Множество кандидатных решений называется недоминируемым,
если для любых двух систем дифференциальных уравнений из этого множества
нельзя сказать, что одно находится в подчинённом отношении перед другим.
Кандидатное решение S0 (u) называется оптимальным (Парето-оптималь�
ным), если не существует иных решений S 0 (u), для которых выполняется
S 0 (u) S0 (u). Целью алгоритма является определение Парето-оптимального
недоминируемого множества кандидатных систем дифференциальных уравне�
ний: 8S 0 (u)9Si (u), Si (u) S 0 (u).
117
( 11 , 1
2, ... , ! ( 01
1
n_eq )
01 01
1 , 2 , ... , n_eq )
( 21 , 2
2, ... , ! ( 02
2
n_eq )
02 02
1 , 2 , ... , n_eq )
pi ⇠ U (0,1) (3.8)
if pi < pxover then 01 1
i = ↵ ⇤ i + (1 ↵) ⇤ 2i
else 01 1 02
i = i, i = i
2
dx
x sin t + cos t = 1 (4.1)
dt
b
Рисунок 4.2 � Предсказание состояния системы на основе полученных по
данным уравнений: левый график - корректное уравнение, правый график -
“переобученное” уравнение.
1 1 1
u = 10000 sin ( xy(1 x)(1 y))2 (4.8)
100 10 10
132
@u 1 1 1
= 1000 sin ( xy(1 x)(1 y))2 (4.9)
@t 100 10 10
@ 2u @ 2u @ 2u
= ↵ 1 + ↵ 2 (4.10)
@t2 @x2 @y 2
Алгоритм определения уравнения был настроен следующим образом: в ка�
честве механизма подготовки данных использовалась аппроксимации данных
при помощи полносвязной нейронной сети (4 скрытых слоя, содержащих 256,
64, 64, 1024 нейронов, и использующих гиперболический тангенс в качестве
функции активации, обучение на 10000 эпох). Последующее вычисление тен�
зоров производных исполнялось при помощи метода конечных разностей (цен�
тральная схема, шаг сетки - 0.01 ⇤ , где - шаг сетки, на которой были
поданы входные данные, по координатной оси дифференцирования). Поряд�
ки производных ограничивались 3-им по всем координатам. В эксперименте
был использован эволюционный алгоритм со следующими параметрами: число
эпох поиска уравнения nepochs = 25, размер популяции кандидатных решений
np op = 10, вероятность уравнения подвергнуться мутации - pmut = 0.2, до�
ля популяции, подвергаемая кроссоверу, nparent = 0.4, вероятности слагаемых
уравнения подвергнуться мутации внутри мутирующих кандидатных решений
и вероятность обмена между парой слагаемых в рамках кроссовера, составили
pterm_mut = pterm_crossover = 0.3. Для оценки приспособленности использовался
подход на основе
Результаты эксперимента таковы: метод успешно обнаруживает структу�
ру уравнения для интервала уровней шума до 7.5 %, что соответствует стан�
дартному отклонению гауссова шума в интервале [0, 0.2], умноженного на норму
поля во временном интервале. Ошибки весов в этом интервале незначительны,
как показано в Tab. 13. При более высоких уровнях шума (в интервале от 7,5%
до 10%) алгоритм обнаруживает дополнительные слагаемые, отсутствующие в
исходном уравнении, что приводит как к искажению структуры уравнения, так
и к некорректному вычислению весов. Наконец, при высоких уровнях (от 10%)
шума предлагаемый алгоритм теряет способность определять даже элементы
желаемой структуры уравнения, сходясь к структурам, описывающим шум в
данных.
133
@u @u @ 2u
+u =⌫ 2 =0 (4.11)
@t @x @x
@u @u @ 3 u
+ 6u + =0 (4.12)
@t @x @x3
p
c c
u= sech2 (x ct x0 ) (4.13)
2 2
Применение структуры к решению уравнения из соотношения 4.13, оце�
ниваемому на регулярной сетке, не позволило определить исходное уравнение
по данным. Неправильно обнаруженная модель возникает из-за более простых
случайных форм данных, таких как ut = cux , также обладающих низкими зна�
чениями ошибки уравнения. При использовании однокритериального подхода,
более низкая сложность этого уравнения приводит к большей вероятности его
открытия, чем для полного уравнения КдФ. Кроме того, отсутствие в структу�
ре производных высокого порядка, которые вычисляются с большей численной
ошибкой, чем производные первого и второго порядка, могут привести к более
низким значениям функционала ошибки, чем в корректном уравнении. Этот
эксперимент показывает, что однокритериальный алгоритм имеет склонность
134
1 @u @ 2 u @u
↵ +↵ 2 = (4.14)
r @r @r @t
2
Уровень шума 1 @u
r @r
@ u
@r2
@u
@t C
0 (1.5 ± ✏) · 10 7
(1.54 ± ✏) · 10 7
1 ✏
0.1 (1.51 ± ✏) · 10 7
(1.53 ± ✏) · 10 7
1 ✏
0.3 (1.4 ± 0.3) · 10 7
(1.5 ± 0.21) · 10 7
1 0.0023 ± 0.005
0.5 (1.45 ± 0.5) · 10 7
(1.5 ± 0.21) · 10 7
1 0.05 ± 0.026
0.7 (1.4 ± 0.7) · 10 7
(1.3 ± 0.4) · 10 7
1 0.1 ± 0.053
1 (1.3 ± 0.3) · 10 7
(1.1 ± 0.7) · 10 7
1 0.3 ± 0.1
Таблица 15 � Полученные коэффициенты перед слагаемыми уравнения
теплопроводности в цилиндрических координатах. Коэффициенты
нормализованы так, что перед слагаемым с @u @t коэффициент - единица. C -
соответствует свободному слагаемому.
2
8 1 @u 8@ u @u
(9.4 ± 0.11) · 10 + (9.423 ± 0.04) · 10 + ✏ ± 0.01 · 10 8
= (4.15)
r @r @r2 @t
1 @u @ 2u @u @u
4.1 · 10 8
· + 5.8 · 10 9
· 2 + v2 = (4.16)
r @r @r @r @t
136
8
>
> dx
= · (y x);
>
< dt
dy
dt = x · (⇢ z) y; (4.18)
>
>
>
: dz = xy
dt z;
104
101 102
100 10 1
2
6 × 100 10
4
10
0
4 × 10 10 6
0 8
3 × 10 10
10
10
2
2 × 10 0 10
Single Objective Multi-Objective Single Objective Multi-Objective Single Objective Multi-Objective
а б в
EPDE
SINDy
NL, % u0t uu0x u000
xxx
P, % b, µ ± 1.98 P b, µ ± 1.98 P b, µ ± 1.98 g.t. u0t + u000 0
xxx + 6uux = 0
0 100 1.001 ± 0.0 100 6.002 ± 0.0 100 1.06 ± 0.0 u0t + 0.992u000 0
xxx + 5.967uux = 0
0.5 80 0.913 ± 0.032 60 5.914 ± 2.59 70 1.31 ± 0.57 u0t 0.906u0x = 0
1 40 0.437 ± 0.156 0 0 u0t 0.816u0x = 0
2.5 100 0.36 ± 0.0 20 1.0 ± 0.0 20 0.01 ± 0.0 u0t 0.004u000
xxx 0.844u0x = 0
5 60 0.01 ± 2.13 · 10 5 80 1.0 ± 0.0 0 u0t 0.003u000
xxx 1.859uu0x + N [u] = 0
ного метода. Было показано, что при обучении метод позволяет получить кор�
ректное дифференциальное уравнение на входных данных с уровнями шума до
5-10 %, в то время как конкурирующие методы теряют возможность получить
уравнение на данных с уровнями шума около 1-2 %, т.к. точность получения
структуры составлет менее 50 %.
143
Заключение
ДУ - Дифференциальное уравнение
УрЧП - Дифференциальное уравнение в частных производных
ОДУ - Обыкновенное дифференциальное уравнение
ИНС - Искусственная нейронная сеть
LASSO - Least absolute shrinkage and selection operator, оператор наимень�
шего абсолютного сжатия и отбора
MAPE - Mean Absolute Percentage Error, cредняя абсолютная ошибка в
процентах
MOEA/DD - Evolutionary Many-Objective Optimization Algorithm Based
on Dominance and Decomposition
Уравнение КдВ - Уравнение Кортевега-де Фриза
147
Список литературы
20. Brunton S. L., Proctor J. L., Kutz J. N. Discovering governing equations from
data by sparse identification of nonlinear dynamical systems // Proceedings of
the National Academy of Sciences. � 2016.
21. Loiseau J. C., Brunton S. L. Constrained sparse Galerkin regression systems //
Journal of Fluid Mechanics. � 2018. � Т. 838. � С. 42�67.
22. Kaiser E., Kutz J. N., Brunton S. L. Sparse identification of nonlinear
dynamics for model predictive control in the low-data limit. � 2017. � URL:
https://arxiv.org/abs/1711.05501.
23. Sparse Identification of Nonlinear Dynamics for Rapid Model Recovery / M.
Quade [и др.]. � 2018. � URL: https://arxiv.org/abs/1803.00894v2.
24. Hirsh S. M., Barajas-Solano D. A., Kutz J. N. Sparsifying priors for Bayesian
uncertainty quantification in model discovery // Royal Society Open Science. �
2022. � Т. 9, № 2. � С. 211823.
25. Park J.-H., Dunson D. B. Bayesian generalized product partition model //
Statistica Sinica. � 2010. � С. 1203�1226.
26. Tran G., Ward R. Exact recovery of chaotic systems from highly corrupted
data // Multiscale Modeling and Simulation. � 2017. � Т. 15. � С. 1108�
1129.
27. Raissi M. Deep hidden physics models: Deep learning of nonlinear partial
differential equations. � 2018. � URL: https://arxiv.org/abs/1801.06637.
28. Berg J., Nystrom K. Data-driven discovery of PDEs in complex datasets. �
2018. � URL: https://arxiv.org/abs/1808.10788.
29. Berg J., Nystrom K. Neural network augmented inverse problems for PDEs. �
2017. � URL: https://arxiv.org/abs/1712.09685.
30. Pde-net: Learning pdes from data / Z. Long [и др.] // International Conference
on Machine Learning. � PMLR. 2018. � С. 3208�3216.
31. Long Z., Lu Y., Dong B. PDE-Net 2.0: Learning PDEs from data with a
numeric-symbolic hybrid deep network // Journal of Computational Physics. �
2019. � Т. 399. � С. 108925.
150
53. Alla A., Kutz J. N. Nonlinear model order reduction via dynamic mode
decomposition. � 2016. � URL: https://arxiv.org/abs/1602.05080.
54. Schmid P. J. Dynamic mode decomposition and its variants // Annual Review
of Fluid Mechanics. � 2022. � Т. 54. � С. 225�254.
55. Zhang Z. J., Duraisamy K. Machine learning methods for data-driven
turbulence modeling // 22nd AIAA Computational Fluid Dynamics
Conference. � 2015. � С. 2460.
56. Zhang Z., Singh A. New Approaches in Turbulence and Transition Modeling
Using Data-driven Techniques // AIAA Modeling and Simulation Technologies
Conference. � 2015.
57. Tracey B., Duraisamy K., Alonso J. Machine Learning Strategy to Assist
Turbulence Model Development // Proc. AIAA Scitech conference. � 2015.
58. Parish E., Duraisamy K. Quantification of Turbulence Modeling Uncertainties
Using Full Field Inversion // 15th AIAA Aviation Technology, Integration, and
Operations Conference. � 2015.
59. Hvatov A. Automated differential equation solver based on the parametric
approximation optimization // Mathematics. � 2023. � Т. 11, № 8. � С. 1787.
60. Ramm A., Smirnova A. On stable numerical differentiation // Mathematics
of computation. � 2001. � Т. 70, № 235. � С. 1131�1153.
61. Savitzky A., Golay M. J. Smoothing and differentiation of data by simplified
least squares procedures. // Analytical chemistry. � 1964. � Т. 36, № 8. �
С. 1627�1639.
62. Schmid M., Rath D., Diebold U. Why and how Savitzky–Golay filters should
be replaced // ACS Measurement Science Au. � 2022. � Т. 2, № 2. � С. 185�
196.
63. Johnson S. G. Notes on FFT-based differentiation // MIT Applied
Mathematics, Tech. Rep. � 2011.
64. Nix A. E., Vose M. D. Modeling genetic algorithms with Markov chains //
Annals of mathematics and artificial intelligence. � 1992. � Т. 5, № 1. �
С. 79�88.
153
Abstract
Data-driven methods provide model creation tools for systems where the appli-
cation of conventional analytical methods is restrained. The proposed method
involves the data-driven derivation of a partial di↵erential equation (PDE) for
process dynamics, helping process simulation and study. The paper describes
the methods that are used within the EPDE (Evolutionary Partial Di↵erential
Equations) partial di↵erential equation discovery framework [1]. The frame-
work involves a combination of evolutionary algorithms and sparse regression.
Such an approach is versatile compared to other commonly used data-driven
partial di↵erential derivation methods by making fewer assumptions about the
resulting equation. This paper highlights the algorithm features that allow data
processing with noise, which is similar to the algorithm’s real-world applications.
This paper is an extended version of the ICCS-2020 conference paper [2]
Keywords: data-driven modelling, PDE discovery, evolutionary algorithms,
sparse regression, spatial fields, physical measurement data
1. Introduction
2
162
60 2. Related work
3
163
4
164
equations.
95 Artificial neural networks provide a more versatile tool. This method is
based on the approximation of time derivative with combinations of spatial
derivatives and other functions. The ANN applications’ examples to the problem
of partial di↵erential equation discovery were presented in [8, 21, 22, 7, 6]. While
artificial neural networks can discover non-linear equations, they still rely on
100 approximating a determined term (time derivative of the first order), limiting
their flexibility.
3. Problem statement
The class of problems, which the described EPDE algorithm can solve, can
be summarized as follows: the process, which involves scalar field u, is occurring
105 in the area ⌦ and is governed by the partial di↵erential equation Eq. 1. How-
ever, there is no a priori information about the dynamics of the process except
that some form of PDE can describe it (for simplicity, we consider temporally
varying 2D field case, even though the problem could be formulated for an ar-
bitrary field). In recent developments, we have abandoned the assumption of
110 the constant weights in the partial di↵erential equations, allowing them to be
an arbitrary function (logarithmic, trigonometric) and thus expanding the class
of possible systems to study.
8
>
<F (u, @u @u @u @ 2 u @ 2 u @2u
@x1 , @x2 , ..., @t , @x21 , @x22 , ..., @t2 , ..., x) = 0;
(1)
>
:G(x) = 0, x 2 (⌦) ⇥ [0, T ];
5
221
Figure 10. The solution of ODE from Equation (20), its approximation by neural network, and derivatives calculated by
analytic, polynomial and automatic differentiation.
∂2 u ∂2 u ∂2 u
2
= 2 + 2. (22)
∂t ∂x ∂y
222
However, that division has its downsides: smaller domains have less data, therefore,
the disturbances (noise) in individual point will have a higher impact on the results.
Furthermore, in realistic scenarios, the risks of deriving an equation, that describes a local
process, increases with the decrease in domain size. The Pareto front, indicating the trade-
off between the equation discrepancy and the time efficiency, could be utilized to find
the parsimonious setup of the experiment. On the noiseless data (we assume, that the
derivatives are calculated without the numerical error) even the data from a single point
will correctly represent the equation. Therefore, the experiments must be held on the data
with low, but significant noise levels.
We have conducted the experiments with the partition of data (Figure 11), containing
80 ⇥ 80 ⇥ 80 values, divided by spatial axes in fractions from the set {1, 10}. The experi-
ments were held with 10 independent runs on each of the setup (size of input data (number
of subdomains, into which the domain was divided, and sparsity constant, which affects
the number of terms of the equation).
100
60
40
20
0 20 40 60 80 100
Number of subdomains
(a) (b)
Figure 11. The results of the experiments on the divided domains. (a) evaluations of discovered equation quality for
different division fractions along each axis (2⇥ division represents division of domain into 4 square parts); (b) domain
processing time (relative to the processing of entire domain) for subdomain number.
The results of the test, presented in Figure 11, give insight into the consequences of
the processing domain by parts. It can be noticed, that with the split of data into smaller
portions, the qualities of the equations decrease due to the “overfitting” to the local noise.
However, in this case, due to higher numerical errors near the boundaries of the studied
domain, the base equation, derived from the full data, has its own errors. By dividing
the area into smaller subdomains, we allow some of the equations to be trained on data
with lower numerical errors and, therefore, have higher quality. The results, presented
in the Figure 11b are obtained only for the iterations of the evolutionary algorithm of the
equation discovery and do not represent the differences in time for other stages, such as
preprocessing, or further modeling of the process.
We can conclude that the technique of separating the domain into lesser parts and pro-
cessing them individually can be beneficial both for achieving speedup via parallelization
of the calculations and avoiding equations, derived from the high error zones. In this case,
such errors were primarily numerical, but in realistic applications, they can be attributed to
the faulty measurements or prevalence of a different process in a local area.
datasets were used as benchmarks that allow to analyze the efficiency of the generative
design in various situations.
To improve the performance of the model building (this issue was noted in Issue 2),
different approaches can be applied. First of all, caching techniques can be used. The cache
can be represented as a dictionary with the topological description of the model position in
the graph as a key and a fitted model as a value. Moreover, the fitted data preprocessor
can be saved in cache together with the model. The common structure of the cache is
represented in Figure 12.
Described by SID
Computa onally Shared storage for the ed models
expensive
(structural ID)
Cache dic onary
Iden ca on
Hyperparam. ( ng) SID 1 Cached model 1
Data-driven
model ... ...
Input data Predic on
SID N Cached model N
Depends on
Fast and simple
underlying chain Methods: append, clear, get
structure
Figure 12. The structure of the multi-chain shared cache for the fitted composite models.
The results of the experiments with a different implementation of cache are described
in Figure 13.
2000
1000
0
0 1 2 3 4 5 6 7 8 9 10
Generations
Figure 13. The total number model fit requests and the actually executed fits (cache misses) for the shared and local cache.
Local cache allows reducing the number of models fits up to five times against the
non-cached variant. The effectiveness of the shared cache implementation is twice as high
as that for the local cache.
The parallelization of the composite models building, fitting, and application also
makes it possible to decrease the time devoted to the design stage. It can be achieved
in different ways. First of all, the fitting and application of the atomic ML models can
be parallelized using the features of the underlying framework (e.g., Scikit-learn, Keras,
TensorFlow, etc [43]), since the atomic models can be very complex. However, this approach
is more effective in the shared memory systems and it is hard to scale it to the distributed
environments. Moreover, not all models can be efficiently parallelized in this way.
Then, the evolutionary algorithm that builds the composite model can be paralleled
itself, since the fitness function for each individual can be calculated independently. To con-
duct the experiment, the classification benchmark based at the credit scoring problem (https:
//github.com/nccr-itmo/FEDOT/blob/master/cases/credit_scoring_problem.py) was
224
used. The parameters of the evolutionary algorithm are the same as described at the
beginning of the section.
The obtained values of the fitness function for the classification problem are presented
in Figure 14.
(a) (b)
Figure 14. (a) The best achieved fitness value for the different computational configurations (represented as different
number of parallel threads) used to evaluate the evolutionary algorithm on classification benchmark. The boxplots are build
for the 10 independent runs. (b) Pareto frontier (blue) obtained for the classification benchmark in “execution time-model
quality” subspace. The red points represent dominated individuals.
(a) (b)
Figure 15. (a) The comparison of different scenarios of evolutionary optimization: best (ideal), realistic and worst cases
(b) The conceptual dependence of the parallelization efficiency from the variance of the execution time in population for
the different types of selection.
225
The same logic can be applied for the parallel fitting of the part of composite model
graphs. It raises the problem of the importance of assessment for the structural subgraphs
and the prediction of most promising candidate models before the final evaluation of the
fitness function will be done.
Figure 16. The comparison of different approaches to the evolutionary optimization of the composite models. The min-
max intervals are built for the 10 independent runs. The green line represents the static optimization algorithm with
20 individuals in the population; the blue line represented the dynamic optimization algorithm with 10 individuals in the
population. T0 , T1 and T2 are different real-time constraints, F0 , F1 and F2 are the values of fitness functions obtained with
the corresponding constraints.
226
Algorithm 1: The simplified pseudocode of the composite models tuning algorithm illustrated in Figure 6b.
Data: maxTuningTime, tuneData, paramsRanges
Result: tunedCompositeModel
fitData, validationData = Split(tuneData)
for atomicModel in compositeModel do
candidateCompositeModel = compositeModel
while tuningTime < maxTuningTime do
bestQuality = 0
candidateAtomicModel OptFunction(atomicModel, paramsRanges) // OptFunction can be
implemented as random search, Bayesian optimization, etc.
candidateCompositeModel Update(candidateCompositeModel, candidateAtomicModel)
Fit(candidateCompositeModel, fitData)
quality = EvaluateQuality (candidateCompositeModel, validationData)
if quality > bestQuality then
bestQuality = quality
bestAtomicModel = candidateAtomicModel
end
compositeModel Update(compositeModel, bestAtomicModel)
end
end
tunedCompositeModel = compositeModel
The results of the model-supported tuning of the composite models for the different
regression problems obtained from PMLB benchmark suite (Available in the https://
github.com/EpistasisLab/pmlb) are presented in Table 1. The self-developed toolbox
that was used to run the experiments with PMLB and FEDOT is available in the open
repository (https://github.com/ITMO-NSS-team/AutoML-benchmark). The applied
tuning algorithm is based on a random search in a pre-defined range.
Table 1. The quality measures for the composite models after and before random search-based tuning of hyperparameters. The
regression problems from PMLB suite [45] are used as benchmarks.
Benchmark Name MSE without Tuning MSE with Tuning R2 without Tuning R2 with Tuning
1203_BNG_pwLinear 8.213 0.102 0.592 0.935
197_cpu_act 5.928 7.457 0.98 0.975
215_2dplanes 1.007 0.001 0.947 1
228_elusage 126.755 0.862 0.524 0.996
294_satellite_image 0.464 0.591 0.905 0.953
4544_GeographicalOriginalofMusic 0.194 2.113 0.768 0.792
523_analcatdata_neavote 0.593 0.025 0.953 0.999
560_bodyfat 0.07 0.088 0.998 0.894
561_cpu 3412.46 0.083 0.937 0.91
564_fried 1.368 0.073 0.944 0.934
227
It can be seen that the hyperparameter optimization allow increasing the quality of
the models in most cases.
where T EPM —model fitting time estimation (represented in ms according to the scale of
coefficients from Table 2), Nobs —number of observations in the sample, N f eat —number of
features in the sample. The characteristics of the computational resources and hyperparam-
eters of the model are considered as static in this case.
We applied the least squared errors (LSE) algorithm to (23) and obtained the Q
coefficients for the set of models that presented Table 2. The coefficient of determination R2
is used to evaluate the quality of obtained performance models.
The application of the evolutionary optimization to the benchmark allows finding the
optimal structure of the composite model for the specific problem. We demonstrate EPM
constructing for the composite model which consists of logistic regression and random
forest as a primary nodes and logistic regression as a secondary node. On the basis of (11),
EPM for this composite model can be represented as follows:
2 N
Nobs
N f eat
EPM
TAdd = max (Q1,1 Nobs N f eat + Q2,1 Nobs , Q11,2 Nobs N f eat + Q2,2 Nobs ) + obs + , (24)
Q21,3 Q22,3
228
where TAddEPM —composite model fitting time estimated by the additive EMP, Q , j-i coeffi-
i
cient of j model type for EPM according to the Table 2.
The performance model for the composite model with three nodes (LR + RF = LR) is
shown in Figure 17. The visualizations for the atomic models are available in Appendix A.
Figure 17. Predictions of the performance model that uses an additive approach for local empirical performance models
(EPMs) of atomic models. The red points represent the real evaluations of the composite model as a part of validation.
where t is a variable of real time and rc is a critical threshold for values of error function
E. Such a problem is typical for models that are connected with a lifecycle of their pro-
totype, e.g., models inside digital shadow for industrial system [47], weather forecasting
models [48], etc.
Additional fitting of co-designed system may appear also on the level of model
execution where classic scheduling approach may be blended with model tuning. Classic
formulation of scheduling for resource intensive applications Tex min ( L⇤ ) = min G 0 ( L| M, I )
A
is based on idea of optimization search for such algorithm L⇤ that helps to provide minimal
computation time Tex min for model execution process through balanced schedules of
workload on computation nodes. However, such approach is restricted by assumption
of uniform performance models for all parts of application. In real cases performance of
application may change dynamically in time and among functional parts. Thus, to reach
more effective execution it is desirable to formulate optimization problem with respect to
possibility of tuning model characteristics that influence on model performance:
229
⇣n o⇤ ⌘ ⇣ ⇣n o⌘ ⌘ n o
Tex max a1:|S| , L⇤ = max G M a1:|S| , L| I , M = S⇤ , E⇤ , a1:|S| , L = { L m }, (26)
a,L
where G is objective function that characterize expected time of model execution with
respect to used scheduling algorithm L and model M. In the context of generative modeling
problem on the stage of execution model M can be fully described as a set of model
properties that consists of optimal model structure: optimal functions ⇤
n S o(from previous
stage) and additional set of performance influential parameters a1:|S| . Reminiscent
approaches can be seen in several publications, e.g., [49].
7. Conclusions
In this paper, we aimed to highlight the different aspects of the creation of mathe-
matical models using automated evolutionary learning approach. Such approach may be
represented from the perspective of generative design and co-design for mathematical
models. First of all, we formalize several actual and unsolved issues that exist in the
field of generative design of mathematical models. They are devoted to different aspects:
computational complexity, performance modeling, parallelization, interaction with the
infrastructure, etc. The set of experiments was conducted as proof-of-concept solutions
for every announced issue and obstacle. The composite ML models obtained by the FE-
DOT framework and differential equation-based models obtained by the EPDE framework
were used as case studies. Finally, the common concepts of the co-design implementation
were discussed.
Author Contributions: Conceptualization, A.V.K. and A.B.; Investigation, N.O.N., A.H., M.M. and
M.Y.; Methodology, A.V.K.; Project administration, A.B.; Software, N.O.N., A.H., and M.M.; Supervi-
sion, A.B.; Validation, M.M.; Visualization, M.Y.; Writing–original draft, A.V.K., N.O.N. and A.H. All
authors have read and agreed to the final publication of the manuscript.
Funding: This research is financially supported by the Ministry of Science and Higher Education,
Agreement #075-15-2020-808.
Conflicts of Interest: The authors declare no conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
AI Artificial intelligence
ANN Artificial neural network
AutoML Automated machine learning
DAG Directed acyclic graph
EPM Empirical performance model
GPU Graphics processing unit
ML Machine learning
MSE Mean squared error
NAS Neural architecture search
ODE Ordinary differential equation
PDE Partial differential equation
PM Performance model
R2 Coefficient of determination
RMSE Root mean square error
ROC AUC Area under receiver operating characteristic curve
230
Table A1. Approximation errors for the different empirical performance models’ structures obtained
for the atomic ML models. The best suitable structure is highlighted with bold.
The visualization of the performance models predictions for the different cases is
presented in Figure A1. It confirms that the selected EPMs allow estimating the fitting time
quite reliably.
References
1. Packard, N.; Bedau, M.A.; Channon, A.; Ikegami, T.; Rasmussen, S.; Stanley, K.; Taylor, T. Open-Ended Evolution and Open-Endedness:
Editorial Introduction to the Open-Ended Evolution I Special Issue; MIT Press: Cambridge, MA, USA, 2019.
2. Krish, S. A practical generative design method. Comput.-Aided Des. 2011, 43, 88–100. [CrossRef]
3. Ferreira, C. Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence; Springer: Berlin/Heidelberg, Germany,
2006; Volume 21.
4. Pavlyshenko, B. Using stacking approaches for machine learning models. In Proceedings of the 2018 IEEE Second International
Conference on Data Stream Mining & Processing (DSMP), Lviv, Ukraine, 21–25 August 2018; pp. 255–258.
5. Kovalchuk, S.V.; Metsker, O.G.; Funkner, A.A.; Kisliakovskii, I.O.; Nikitin, N.O.; Kalyuzhnaya, A.V.; Vaganov, D.A.; Bochenina,
K.O. A conceptual approach to complex model management with generalized modelling patterns and evolutionary identification.
Complexity 2018, 2018, 5870987. [CrossRef]
6. Kalyuzhnaya, A.V.; Nikitin, N.O.; Vychuzhanin, P.; Hvatov, A.; Boukhanovsky, A. Automatic evolutionary learning of composite
models with knowledge enrichment. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion,
Cancun, Mexico, 8–12 July 2020; pp. 43–44.
7. Lecomte, S.; Guillouard, S.; Moy, C.; Leray, P.; Soulard, P. A co-design methodology based on model driven architecture for real
time embedded systems. Math. Comput. Model. 2011, 53, 471–484. [CrossRef]
8. He, X.; Zhao, K.; Chu, X. AutoML: A Survey of the State-of-the-Art. arXiv 2019, arXiv:1908.00709.
9. Caldwell, J.; Ram, Y.M. Mathematical Modelling: Concepts and Case Studies; Springer Science & Business Media: Berlin/Heidelberg,
Germany, 2013; Volume 6.
10. Banwarth-Kuhn, M.; Sindi, S. How and why to build a mathematical model: A case study using prion aggregation. J. Biol. Chem.
2020, 295, 5022–5035. [CrossRef] [PubMed]
11. Castillo, O.; Melin, P. Automated mathematical modelling for financial time series prediction using fuzzy logic, dynamical
systems and fractal theory. In Proceedings of the IEEE/IAFE 1996 Conference on Computational Intelligence for Financial
Engineering (CIFEr), New York City, NY, USA, 24–26 March 1996; pp. 120–126.
12. Kevrekidis, I.G.; Gear, C.W.; Hyman, J.M.; Kevrekidid, P.G.; Runborg, O.; Theodoropoulos, C. Equation-free, coarse-grained
multiscale computation: Enabling mocroscopic simulators to perform system-level analysis. Commun. Math. Sci. 2003, 1, 715–762.
13. Schmidt, M.; Lipson, H. Distilling free-form natural laws from experimental data. Science 2009, 324, 81–85. [CrossRef]
14. Kondrashov, D.; Chekroun, M.D.; Ghil, M. Data-driven non-Markovian closure models. Phys. D Nonlinear Phenom. 2015,
297, 33–55. [CrossRef]
15. Maslyaev, M.; Hvatov, A.; Kalyuzhnaya, A. Data-Driven Partial Derivative Equations Discovery with Evolutionary Approach. In
International Conference on Computational Science; Springer: Berlin/Heidelberg, Germany, 2019; pp. 635–641.
16. Qi, F.; Xia, Z.; Tang, G.; Yang, H.; Song, Y.; Qian, G.; An, X.; Lin, C.; Shi, G. A Graph-based Evolutionary Algorithm for Automated
Machine Learning. Softw. Eng. Rev. 2020, 1, 10–37686.
17. Olson, R.S.; Bartley, N.; Urbanowicz, R.J.; Moore, J.H. Evaluation of a tree-based pipeline optimization tool for automating
data science. In Proceedings of the Genetic and Evolutionary Computation Conference, New York, NY, USA, 20–24 July 2016;
pp. 485–492.
18. Zhao, H. High Performance Machine Learning through Codesign and Rooflining. Ph.D. Thesis, UC Berkeley, Berkeley, CA,
USA, 2014.
19. Amid, A.; Kwon, K.; Gholami, A.; Wu, B.; Asanović, K.; Keutzer, K. Co-design of deep neural nets and neural net accelerators for
embedded vision applications. IBM J. Res. Dev. 2019, 63, 6:1–6:14. [CrossRef]
20. Li, Y.; Park, J.; Alian, M.; Yuan, Y.; Qu, Z.; Pan, P.; Wang, R.; Schwing, A.; Esmaeilzadeh, H.; Kim, N.S. A network-centric
hardware/algorithm co-design to accelerate distributed training of deep neural networks. In Proceedings of the 2018 51st Annual
IEEE/ACM International Symposium on Microarchitecture (MICRO), Fukuoka, Japan, 20–24 October 2018; pp. 175–188.
232
21. Bertels, K. Hardware/Software Co-Design for Heterogeneous Multi-Core Platforms; Springer: Berlin/Heidelberg, Germany, 2012.
22. Wang, K.; Liu, Z.; Lin, Y.; Lin, J.; Han, S. HAQ: Hardware-Aware Automated Quantization With Mixed Precision. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019.
23. Cai, H.; Zhu, L.; Han, S. Proxylessnas: Direct neural architecture search on target task and hardware. arXiv 2018, arXiv:1812.00332.
24. Dosanjh, S.S.; Barrett, R.F.; Doerfler, D.; Hammond, S.D.; Hemmert, K.S.; Heroux, M.A.; Lin, P.T.; Pedretti, K.T.; Rodrigues, A.F.;
Trucano, T. Exascale design space exploration and co-design. Future Gener. Comput. Syst. 2014, 30, 46–58. [CrossRef]
25. Gramacy, R.B.; Lee, H.K. Adaptive Design of Supercomputer Experiments. 2018. Available online: http://citeseerx.ist.psu.edu/
viewdoc/download?doi=10.1.1.312.3750&rep=rep1&type=pdf (accessed on 26 December 2020).
26. Glinskiy, B.; Kulikov, I.; Snytnikov, A.V.; Chernykh, I.; Weins, D.V. A multilevel approach to algorithm and software design for
exaflops supercomputers. Numer. Methods Program. 2015, 16, 543–556.
27. Kaltenecker, C. Comparison of Analytical and Empirical Performance Models: A Case Study on Multigrid Systems. Master’s The-
sis, University of Passau, Passau, Germany, 2016.
28. Calotoiu, A. Automatic Empirical Performance Modeling of Parallel Programs. Ph.D. Thesis, Technische Universität, Berlin,
Germany, 2018.
29. Eggensperger, K.; Lindauer, M.; Hoos, H.H.; Hutter, F.; Leyton-Brown, K. Efficient benchmarking of algorithm configurators via
model-based surrogates. Mach. Learn. 2018, 107, 15–41. [CrossRef]
30. Chirkin, A.M.; Belloum, A.S.; Kovalchuk, S.V.; Makkes, M.X.; Melnik, M.A.; Visheratin, A.A.; Nasonov, D.A. Execution time
estimation for workflow scheduling. Future Gener. Comput. Syst. 2017, 75, 376–387. [CrossRef]
31. Gamatié, A.; An, X.; Zhang, Y.; Kang, A.; Sassatelli, G. Empirical model-based performance prediction for application mapping
on multicore architectures. J. Syst. Archit. 2019, 98, 1–16. [CrossRef]
32. Shi, Z.; Dongarra, J.J. Scheduling workflow applications on processors with different capabilities. Future Gener. Comput. Syst.
2006, 22, 665–675. [CrossRef]
33. Visheratin, A.A.; Melnik, M.; Nasonov, D.; Butakov, N.; Boukhanovsky, A.V. Hybrid scheduling algorithm in early warning
systems. Future Gener. Comput. Syst. 2018, 79, 630–642. [CrossRef]
34. Melnik, M.; Nasonov, D. Workflow scheduling using Neural Networks and Reinforcement Learning. Procedia Comput. Sci. 2019,
156, 29–36. [CrossRef]
35. Olson, R.S.; Moore, J.H. TPOT: A tree-based pipeline optimization tool for automating machine learning. Proc. Mach. Learn. Res.
2016, 64, 66–74.
36. Evans, L.; Society, A.M. Partial Differential Equations; Graduate Studies in Mathematics; American Mathematical Society:
Providence, RI, USA, 1998.
37. Czarnecki, W.M.; Osindero, S.; Jaderberg, M.; Swirszcz, G.; Pascanu, R. Sobolev training for neural networks. In Proceedings of
the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 4278–4287.
38. Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward
and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [CrossRef]
39. Epicoco, I.; Mocavero, S.; Porter, A.R.; Pickles, S.M.; Ashworth, M.; Aloisio, G. Hybridisation strategies and data structures for
the NEMO ocean model. Int. J. High Perform. Comput. Appl. 2018, 32, 864–881. [CrossRef]
40. Nikitin, N.O.; Polonskaia, I.S.; Vychuzhanin, P.; Barabanova, I.V.; Kalyuzhnaya, A.V. Structural Evolutionary Learning for
Composite Classification Models. Procedia Comput. Sci. 2020, 178, 414–423. [CrossRef]
41. Full Script That Allows Reproducing the Results Is Available in the GitHub Repository. Available online: https://github.
com/ITMO-NSS-team/FEDOT.Algs/blob/master/estar/examples/ann_approximation_experiments.ipynb (accessed on
26 December 2020).
42. Full Script That Allows Reproducing the Results Is Available in the GitHub Repository. Available online: https://github.com/
ITMO-NSS-team/FEDOT.Algs/blob/master/estar/examples/Pareto_division.py (accessed on 26 December 2020).
43. Géron, A. Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent
Systems; O’Reilly Media: Sebastopol, CA, USA, 2019.
44. Nikitin, N.O.; Vychuzhanin, P.; Hvatov, A.; Deeva, I.; Kalyuzhnaya, A.V.; Kovalchuk, S.V. Deadline-driven approach for multi-
fidelity surrogate-assisted environmental model calibration: SWAN wind wave model case study. In Proceedings of the Genetic
and Evolutionary Computation Conference Companion, Prague, Czech Republic, 13–17 July 2019; pp. 1583–1591.
45. Olson, R.S.; La Cava, W.; Orzechowski, P.; Urbanowicz, R.J.; Moore, J.H. PMLB: A large benchmark suite for machine learning
evaluation and comparison. BioData Min. 2017, 10, 1–13. [CrossRef]
46. Li, K.; Xiang, Z.; Tan, K.C. Which surrogate works for empirical performance modelling? A case study with differential evolution.
In Proceedings of the 2019 IEEE Congress on Evolutionary Computation (CEC), Wellington, New Zealand, 10–13 June 2019;
pp. 1988–1995.
47. Bauernhansl, T.; Hartleif, S.; Felix, T. The Digital Shadow of production–A concept for the effective and efficient information
supply in dynamic industrial environments. Procedia CIRP 2018, 72, 69–74. [CrossRef]
48. Cha, D.H.; Wang, Y. A dynamical initialization scheme for real-time forecasts of tropical cyclones using the WRF model. Mon.
Weather Rev. 2013, 141, 964–986. [CrossRef]
49. Melnik, M.; Nasonov, D.A.; Liniov, A. Intellectual Execution Scheme of Iterative Computational Models based on Symbiotic
Interaction with Application for Urban Mobility Modelling. IJCCI 2019, 1, 245–251.
233
ABSTRACT some information about its operation. In the case of modeling phys-
Evolutionary di�erential equation discovery proved to be a tool to ical processes, commonly, the most suitable models have forms of
obtain equations with less a priori assumptions than conventional partial di�erential equations. Thus many recent studies aimed to
approaches, such as sparse symbolic regression over the complete develop the concept of data-driven di�erential equations discovery.
possible terms library. The equation discovery �eld contains two In the paper, data-driven discovery implies obtaining a di�eren-
independent directions. The �rst one is purely mathematical and tial equation from a set of empirical measurements, describing the
concerns di�erentiation, the object of optimization and its relation dynamics of a dependent variable in some domain. Furthermore,
to the functional spaces and others. The second one is dedicated equation-based models can be incorporated into pipelines of au-
purely to the optimizatioal problem statement. Both topics are tomated machine learning, that can include arbitrary submodels,
worth investigating to improve the algorithm’s ability to handle with approach, discussed in paper [14].
experimental data a more arti�cial intelligence way, without signif- Initial advances in di�erential equations discovery were made
icant pre-processing and a priori knowledge of their nature. In the with symbolic regression algorithm, as in [1]. The algorithm em-
paper, we consider the prevalence of either single-objective opti- ploys genetic programming to detect the graph, that represents
mization, which considers only the discrepancy between selected di�erential equation. One of the groups of the most simple yet
terms in the equation, or multi-objective optimization, which addi- practical techniques of equation construction is based on the sparse
tionally takes into account the complexity of the obtained equation. linear regression (least absolute shrinkage and selection operator),
The proposed comparison approach is shown on classical model introduced in works [11], [15], [16], and other similar projects. This
examples – Burgers equation, wave equation, and Korteweg - de approach has limited �exibility, having applicability restrictions
Vries equation. in cases of the equation with low magnitude coe�cients, being
discovered on noisy data. This issue is addressed by employing
CCS CONCEPTS Bayesian interference as in [12] to estimate the coe�cients of the
equation, as in work [4]. To account for the uncertainty in the
• Applied computing ! Mathematics and statistics; • Computing
resulting model, the approximating term library can be biased sta-
methodologies ! Heuristic function construction.
tistically [2]. Physics-informed neural networks (PINN) form the
next class of data-driven equation discovery tools, representing
KEYWORDS
the process dynamics with arti�cial neural networks. The primary
symbolic regression, dynamic system modeling, interpretable learn- research on this topic is done in work [13], while recent advances
ing, di�erential equations, sparse regression have been made in incorporating more complex types of neural
ACM Reference Format: networks in the PINNs [3, 17].
Mikhail Maslyaev and Alexander Hvatov. 2023. Comparison of Single- and In recent studies [7, 10], evolutionary algorithms have proved
Multi- Objective Optimization Quality for Evolutionary Equation Discovery. to be a rather �exible tool for di�erential equation discovery, de-
In Genetic and Evolutionary Computation Conference Companion (GECCO manding only a few assumptions about the process properties. The
’23 Companion), July 15–19, 2023, Lisbon, Portugal. ACM, New York, NY, problem is stated as the process representation error minimization.
USA, 4 pages. https://doi.org/10.1145/3583133.3590601 Implementing multi-objective evolutionary optimization, �rst in-
troduced for DE systems, as in [8], seems to be a feasible way to
1 INTRODUCTION improve the quality of the equation search, operating on fewer
The recent development of arti�cial intelligence has given high initial assumptions and providing higher diversity among the pro-
importance to problems of interpretable machine learning. In many cessed candidates. Additional criteria can represent other valuable
cases, users value models not only for their quality of predicting properties of the constructed models, namely conciseness.
the state of the studied system but also for the ability to provide This study compares the performance of single- and multi- objec-
tive optimization. Namely, the hypothesis that the multi-objective
Permission to make digital or hard copies of part or all of this work for personal or optimization creates and preserves diversity in the population and
classroom use is granted without fee provided that copies are not made or distributed
for pro�t or commercial advantage and that copies bear this notice and the full citation thus may achieve a better �tness function values, than that of a
on the �rst page. Copyrights for third-party components of this work must be honored. single-objective approach.The theoretical comparison shows that
For all other uses, contact the owner/author(s). multi-objective algorithms allow escaping local minima as soon as
GECCO ’23 Companion, July 15–19, 2023, Lisbon, Portugal
© 2023 Copyright held by the owner/author(s). the number of objectives is reasonably small [5]. For equation dis-
ACM ISBN 979-8-4007-0120-7/23/07. covery applications, the function landscapes have a more complex
https://doi.org/10.1145/3583133.3590601
234
GECCO ’23 Companion, July 15–19, 2023, Lisbon, Portugal M. Maslyaev, and A. Hvatov
structure, so increased diversity of the population can bene�t the 2.2 Mechanics of implemented evolutionary
resulting quality. operators
To direct the search for the optimal equations, standard evolution-
2 ALGORITHM DESCRIPTION ary operators of mutation and cross-over have been implemented.
The data-driven di�erential equation identi�cation operates on While the mechanics of single- and multi-objective optimization
problems of selecting a model for dynamics of the variable D = in the algorithm di�er, they work similarly on the stage of apply-
D (C, x) in a spatio-temporal domain (0,) ) ⌦, that is implicitly
>
ing equation structure-changing operators. With the graph-like
described by di�erential equation Eq. 1 with corresponding initial encoding of candidate equations, the operators can be represented
and boundary conditions. It can be assumed, that the order of the as changes, introduced into its subgraphs.
unknown equation can be arbitrary, but rather low (usually of The algorithm properties to explore structures are provided by
second or third order). mutation operators, which operate by random token and term ex-
mD mD mD changes. The number of terms to change has no strict limits. For
(C, x, D,
, , ... )=0 (1) tokens with parameters (?:+1, ... ?= ) 2 R= : , such as a para-
mC mG 1 mG=
Both multi-objective and single-objective approaches have the metric representation of an unknown external dependent variable,
same core of "graph-like" representation of a di�erential equation parameters are also optimized: the mutation is done with a random
(encoding) and similar evolutionary operators that will be described Gaussian increment.
further. In order to combine structural elements of better equations,
the cross-over operator is implemented. The interactions between
2.1 Di�erential equation representation parent equations are held on a term-level basis. The sets of terms
pairs from the parent equation are divided into three groups: terms
To represent the candidate di�erential equation the computational
identical in both equations, terms that are present in both equations
graph structure is employed. A �xed three-layer graph structure is
but have di�erent parameters or only a few tokens inside of them
employed to avoid the infeasible structures, linked to unconstrained
are di�erent, and the unique ones. The cross-over occurs for the two
graph construction and overtraining issues, present in symbolic
latter groups. For the second group it manifests as the parameter
regression. The lowest level nodes contain tokens, middle nodes
exchange between parents: the new parameters are selected from
and the root are multiplication and summation operations. The
the interval between the parents’ values.
data-driven equations take the form of a linear combination of
Cross-over between unique terms works as the complete ex-
product terms, represented by the multiplication of derivatives,
change between them. The construction of exchange pairs between
other functions and a real-valued coe�cient Eq. 2.
these tokens works entirely randomly.
(
0 (C, x, D, mD , mD , ... mD ) = Õ U Œ 5 = 0
mC mG 1 mG= 8 8 9 89
(2) 2.3 Optimization of equation quality metric
⌧ 0 (D)| = 0
The selection of the optimized functional distinguishes multiple
Here, the factors 58 9 are selected from the user-de�ned set of approaches to the di�erential equation search. First of all, a more
elementary functions, named tokens. The problem of an equation trivial optimization problem can be stated as in Eq. 4, where we
search transforms into the task of detecting an optimal set of tokens assume the identity of the equation operator 0 (D) = 0 to zero as
to represent the dynamics of the variable D (C, x), and forming the in Eq. 2.
equation by evaluating the coe�cients U = (U 1, ... U< ).
During the equation search, we operate with tensors of token ’ ÷
values, evaluated on grids DW = D (CW , xW ) in the processed domain &>? ( 0 (D)) = || 0 (D)||= = || U8 58 9 ||= ! min (4)
> U 8 C8 9
(0,) ) ⌦. 8 9
Sparsity promotion in the equation operates by �ltering out
An example of a more complex optimized functional is the norm
nominal terms with low predicting power and is implemented with
of a discrepancy between the input values of the modelled variable
LASSO regression. For each individual, a term (without loss of
and the solution proposed by the algorithm di�erential equation,
generality, we can assume that it is the <-th term) is marked to be a
estimated on the same grid. Classical solution techniques can not
"right-hand side of the equation" for the purposes of term �ltering
Œ be applied here due to the inability of a user to introduce the par-
and coe�cient calculation. The terms )8 = 9 58 9 are paired with
titioning of the processed domain, form �nite-di�erence schema
real-value coe�cients obtained from the optimization subproblem
without a priori knowledge of an equation, proposed by evolution-
of Eq. 3. Finally, the equation coe�cients are detected by linear
ary algorithm. An automatic solving method for candidate equation
regression.
(viewed as in Eq. 6) quality evaluation is introduced in [9] to work
’ ÷ ÷ around this issue.
U 0 = arg min (|| U80 58 9 5< 9 || 2 + _||U 0 || 1 ) (3)
U
8, 8<< 9 9
&B>; ( 0 (D)) = ||D D ||= ! min (5)
In the initialization of the algorithm equation graphs are ran- U 8 C8 9
domly constructed for each individual from the sets of user-de�ned ’ ÷
tokens with a number of assumptions about the structures of the 0
(D) = 0 : 0 (D) = U8 58 9 = 0 (6)
“plausible equations”. 8 9
235
Comparison of Single- and Multi- Objective Optimization �ality for Evolutionary Equation Discovery GECCO ’23 Companion, July 15–19, 2023, Lisbon, Portugal
While both quality metrics Eq. 4 and Eq. 5 in ideal conditions consumption.10 independent runs are conducted with each setup.
provide decent convergence of the algorithm, in the case of the The main equation quality indicator in our study is the statistical
noisy data, the errors in derivative estimations can make di�erential analysis of the objective function mean (` = ` (& ( 0 ))) and variance
operator discrepancy from the identity (as in problem in Eq. 4) an f 2 = (f (& ( 0 ))) 2 among the di�erent launches.
unreliable metric. Applying the automatic solving algorithm has The �rst equation was the wave equation as on Eq. 8 with the
high computational cost due to training a neural network to satisfy necessary boundary and initial conditions. The equation is solved
the discretized equation and boundary operators. with the Wolfram Mathematica software in the domain of (G, C) 2
As the single-objective optimization method for the study, we [0, 1] [0, 1] on a grid of 101 101. Here, we have employed
> >
have employed a simple evolutionary algorithm with a strategy that numerical di�erentiation procedures.
minimizes one of the aforementioned quality objective functions.
Due to the purposes of experiments on synthetic noiseless data, the m 2D m 2D
= 0.04 2 (8)
discrepancy-based approach has been adopted. mC 2 mG
The algorithm’s convergence due to the relatively simple struc-
2.4 Multi-objective optimization application ture was ensured in the case of both algorithms: the algorithm
As we stated earlier, in addition to process representation, the proposes the correct structure during the initialization or in the
conciseness is also a valuable for regulating the interpretability initial epochs of the optimization. However, such a trivial case can
of the model. Thus the metric of this property can be naturally be a decent indicator of the “ideal” algorithm behaviour. The values
introduced as Eq. 7, with an adjustment of counting not the total of examined metrics for this experiment and for the next ones are
number of active terms but the total number of tokens (:8 for 8 C⌘ presented on Tab. 1.
term).
Table 1: Results of the equation discovery
’
⇠( 0
(D)) = #( ) =
0
:8 ⇤ 1U8 <0 (7)
8 metric method wave Burgers KdV
In addition to evaluating the quality of the proposed solution ` single-objective 5.72 2246.38 0.162
from the point of the equation simplicity, multi-objective enables multi-objective 2.03 1.515 16.128
the detection of systems of di�erential equations, optimizing quali- f2 single-objective 18.57 4.41 ⇤ 107 8.9 ⇤ 10 3
ties of modeling of each variable. multi-objective 0 20.66 ⇡ 10 13
While there are many evolutionary multi-objective optimiza-
tion algorithms, MOEADD (Multi-objective evolutionary algorithm
based on dominance and decomposition) [6] algorithm has proven The statistical analysis of the algorithm performance on each
to be an e�ective tool in applications of data-driven di�erential equation is provided in Fig. 1.
equations construction. We employ baseline version of the MOEADD Another examination was performed on the solution of Burgers’
from the aforementioned paper with the following parameters: PBI equation, which has a more complex, non-linear structure. The
penalty factor \ = 1.0, probability of parent selection inside the problem was set as in Eq. 9, for a case of a process without viscosity,
2
sector neighbourhood X = 0.9 (4 nearest sector are considered as thus omitting term a mmCD2 . As in the previous example, the equation
“neighbouring”) with 40% of individuals selected as parents. Evo- was solved with the Wolfram Mathematica toolkit.
lutionary operator parameters are: crossover rate (probability of
a�ecting individual terms): 0.3 and mutation rate of 0.6.The result mD mD
+D =0 (9)
of the algorithm is the set of equations, ranging from the most sim- mC mG
m= D = 0) to the highly
plistic constructions (typically in forms of mG Derivatives used during the equation search were computed
analytically due to the function not being constant only on small
=
:
complex equations, where extra terms probably represents the noise domain.
components of the dynamics. The presence of other structures that have relatively low opti-
mized function values, such as DG0 DC0 = DCC
00 , makes this case of data
3 EXPERIMENTAL STUDY rather informative. Thus, the algorithm has a local optimum that is
This section of the paper is dedicated to studying equation dis- far from the correct structure from the point of error metric.
covery framework properties. As the main object of interest, we The �nal set-up for an experiment was de�ned with a non-
designate the di�erence of derived equations between single- and homogeneous Korteweg-de Vries equation, presented in Eq. 10.
multi-objective optimization launches. The validation was held The presence of external tokens in separate terms in the equation
on the synthetic datasets, where modelled dependent variable is makes the search more di�cult.
obtained from solving an already known and studied equation.
The tests were held on three cases: wave, Burgers and Korteweg- mD mD m 3D
+ 6D + = cos C sin C (10)
de Vries equations due to unique properties of each equation. The mC mG mG 3
algorithms were tested in the following pattern: 64 evolutionary The experiment results indicate that the algorithm may detect
iterations for the single-objective optimization algorithm and 8 the same equation in multiple forms. Each term of the equation
iterations of multi-objective optimization for the populations of 8 may be chosen as the “right-hand side” one, and the numerical error
candidate equations, which resulted in roughly similar resource with di�erent coe�cient sets can also vary.
236
GECCO ’23 Companion, July 15–19, 2023, Lisbon, Portugal M. Maslyaev, and A. Hvatov
104
102
101
100 10 1
2
10
6 × 100
4
10
0 6
4 × 10 10
8
3 × 10 0 10
10
10
2
2 × 100 10
Single Objective Multi-Objective Single Objective Multi-Objective Single Objective Multi-Objective
Figure 1: Resulting quality objective function value, introduced as Eq. 6, for single- and multi-objective approaches for (a) wave
equation, (b) Burgers equation, and (c) Korteweg-de Vries equation
4 CONCLUSION limit, with active learning and control. Proceedings of the Royal Society A 478,
2260 (2022), 20210904.
This paper examines the prospects of using multi-objective opti- [3] Han Gao, Matthew J Zahr, and Jian-Xun Wang. 2022. Physics-informed graph
mization for the data-driven discovery of partial di�erential equa- neural galerkin networks: A uni�ed framework for solving pde-governed forward
and inverse problems. Computer Methods in Applied Mechanics and Engineering
tions. While initially introduced for handling problems of deriving 390 (2022), 114502.
systems of partial di�erential equations, the multi-objective view [4] L Gao, Urban Fasel, Steven L Brunton, and J Nathan Kutz. 2023. Convergence of
of the problem improves the overall quality of the algorithm. The uncertainty estimates in Ensemble and Bayesian sparse model discovery. arXiv
preprint arXiv:2301.12649 (2023).
improved convergence, provided by higher candidate individual [5] Hisao Ishibuchi, Yusuke Nojima, and Tsutomu Doi. 2006. Comparison between
diversity, makes the process more reliable in cases of equations single-objective and multi-objective genetic algorithms: Performance comparison
with complex structures, as was shown in the examples of Burgers’ and performance measures. In 2006 IEEE International Conference on Evolutionary
Computation. IEEE, 1143–1150.
and Korteweg-de Vries equations. [6] Q. Zhang K. Li, K. Deb and S. Kwong. 2015. An Evolutionary Many-Objective
The previous studies have indicated the algorithm’s reliability, Optimization Algorithm Based on Dominance and Decomposition. in IEEE
Transactions on Evolutionary Computation) 19, 5 (2015), 694–716. https://doi.org/
converging to the correct equation, while this research has proposed 10.1109/TEVC.2014.2373386
a method of improving the rate at which the correct structures are [7] Lu Lu, Xuhui Meng, Zhiping Mao, and George Em Karniadakis. 2021. DeepXDE:
identi�ed. This property is valuable for real-world applications A deep learning library for solving di�erential equations. SIAM Rev. 63, 1 (2021),
208–228.
because incorporating large and complete datasets improves the [8] Mikhail Maslyaev and Alexander Hvatov. 2021. Multi-Objective Discovery of PDE
noise resistance of the approach. Systems Using Evolutionary Approach. In 2021 IEEE Congress on Evolutionary
The further development of the proposed method involves intro- Computation (CEC). 596–603. https://doi.org/10.1109/CEC45853.2021.9504712
[9] Mikhail Maslyaev and Alexander Hvatov. 2022. Solver-Based Fitness Function
ducing techniques for incorporating expert knowledge into the for the Data-Driven Evolutionary Discovery of Partial Di�erential Equations. In
search process. This concept can help generate preferable can- 2022 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1–8.
[10] Mikhail Maslyaev, Alexander Hvatov, and Anna V Kalyuzhnaya. 2021. Partial
didates or exclude infeasible ones even before costly coe�cient di�erential equations discovery with EPDE framework: application for real and
calculation and �tness evaluation procedures. synthetic data. Journal of Computational Science (2021), 101345.
[11] Daniel A Messenger and David M Bortz. 2021. Weak SINDy for partial di�erential
equations. J. Comput. Phys. 443 (2021), 110525.
5 CODE AND DATA AVAILABILITY [12] Lizhen Nie and Veronika Ročková. 2022. Bayesian Bootstrap Spike-and-Slab
The numerical solution data and the Python scripts, that reproduce LASSO. J. Amer. Statist. Assoc. 0, 0 (2022), 1–16. https://doi.org/10.1080/01621459.
2022.2025815
the experiments, are available at the GitHub repository 1 . [13] M Raissi, P Perdikaris, and GE Karniadakis. 2017. Physics informed deep learning
(Part II): Data-driven discovery of nonlinear partial di�erential equations. arXiv
ACKNOWLEDGEMENTS preprint arXiv:1711.10566 (2017). https://arxiv.org/abs/1711.10566
[14] Mikhail Sarafanov, Valerii Pokrovskii, and Nikolay O Nikitin. 2022. Evolutionary
This research is �nancially supported by the Ministry of Science Automated Machine Learning for Multi-Scale Decomposition and Forecasting of
and Higher Education, agreement FSER-2021-0012. Sensor Time Series. In 2022 IEEE Congress on Evolutionary Computation (CEC).
IEEE, 01–08.
[15] Hayden Schae�er. 2017. Learning partial di�erential equations via data discovery
REFERENCES and sparse optimization. Proc. R. Soc. A 473, 2197 (2017), 20160446.
[1] H. Cao, L. Kang, Y. Chen, et al. 2000. Evolutionary Modeling of Systems of [16] H. Schae�er, R. Ca�isch, C. D. Hauck, and S. Osher. 2017. Learning partial
Ordinary Di�erential Equations with Genetic Programming. Genetic Program- di�erential equations via data discovery and sparse optimization. Proceedings
ming and Evolvable Machines 1 (2000), 309–337. https://doi.org/doi:10.1023/A: of the Royal Society A: Mathematical, Physical and Engineering Science (2017).
1010013106294 https://doi.org/473(2197):20160446
[2] Urban Fasel, J Nathan Kutz, Bingni W Brunton, and Steven L Brunton. 2022. [17] Pongpisit Thanasutives, Takashi Morita, Masayuki Numao, and Ken ichi Fukui.
Ensemble-SINDy: Robust sparse model discovery in the low-data, high-noise 2023. Noise-aware physics-informed machine learning for robust PDE discovery.
Machine Learning: Science and Technology 4, 1 (feb 2023), 015009. https://doi.org/
1 https://github.com/ITMO-NSS-team/EPDE_GECCO_experiments 10.1088/2632-2153/acb1f0