Вы находитесь на странице: 1из 123

Национальный исследовательский университет ИТМО

(Университет ИТМО)

На правах рукописи

Масляев Михаил Александрович


Методы и алгоритмы идентификации по данным
физически обоснованных моделей в форме
дифференциальных уравнений

Диссертация на соискание учёной степени

кандидата физико-математических наук

Санкт-Петербург 2023
115

В уравнении (3.2) Eq k = Eq ⇥...⇥Eq представляет собой декартово произ�


ведение наборов возможных уравнений. Подчеркнем, что поскольку дано толь�
ко дискретное число точек, операторы заменяются дискретными аналогами,
например конечными разностями, и задача минимизации переформулируется
как:

8i S̄ = arg min S(~ui ) ,~ui 2 D (3.3)


S2Eq k

На практике формулировку из соотношения (3.3) трудно применить к ре�


шению задачи, поэтому её можно переписать как соотношение (3.4), где норма
|| · || выбирается с учётом специфики задачи.

i=M
X
S̄ = arg min ||S(~ui )|| (3.4)
S2Eq k
i=1

Использование многокритериальной постановки задачи дает возможность


разнообразно настраивать обнаруженную систему. Например, для некоторых
задач точность воспроизведения данных менее важна, чем сложность уравне�
ния. Для других процессов акцент ставится на качестве предсказания на основе
решения дифференциального уравнения, и не так важна понятность модели. В
качестве первой группы критериев мы будет использовать введённые в разделе,
посвящённом однокритериальной оптимизации, метрики качества полученных
дифференциальных уравнений.
Стремление контролировать сложность получаемых моделей помимо ми�
нимизации погрешности проявляется в отдании предпочтения уравнениям c
простой структурой, которую можно связать с количеством активных токенов
(то есть соответствующих слагаемым с ненулевыми коэффициентами) в струк�
туре уравнения и порядком производных внутри них. Таким образом, критерии
обнаружения уравнений, используемые в алгоритме, включают метрику слож�
ности, представленную в выражении 3.5, где ord(tij ) - порядок частной произ�
водной. Для того, чтобы избежать “переобучения” уравнения (например, опреде�
ления необоснованно-сложной неоднородности) с элементарными функциями,
не соответствующими производным, для подобных токенов вводится базовая
сложность 0,5.
116

8
XX <n , if tij = @nu
0 @ n1 x1 ... @ nk xd im , n 1
C(L u) = compl(tij ); compl(ti j) =
:0.5 , otherwise
j i
(3.5)
Введённые критерии C(L0 u) и Q(L0 u) для каждого из k уравнений систе�
мы образуют пространство для оптимизации. Так как приложение предполага�
ет использование эволюционного алгоритма многокритериальной оптимизации,
вводится отношение подчинения между кандидатными решениями оптимизаци�
онной задачи. Введём бинарное отношение на множестве созданных алгоритмом
систем, позволяющее определить предпочтительность одного решения над дру�
гим. Будем говорить, что кандидатное решение - система дифференциальных
уравнений S1 (u) доминирует по Парето над решением S2 (u) (обозначается как
S1 (u) S2 (u)), если для каждого i - индекса уравнения системы выполняется
Qi (S1 (u))  Qi (S2 (u)), и Ci (S1 (u))  Ci (S2 (u)), а также существует индекс j,
для которого Qj (S1 (u)) < Qj (S2 (u)) и/или Cj (S1 (u)) < Cj (S2 (u)). Подобное
отношение доминирования можно интерпретировать как факт того, что каж�
дое уравнение системы S1 (u) одновременно не хуже описывает динамическую
систему, и представлено при помощи более простой структуры.
Очевидно, что подобное отношение вводит лишь частичный порядок: для
S1 (u) и S2 (u), таких, что существует множество индексов критериев I1 , по ко�
торым система S1 является предпочительной, и I2 , по которым предпочтения
отдаются системе S2 , нельзя сказать, что одно кандидатное решение доминиру�
ет над другим. Множество кандидатных решений называется недоминируемым,
если для любых двух систем дифференциальных уравнений из этого множества
нельзя сказать, что одно находится в подчинённом отношении перед другим.
Кандидатное решение S0 (u) называется оптимальным (Парето-оптималь�
ным), если не существует иных решений S 0 (u), для которых выполняется
S 0 (u) S0 (u). Целью алгоритма является определение Парето-оптимального
недоминируемого множества кандидатных систем дифференциальных уравне�
ний: 8S 0 (u)9Si (u), Si (u) S 0 (u).
117

3.2 Многокритериальный эволюционный алгоритм для обучения


модели в форме системы дифференциальных уравнений

Задачу оптимизации в пространстве, заданном метрикой качества и слож�


ности уравнений, предлагается решать при помощи эволюционного алгоритма,
основанного на Парето-доминировании и разложении пространства значений
критериев (MOEA/DD) [66]. В работе было рассмотрено, что путём изменения
параметров алгоритма генерации дифференциального уравнения (постоянной
разреженности оператора LASSO), можно изменить компромисс между каче�
ством и сложностью в создаваемых уравнениях.
На начальном этапе оптимизации, в соответствии с подходом, предложен�
ным алгоритмом MOEA/DD, мы должны оценить наилучшее достижимое зна�
чение для каждого из оптимизируемых функционалов для определения иде�
альной точки. Для метрики сложности целесообразно установить значение 0,
соответствующее уравнениям со структурой, аналогичной @u @t = C, C 2 R. Для
критерия качества представления процесса такое же предположение можно сде�
лать, лишь приняв большие допущения: возможный стохастический характер
процессов или шумы, присутствующие в измерениях, ограничивают достижи�
мое качество. Для оценки идеального значения метрики ошибки можно прове�
сти тестовый запуск алгоритма поиска уравнений, чтобы получить приближен�
но наилучшее качество решения. Далее, чтобы начать эволюционный поиск, мы
генерируем популяцию решений, находя системы со случайными постоянными
разреженности.
В основу подхода положено представление общей задачи многокритери�
альной оптимизации при помощи набора подзадач, соответствующих оптимиза�
ции вдоль выделенных в пространстве критериев направлений, определяемых
про помощи весовых векторов W = {w1 , w2 , ... , wn_pop }. Выбор весовых векто�
ров происходит согласно подходу, предложенному в статье [67], предполагающе�
му равномерному расположению на единичном симпликсе. С каждым вектором
ассоциируется сектор пространства значений критериев 1 , 2 , ... , n_pop . Для
каждого весового вектора (и, соответственно, сектора) выделяются соседние
векторы/секторы на основе величины угла между ними по соотношению 3.6.
118

Аналогично при помощи угловой меры для произвольной точки в пространстве


критериев можно определить расположение точки на секторах.
!
(a, b)
↵(a, b) = arccos p p (3.6)
(a, a) · (b, b)
Разбиение пространства критериев не является жёстким и не приводит
к разделению популяции решений, но служит для распределения канидадт�
ных решений равномерно по пространству. Число особей в популяции таким
образом должно соответстовать числу весовых векторов. В оригинальном ис�
следовании рекомендуется вводить размеры популяции и весовых векторов как
n_pop = H+m m 1
1
, где m - число критериев и H - число разбиений по осям
их значений. Также рекомендуется использовать значения H m. Однако,
такой размер популяции требует использования при поиске значительных вы�
числительных ресурсов даже при использовании в качестве метрики качества
уравнения нормы ошибки дифференциального оператора, так что при решении
практических задач размер популяции обычно не превышает 8-16 систем.
Для оценки качества выбранного кандидатной системы как решения опти�
мизационной подзадачи i вводится штрафная функция g pbi (S(u)|wi ) (penalty�
based intersection, PBI), определённое по соотношению 3.7. Слагаемое d1 соот�
ветствует близости кандидатного решения к идеальной точке, а добавление ве�
личины d2 с соответствующим весом ✓ позволяет отдавать предпочтения кан�
дидатным решениям, сонаправленным с весовым вектором.

g pbi (S(u)|wi ) = d1 + ✓d2


k(F(S(u)))T wi k
d1 = (3.7)
kwi k
wi
d2 = kF(S(u)) d1 k
kwi k

При расширении алгоритма обучения модели в форме дифференциальных


уравнений на задачу многокритериальной оптимизации, кодирование особи ЭА
происходит следующим образом: в дополнение к рассмотренным ранее графам,
соответствующие отдельным уравнениям системы, в хромосому включаются
метепараметры алгоритма построения уравнени - постоянные разреженности
для каждого уравнения: они объединяются в вектор ( 1 , 2 , ..., n_eq , i 2 R+ ).
119

Мотивация этого представления основана на наблюдении, что при заданных ги�


перпараметрах алгоритм поиска уравнения сходятся к решению (или решениям,
если одно частное решение соответствует нескольким уравнениям), определён�
ному входными данными.
При построении исходной популяции составяются графовые структуры,
соответствующие отдельным генам и кодирующие одиночные уравнения гра�
фы, каждый из которых соотносится с одной зависимой переменной. Подобный
приём используется для того, чтобы по крайней мере это уравнение описывало
её динамику: в слагаемом его правой части должна содержаться производная
зависимой переменной. Согласно подобной логике слагаемые, которые могут
быть выбраны для выделения в правую часть уравнения. Подобная логика огра�
ничивает множество возможных систем уравнений и не позволяет определять
системы, где некоторые уравнения - алгебраические. Для каждого уравнения
определяются свои значения метапараметров ( i ). В конце процедуры иници�
ализации каждое построенное уравнение случайным образом связано с подоб�
ластью, определяемой весовым вектором, представляющим решение подзадачи
оптимизации.
После построения множества весовых векторов, определения соседних сек�
торов для каждого сектора и генерации исходной популяции инициируется ос�
новной цикл эволюционного алгоритма. В ходе цикла, исполняемого до выпол�
нения критерия останова, для каждой подзадачи i выполняется шаг эволю�
ционной оптимизации: проводится отбор родительских кандидатных решений,
применяется оператор скрещивания для получения решений-потомков, к кото�
рым применяется оператор мутации, и происходит процесс обновления попу�
ляции, включающий в себя добавление потомков в неё и удаления наименее
предпочтительных кандидатных решений.
Для определения систем уравнений, используемых в качестве родитель�
ских особей используется оператор селекции. С фиксированной вероятностью
ms заданное число систем выбирается из соседних по отношению к обрабаты�
ваемому секторов, или в ином случае (соответствующем вероятности (1 ms ))
выбор происходит из всей популяции, без учёта принадлежности к секторам.
Первый тип отбора нацелен на “эксплуатацию” уже полученных решений, соот�
несённых с решаемой подзадачей, а второй - на “разведку”, попытку получить
новые решения для подзадачи, непохожие на уже рассмотренные. Из множества
120

отобранных особей составляются пары, для которых инициируется рекомбина�


ция (кроссовер).
В операторах рекомбинации и мутации для генов, соответствующих гра�
фам вычислений дифференциальных уравнений системы, используются подхо�
ды, рассмотренные в разделе, посвященном однокритериальной оптимизации
без модификаций. Так как оператор рекомбинации предполагает создание но�
вых особей на основе выбранных родителей и с характеристиками, близкими
к обоих родителей, данный оператор реализуется для генов, соответствующих
параметрам алгоритма построения уравнений, через подбор значений для по�
томков в диапазоне между их родителями. Новые значения параметров для
каждого гена в хромосомах потомков выбираются как взвешенная сумма роди�
тельские, где коэффициент ↵ 2 U (0, 1). Схема рекомбинации систем показана
в уравнении 3.8.

( 11 , 1
2, ... , ! ( 01
1
n_eq )
01 01
1 , 2 , ... , n_eq )
( 21 , 2
2, ... , ! ( 02
2
n_eq )
02 02
1 , 2 , ... , n_eq )
pi ⇠ U (0,1) (3.8)
if pi < pxover then 01 1
i = ↵ ⇤ i + (1 ↵) ⇤ 2i
else 01 1 02
i = i, i = i
2

Оператор мутации для генов, содержащих параметры построения уравне�


ния, представлен через изменение значения на приращение из нормального рас�
пределения N (0, ) с заранее заданной вероятностью воздействия pmut 2 (0, 1),
как в уравнении 3.9.

( 1, 2, ... , n_eq ) ! ( 01 , 02 , ... , 0n_eq )


pi ⇠ U (0,1)
(3.9)
if pi < pmut then 0i = i + , ⇠ N (0, )
else 0i = i
Процедура определения новых кандидатных решений из подпопуляции
P of f spr = {S1of f spr (u), ... , Snofo fffspr
spr (u)} и удаления наименее предпочтительных
(с учётом обрабатываемых секторов) решений для сохранения численности на�
зывается “обновлением” популяции. Ход процедуры обновления популяции ре�
гулируется состоянием популяции: сколько недоминируемых множеств можно
выделить среди предложенных систем дифференциальных уравнений.
121

В случае, если вся популяция соотносится с одним недоминируемым мно�


жеством, ставится задача достижения наибольшего разнообразия среди пред�
ставленных решений и достижения более равномерного покрытия Парето-опти�
мального множества. Определяется сектор, к которому принадлежат больше
всего решений, и из него удаляется особь с наивысшим значение штрафной
функции g pbi (S(u)|wi ). В случае, если в нескольких секторах находится одина�
ковое число кандидатных решений, при выборе для удаления алгоритм учиты�
вает сумму значений штрафных функций систем, принадлежащих к сектору, а
затем также удаляется особь с наибольшим значением g pbi .
Далее рассмотрим сценарий, когда кандидатные решения образуют более,
чем одно недоминируемое множество. Алгоритм предполагает рассмотрение по�
следнего недоминируемого множества. В случае, если ему принадлежит лишь
одно решение Sk (u), то данное Sk (u) рассматривается на предмет принадлеж�
ности к региону k . Если ему соответствует лишь одно решение, то мы должны
сохранять Sk (u), как важную для создания разнообразия систему. Для удале�
ния выбирается решение с наибольшим значением штрафной функции g pbi . В
случае, если в популяции присутствует несколько решений, соотносимых с k ,
то решение Sk (u) удаляется. При наличии нескольких решений на последнем
уровне недоминирования наиболее загруженная подобласть k подвергается
разрежению. Решение с наибольшим значением штрафной функции удаляет�
ся из популяции.

3.3 Выводы к главе 3

В данной главе были рассмотрены особенности постановки задачи обуче�


ния модели в форме системы дифференциальных уравнений при помощи алго�
ритма многокритериальной оптимизации. Был предложен адаптированный под
задачи обучения алгоритм на основе Парето-доминирования и разложения про�
странства значений критериев. Для оценки сложности модели в форме диффе�
ренциальных уравнений был введён соответствующий критерий оптимизации:
подобная метрика позволяет ограничивать переобученные структуры, в кото�
рых дополнительные слагаемые описывают шумовые компоненты в данных.
122

Рисунок 3.1 � Схема многокритериального эволюционного алгоритма


обучения модели в форме системы дифференциальных уравнений.

Также была показано, что использование многокритеральной постановки зада�


чи для обучения модели в форме одиночного дифференциального уравнения
позволяет добиться лучшей сходимости алгоритма и позволяет получить мно�
жество Парето-оптимальных с точки зрения критериев сложности и качество
дифференциальных уравнений.
123

4. Валидация разработанных методов

4.1 Валидация однокритериального метода обучения модели в


форме дифференциального уравнения

В данное главе представлены экспериментальные исследования, отража�


ющие свойства эволюционного метода обучения модели в форме дифференци�
ального уравнения по данным. Валидация метода обучения модели в форме
дифференциального уравнения, модели динамической системы, была нацелена
на экспериментальное подтверждение основных свойств метода: сходимости,
устойчивости к шуму во входных данных и относительно разбиений модели�
руемой области, соответствующей ограниченной выборке. Подобные вопросы в
наиболее наглядной форме исследуются на синтетических данных, в качестве
которых используются частные решения дифференциальных уравнений с из�
вестными свойствами.

4.1.1 Экспериментальное исследование метода обучения модели в


форме дифференциального уравнения

Первым типом задач, к которым применим разработанных подход, явля�


ется обучение модели в форме уравнения для описания одномерных данных
(временного ряда). При отсутствии заданных в явной форме частных произ�
водных по иным независимым переменным, в подобных задачах для описания
динамики u = u(t) строится обыкновенное дифференциальное уравнение n-го
порядка в форме F (t, u, u0 , ... , u(n) ) = 0. Разработанный алгоритм позволяет
определять дифференциальные уравнения с произвольными структурами, если
нелинейность и неоднородность могут быть выражены через заданные элемен�
тарные функции.

Синтетические данные: ОДУ первого порядка На задачах обучения


модели в форме обыкновенных дифференциальных уравнений можно более
124

наглядно проиллюстрировать случаи, когда помимо определения корректного


набора элементарных функций алгоритму необходимо определить параметры
некоторых токенов в структуре. В качестве данных использовалось аналити�
ческое решение ОДУ первого порядка 4.1, имеющее форму 4.2. Во избежание
получения упрощённой структуры дифференциального уравнения (u00 = u)
порядок искомого ДУ был ограничен первым: по данным численно (на основе
полиномов Чебышева) определялась первая производная. В процессе определе�
ния уравнений в качестве токенов были выбраны тригонометрические функции,
исходная функция и производные, а также независимые переменные (коорди�
натные сетки). Для тригонометрических функций, помимо общего для всех эле�
ментарных функций параметра - степени, необходимо было оптимизировать и
частоту.
Оценка качества результата в этих экспериментах проводилась в отличном
от обычного подхода машинного обучения: вместо оценки метрики, сопоставля�
ющей временные ряды/полей значений предсказаний и валидационных данных,
мы сравниваем символьные выражения. Прямой подход включает в себя про�
верку, сохраняется ли общая структура уравнения после повторного открытия:
оно должно иметь такое же количество (значащих) слагаемых, содержащих од�
ну и ту же функцию. Следующим показателем качества является сходство ко�
эффициентов уравнений и параметров функций: из-за неточностей машинных
расчетов, используемых в процессе вычисления производных и последующего
обучения модели, значения параметров могут отличаться от ожидаемых.

dx
x sin t + cos t = 1 (4.1)
dt

x = sin t + C cos t (4.2)

Проверочные данные были получены путем применения функции реше�


ния уравнения к интервалу значений [0, 4⇡] с сеткой из 1000 узлов. Парамет�
ры эволюционного алгоритма для экспериментов приняты следующие: числен�
ность популяции выбрана равна 10 особям, доля популяции, выбираемая для
оператора кроссовера - 20%, коэффициенты мутаций и скрещивания составили
0,4 и 0,5 соответственно. Вероятность мутации принимается равной 0,8, а особь
с лучшим значением функции приспособленности считались �элитой� и не мог�
125

Рисунок 4.1 � Распределения значений функции приспособленности (в


логарифмической шкале) по эпохам эволюционного алгоритма на основе 10
независимых запусков.

ла подвергаться воздействию оператора мутации. В качестве критерия останова


алгоритма выступало ограничение по числу итераций обучения модели в фор�
ме дифференциального уравнения: 108 эпох. Для оценки работы эволюционного
алгоритма был проведён статистический анализ значений функции приспособ�
ленности ||Lu||2 , соответствующей лучшей кандидатной особи в популяции для
эпох ЭА, на основе данных, полученных из 10 независимых экспериментов.
Результаты эксперимента представлены на рисунке 4.1, где показана ди�
намика функция приспособленности (в постановке минимизации) в процессе
эволюции. Первоначальное улучшение (15-40 эпох) происходит за счёт опти�
мизации структуры ДУ, в то время как на последующем этапе выполняется
поиск параметров токенов (частоты и соответсвующие им амплитуды) для три�
гонометрических функций. Незначительные (в пределах 10 2 ) различия между
результирующими уравнениями можно связать как раз с этими различиями в
параметрах: во всех 10 запусках было определено уравнение с искомой струк�
турой, как на выражении 4.3, где под величиной ✏ подразумевается малая вели�
чина.

(1 ± ✏) · x · sin ((1 ± ✏) · t) + (1 ± ✏) · x0 · cos ((1 ± ✏) · t) = (1 ± ✏) (4.3)


126

Синтетические данные: ОДУ второго порядка Далее, рассмотрим за�


дачу обучения модели в форме обыкновенного дифференциального уравнения
высших порядков. Для иллюстрации работы алгоритма было выбрано уравне�
ние 4.4. В этом эксперименте было проведено сравнение результатов предсказа�
ния значений системы на основе подхода, включающее в себя обучения модели
в форме дифференциального уравнения по данным и дальнейшее его решение
на моделируемой области. Решение ДУ имеет как периодическую компоненту,
так и трендовую, что соответствует многим реальным процессам, для модели�
рования которых разрабатывался подход.

x00 + sin (2t)x0 + 4x = 1.5t (4.4)

На данном примере рассмотрим метод предсказания при помощи получен�


ных по данным ОДУ. В качестве обучающей выборки использовались данные
со временного полуинтервала [0, 8), тогда как данные с полуинтервала [8, 16)
были приняты за валидационную выборку. Алгоритм был инициализирован на
следующих параметрах:
По данным было получено уравнение: 1.486t 0.991x00 3.945x 0.0286 =
sin(1.999t)x0 . Для предсказания состояния системы на основе уравнения была
решена начальная задача, отражённая на рисунке 4.2, где показывается, что
отклонение решения полученного уравнения от валдиационной выборки не яв�
ляется существенным: M AP E = 0.0098.
При слишком низких значениях постоянной регуляризации , алгоритм
определил структуру уравнения как 0.63x 0.40t 0.52t·sin (2t)+0.26t·cos (2t)+
0.92 = x0 . Она допускает более низкое значение ошибки дифференциального
оператора на данных, но, как можно видеть из рисунка 4.2, некорректно обоб�
щает состояние системы.

Синтетические данные: осциллятор Ван дер Поля Способность разра�


ботанного метода восстанавливать структуру обыкновенных дифференциаль�
ных уравнений со сложной структурой можно продемонстрировать на примере
осциллятора Ван дер Поля. Первоначально разработанная для описания цикла
релаксации-колебаний, создаваемого электромагнитным полем [68], модель на�
шла применение в других областях науки, таких как биология или сейсмология.
Система описывается при помощи нелинейного ОДУ второго порядка:
127

b
Рисунок 4.2 � Предсказание состояния системы на основе полученных по
данным уравнений: левый график - корректное уравнение, правый график -
“переобученное” уравнение.

u00 + E(u2 1)u0 + u = 0, (4.5)

где E - положительная постоянная, которая в экспериментах принималась:


E = 0.2.
Отдельным целью данного исследования является сравнение с библиоте�
кой SINDy ([15]). Несмотря на то, что данный инструмент не может восстанав�
ливать структуру дифференциальных уравнения порядков выше первого, урав�
нение Ван дер Поля можно привести к системе ОДУ первого порядка ур. (4.6),
которая, в свою очередь, может быть определена при помощи разреженной ре�
грессии.
8
<u0 = v;
(4.6)
:v 0 = E(u2 1)v u
Набор данных был представлен решением уравнения 4.5 с начальными
p
условиями u = 3/2; u0 = 1/2 для области в 320 точек с шагом 0,05, начи�
ная с условной точки t = 0. Численное решение получено при помощи метода
Рунге-Кутты четвертого порядка.
Несмотря на то, что в данном случае может быть сформулирована много�
критеральная постановка задачи, соответствующая поиску системы уравнений,
при её решении возникают сложности со сходимостью, связанные с равенством
токенов x0 = y. В данном случае, отсутствие ограничений по форме приводит
128

к некорректному результату: эволюционный алгоритм предлагает кандидатные


уравнения, такие как v · u0 · v 0 = v 0 (u0 )2 , которые являются тривиальными тож�
дествами относительно правильно найденного уравнения u0 = v.
Несмотря на невозможность получить ОДУ как систему уравнений перво�
го порядка, предложенный алгоритм правильно идентифицирует правильную
модель в режиме поиска одиночного уравнения. Анализ предсказаний, основан�
ных на полученных уравнениях, может дать представление о том, как ошибки
в обнаружении уравнений влияют на прогнозирующую способность модели.
Иллюстрация проверки работоспособности алгоритма на интервале вре�
мени, прилегающем к обучающему, представлена на рис. 4.3. Случай (а) пред�
ставляет собой пример прогноза с совершенно неправильной структурой урав�
нения, усиленный недостаточно обученной прогнозирующей нейронной сетью.
В частности, этот случай показывает, что набор проверочных данных должен
быть достаточно длинным, чтобы представлять свойства уравнений, представ�
ляющие процесс. Как при оценке пригодности уравнения во время оптимиза�
ции, так и при проверке, включающей решение задачи начального значения,
ошибка значительно ниже в области, следующей за определенными условиями.
Здесь, на участке [0, ⇡ 1.7] решение предложенного уравнения существенно не
отклоняется от правильных значений.
Следующий случай (b) представляет собой прогнозы с использованием
дифференциальных уравнений на основе данных с правильной структурой, в
то время как коэффициенты оцениваются с предельной ошибкой. В случае, ес�
ли свойства обнаруженной динамической системы, связанной с уравнением, с
вектором коэффициентов ↵0 не приводят к бифуркации от решения искомой
системы с параметрами ↵, кандидат склонен давать осуществимые прогнозы.
Решение уравнений на основе данных с минимальными отклонениями от факти�
ческих коэффициентов представлено в части (c) рис. 4.3 и следует ожидаемой
динамике с минимальными неточностями.
129

Рисунок 4.3 � Примеры моделирования осциллятора Ван дер Поля при


помощи полученным по данным уравнениям.
130

Таблица 12 � Статистика определения корректных слагаемых для уравнения,


описывающего динамику осциллятора Ван дер Поля. Истинное уравнение
(идеальный результат идентификации) имеет структуру:
u00 = 0.2(u2 1)u0 u.
EPDE
NL, % u2 u0t u0t u u00
tt
P, % b, µ ± 1.98 P, % b, µ ± 1.98 P b, µ ± 1.98 P b, µ ± 1.98
0 100 0.199 ± 0.0 100 0.199 ± 0.0 100 1.00 ± 0.0 100 1.00 ± 0.0
0.5 100 0.181 ± 33.6 · 105 100 0.187 ± 11.9 · 105 100 1.00 ± 5.9 · 105 100 1.0 ± 0.0
1 60 0.085 ± 3.96 · 105 0 80 0.065 ± 1.98 · 105 0
2.5 100 0.36 ± 0.0 20 1.0 ± 0.0 20 0.01 ± 0.0 0
5 60 0.013 ± 1.63 · 10 5 80 1.0 ± 0.0 0 0

4.1.2 Синтетические наборы данных, заданные


дифференциальными уравнениями в частных производных

При исследовании реальных динамических систем, распространённым


классом моделей выступают дифференциальные уравнения в частных произ�
водных, описывающие динамику или пространственную структуру данных на
основе соотношений, содержащих различные частные производные зависимой
переменной. Таким образом, обеспечение способности метода определять кор�
ректное УрЧП является одной из приоритетных задач проведённого исследова�
ния.
В рамках данной валидационной работы, как и в случае с ОДУ, проводи�
лись эксперименты на синтетических данных.
Далее, приведены результаты валидации алгоритма на синтетических дан�
ных: в качестве входных данных для алгоритма использовались решения зара�
нее известных дифференциальных уравнений. Полученное в таком случае част�
ное решение дифференциального уравнения имитирует доступное для наблюда�
теля проявление моделируемой системы. Валидация проводилась на уравнении
теплопроводности, волновом, Бюргерса, Кортевега-де Фриза, а также стацио�
нарные случаи (уравнение Пуассона). Для иллюстрации работы алгоритма на
входных данных с разным уровнем внесённого шума на таблице 13 приведены
результаты построения уравнений (волнового, Бюргерса, солитонного решения
Кортевега-де Фриза), оцененные при помощи доли успешных запусков среди 20
независимых запусков. Запуски проводились при достаточном числе итерация
для обеспечения сходимости алгоритма.
131

Таблица 13 � Доля успешных поисков уравнений (%) при помощи


однокритериального алгоритма в зависимости от шума во входных
синтетических данных

Уровень шума Волновое ур. Ур. Бюргерса Ур. Кортевега-де Фриза


0 100 100 100
1.0 100 90 65
2.5 100 75 5
5.0 85 20 0
7.5 35 0 0
10.0 0 0 0
15.0 0 0 0

На синтетических данных было проведено несколько экспериментов по


добавлению шума: прежде всего, в части точек (40% от общего числа) были
добавлены шумы различной величины: (µ = 0; = n ⇤ ||u(t)||). Уровень шума
в данных определялся из соотношения 4.7, где unoised - данные с внесённым
шумом, а uoriginal - исходные. После этого полученные данные использовались
как входные для алгоритма.

||unoised uoriginal ||2


NL = · 100% (4.7)
||uoriginal ||2

a) Волновое уравнение Первая постановка задачи включала в себя исполь�


зование алгоритма на решении волнового уравнения с двумя пространственны�
ми переменными на соотношении 4.10, где t - время, x, y - пространственные
координаты, u - изучаемая функция (например, малое внеплоскостное смеще�
ние мембраны), ↵1 = ↵2 = 1. Уравнение было решено с использованием метода
конечных разностей для области, состоящей из 201 ⇥ 201 ⇥ 201 точек в двух
пространственных измерениях & времени. Сетка, которая покрывала область,
имела равномерно распределенные узлы с координатами от 0 до 10. Начальны�
ми условиями для уравнения были соотношение 4.8 & соотношение 4.9, и u = 0
было граничным условием для задачи.

1 1 1
u = 10000 sin ( xy(1 x)(1 y))2 (4.8)
100 10 10
132

@u 1 1 1
= 1000 sin ( xy(1 x)(1 y))2 (4.9)
@t 100 10 10

@ 2u @ 2u @ 2u
= ↵ 1 + ↵ 2 (4.10)
@t2 @x2 @y 2
Алгоритм определения уравнения был настроен следующим образом: в ка�
честве механизма подготовки данных использовалась аппроксимации данных
при помощи полносвязной нейронной сети (4 скрытых слоя, содержащих 256,
64, 64, 1024 нейронов, и использующих гиперболический тангенс в качестве
функции активации, обучение на 10000 эпох). Последующее вычисление тен�
зоров производных исполнялось при помощи метода конечных разностей (цен�
тральная схема, шаг сетки - 0.01 ⇤ , где - шаг сетки, на которой были
поданы входные данные, по координатной оси дифференцирования). Поряд�
ки производных ограничивались 3-им по всем координатам. В эксперименте
был использован эволюционный алгоритм со следующими параметрами: число
эпох поиска уравнения nepochs = 25, размер популяции кандидатных решений
np op = 10, вероятность уравнения подвергнуться мутации - pmut = 0.2, до�
ля популяции, подвергаемая кроссоверу, nparent = 0.4, вероятности слагаемых
уравнения подвергнуться мутации внутри мутирующих кандидатных решений
и вероятность обмена между парой слагаемых в рамках кроссовера, составили
pterm_mut = pterm_crossover = 0.3. Для оценки приспособленности использовался
подход на основе
Результаты эксперимента таковы: метод успешно обнаруживает структу�
ру уравнения для интервала уровней шума до 7.5 %, что соответствует стан�
дартному отклонению гауссова шума в интервале [0, 0.2], умноженного на норму
поля во временном интервале. Ошибки весов в этом интервале незначительны,
как показано в Tab. 13. При более высоких уровнях шума (в интервале от 7,5%
до 10%) алгоритм обнаруживает дополнительные слагаемые, отсутствующие в
исходном уравнении, что приводит как к искажению структуры уравнения, так
и к некорректному вычислению весов. Наконец, при высоких уровнях (от 10%)
шума предлагаемый алгоритм теряет способность определять даже элементы
желаемой структуры уравнения, сходясь к структурам, описывающим шум в
данных.
133

b) Уравнение Бюргерса Для исследования поведения алгоритма на дан�


ных нелинейных дифференциальных уравнениях в частных производных был
проведён эксперимент на входных данных, полученных на основе решения урав�
нения Бюргерса. Интерес этого примера связан с практической значимостью
данного примера: оно соответствует уравнению движения для одномерного слу�
чая системы уравнений Навье-Стокса, если принять за u моделируемую пере�
менную (скорость течения), за ⌫ - вязкость среды.

@u @u @ 2u
+u =⌫ 2 =0 (4.11)
@t @x @x

c) Уравнение Кортевега-де Фриза Для того, чтобы создать более слож�


ную постановку задачи, использовалось солитонное решение (выражение 4.13)
уравнения Кортевега-де Фриза (уравнение 4.12). Это решение представляет со�
бой перенос одиночной волны, распространяющейся со скоростью c, из началь�
ного положения, заданного положением гребня волны в точке x0 . Данные для
теста получаются из функции решения уравнения 4.13. Решение оценивается
на равномерной сетке из 101 пространственной точки в интервале x 2 [0,10] и
151 временной точки в интервале t 2 [0,15].

@u @u @ 3 u
+ 6u + =0 (4.12)
@t @x @x3
p
c c
u= sech2 (x ct x0 ) (4.13)
2 2
Применение структуры к решению уравнения из соотношения 4.13, оце�
ниваемому на регулярной сетке, не позволило определить исходное уравнение
по данным. Неправильно обнаруженная модель возникает из-за более простых
случайных форм данных, таких как ut = cux , также обладающих низкими зна�
чениями ошибки уравнения. При использовании однокритериального подхода,
более низкая сложность этого уравнения приводит к большей вероятности его
открытия, чем для полного уравнения КдФ. Кроме того, отсутствие в структу�
ре производных высокого порядка, которые вычисляются с большей численной
ошибкой, чем производные первого и второго порядка, могут привести к более
низким значениям функционала ошибки, чем в корректном уравнении. Этот
эксперимент показывает, что однокритериальный алгоритм имеет склонность
134

в ряде случае не сходиться к полной модели, а обнаруживать �упрощённые


уравнения�, которые обычно представляют собой равенство между функциями
(обычно различными производными), присутствующими в наборе допустимых
элементарных функций.
Результаты экспериментов по восстановлению дифференциальных уравне�
ний в частных производных обобщены на таблице 14. Ожидаемо, для уравнений
с более простой структурой

Таблица 14 � Доля успешных поисков уравнений (%) при помощи


однокритериального алгоритма в зависимости от шума во входных
синтетических данных

Уровень шума Волновое ур. Ур. Бюргерса Ур. Кортевега-де Фриза


0 100 100 100
⇡ 1.0 100 90 65
⇡ 2.5 100 75 5
⇡ 5.0 85 20 0
⇡ 7.5 35 0 0
⇡ 10.0 0 0 0
⇡ 15.0 0 0 0

4.1.3 Реальные данные: восстановление уравнения


теплопроводности

Особое внимание в исследовании было отведено апробации подхода к поис�


ку уравнений, описывающих реальные системы. Был поставлен эксперимент по
определению уравнения, описывающего динамику температуры в среде вокруг
проволоки-нагревателя. В теории, для процесса применимо уравнение теплопро�
водности в полярных координатах. Были исследованы два случая различных
сред: в первой процесс распространения тепла имеет диффузионную природу,
в то время как во второй присутствует конвекция. Уравнения, описывающие
диффузионное распространение тепла, имеют структуру 4.14 в которой ↵ 2 R
- постоянная. Первичный эксперимент был совершён на синтетических данных
135

и были получены результаты 15. ✏ соответствует пренебрежимо малой величине,


соответствующей машинной погрешности вычислений.

1 @u @ 2 u @u
↵ +↵ 2 = (4.14)
r @r @r @t
2
Уровень шума 1 @u
r @r
@ u
@r2
@u
@t C
0 (1.5 ± ✏) · 10 7
(1.54 ± ✏) · 10 7
1 ✏
0.1 (1.51 ± ✏) · 10 7
(1.53 ± ✏) · 10 7
1 ✏
0.3 (1.4 ± 0.3) · 10 7
(1.5 ± 0.21) · 10 7
1 0.0023 ± 0.005
0.5 (1.45 ± 0.5) · 10 7
(1.5 ± 0.21) · 10 7
1 0.05 ± 0.026
0.7 (1.4 ± 0.7) · 10 7
(1.3 ± 0.4) · 10 7
1 0.1 ± 0.053
1 (1.3 ± 0.3) · 10 7
(1.1 ± 0.7) · 10 7
1 0.3 ± 0.1
Таблица 15 � Полученные коэффициенты перед слагаемыми уравнения
теплопроводности в цилиндрических координатах. Коэффициенты
нормализованы так, что перед слагаемым с @u @t коэффициент - единица. C -
соответствует свободному слагаемому.

Полученные по экспериментальным данным (10 независимых эксперимен�


тов) уравнения теплопроводности без конвекции можно описать при помощи
соотношения 4.15, что соотносится с ожидаемыми парамет. Неточность в значе�
ниях коэффициентов связана с различными аппроксимациями входных данных
при помощи нейронных сетей.

2
8 1 @u 8@ u @u
(9.4 ± 0.11) · 10 + (9.423 ± 0.04) · 10 + ✏ ± 0.01 · 10 8
= (4.15)
r @r @r2 @t

Уравнение конвекции содержит в своей структуре неизмеренное (в общем


случае мы принимаем, что и неизмеримое) поле скорости. Классические методы
решения обратных задач не позволяют получить его в точной (аналитической)
форме, так что для её представления использовалась параметрическая функция
- произведение полиномов, зависящих от радиуса от нагревателя и времени.
Алгоритм определил структуру уравнения, как на соотношении 4.16, где v2 -
параметризованное поле скорости среды, что соответствует ожидаемой.

1 @u @ 2u @u @u
4.1 · 10 8
· + 5.8 · 10 9
· 2 + v2 = (4.16)
r @r @r @r @t
136

Было показано, что слабым местом предложенного подхода является тре�


бование к параметрическому представлению элементарных функций и испол�
нение соответствующей оптимизации. Хотя приближение токенов при помощи
нейронных сетей и может позволить автоматизировать процесс, оно также име�
ет и ряд недостатков. Во-первых, алгоритм может сходиться в локальные оп�
тимумы качества кандидатных уравнений, полученные за счёт переобучения
коэффициентов под неоптимальную структуру из производных. Помимо этого,
структуры глубоких полносвязных нейронных сетей с большим числом пара�
метров трудно анализировать, так что получение подобных уравнений будет
противоречить идее интерпретируемого машинного обучения.

4.2 Валидация многокритериального метода обучения модели в


форме системы дифференциальных уравнений

Валидация метода обучения модели в форме системы дифференциальных


уравнений была проведена на обыкновенных дифференциальных уравнениях и
на уравнениях в частных производных. Первый эксперимент был посвящён вос�
становлению системы уравнений, описывающих модель Лотки-Вольтерра (си�
стему “хищник-жертва”) 4.17 и системы уравнений, описывающих осциллятор
Лоренца 4.18. Ниже приведены результаты, отражающие долю успешных запус�
ков, когда желаемое уравнение находилось на множестве Парето кандидатных
уравнений. Далее, для оценки пригодности полученных систем дифференциаль�
ных уравнений для предсказания состояния процесса использовался алгоритм
автоматического решения уравнений и оценивалась метрика MAPE 4.19 на от�
ложенной выборке - временном интервале после периода обучения. В каждом
случае ставилась начальная задача, в качетсве заданных значений задавалось
значение моделируемых переменных в начальным момент времени t0 , получен�
ный из обучающей выборки.
8
< du = ↵u uv;
dt
(4.17)
: dv = uv v;
dt
137

8
>
> dx
= · (y x);
>
< dt
dy
dt = x · (⇢ z) y; (4.18)
>
>
>
: dz = xy
dt z;

Таблица 16 � Доля успешных поисков систем уравнений R(%) и ошибка


моделирования (MAPE) в зависимости от шума во входных синтетических
данных

Модель Лотки-Вольтерра Модель Лоренца


Уровень шума, % R, % M AP E R, % M AP E
0 100 0.42 100 1.5
0.5 100 2.7 100 4.1
1.0 90 17 70 37
2.5 70 38 30 42
5.0 15 88 5 93

Результаты экспериментов по восстановлению систем дифференциальных


уравнений на основе данных приведены на таблице 16, где запуски алгоритма
оценивались по метрике MAPE 4.19 на тестировочном интервале, где оценива�
ется относительное отклонение предсказания fipred от фактического значения
fif act . Нужно отметить, что даже в случае некорректно определённых уравне�
ний значение MAPE не превышает 100%, так как алгоритм получает уравнения,
решения которых сходится к нулевым. Пример подобного воспроизведённого
уравнения представлен на рисунке 4.4
n
100 X fipred fif act
M AP E = | f act
| (4.19)
n i=0 fi
Далее в диссертационной работе рассматривается особенности примене�
ния многокритериального подхода к задаче обучения модели в форме одиноч�
ного дифференциального уравнения. По аналогии с кодировкой особи для ре�
шения задачи поиска системы ДУ, в хромосому помимо структуры отдельных
уравнений включаются параметры, определяющие поведение алгоритма гене�
рации графа уравнения. Возможность алгоритма оценивать кандидатные урав�
нения не только с точки зрения качества воспроизведения состояинй динамиче�
138

Рисунок 4.4 � Пример предсказания состояния системы Лотки-Вольтерра на


основе полученных по данным уравнений.

ской системы, но и на основе сложности их структуры, позволяет расширить


разнообразие популяции в процессе эволюции. Иллюстрировать эту идею мож�
но тем фактом, что простые уравнения со сложностью “2 активных токена”,
не полностью описывающие динамику процесса, а представляющие лишь часть
динамики, будут оставаться в популяции, и могут участвовать в поиске уравне�
ния как наиболее простые содержательные модели. Таким образом, у алгоритма
появляется возможность определять относительно-простые уравнения, не опре�
деляющие шумовую компоненту данных.
Результаты экспериментов по сравнению эффективности одно- (single�
objective) и многокритериального (multi-objective) поиска уравнений в частных
производных при одинаковых вычислительных ресурсах приведены на рисунке
4.5. По полученным данным можно определить, что даже в задаче поиска оди�
ночного уравнения многокритериальная постановка оптимизационной задачи
имеет ряд преимуществ, обеспечивая более раннюю и достоверную сходимость,
однако требуют экспертного вердикта для определения желаемого уравнения
на множества Парето-оптимальных кандидатов.
При использовании алгоритма многокритериальной оптимизации резуль�
татом работы алгоритма является множество Парето, содержащее набор реше�
ний задачи, что ставит перед исследователем проблему выбора наиболее под�
ходящего уравнения. Пример результата работы алгоритма при использовании
139

104

101 102

100 10 1

2
6 × 100 10
4
10
0
4 × 10 10 6

0 8
3 × 10 10
10
10
2
2 × 10 0 10
Single Objective Multi-Objective Single Objective Multi-Objective Single Objective Multi-Objective

а б в

Рисунок 4.5 � Значения MAE на обучающих данных при использовании одно-


и многокритериального подхода на примерах волнового уравнения (а),
уравнения Бюргерса (б), и уравнения Кортевега-де Фриза(в)

входных данных, полученных из решения уравнения Ван дер Поля, представ�


лен на рисунке 4.6. Общее решение для выбора предпочтительного решения
остаётся за рамками диссертационной работы.

Рисунок 4.6 � Пример Парето-множества решений оптимизационной задачи


поиска уравнения Ван дер Поля, изображенного в пространстве критериев
оптимизации: ошибки воспроизведения процесса “Objective 1” и критерия
сложности уравнения “Objective 2”.

Отдельным этапом валидации алгоритма является рассмотрение его эф�


фективности по сравнению с ближайшими аналогами. Для сравнения использо�
140

валась библиотека SINDy, основанная на операторе LASSO и описанная в главе


1 данной работы. Входные данные составлялись таким образом, чтобы покры�
вать типовые задачи поиска модели динамической системы. Для сравнения спо�
собности алгоритмов восстановить известные дифференциальные уравнения в
частных производных использовались уравнения Бюргерса 4.11 и Кортевега-де
Фриза 4.12. Возможности построения одиночных ОДУ проверялись на примере
уравнения Ван дер Поля 4.5, и систем на основе системы уравнений Лотки�
Вольтерры (4.17).

Таблица 17 � Статистика включения корректных слагаемых для уравнения


Бюргерса и значений соответствующих коэффициентов в сочетании с
полученными на основе SINDy для заданных уровней шума. Аббревиатура g.t.
обозначает истинную структуру уравнения.
EPDE
SINDy
NL, % u0t u00
xx uu0x
P, % b, µ ± 1.98 P, % b, µ ± 1.98 P, % b, µ ± 1.98 g.t. u0t = 0.1u00
xx uu0x
0 100 1.001 ± 0 100 0.106 ± 0.0 100 0.997 ± 0.0 u0t = 0.1u00
xx 1.001uu0x
1 90 0.830 ± 0.218 60 0.053 ± 0.002 10 0.980 ± 0.0 u0t = 0.248u0x 0.292uu0x
2.5 80 0.599 ± 0.158 50 0.018 ± 0.0 0 u0t = 0.265u0x 0.229uu0x
5 100 0.674 ± 0.139 20 0.012 ± 0.0 0 u0t = 0.001uu000 xxx 0.825uu0x
10 100 0.674 ± 0.103 40 0.004 ± 0.0 0 u0t = 0.133uu00xx

Таблица 18 � Статистика включения правильных слагаемых для уравнения


Кортевега-де Фриза и соответствующих коэффициентов в сочетании с
полученной SINDy для указанных уровней шума. Аббревиатура g.t.
обозначает истинную структуру искомого уравнения и
xxx + 0.025u uxxx .
N [u] = 0.515u0x + 3.813u2 u0x 0.013uu000 2 000

EPDE
SINDy
NL, % u0t uu0x u000
xxx
P, % b, µ ± 1.98 P b, µ ± 1.98 P b, µ ± 1.98 g.t. u0t + u000 0
xxx + 6uux = 0
0 100 1.001 ± 0.0 100 6.002 ± 0.0 100 1.06 ± 0.0 u0t + 0.992u000 0
xxx + 5.967uux = 0
0.5 80 0.913 ± 0.032 60 5.914 ± 2.59 70 1.31 ± 0.57 u0t 0.906u0x = 0
1 40 0.437 ± 0.156 0 0 u0t 0.816u0x = 0
2.5 100 0.36 ± 0.0 20 1.0 ± 0.0 20 0.01 ± 0.0 u0t 0.004u000
xxx 0.844u0x = 0
5 60 0.01 ± 2.13 · 10 5 80 1.0 ± 0.0 0 u0t 0.003u000
xxx 1.859uu0x + N [u] = 0

Время выполнения для платформы EPDE составляет в среднем 91 секун�


ду, а поиск с помощью SINDy занимает 0,032 секунды. Это временное расхож�
дение можно объяснить алгоритмической простотой выполнения и меньшим
пространством поиска для подхода на основе разреженной регрессии. Результа�
ты проверки обобщены в таблице 17. С появлением шума оба подхода быстро
теряют способность выводить уравнения с правильной структурой. Алгоритм
141

может надежно сходиться к правильному уравнению с правильными коэффи�


циентами только при уровне шума, равном или меньшем 1%.

Таблица 19 � Статистика включения правильных слагаемые для уравнения,


описывающего динамику жертвы, с соответствующими коэффициентами и
уравнением, полученными с помощью SINDy. Основное уравнение истинности
обозначается аббревиатурой g.t., а NN L [u, v] � дополнительные менее
значимые слагаемые, обнаруженные SINDy в структуре уравнения.
EPDE
SINDy
NL, % u u0 uv
P, % b, µ ± 1.98 P b, µ ± 1.98 P b, µ ± 1.98 g.t. u0 = 20u 20uv
0 100 19.83 ± 0.24 100 1.0 ± 0.0 90 20.06 ± 0.008 u0 = 20.096u 19.842uv + N0 [u, v]
0
0.5 100 19.969 ± 0.0 100 1.0 ± 0.0 100 20.214 ± 0.0 u = 20.194u 19.87uv + N0.5 [u, v]
1 90 19.070 ± 0.263 100 1.0 ± 0.0 40 19.361 ± 0.0 u0 = 20.726u 19.904uv + N1.0 [u, v]
0
2.5 50 6.964 ± 175.4 60 0.38 ± 0.368 10 1.4 ± 0.0 u = 19.311u 19.67uv + N2.5 [u, v]
5 30 2.77 ± 39.3 50 0.1 ± 0.011 10 1.4 ± 0.0 Convergence failure

Таблица 20 � Статистика включения правильных слагаемые для уравнения,


описывающего динамику хищника, с соответствующими коэффициентами и
уравнением, полученными с помощью SINDy. Основное уравнение истинности
обозначается аббревиатурой g.t., а NN L [u, v] � дополнительные менее
значимые слагаемые, обнаруженные SINDy в структуре уравнения.
EPDE
SINDy
NL, % v u0 uv
P, % b, µ ± 1.98 P b, µ ± 1.98 P b, µ ± 1.98 g.t. v 0 = 20v + 20uv
0 90 20.018 ± 0.0 90 1.0 ± 0.0 90 24.741 ± 384.9 v0 = 19.97v + 19.85uv + N0.5 [u, v]
0.5 100 19.822 ± 0.0 100 1.0 ± 0.0 100 20.098 ± 0.0 v0 = 20.99v + 19.86uv + N0.5 [u, v]
1 100 19.922 ± 33.6 · 10 4 100 1.0 ± 0.0 100 20.011 ± 0.021 v0 = 19.73v + 19.63uv N1.0 [u, v]
2.5 90 18.987 ± 1.09 90 1.0 ± 0.0 40 31.26 ± 816.2 v0 = 20.65v 20.12uv + N2.5 [u, v]
5 40 8.97 ± 65.0 50 0.525 ± 0.28 70 72.86 ± 25.7 Convergence failure

4.3 Выводы к главе 4

В этой главе были представлены результаты экспериментального исследо�


вания и валидации предложенных методов, в рамках которых предполагалось
обучения модели в форме дифференциальных уравнений, описывающих синте�
тические и реальные наборы данных. В рамках исследования были рассмотрены
основные классы дифференциальных уравнений и систем дифференциальных
уравнений, которые могут быть получены при обучении на основе разработан�
142

ного метода. Было показано, что при обучении метод позволяет получить кор�
ректное дифференциальное уравнение на входных данных с уровнями шума до
5-10 %, в то время как конкурирующие методы теряют возможность получить
уравнение на данных с уровнями шума около 1-2 %, т.к. точность получения
структуры составлет менее 50 %.
143

Заключение

При выполнении диссертационного исследования было предложено реше�


ние существующим проблемам и противоречиям в области обучения модели в
форме дифференциальных уравнений. Метод на основе эволюционной оптими�
зации не ставит жёсткие ограничения на структуры определяемых уравнений
и, соответственно, может быть применён в более широком классе задач.
В результате диссертационного исследования:
1. Исследовано современное состояние области методов получения струк�
туры и коэффициентов моделей в форме дифференциальных уравне�
ний и выдвинута гипотеза, что задача символьной регрессии по расши�
ренной библиотеке слагаемых может быть заменена на более гибкий
эволюционный алгоритм;
2. Разработан метод и реализующий его алгоритм обучения модели в фор�
ме дифференциальных уравнений с неизвестными структурой и коэф�
фициентами на основе эволюционных алгоритмов оптимизации и мето�
да численного решения начально-краевых задач физически-обоснован�
ными нейронными сетями (PINN) для вычисления функции приспособ�
ленности.
3. Разработан метод и реализующий его алгоритм обучения моделей в
форме систем обыкновенных дифференциальных уравнений и уравне�
ний в частных производных на основе алгоритма многокритериальной
эволюционной оптимизации с независимым обучением структуры и ко�
эффициентов модели для каждого из уравнений системы с учетом воз�
можности задания критериев точности относительно наблюдаемых па�
раметров динамических систем и структурной сложности модели, кото�
рый не ограничивает системы формой векторных уравнений и который
можно распространить на задачи обучения модели в форме одиночного
дифференциального уравнения для унификации метода и улучшения
сходимости эволюционного алгоритма.
4. Проведена валидация разработанных методов на синтетических и ре�
альных данных, отражающих широкий класс дифференциальных урав�
нений. В частности на бенчмарках, принятых в сообществе: обыкно�
144

венных дифференциальных уравнениях (нелинейных, неоднородных),


уравнений в частных производных второго порядка (гиперболические,
параболические, эллиптические) и третьего порядка (на примере соли�
тонного и неоднородного случая уравнения Кортевега-де Фриза). Так�
же были рассмотрены системы ОДУ (система Лотка-Вольтерра и Ло�
ренца) и система уравнений в частных производных на примере урав�
нений Навье-Стокса. Также, проведено исследование элементов мето�
да обучения модели в форме дифференциальных уравнений: исполь�
зуемой в эволюционной оптимизации функции приспособленности кан�
дидатных дифференциальных уравнений, методов устойчивого диффе�
ренцирования.
Метод обучения модели в форме дифференциальных уравнений (как
обыкновенных, так и в частных производных) позволяет повысить точ�
ность определения структуры (SHD) на уравнениях-бенчмарках от 20
% (уравнение Бюргерса) до 5 раз (400 %) (уравнение Кортевега – де
Фриза) со средним значением прироста точности по всем бенчмаркам
в 2 раза (100 %) и увеличить робастность обучения, в виде максималь�
ной дисперсии шума, с 0.5 % (у ближайшего конкурента) до 10 % (для
разработанного метода) , при которой может быть получена структура
уравнения с точностью не менее 50%.
Метод обучения модели в форме системы дифференциальных уравне�
ний позволяет повысить точность определения структуры (SHD) на
уравнениях-бенчмарках на рассмотренных системах до 70 % в зависи�
мости от уровня шума и повысить робастность обучения в виде мак�
симальной дисперсии шума с 2.5 % (у ближайшего конкурента) до 8
% (для разработанного метода) для систем первого порядка и с 0 %
(у ближайшего конкурента) до 5 % (для разработанного метода) для
систем второго и выше порядков, при которой может быть получена
структура уравнения с точностью не менее 50%.
Дальнейшее развитие области диссертационного исследования может
быть связано с обучением моделей в форме стохастических дифференциальных
уравнений, а также с добавлением оператора интегрирования, который позво�
лит идентифицировать по данным интегро-дифференциальные уравнения. От�
дельным вопросом является дальнейшее улучшение шумоустойчивости алгорит�
145

ма как за счёт инструментов дифференцирования, так и за счёт адаптирован�


ных операций разреживания структуры уравнения, вычисления коэффициен�
тов и оценки его приспособленности.
146

Список сокращений и условных обозначений

ДУ - Дифференциальное уравнение
УрЧП - Дифференциальное уравнение в частных производных
ОДУ - Обыкновенное дифференциальное уравнение
ИНС - Искусственная нейронная сеть
LASSO - Least absolute shrinkage and selection operator, оператор наимень�
шего абсолютного сжатия и отбора
MAPE - Mean Absolute Percentage Error, cредняя абсолютная ошибка в
процентах
MOEA/DD - Evolutionary Many-Objective Optimization Algorithm Based
on Dominance and Decomposition
Уравнение КдВ - Уравнение Кортевега-де Фриза
147

Список литературы

1. Hvatov A., Maslyaev M. The data-driven physical-based equations discovery


using evolutionary approach // GECCO 2020 - Proceedings of the Genetic
and Evolutionary Computation Conference Companion. — 2020. — P. 129–
130.
2. Grigoriev V., Maslyaev M., Hvatov A. String-based and graph-based geno-
type representations for evolutionary di↵erential equations discovery on an
example of the heat equation // Proceedings of the 13th Majorov Interna-
tional Conference on Software Engineering and Computer Systems, — 2021.
3. Maslyaev M., Hvatov A., Kalyuzhnaya A. Data-Driven Partial Di↵erential
Equations Discovery Approach for the Noised Multi-dimensional Data //
Lecture Notes in Computer Science (including subseries Lecture Notes in
Artificial Intelligence and Lecture Notes in Bioinformatics). 12138 LNCS. —
2020. — P. 86–100.
4. Maslyaev M., Hvatov A., Kalyuzhnaya A. Discovery of the data-driven mod-
els of continuous metocean process in form of nonlinear ordinary di↵erential
equations // Procedia Computer Science. Vol. 178. — 2020. — P. 18–26.
5. Model-Agnostic Multi-objective Approach for the Evolutionary Discovery of
Mathematical Models / A. Hvatov [et al.] // Communications in Computer
and Information Science. Vol. 1488. — 2021. — P. 72–85.
6. Maslyaev M., Hvatov A. Solver-Based Fitness Function for the Data-Driven
Evolutionary Discovery of Partial Di↵erential Equations // IEEE Congress
on Evolutionary Computation CEC. — 2022. — P. 1–8.
7. Maslyaev M., Hvatov A. Comparison of Single- and Multi- Objective
Optimization Quality for Evolutionary Equation Discovery // Genetic and
Evolutionary Computation Conference Companion (GECCO). � 2023.
8. Maslyaev M., Hvatov A. Partial di↵erential equations discovery with EPDE
framework: application for real and synthetic data // Journal of Computa-
tional Science. — 2021. — P. 101345.
148

9. Maslyaev M., Hvatov A. Multiobjective evolutionary discovery of equation-


based analytical models for dynamical systems // Scientific and Technical
Journal of Information Technologies, Mechanics and Optics. — 2023. —
Vol. 23, no. 1. — P. 97–104.
10. Towards generative design of computationally efficient mathematical models
with evolutionary learning / A. Kalyuzhnaya [et al.] // Entropy. — 2021. —
Vol. 23, no. 1. — P. 28.
11. Hybrid modeling of gas-dynamic processes in AC plasma torches. / N. Bykov
[и др.] // Materials Physics & Mechanics. � 2022. � Т. 50, № 2.
12. Hochreiter S., Schmidhuber J. Long Short-Term Memory // Neural
Computation. � 1997. � Нояб. � Т. 9, № 8. � С. 1735�1780. � DOI:
10.1162/neco.1997.9.8.1735.
13. Elsworth S., Güttel S. Time Series Forecasting Using LSTM Networks: A
Symbolic Approach. � 2020. � arXiv: 2003.05672 [cs.LG].
14. Machine learning assisted prediction of exhaust gas temperature of a heavy�
duty natural gas spark ignition engine / J. Liu [и др.] // Applied Energy. �
2021. � Т. 300. � С. 117413.
15. PySINDy: A comprehensive Python package for robust sparse system
identification / A. A. Kaptanoglu [и др.] // Journal of Open Source Software. �
2022. � Т. 7, № 69. � С. 3994. � DOI: 10.21105/joss.03994. � URL: https:
//doi.org/10.21105/joss.03994.
16. Data-driven discovery of partial differential equations / S. H. Rudy [и др.] //
Science Advances. � 2017. � Т. 3, № 4. � e1602614.
17. Data-Driven Identification of Parametric Partial Differential Equations / S. H.
Rudy [и др.] // SIAM Journal on Applied Dynamical Systems. � 2019. �
Т. 18, № 2. � С. 643�660.
18. Learning partial differential equations via data discovery and sparse
optimization / H. Schaeffer [и др.] // Proceedings of the Royal Society A:
Mathematical, Physical and Engineering Science, publisher: Royal Society. �
2017. � DOI: 473(2197):20160446.
19. Schaeffer H., McCalla S. G. Sparse model selection via integral terms //
Physical Review E. � 2017. � Т. 96, № 2.
149

20. Brunton S. L., Proctor J. L., Kutz J. N. Discovering governing equations from
data by sparse identification of nonlinear dynamical systems // Proceedings of
the National Academy of Sciences. � 2016.
21. Loiseau J. C., Brunton S. L. Constrained sparse Galerkin regression systems //
Journal of Fluid Mechanics. � 2018. � Т. 838. � С. 42�67.
22. Kaiser E., Kutz J. N., Brunton S. L. Sparse identification of nonlinear
dynamics for model predictive control in the low-data limit. � 2017. � URL:
https://arxiv.org/abs/1711.05501.
23. Sparse Identification of Nonlinear Dynamics for Rapid Model Recovery / M.
Quade [и др.]. � 2018. � URL: https://arxiv.org/abs/1803.00894v2.
24. Hirsh S. M., Barajas-Solano D. A., Kutz J. N. Sparsifying priors for Bayesian
uncertainty quantification in model discovery // Royal Society Open Science. �
2022. � Т. 9, № 2. � С. 211823.
25. Park J.-H., Dunson D. B. Bayesian generalized product partition model //
Statistica Sinica. � 2010. � С. 1203�1226.
26. Tran G., Ward R. Exact recovery of chaotic systems from highly corrupted
data // Multiscale Modeling and Simulation. � 2017. � Т. 15. � С. 1108�
1129.
27. Raissi M. Deep hidden physics models: Deep learning of nonlinear partial
differential equations. � 2018. � URL: https://arxiv.org/abs/1801.06637.
28. Berg J., Nystrom K. Data-driven discovery of PDEs in complex datasets. �
2018. � URL: https://arxiv.org/abs/1808.10788.
29. Berg J., Nystrom K. Neural network augmented inverse problems for PDEs. �
2017. � URL: https://arxiv.org/abs/1712.09685.
30. Pde-net: Learning pdes from data / Z. Long [и др.] // International Conference
on Machine Learning. � PMLR. 2018. � С. 3208�3216.
31. Long Z., Lu Y., Dong B. PDE-Net 2.0: Learning PDEs from data with a
numeric-symbolic hybrid deep network // Journal of Computational Physics. �
2019. � Т. 399. � С. 108925.
150

32. Stephany R., Earls C. PDE-READ: Human-readable partial differential


equation discovery using deep learning // Neural Networks. � 2022. � Т.
154. � С. 360�382.
33. Chen J., Wu K. Deep-OSG: Deep Learning of Operators in Semigroup //
Journal of Computational Physics. � 2023. � С. 112498.
34. Qin T., Wu K., Xiu D. Data driven governing equations approximation using
deep neural networks // Journal of Computational Physics. � 2019. � Т.
395. � С. 620�635.
35. Wu K., Xiu D. Data-driven deep learning of partial differential equations
in modal space // Journal of Computational Physics. � 2020. � Т. 408. �
С. 109307.
36. Raissi M., Perdikaris P., Karniadakis G. Physics-informed neural networks:
A deep learning framework for solving forward and inverse problems involving
nonlinear partial differential equations // Journal of Computational Physics. �
2019. � Т. 378. � С. 686�707. � DOI: https://doi.org/10.1016/j.jcp.2018.10.
045.
37. Physics-informed neural networks for solving forward and inverse problems in
complex beam systems / T. Kapoor [и др.]. � 2023. � arXiv: 2303 . 01055
[cs.LG].
38. Physics-informed neural networks for inverse problems in supersonic flows /
A. D. Jagtap [и др.] // Journal of Computational Physics. � 2022. � Т. 466. �
С. 111402.
39. Data-Driven Discovery of Fokker-Planck Equation for the Earth’s Radiation
Belts Electrons Using Physics-Informed Neural Networks / E. Camporeale [и
др.] // Journal of Geophysical Research: Space Physics. � 2022. � Т. 127,
№ 7. � e2022JA030377. � DOI: https://doi.org/10.1029/2022JA030377.
40. Discovering a universal variable-order fractional model for turbulent Couette
flow using a physics-informed neural network / P. P. Mehta [и др.] // Fractional
Calculus and Applied Analysis. � 2019. � Т. 22, № 6. � С. 1675�1688.
151

41. Abdellaoui I. A., Mehrkanoon S. Symbolic regression for scientific discovery:


an application to wind speed forecasting // 2021 IEEE Symposium Series on
Computational Intelligence (SSCI). � 2021. � С. 01�08. � DOI: 10.1109/
SSCI50451.2021.9659860.
42. Data-driven discovery of free-form governing differential equations / S.
Atkinson [и др.]. � 2019. � URL: https://arxiv.org/abs/1910.05117.
43. Vaddireddy H., San O. Equation Discovery Using Fast Function Extraction: a
Deterministic Symbolic Regression Approach // Fluids. � 2019. � Т. 4, № 2. �
DOI: 10.3390/fluids4020111.
44. Hoffman M. D., Johnson M. J. Elbo surgery: yet another way to carve up the
variational evidence lower bound // Workshop in Advances in Approximate
Bayesian Inference, NIPS. � 2016. � Т. 1, № 2.
45. Data-based Discovery of Governing Equations / W. Subber [и др.]. � 2020. �
arXiv: 2012.06036 [cs.LG].
46. Xu H., Chang H., Zhang D. DLGA-PDE: Discovery of PDEs with incomplete
candidate library via combination of deep learning and genetic algorithm //
Journal of Computational Physics. � 2020. � Т. 418. � С. 109584.
47. Symbolic genetic algorithm for discovering open-form partial differential
equations (SGA-PDE) / Y. Chen [и др.] // Physical Review Research. �
2022. � Т. 4, № 2. � С. 023174.
48. Deep learning and symbolic regression for discovering parametric equations /
M. Zhang [и др.] // IEEE Transactions on Neural Networks and Learning
Systems. � 2023.
49. Kondrashov D., Chekroun M. D., Ghil M. Data-driven non-Markovian closure
models // Physica D: Nonlinear Phenomena. � 2015. � Т. 297. � С. 33�55.
50. al D. K. et. Data-adaptive harmonic decomposition and stochastic modeling of
Arctic sea ice // Advances in Nonlinear Geosciences. � 2018. � С. 179�205.
51. Chekroun M. D., Kondrashov D. Data-adaptive harmonic spectra and
multilayer Stuart-Landau models. � 2017. � DOI: hal-01537797v2.
52. Schmid P. J. Dynamic mode decomposition of numerical and experimental
data // Journal of fluid mechanics. � 2010. � Т. 656. � С. 5�28.
152

53. Alla A., Kutz J. N. Nonlinear model order reduction via dynamic mode
decomposition. � 2016. � URL: https://arxiv.org/abs/1602.05080.
54. Schmid P. J. Dynamic mode decomposition and its variants // Annual Review
of Fluid Mechanics. � 2022. � Т. 54. � С. 225�254.
55. Zhang Z. J., Duraisamy K. Machine learning methods for data-driven
turbulence modeling // 22nd AIAA Computational Fluid Dynamics
Conference. � 2015. � С. 2460.
56. Zhang Z., Singh A. New Approaches in Turbulence and Transition Modeling
Using Data-driven Techniques // AIAA Modeling and Simulation Technologies
Conference. � 2015.
57. Tracey B., Duraisamy K., Alonso J. Machine Learning Strategy to Assist
Turbulence Model Development // Proc. AIAA Scitech conference. � 2015.
58. Parish E., Duraisamy K. Quantification of Turbulence Modeling Uncertainties
Using Full Field Inversion // 15th AIAA Aviation Technology, Integration, and
Operations Conference. � 2015.
59. Hvatov A. Automated differential equation solver based on the parametric
approximation optimization // Mathematics. � 2023. � Т. 11, № 8. � С. 1787.
60. Ramm A., Smirnova A. On stable numerical differentiation // Mathematics
of computation. � 2001. � Т. 70, № 235. � С. 1131�1153.
61. Savitzky A., Golay M. J. Smoothing and differentiation of data by simplified
least squares procedures. // Analytical chemistry. � 1964. � Т. 36, № 8. �
С. 1627�1639.
62. Schmid M., Rath D., Diebold U. Why and how Savitzky–Golay filters should
be replaced // ACS Measurement Science Au. � 2022. � Т. 2, № 2. � С. 185�
196.
63. Johnson S. G. Notes on FFT-based differentiation // MIT Applied
Mathematics, Tech. Rep. � 2011.
64. Nix A. E., Vose M. D. Modeling genetic algorithms with Markov chains //
Annals of mathematics and artificial intelligence. � 1992. � Т. 5, № 1. �
С. 79�88.
153

65. He J., Yu X. Conditions for the convergence of evolutionary algorithms //


Journal of systems architecture. � 2001. � Т. 47, № 7. � С. 601�612.
66. An evolutionary many-objective optimization algorithm based on dominance
and decomposition / K. Li [и др.] // IEEE transactions on evolutionary
computation. � 2014. � Т. 19, № 5. � С. 694�716.
67. Das I., Dennis J. E. Normal-boundary intersection: A new method
for generating the Pareto surface in nonlinear multicriteria optimization
problems // SIAM journal on optimization. � 1998. � Т. 8, № 3. � С. 631�
657.
68. Van der Pol B. A theory of the amplitude of free and forced triode vibrations,
Radio Rev. 1 (1920) 701-710, 754-762 // Selected scientific papers. � 1960. �
Т. 1.
154

Публикации автора по теме диссертации

Свидетельства автора о регистрации программ для ЭВМ:


1. Свидетельство о регистрации № 2020660871 от 15.09.2020 “Программ�
ный комплекс для управляемого данными вывода дифференциальных
уравнений EPDE” // Масляев М.А., Калюжная А.В., Хватов А.А.
2. Свидетельство о регистрации № 2021666447 от 21.02.2022 “Программ�
ный комплекс для многокритериальной идентификации систем диф�
ференциальных уравнений EPDE.Sys” // Масляев М.А., Калюжная
А.В., Хватов А.А.
Публикации в изданиях, индексируемых в Scopus, Web of
science, а также входящих в списки, рекомендованные ВАК:
1. Hvatov A., Maslyaev M. The data-driven physical-based equations
discovery using evolutionary approach // GECCO 2020 - Proceedings of the
Genetic and Evolutionary Computation Conference Companion. � 2020. �
P. 129–130.
2. Grigoriev V., Maslyaev M., Hvatov A. String-based and graph-based
genotype representations for evolutionary differential equations discovery
on an example of the heat equation // Proceedings of the 13th Majorov
International Conference on Software Engineering and Computer Systems,
� 2021.
3. Maslyaev M., Hvatov A., Kalyuzhnaya A. Data-Driven Partial
Differential Equations Discovery Approach for the Noised Multi�
dimensional Data // Lecture Notes in Computer Science (including
subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics). 12138 LNCS. � 2020. � P. 86–100.
4. Maslyaev M., Hvatov A., Kalyuzhnaya A. Discovery of the data-driven
models of continuous metocean process in form of nonlinear ordinary
differential equations // Procedia Computer Science. Vol. 178. � 2020. �
P. 18–26.
5. Maslyaev M. [et al.] Model-Agnostic Multi-objective Approach for the
Evolutionary Discovery of Mathematical Models // Communications in
Computer and Information Science. Vol. 1488. � 2021. � P. 72–85.
155

6. Maslyaev M., Hvatov A. Solver-Based Fitness Function for the Data�


Driven Evolutionary Discovery of Partial Differential Equations // IEEE
Congress on Evolutionary Computation CEC. � 2022. � P. 1–8.
7. Maslyaev M., Hvatov A. Comparison of Single- and Multi- Objective
Optimization Quality for Evolutionary Equation Discovery // Genetic and
Evolutionary Computation Conference Companion (GECCO). � 2023.
8. Maslyaev M., Hvatov A. Partial differential equations discovery with
EPDE framework: application for real and synthetic data // Journal of
Computational Science. � 2021. � P. 101345.
9. Maslyaev M., Hvatov A. Multiobjective evolutionary discovery of
equation-based analytical models for dynamical systems // Scientific and
Technical Journal of Information Technologies, Mechanics and Optics. �
2023. � Vol. 23, no. 1. � P. 97–104.
10. Maslyaev M. [et al.] Towards generative design of computationally
efficient mathematical models with evolutionary learning // Entropy. �
2021. � Vol. 23, no. 1. � P. 28.
11. Maslyaev M. [et al.] Hybrid modeling of gas-dynamic processes in AC
plasma torches // Materials Physics & Mechanics. � 2022. � Т. 50, No 2.
156

А. ПРИЛОЖЕНИЕ А. СВИДЕТЕЛЬСТВА О РЕГИСТРАЦИИ


ПРОГРАММ ДЛЯ ЭВМ
157
158
159

Б. ПРИЛОЖЕНИЕ Б. ТЕКСТЫ ПУБЛИКАЦИЙ


160

Partial di↵erential equations discovery with EPDE


framework: application for real and synthetic data

Mikhail Maslyaev, Alexander Hvatov⇤, Anna V. Kalyuzhnaya


ITMO University, 49 Kronverksky Pr. St. Petersburg, 197101, Russian Federation

Abstract

Data-driven methods provide model creation tools for systems where the appli-
cation of conventional analytical methods is restrained. The proposed method
involves the data-driven derivation of a partial di↵erential equation (PDE) for
process dynamics, helping process simulation and study. The paper describes
the methods that are used within the EPDE (Evolutionary Partial Di↵erential
Equations) partial di↵erential equation discovery framework [1]. The frame-
work involves a combination of evolutionary algorithms and sparse regression.
Such an approach is versatile compared to other commonly used data-driven
partial di↵erential derivation methods by making fewer assumptions about the
resulting equation. This paper highlights the algorithm features that allow data
processing with noise, which is similar to the algorithm’s real-world applications.
This paper is an extended version of the ICCS-2020 conference paper [2]
Keywords: data-driven modelling, PDE discovery, evolutionary algorithms,
sparse regression, spatial fields, physical measurement data

1. Introduction

The ability to simulate complex processes, neglecting a lack of knowledge


about the system’s underlying structure, can be vital for developing models in
such spheres of science as biology, medicine, materials technology, and meto-
5 cean studies. In contrast to the deterministic physics-based models, developed

⇤ Corresponding author: alex hvatov@itmo.ru

Preprint submitted to Journal of Computational Science October 15, 2023


161

by application of conservation laws to the studied process, data-driven mod-


eling (DDM) involves developing complete models from various fields of mea-
surements, describing the process, using means of statistics and machine learn-
ing algorithms. Moreover, in some occasions, DDM can enhance the existing
10 physics-based models with supplementary expressions or refined weight values
[3]. In fluid dynamics science and hydrometeorology, surrogate models’ devel-
opment is the most common application of data-driven algorithms.
In the current paper’s scope are the methods of data-driven di↵erential equa-
tion discovery. Di↵erential equations, in some cases, are interpretable by the
15 expert either in the application field or in the di↵erential equations. Moreover,
the well-developed mathematical physics methods for the di↵erential equations
analysis may interpret the equations. In most cases, actual algorithms utilize
the sparse regression in a prescribed di↵erential terms library [4, 5]. The sec-
ond popular case of the study is the neural network’s algorithms for di↵erential
20 equations discovery [6, 7, 8].
We consider discovered models as the surrogate models that could be applied
to the hydrometeorological examples. Various approaches to surrogate modeling
are described below, including di↵erential equations discovery.
The modern surrogate models tend to belong to one of 3 major groups [9]:

25 • Data-driven empirical approximations of the deterministic model outputs.


These models use conclusions obtained with the statistical or machine
learning tools (response surfaces, kriging) applied to the data.

• Reduced-order models are based on the projection of the model’s main


equations to the subspace with the reduced dimensionality, using various
30 orthogonal decompositions.

• Multifidelity models: simplifications of representing the complex physics


of the model’s process by omitting the less significant subprocesses or in-
creasing the model’s scale. In some cases, the experimental setup requires
applying models with di↵erent fidelity levels to evaluate multiple scales of
35 processes or modeling ensemble [10, 11].

2
162

In this research, we are interested in developing a new approach that be-


longs to the first class of models. However, natural sciences applications require
robustness of the model and should work in high-dimensional space to handle
spatio-temporal and other types of variability. Transferring from one spatial
40 dimension usually considered in references to higher spatial dimensions requires
the algorithm to handle exponentially growing noise levels.
In the previous works [12] we have described the EPDE (Evolutionary Partial
Di↵erential Equations)1 approach, that can provide a flexible, yet efficient tool
for data-driven equation derivation. This work increases the problem’s difficulty
45 by introducing higher-dimensional cases and high-magnitude noise in the data.
This version extends conference paper [2] and introduces a series of experi-
ments that allow comparing EPDE framework with the analogs in a better way.
The module system of the PDE algorithm that is briefly described in Sec. 6
allows to, as an example, use di↵erent from the finite-di↵erence di↵erentiation
50 scheme. We show it using neural networks and automatic derivatives in Sec. 7.
This paper is organized as follows: Sec. 2 briefly introduces the existing sur-
rogate modeling approaches. Sec. 3 describes the problem of the data-driven
PDE discovery and Sec. 4 describes the practical realization. In Sec. 5, numeri-
cal examples of the synthetic data and the real data are shown. Sec. 6 presents
55 the additions to the method described in the previous article [12], which allows
dealing with the higher-dimension data-driven PDE discovery. Sec.7 is dedi-
cated to illustrating the module structure and experiments with replacement of
di↵erentiation model with neural network approximation. Sec. 8 concludes the
paper.

60 2. Related work

The first examples of the data-driven surrogate modeling in hydrometeo-


rology have appeared in its earliest stages with the understanding, that the

1 The approach described in the article is available as stand-alone EPDE-framework in


GitHub [1].

3
163

contemporary full-scale models required computational powers, inaccessible for


many research teams. The original approaches were based on the pattern scal-
65 ing - the extension of the present trend, obtained from the ensemble of full-
scale models [13, 14]. The statistical emulation on the base of an ensemble of
pre-computed deterministic models has been developed in [15]. The recent ad-
vancements have been achieved in the area of deep learning methods [16]. While
being relatively successful in their forecasting abilities, the models above do not
70 consider any knowledge about the processes’ physics, and due to a large number
of assumptions, it may lead to substantial errors.
Furthermore, the proposed method could be applied to the unstudied sys-
tems as a way to model them. Many systems across all spheres of science
lack the study to be adequately described by analytical models. The proposed
75 equation-based method may provide a surrogate model to simulate the system
and an insight into its dynamics.
This article describes the first step of the creation of the di↵erential equation-
based surrogate modeling method. Here we propose only the element of the
equation derivation, avoiding the problem of forecasting.
80 The problem of data-driven discovery of partial di↵erential equations, which
plays a significant role in our modeling scheme, has seen an increasing rele-
vance and research interest in recent years. The sparse regression presents the
first class of the developed algorithms of data-driven partial di↵erential equa-
tion derivation. It is applied to the libraries of possible equation terms to ap-
85 proximate the time derivative with the selected terms, required to describe the
examined process, and calculate real-valued coefficients for them. The notable
examples of this approach are presented in [17, 18]. In [19], the same idea was
extended to the discovery of an equation with non-constant (time-dependent)
coefficients.
90 The concept of numerical Gaussian processes, developed in [20], views the
discretized equation as the Gaussian process and obtains the equation’s un-
known coefficients with maximum likelihood estimation. However, the class of
the equations explored in the research is limited by the linear partial di↵erential

4
164

equations.
95 Artificial neural networks provide a more versatile tool. This method is
based on the approximation of time derivative with combinations of spatial
derivatives and other functions. The ANN applications’ examples to the problem
of partial di↵erential equation discovery were presented in [8, 21, 22, 7, 6]. While
artificial neural networks can discover non-linear equations, they still rely on
100 approximating a determined term (time derivative of the first order), limiting
their flexibility.

3. Problem statement

The class of problems, which the described EPDE algorithm can solve, can
be summarized as follows: the process, which involves scalar field u, is occurring
105 in the area ⌦ and is governed by the partial di↵erential equation Eq. 1. How-
ever, there is no a priori information about the dynamics of the process except
that some form of PDE can describe it (for simplicity, we consider temporally
varying 2D field case, even though the problem could be formulated for an ar-
bitrary field). In recent developments, we have abandoned the assumption of
110 the constant weights in the partial di↵erential equations, allowing them to be
an arbitrary function (logarithmic, trigonometric) and thus expanding the class
of possible systems to study.
8
>
<F (u, @u @u @u @ 2 u @ 2 u @2u
@x1 , @x2 , ..., @t , @x21 , @x22 , ..., @t2 , ..., x) = 0;
(1)
>
:G(x) = 0, x 2 (⌦) ⇥ [0, T ];

From the area ⌦ ⇥ [0, T ] a set of samples U = {u1 , u2 , ..., un }, where ui =


(i) (i) (i) (i)
u(x1 , x2 , ti ) is the function value at the arbitrary point (x1 , x2 , ti ) 2 ⌦ ⇥
115 [0, T ], is collected. There are no strict limitations for distributing the sample
collection points in the area, but the further requirements of the derivative
calculations make the case of stationary points located on the grid the most
preferable. The main task of the algorithm is the derivation of the Eq. 1, using
measurements from the set of discrete measurements U with some externally

5
165

120 defined limitations, including a range of the derivative orders, several terms in
the equations, and some factors in the term.
The resulting model Eq. 2 takes form of linear combination of terms, where
Q
Ntokens
each of them t(x) = j (with Ntokens is pre-defined algorithm hyper-
j=1
parameter) is constructed as the product of pre-computed elementary oper-
125 ators j, selected from a di↵erent groups of elementary operators of a same
nature. More detailed elementary operator i 2 = [j j; j - a group of
elementary operators of a same nature (for example, trigonometric group j =
@u
{sin(x1 ), cos(x1 ), sin(x2 ), ...}, or di↵erential operators group j = {u, @x , @u , ...}).
1 @x2

X
F (x) = ci ti (x) = 0; (2)
i
The addition of di↵erent groups j of terms allows to switch from the di↵er-
130 ential equation with the constant coefficients to di↵erential equations with the
variable coefficients.
The noise in this paper is assumed to be directional. The noise Exj in the
direction xj can be described as Eq. 3.

Exj (u(x1 , ..., xn ; t)) = u(x̄1 , ..., xj , ...x̄n ; t) + ✏(xj ) (3)

With x̄i “fixed” variables are denoted and ✏(xj ) is the noise, which in the
135 paper is assumed to be distributed normally N (0; ) with the expected value
of 0 and the variance of . It should be emphasized that poly-directional noise
forms as the superposition of the unidirectional noise operators, i.e. Exj ,xk =
Exj Exk . In what follows ¯ = max(u) is chosen as the multiplier of maximal
magnitude of the measured value in this direction. The noise level is defined in
140 the same way. In the text below bar over the variance, is omitted.

4. Method description

In this section, the details of the evolutionary method of partial di↵erential


equation derivation are described. The proposed method involves a combina-
tion of evolutionary algorithms and sparse regression to detect the equation

6
166

145 structure. The sparse regression aims to construct equation terms set, while the
evolutionary algorithm is focused on selecting significant terms from the created
set and calculating weights that will be present in the resulting equation. At
first, we introduce the preprocessing pipeline while later describe the algorithm
workflow.

150 4.1. Data preprocessing

To initialize the algorithm, time and spatial derivatives, which will later form
the desired equation, must be calculated. In specific situations, the derivatives
by themselves can be measured, and, therefore, this step can be skipped, but
often only the raw value of the studied function is available in the research. It
155 can be assumed without losing the generality that the measurements are held on
the rectangular (but not necessarily uniform) grid for the more straightforward
further computations. The multi-dimensional case requires more nuanced meth-
ods of obtaining derivatives, unlike the instances of a single dimension. In most
of the one-dimensional experiments, even on moderate noise levels, which can
160 be measured as Eq. 4, the finite-di↵erence method of derivative calculation can
lead to satisfactory results. It is important to note that the taken derivatives’
quality is crucial for acquiring the equation’s correct structure.

ku0 u ek2
Qnoise = ⇤ 100% (4)
ku0 k2
In general, the calculation of derivatives is the operation that is vulnerable
to the data’s noise. Also, the convergence of the algorithm and the resulting
165 equation depends on the input derivatives’ quality. If they are computed with
high errors, the resulting equation’s alterations can vary from incorrect coeffi-
cients to the entirely wrong structure. For these reasons, several noise-resistant
methods of partial derivative calculations have been introduced. Notably, they
include such commonly used methods as kernel smoothing [23], derivation of
170 polynomials, fitted over sets of points, and more uncommon ones like Kalman
filtering [24].

7
167

Therefore, data clearance and noise-resistant derivative calculations have


been combined to achieve decent smoothness in the framework. First of all,
Gaussian smoothing kernels are applied for the data field on each time frame.
175 This approach can reduce the significant outliers in the data and corresponds
to the nature of the studied metocean processes, where the fields tend to be
smooth. In the time-dependent multi-dimensional field, the smoothing is applied
for each of the time frames. Two-dimensional Gaussian smoothing with selected
bandwidth has the structure Eq. 5 and kernel Eq. 6, where s is the point, for
180 which the smoothing is done, and s0 - point, that value is utilized in smoothing.

Z
ũ(s, t) = K (s s0 )u(s0 )ds0 ; (5)

2
1 1 X
K (s s0 ) = 2
exp ( 2
(s s0 )i ); (6)
2⇡ 2 i=1

In addition to smoothing, a noise-stable numerical di↵erentiation scheme is


applied. The derivative is taken by di↵erentiation of polynomials constructed
over the set of points in the selected window. The coefficients of the polynomials
utilized in this step are obtained by linear regression. Despite all these measures,
185 as presented on Tab. 1, derivatives of higher orders tend to have significant errors
even after smoothing and polynomial derivation.

Table 1: Noise levels (%) for the raw noised data and for the smoothed data

@u @2u
u @t @t2

Noised function 15.5 260.1 12973.8


Smoothed function 12.3 10.78 458.2

A particular example of the noised function field is shown in Fig. 1. For


clarity purposes, only the spatial domain center slice is provided. However, it
should be emphasized that the entire spatial field is smoothed out to obtain

8
168

190 the spatial derivative field, with the other spatial dimension processed by the
kernel.

Figure 1: Graph of a section over one spatial dimension for synthetic input function (solution
of wave equation with 2 spatial dimensions) in original state, with Gaussian noise, added to a
fraction (40%) of points of the domain, and after the noise was smoothed by Gaussian kernel

Di↵erentiation of the three di↵erent fields shown in Fig. 1 gives the derivative
fields that have values of the di↵erent orders. Thus, they are shown in the
di↵erent graphs in Fig. 2.
195 In most equations governing the real-world processes, derivatives’ orders are
limited to the first or second order. Derivatives of the slices shown in Fig. 1 are
represented in Fig. 2 and indicate that the proposed algorithm of noise reduction
in derivatives not only achieves values close to the values of the derivatives on
noiseless data but also preserves the structure of the fields, which is vital for the
200 main evolutionary algorithm due to the normalization of values on each time
frame.

9
169

Figure 2: Graph of first (left column) and second (right column) order time derivative, cal-
culated on input function (a)), noisy function (b)), and function with noise, smoothed by
Gaussian kernel (c)

4.2. Evolutionary algorithm

After the preprocessing, which involves di↵erentiation of initial field, the


evolutionary algorithm is initiated. Here, we split the task of the equation
205 discovery into two subtasks, performed in turns: the detection of the structure
(terms) of the equation, and calculation of the real-valued coefficients, that
correspond to these terms with the detection of valuable ones. The search of
the optimal set of terms is performed with the evolutionary algorithm, while the
calculation of intermediate coefficients is done with the regularized regression.
210 During the search process, we use the values of factors , belonging to the task-
specific types , evaluated on the studied domain nodes, that form vectors as
in Eq. 7, combinations of which form the terms of the searched equation.

a) Chromosome form and the fitness function. An example of genes in the


chromosome is presented in Eq. 8. Here, the vector composed as the elementwise
215 product (that is denoted with symbol) of vectors containing the original

10
170

function and its derivative along the x-axis is used as the regression feature set.

2 3 2 3 2 3
1 u (t0 , x0 ) ux (t0 , x0 )
6 . 7 6 .. 7 6 .. 7
6 . 7 6 7 6 7
6 . 7 6 . 7 6 . 7
6 7 6 7 6 7
6 7 6 7 6 7
f1 = 6 1 7 ; f2 = 6 u (ti , xj ) 7 ; f3 = 6 ux (ti , xj ) 7 ; ... (7)
6 7 6 7 6 7
6 . 7 6 .. 7 6 .. 7
6 .. 7 6 . 7 6 . 7
4 5 4 5 4 5
1 u (tm , xn ) ux (tm , xn )
2 3
u(t0 , x0 ) ⇤ ux (t0 , x0 )
6 .. 7
6 7
6 . 7
6 7
0 6 7
Fk = 6 u(ti , xj ) ⇤ ux (ti , xj ) 7 = f2 f3 ; (8)
6 7
6 .. 7
6 . 7
4 5
u(tm , xn ) ⇤ ux (tm , xn )
The normalization of terms values is held for each time frame passed into the
algorithm for the correct operation of regularized regression during the further
weight calculation phases. In this step, we can use arbitrary norm, but most
220 commonly, L2 norm or L1 norms are applied, as it is represented in Eq. 9.

2 u(t0 ,x0 )·ux (t0 ,x0 )


3
||f2 (t0 ) f3 (t0 )||
6 .. 7
6 7
6 . 7
6 7
6 u(ti ,xj )·ux (ti ,xj ) 7
Fk = 6 7; (9)
6 ||f2 (ti ) f3 (ti )|| 7
6 .. 7
6 . 7
4 5
u(tm ,xn )·ux (tm ,xn )
||f2 (tm ) f3 (tm )||

The evolutionary part of the algorithm aims to select a set of terms to


form the equation. In the set, one of the terms is randomly selected as the
target. The target term is approximated with the weighted combination of the
other terms in the list. In the beginning, a randomized collection of possible
225 equations, which is called population, is declared. Every individual contains a
set of terms with the selected target that can be interpreted as the right part of
the equation and features, a linear combination of which composes the left part.
In order to perform selection, the fitness function is introduced in Eq. 10 as the

11
171

inverse value of L2 -norm. The target term in the ”right side” of the equation
230 contains only one randomly selected term. Norm is taken as the di↵erences
between target Ftarget and the selected combination of features F with weighs
↵, obtained by the sparse regression (left side of the equation). Therefore, the
evolutionary algorithm’s task can be reduced to obtain the equation structure
with the highest fitness function value.

1
ff itness = ! max (10)
kF · ↵ Ftarget k2
235 The composition of the encoded terms represents the genotype of the indi-
vidual. These encodings contain the parameters of each token in the term. The
evolution of individuals is performed both by mutation and by a crossover in ev-
ery iteration step. The mutation for an individual is introduced as the random
change (addition, deletion, or alteration of factors) in its terms. For example,
240 this can result in shift of equation term ut to uxx ⇤ ut or ux ⇤ utt to ux ⇤ ut . The
elitism, introduced as the individual’s exclusion with the highest fitness value,
helps preserve the best-discovered candidate (the one with the highest fitness
function value) during the mutation step.

b) Evolutionary operators. Crossover is the part of the evolutionary mechanism,


245 which manifests as the gene exchange between two individuals to produce o↵-
spring with higher fitness values. In the task of data-driven equation derivation,
it can be introduced as the exchange of terms between equations. In order to
produce units with higher fitness values, the crossover should be held between
selected individuals. Several tests have proved that the fastest convergence to
250 the desired solution can be achieved with the tournament selection. In this pol-
icy, several tournaments, where the unit with the highest fitness value is selected
for a further crossover, are held between individuals of the population. After
that, parents for the o↵springs are randomly chosen between the tournament
winners. In contrast to the simple selection of several individuals with the high-
255 est fitness function values for reproduction, this approach can let the o↵springs
take good qualities from the population’s less-valuable individuals.

12
172

The next essential element of the proposed data-driven algorithm is sparse


regression. Its main application is the detection of the equation structure among
the set of possible terms. With no original information about the equation
260 structure and the correct number of terms, it is better to introduce the equation
with a higher number of possible term candidates. Therefore some form of
filtration has to take place. The main instrument in this phase is the Least
Absolute Shrinkage and Selection Operator (LASSO). In contrast to other types
of regression, LASSO can reduce the number of non-zero elements of the weights
265 vector, giving zero coefficients to the features that are not significant to the
target.
The minimized functional of the LASSO regression Eq. 11 takes the form of
the sum of two terms. First is the squared error between vectors of the target,
denoted as Ftarget , and vector of predictions, obtained as the inner product of
270 a matrix of features F and vector of weights ↵, while second in the L1 -norm of
the weights vector, taken with sparsity constant :

kF↵ Ftarget k22 + k↵k1 ! min (11)


The main drawback of the LASSO regression is its disability to acquire the
correct values of the coefficients. Final linear regression over discovered e↵ective
terms is performed to obtain the resulting PDE’s actual coefficients. In the final
275 step, non-zero weights from the LASSO are rescaled with original unnormalized
data as features and the target.
The pseudo-code for the resulting algorithm is provided in Appendix A

5. Numerical experiments

5.1. Synthetic data

280 a) Wave equation. The analysis of the algorithm performance is held on the
synthetic data. This simplification can show the result’s response to various
types and magnitudes of noise, which is generally unknown on the measurement
data. As in the previous studies, the solution of the wave equation with two

13
173

spatial variables Eq. 14, where t - time , x, y - spatial coordinates, u - studied


285 function (for example, small out-of-plane membrane displacement), and ↵1 =
↵2 = 1 was taken as the synthetic data. The equation was solved, using the
finite-di↵erence technique for the domain, comprised of 201 ⇥ 201 ⇥ 201 points
in 2 spatial dimensions & time, and the proposed method was applied to the
solution dataset. The grid, which covered the domain, had uniformly distributed
290 nodes with coordinates between 0 and 10. The initial conditions for the equation
were Eq. 12 & Eq. 13, and u = 0 was the boundary condition for the problem.

1 1 1
u = 10000 sin ( xy(1 x)(1 y))2 (12)
100 10 10

@u 1 1 1
= 1000 sin ( xy(1 x)(1 y))2 (13)
@t 100 10 10

The algorithm has proved to detect the correct structure of the equation
with the clean data, while on the noisy data, additional terms or completely
295 wrong structures have been detected.

@2u @2u @2u


2
= ↵1 2 + ↵2 2 (14)
@t @x @y
Several noise addition experiments were held on the synthetic data: first of
all, in a fraction of points (40% of total number) the noise of various magnitudes
have been added: (µ = 0; = n ⇤ ||u(t)||, n = 0.1, 0.2, ..., 0.8). After that, the
algorithm has been applied to this data. The results of the experiment are as
300 follows: the method is successfully able to detect the structure of the equation
for the interval of noise levels up to 14.9 %, which corresponds to the standard
deviation of Gaussian noise in the interval [0, 0.35], multiplied by a norm of
the field in the time frame. The weights errors in this interval are minor, as is
shown in the Tab. 2. With higher noise levels (in the interval between 14.9%
305 and 15.67%), the algorithm detects additional terms that are not present in the
original equation, resulting in both distortion of equation structure and incorrect

14
174

weights calculation. Finally, with high noise levels, the proposed algorithm can
lose grasp of the equation’s correct structure.

Table 2: Discovered structures of the equations for the specific noise levels

Noise level of input data (%) equation


@2u 2 2
0 @t2 = 1.00 @@xu2 + 1.00 @@yu2
@2u 2 2
8.3 @t2 = 1.02 @@xu2 + 1.01 @@yu2
@2u 2 2
10.9 @t2 = 1.04 @@xu2 + 0.99 @@yu2
@2u 2 2
13.1 @t2 = 0.96 @@xu2 + 0.99 @@yu2
@2u 2 2
14.9 @t2 = 0.95 @@xu2 + 1.2 @@yu2
@2u 2 2
15.67 @t2 = 0.84 @@xu2 + 0.63 @@yu2 + 0.12 @u
@y
@2u @2u
16.45 @x2 @y 2 =0
2 2
@ u@ u
17.88 @x2 @y 2 =0

In Fig. 3, the influence of the noise level added to the measured field on the
310 derivative fields is shown.

Figure 3: Noise levels of calculated first and second (dashed line) time derivatives, related to
noise levels of input data

In the other experiment with the same data set, the noise of relatively high

15
175

magnitudes was added to a minor fraction of points (5% of the total number). In
this case, the framework has shown similar results to the previous experiment:
until the noise level of approximately 15%, the discovered structure was correct.
315 On the data with higher noise magnitudes, errors in the structure of the equation
occurred. This experiment has shown that in the studied cases, the main limiting
factor for the algorithm’s performance with implemented preprocessing for noise
reduction is the noise level and not the distribution of noise across the studied
field.

320 b) Korteweg-de Vries equation’s solitary solution. To further analyze the syn-
thetic data’s algorithm performance, we have conducted additional experiments
on the Korteweg-de Vries equation and the heat transfer equation.
To create a more specific situation for the Korteweg-de Vries equation Eq. 15,
we have studied the solitary wave solution of the equation Eq. 16. This solution
325 represents the transfer of a single wave, propagation with speed c from the initial
position, specified by the wave crest’s location at x0 . The data for the test is
obtained from the solution function Eq. 16. The solution is evaluated on the
uniform grid of 101 spatial points in the interval x 2 [0, 10] and 151 time points
in the interval t 2 [0, 15].

@u @u @ 3 u
+ 6u + =0 (15)
@t @x @x3
p
c c
u= sech2 [ (x ct x0 )] (16)
2 2
330 The application of the framework to the solution Eq. 16, evaluated on a
regular grid, failed to rediscover the initial equation. An improperly discovered
model results from the simpler incidental forms in data, such as ut = cux . This
equation’s simplicity results in a higher probability of its discovery than for the
full KdV equation. Additionally, the absence of the high-order derivatives in
335 the structure, which are inevitably calculated with numerical error, may lead
to higher fitness function values than in the correct equation. This experiment
illustrates that the algorithm is susceptible to discovering “shortcut-equations”,

16
176

which commonly represent the equality between functions (usually, di↵erent


derivatives) present in the input functions pool. Similar cases have been stud-
340 ied to analyze the discovery process for ordinary di↵erential equations in the
previous works.

c) Heat equation with convection. To provide a more sophisticated test case for
the framework, we have utilized the convection-di↵usion equation. The equa-
tion belongs to the class of parabolic equations and has the structure Eq. 17,
345 where r represents gradient operator, and v is the velocity vector (field).
We have studied example of the equation with transfer in only one direction
(meaning,v = [v1 , 0]; v1 = 1, representing constant velocity field along x-
P @
axis), and ↵ = 1 - thermal di↵usivity of the medium; r = i ei @x i
- gradient
operator; ei - basis vector for i-th axis. The equation was solved on a grid with
350 100 ⇥ 100 ⇥ 100 nodes in domain between 0 and 10 along each axis. The ini-
tial and boundary conditions are correspondingly presented in the Eq. 12 and
Eq. 13.

@u
= r · (↵ru) v · (ru) (17)
@t

⇡ ⇡
u = 10 sin ( x(10 x)) sin ( x(10 x)) (18)
100 100

2⇡ ⇡
u = 10 sin ( t) sin ( y(10 y)) + 0.05t (19)
3 100

The evolutionary algorithm detects the correct structure of the equation in


355 the majority of independent runs (out of 15 runs), as is shown in Tab. 3.
In Tab. 3 ci ⇡ 1, i = 1, 3, c4 ⇡ 1.59, and c5 ⇡ 2.98. This experiment
indicates, that the algorithm is able to detect complex structures of the equation,
if there are no ”shortcut” solutions of the problem.

17
177

Table 3: Equations structures, detected in the experiment.

Equation Number of experiments, getting the structure


2 2
@u
@t = c1 @@xu2 + c2 @@yu2 + c3 @u
@x 9
3
11.06 ddxu3 = @u
@x 1
2 2
@u
@t = c1 @@xu2 + c2 @@yu2 3
2
@ u
@x2 = c4 @u @u
@y + c5 @x 2

5.2. Real data example

360 For the validation of the model, the dynamics of the two-dimensional field of
sea surface height (SSH) data from the NEMO ocean model for the Arctic region
(center of the Barents sea) for a modeling month a resolution of an hour has
been used. The area is known to have strong tides, leading to the discovery of
the time-dependent equation. It is necessary to emphasize that despite existing
365 Tidal equations, there is no single analytical equation for the specific case of
the SSH dynamics in this region due to the overlapping of di↵erent natures’
processes. The studied domain was divided into daily intervals to reduce the
risks of the deriving equation, which describes multiple processes following each
other in the domain. The data’s spatial properties are as follows: the intervals
370 between nodes are approximately 5 km, while the domain contained 50 ⇥ 50
nodes.
After the application of the framework to the data, we obtain the equation
in the form Eq. 20.

@u @u @2u
= 0.0506 0.0053 (20)
@x @t @t2
Eq. 20 was solved, and the calculated field was compared with the initial one
375 to validate the result of the algorithm. Since there is a second-order time deriva-
tive and first spatial derivative, the initial conditions (two initial time steps to
represent the field and its first time derivative for the beginning of the studied
period) and the boundary condition on one edge of the studied area are set. The

18
178

graphs of daily sea surface height dynamics from reanalysis and equation solu-
380 tion are presented in Fig. 4 and Fig. 5. The quality metrics show that the discov-
ered equation can describe the equation well: RM SE = 0.0434, M AE = 0.0446
for the field with values in interval between approximately 0.5 and 0.9.

t = 15.0 t = 16.0 t = 17.0

Figure 4: Example of SSH field, obtained from reanalysis (upper row) and the same field from
Eq. 20, (lower row) for 3 time frames

5.3. Comparison with other methods


The experiments, similar to the ones in [7], have been performed to com-
385 pare the proposed algorithm with existing state-of-the-art methods. Due to
the framework’s limitations, a single equation Eq. 22 is used, instead of the
system Eq. 21, that is utilized in [7]. Additional difficulties to the comparison
were contributed by unknown initial and boundary conditions in the referenced
experiment.
8
>
< @U =
@t U rU + ⌫ U, U = (u, v)T
(21)
>
:U |t=0 = U0 (x, y)

@u @2u @2u @u @u
= ( 2 + 2 ) + u( + ) (22)
@t @x @y @x @y

19
179

Figure 5: Dynamics of sea surface height for September 18, 2013: reanalysis (denoted as data)
and solution of the equation, obtained from framework, denoted as model, for the center of
the studied area

390 Eq. 22 was solved using finite di↵erences, and the noise from normal dis-
tribution was added to simulate the previous experiment. The preprocessing
phase, described in previous sections and involving smoothing and derivatives
calculation, was performed to reduce noise’s influence on the resulting equation.
The added noise was created from k ⇥ maxx,y,t u(x, y, t) ⇥ N (0, 1). The
395 experiments resulting in the correct structure have been conducted with the
value k = 0.001, as in the compared study [7]. As the framework output, we
will consider the closest to the equation’s correct structure, obtained on the grid
of sparsity constant values.
These tests show that noise resistance corresponds with other framework
400 applications and is somewhat better than in the compared experiment. Despite
the insignificant di↵erence in the coefficient k (0.001 versus 0.00015), the noise
level di↵erence is significant (1% versus 6.3%). For the noise levels approxi-
mately below 5%, the correct equations were detected. The equation structure
deteriorates, which manifests in wrong weights and additional terms or lack of
405 mandatory ones.

20
180

Table 4: Discovered structures of the equations for the specific noise levels

k Noise level of input data (%) equation


2 2
@u
@t = (0.999 @@xu2 + 1.000 @@yu2 )+
0.0005 0.25%
+u(1.001 @u @u
@x + 1.000 @y )
2 2
@u
@t = (1.001 @@xu2 + 1.001 @@yu2 )+
0.00075 0.49%
+u(0.999 @u @u
@x + 0.999 @y )
2 2
@u
@t = (1.000 @@xu2 + 0.999 @@yu2 )+
0.001 0.97%
+u(1.000 @u @u
@x + 1.000 @y )
2 2
@u
@t = (1.001 @@xu2 + 1.002 @@yu2 )+
0.00125 4.2%
+u(0.998 @u @u
@x + 0.999 @y )
2 2
@u
@t = (0.996 @@xu2 + 0.998 @@yu2 )+
0.0015 6.3% 2
+u(1.000 @u @u
@x + 1.003 @y ) 0.0034 @@yu2 @u
@x

6. EPDE framework description

The framework, encompassing the described method, is designed to allow the


user to customize the algorithm’s significant elements while giving the default
pipeline and necessary tools for the di↵erential equation discovery. The setup
410 of the equation discovery experiment requires the selection of functions (tokens)
that form the pool, from which the algorithm creates the candidate equations.
The main element that has to be defined is obtaining the function values on the
set of processed points to evaluate further the fitness function in the Eq. 10. For
example, the derivatives in the framework’s current development are stored as
415 the pre-computed matrices of their values on the grid. In contrast, trigonometric
functions’ values are calculated during the fitness function calculation due to a
frequency parameter. The correct token evaluation method’s selection can be
viewed as the trade-o↵ between memory storage utilization and computational
powers involved in calculating functions during the algorithm run.
420 The token families, sets of elementary functions, are created with the def-

21
181

inition of the tool mentioned above. For every token, the range for function
parameters (such as power or frequency) and markers are specified, setting the
behavior of functions in the equation structure (i.e., if a function can be present
in terms in the left side of the equation, if it is in the right part, or if multiple
425 tokens from a token family can be in a term).
The workflow of the main evolutionary algorithm, which forms the algo-
rithm’s cornerstone, is mutable via the evolutionary operator’s modifications.
To guide the operator’s development, we have introduced the builder class, rep-
resenting the eponymous pattern. The operators, presented in the Sec. 4, are
430 included in its default form, provided by the director class. However, the user
can modify all of the significant elements if the specific equation discovery task
requires it. The selection of the parameters for the evolutionary operators is
made in their definition in the builder class.

7. Neural networks approximation with automatic di↵erentiation

435 This section is dedicated to changing the di↵erentiation method. The pro-
posed algorithm has a modular structure. Thus, we may replace the di↵erentia-
tion algorithm from finite di↵erences or analytical di↵erentiation of polynomials
to the neural network approximation with further automatic di↵erentiation.

7.1. Application of artificial neural networks to the data preprocessing

440 Automatic di↵erentiation is a standard tool in deep learning frameworks.


The process of neural network training utilizes the backpropagation technique
based on calculating the loss function gradient with respect to the weights’ val-
ues, which is often done via automatic di↵erentiation. With this approach to
the derivative calculation problem, the preprocessing stage involves two stages:
445 fitting the artificial neural network to the input data and the automatic di↵er-
entiation with respect to the spatial coordinates and time.
The multi-layered feed-forward artificial neural network’s training process
with the sigmoid activation functions was implemented, using the tensorflow

22
182

framework. We use the network that contained three fully-connected layers


450 (generally, with 256, 512, and 64 neurons) in the experiments. The selection
of architecture was driven by the propositions in [25, 26]. We utilize the mean
squared error as the loss function during the artificial neural network training
process, performed with Adam stochastic optimization algorithm. A random
sample of studied data points (the function we want to describe with the dif-
455 ferential equation, which we will obtain later) is used for each epoch’s training
process.
After the ANN training process with the studied data, automatic di↵erenti-
ation is used to obtain the derivatives. In contrast to the previously mentioned
method of calculating derivatives of polynomials fitted only along an axis, the
460 automatic di↵erentiation technique can get mixed partial derivatives. The gra-
dients, hessian and further derivatives are collected from the in-built methods
of automatic di↵erentiation.

7.2. The analysis of ANN preprocessing properties

The wave equation’s test case with one spatial dimension was examined to
465 compare the noise-reducing properties of proposed derivatives evaluation meth-
ods: the kernel smoothing and analytical di↵erentiation of fitted polynomials
against the automatic di↵erentiation of artificial neural networks. The noise is
added in the way shown previously. However, data is not separated into sub-
domains. Here, we add the Gaussian noise with µ = 0 and = 0.03 ⇤ max(u(t))
470 to each of the time frames.
The lower quality of the artificial neural field reconstruction of the noisy
field presented in Fig. 6, can not be attributed to the overfitting or the under-
fitting of the network: the same structures are obtained in the initial stages
of loss function stabilization and on the latter epochs. Therefore, we can at-
475 tribute the method’s downside to the particular architecture: 3 fully connected
layers of the ANN with particular neurons. Nevertheless, the comparison of the
fields obtained after the di↵erentiation (see Fig. 7) shows that the default kernel
smoothing and the polynomial fitting algorithm can better calculate the fields.

23
183

(a) Original noised field (b) The noised field reconstruction

Figure 6: The comparison between the solution of wave equation x and y axes are coordi-
nates, color value of the function u(x, y) a) and the ANN, fit to the solution b)

In the discussed experiments, the noise levels, introduced in Eq. 4, in the initial
480 fields are approximately 2.49%. The noise levels in the first time derivatives
(Fig. 8) are 31.6% and 1116.89% for default method and ANN accordingly, and
the first spatial derivatives (Fig. 9) are 29.7% and 927.14% correspondingly.

8. Conclusion

The proposed method has proven to be suitable for the data-driven deriva-
485 tion of equations that can model various physical processes. The robustness
of the algorithm to the noise in the input data provided by improved prepro-
cessing of data allows the framework applicable to real-world problems. Even
in the cases of substantial noise in the input data, the resulting equations had
the correct structures and, therefore, can correctly describe the studied system.
490 Other notable points about the algorithm operation can be stated:

• To achieve a good quality of the resulting processes, the areas, localizing


di↵erent processes, should be separated and studied independently. It is
presented in the case of real-world data processing when the area that is
already known to have strong time dependencies of sea surface height was
495 separated, and the equation for it was derived;

24
184

(a) Field of the correct first time derivative (b) Field of the correct spatial time derivative

Figure 7: The fields of time and spatial derivatives, calculated from the noiseless data

(a) Kernel filtering & polynomial di↵erentiation (b) Automatic di↵erentiation result

Figure 8: The comparison between the di↵erentiation methods for first time derivative solution
of wave equation

25
185

(a) Kernel filtering & polynomial di↵erentiation (b) Automatic di↵erentiation result

Figure 9: The comparison between the di↵erentiation methods for first spatial derivative
solution of wave equation

• The meta-parameters of the algorithm have a strong influence on the final


result. For example, low values of sparsity constant can lead to additional
terms in the equation, while its higher than optimal values can completely
distort the equation structure. Therefore, mechanisms of meta-parameter
500 selection should be implemented in the further development of the method;

• The proposed preprocessing technique, that combines kernel smoothing


and fitting the Chebyshev polynomial to the data in a specified window,
is proven to be an efficient derivative calculation tool. It was shown to
505 perform better than the automatic di↵erentiation of an artificial neural
network.

Areas of the further development of the framework can include deriving a


more generalized class of equations, using similar techniques, not limiting the
results in a class of partial di↵erential equations. Additionally, the equations
510 for vector variables or even systems of equations can be the next targets for the
work.
Source code is publicity available at GitHub [1].

26
186

Acknowledgements

This research is financially supported by The Russian Scientific Foundation,


515 Agreement #19-71-00150.

References

[1] NSS Team, Fedot E* algotirhms, https://github.com/ITMO-NSS-team/


FEDOT.Algs (2020).

[2] M. Maslyaev, A. Hvatov, A. Kalyuzhnaya, Data-driven partial di↵erential


520 equations discovery approach for the noised multi-dimensional data, in:
International Conference on Computational Science, Springer, 2020, pp.
86–100.

[3] J. Berg, K. Nyström, Neural network augmented inverse problems for pdes,
arXiv preprint arXiv:1712.09685.
525 URL https://arxiv.org/abs/1712.09685

[4] H. Schae↵er, R. Caflisch, C. D. Hauck, S. Osher, Learning partial dif-


ferential equations via data discovery and sparse optimization, Proceed-
ings of the Royal Society A: Mathematical, Physical and Engineering Sci-
encedoi:473(2197):20160446.

530 [5] S. H. Kang, W. Liao, Y. Liu, Ident: Identifying di↵erential equations with
numerical time evolution, arXiv preprint arXiv:1904.03538.

[6] T. Qin, K. Wu, D. Xiu, Data driven governing equations approximation


using deep neural networks, Journal of Computational Physics 395 (2019)
620–635.

535 [7] Z. Long, Y. Lu, X. Ma, B. Dong, PDE-net: Learning PDEs from data, in:
International Conference on Machine Learning, 2018, pp. 3208–3216.

[8] M. Raissi, Deep hidden physics models: Deep learning of nonlinear partial
di↵erential equations, The Journal of Machine Learning Research 19 (1)
(2018) 932–955.

27
187

540 [9] M. J. Asher, B. F. W. Croke, A. J. Jakeman, L. J. M. Peeters, A review


of surrogate models and their application to groundwater modeling, Water
Resour. Res. 51 (2015) 5957–5973. doi:10.1002/2015WR016967.

[10] M. P. Rumpfkeil, P. Beran, Multi-fidelity surrogate models for flutter


database generation, Computers & Fluids 197.

545 [11] N. O. Nikitin, P. Vychuzhanin, A. Hvatov, I. Deeva, A. V. Kalyuzhnaya,


S. V. Kovalchuk, Deadline-driven approach for multi-fidelity surrogate-
assisted environmental model calibration: Swan wind wave model case
study, in: Proceedings of the Genetic and Evolutionary Computation Con-
ference Companion, 2019, pp. 1583–1591.

550 [12] M. Maslyaev, A. Hvatov, A. Kalyuzhnaya, Data-driven partial derivative


equations discovery with evolutionary approach, in: International Confer-
ence on Computational Science, Springer, 2019, pp. 635–641.

[13] B. D. Santer, T. M. Wigley, M. E. Schlesinger, J. F. Mitchell, Developing


climate scenarios from equilibrium gcm results.

555 [14] M. Cabré, S. Solman, M. Nuñez, Creating regional climate change scenarios
over southern south america for the 2020’s and 2050’s using the pattern
scaling technique: validity and limitations., Climatic Change 98 (2010)
449–469. doi:10.1007/s10584-009-9737-5.

[15] S. Castruccio, D. J. McInerney, M. L. Stein, F. Liu Crouch, R. L. Jacob,


560 E. J. Moyer, Statistical Emulation of Climate Model Projections Based on
Precomputed GCM Runs*, Journal of Climate 27 (5) (2014) 1829–1844.
doi:10.1175/JCLI-D-13-00099.1.

[16] T. Weber, A. Corotan, B. Hutchinson, B. Kravitz, R. Link, Technical note:


Deep learning for creating surrogate models of precipitation in earth system
565 models, Atmospheric Chemistry and Physics 20 (2020) 2303–2317. doi:
10.5194/acp-20-2303-2020.

28
188

[17] K. Kaheman, S. L. Brunton, J. N. Kutz, Automatic di↵erentiation to si-


multaneously identify nonlinear dynamics and extract noise probability dis-
tributions from data, arXiv preprint arXiv:2009.08810.
570 URL https://arxiv.org/abs/2009.08810

[18] L. Zhang, H. Schae↵er, On the convergence of the sindy algorithm, Multi-


scale Model. Simul., 17(3) (2019) 948–972.
URL https:///arxiv.org/abs/1805.06445

[19] S. H. Rudy, A. Alla, S. L. Brunton, J. N. Kutz, Data-driven identifica-


575 tion of parametric partial di↵erential equations, SIAM Journal on Applied
Dynamical Systems 18 (2) (2019) 643–660.

[20] M. Raissi, P. Perdikaris, G. E. Karniadakis, Numerical gaussian processes


for time-dependent and nonlinear partial di↵erential equations, SIAM Jour-
nal on Scientific Computing 40 (1) (2018) A172–A198.

580 [21] J. Berg, K. Nyström, Data-driven discovery of pdes in complex datasets,


Journal of Computational Physics 384 (2019) 239–252.

[22] M. Raissi, P. Perdikaris, G. Karniadakis, Physics informed deep learning


(part ii): Data-driven discovery of nonlinear partial di↵erential equations.
arxiv 2017, arXiv preprint arXiv:1711.10566.
585 URL https://arxiv.org/abs/1711.10566

[23] I. Knowles, T. Le, A. Yan, On the recovery of multiple flow parameters from
transient head data, Journal of Computational and Applied Mathematics
169 (1) (2004) 1–15.

[24] R. Piche, Automatic numerical di↵erentiation by maximum likelihood es-


590 timation of state-space model, arXiv preprint arXiv:1610.04397.
URL https://arxiv.org/abs/1610.04397v1

[25] Z. Zainuddin, P. Ong, Function approximation using artificial neural net-


works, International Journal Of Systems Applications, Engineering & De-
velopment 1 (4) (2007) 173–178.

29
189

595 [26] K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks


are universal approximators, Neural Networks 2 (1989) 359–366.

Appendix A. Pseudo-code of the algorithm

30
190

Input: set of elementary tokens T , symbolically representing constant,


initial function, and its various derivatives; set of function
measurements from the studied field
Parameters: M - number of token combinations in a single individual;
k - number of elementary tokens in a combination; n pop
- number of candidate solutions in the population;
evolutionary algorithm parameters: number of epochs
nepochs , mutation rmutation & crossover rates rcrossover ,
part of the population, allowed for procreation aproc ,
number of individuals, refrained from mutation (elitism)
aelite ; sparse regression parameter - sparsity constant
Result: The structure of the partial di↵erential equation with the
corresponding weights, best fitting the input field
Smooth the measurements & calculate the derivatives; Generate
population P of individuals, representing equation, of size n pop, with
M - random permutations of k tokens to form sets C j ;
for epoch = 1 to nepochs do
for individual in population do
Apply sparse regression to the individual to calculate weights;
Calculate fitness function to individual;
end
Hold tournament selection and crossover;
for individual in population except n pop ⇥ aelite ”elite” ones do
Mutate individual;
end
end
Select the individual with the highest fitness function value as the final
structure of the solution to the problem;
Calculate the correct weights of the equation using linear regression.
Algorithm 1: The pseudo-code of the algorithm operation

31
191

Multiobjective Evolutionary Discovery of Equation-Based


Analytical Models for Dynamical Systems
Mikhail Maslyaev, Alexander Hvatov
October 15, 2023

Abstract
While there are multiple approaches to simulating a dynamical system, that represents unknown
physical process, the majority of these methods can not be connected to the analytical equation-based
models. In this article, we suggest an approach of describing a process with a system of data-driven
ordinary di↵erential equations, that employs multiobjective optimization to obtain the set of candidate
systems. The main objective of the research is the development of a robust tool, that is able to construct
an analytical model for an arbitrary dynamical system, using as few assumptions as possible. To operate,
the algorithm demands data, describing state of the system, evolving in type. The optimization is held
in the criteria space of complexities and qualities of obtained di↵erential equations, allowing the selection
of the parsimonious model, that fits the needs of the process description. To improve the flexibility
of the algorithm, while sustaining the robustness to the noise in input data, linked to the real-world
applications, the algorithm performs simultaneous search of the systems’ equations, that have minimum
a priori assumptions about their structure. Thus, the model search algorithm obtains Pareto-optimal
set of systems. To validate the approach, experiments with the simple ”hunter-prey” model were held.
On this case the equation search algorithm has performed well, being able to correctly derive system of
equations with insignificant errors in coefficient estimations, linked to the numerical errors.

1 Introduction
Systems of ordinary or partial di↵erential equations (ODEs and PDEs) are powerful tools, that are able
to describe complex dynamics of structures, involving multiple variables. While there are many tools, that
can be used for creating mathematical models for processes, such as classical machine learning models, or
unconventional ones, like bayesian networks, they tend to have strict limitations to their applications. In cases
of many real-world systems, in addition to the aforementioned issues, these models are often abstracted from
the intrinsic physical principles, guiding the system. The classical approach to deriving systems of di↵erential
equations necessitates use of mathematical analysis in combination with the in-depth understanding of the
process. The data-driven approach to the system discovery involves creation of an individual di↵erential
equation for each dependent variable, that can be measured from a system.
The forms of the discovered models, that contain systems of di↵erential equations, are selected due to
the prevalence of di↵erential equation in physical systems. For example, flow of viscous fluid is governed by
Navier-Stokes equations, that are a system of partial di↵erential equations. Dynamics and interactions be-
tween electric and magnetic components of the electromagnetic field are described with Maxwell’s equations,
which are a system of PDE as well. Many simpler systems, such as rotation of the spherical pendulum, can
be defined with system of ordinary di↵erential equations.
Apart from the descriptive possibilities, provided by models in forms of systems of di↵erential equations,
obtained systems can be solved to predict further states of the process. While the toolkit for the automatic
solution of systems of ordinary/partial di↵erential equations is out of scope of this study, a number of stud-
ies have been conducted towards implementing equation-solving module into the frameworks of di↵erential
equations discovery as in work [1]. With this ability to solve model equations, the dynamics of the system
can be propagated into the future.
The problem of creating models for dynamical systems, governed by di↵erential equations, has seen surging
interest in the recent years. The first perspective to the task involves developing substitutes for the equations
in forms of propagation operators, that map the state of the system forward in time like in [2] or [3]. Dynamic

1
192

mode decomposition (DMD) involves approximation of the system dynamics with a finite-dimensional linear
operator. While that can be useful for multiple real-world applications, where the propagator is linear, many
other cases involve non-linear dynamics, that can not be fully explained with DMD approach.
A number of data-driven solutions to the problem of explaining dynamical system with explicitly derived
governing equations has been developed. Here, we will inspect methods, that are applicable not only for
problems of discovering ordinary di↵erential equations (ODE) and systems of ODEs, but also for tasks
of partial di↵erential equations discovery. The first problem has sufficient solution in forms of multilayer
stochastic models (MSMs), proposed in Kondrashov, Chekroun and Ghil in [4]. However, due to the non-
Markovian approach, the approach is not extendable to the problems of partial di↵erential equations.
The earliest advances were made with the symbolic regression [5]. Governing equations are viewed as
computational tree graphs, where leaves are inputs, and on the other levels various operators are located.
The search of the equation can be done with the typical graph-targeted evolutionary optimization algorithms.
More contemporary approaches are represented by sparse regression - based models, developed many works,
including Kaheman et. al. in [6] and Berg & Nystrom in [7], and with artificial neural networks (ANN)
representation of the dynamical system. While there are multiple approaches to discover di↵erential equations
with artificial neural networks, notable ones include PINN [8], PDE-Net, developed by Long et. al. [9], and
physics-informed neural networks by Raissi et. al. [10].
Partial di↵erential equation search with sparse regression uses LASSO operator, that is applied to ap-
proximate time derivative with a library of candidate terms. That library has to contain all of the possible
equation terms, and the usage of sparsity operator allows selection of only a few active feature terms. The
main issues of this approach can be linked with its rigidity: the term library has to be extensive enough to
contain all possible terms, including all of the non-linear functions, that can be present in equations. While
many of the presented approaches can be applied to the systems of di↵erential equations, their possibilities
are limited by description of time dynamics of a vector variable, like in paper by [11].
The algorithm, described in this article, is based on the multiobjective evolutionary optimization approach,
where the obtained model is evaluated by a number of metrics, describing quality and complexity of the
equations of the system. Thus, the algorithm is able to provide the parsimonious model, that is not overly
complex, but can sufficiently simulate the dynamics of the process. However, the problem of selecting
that parsimonious model from the discovered Pareto frontier is the problem for another study. This paper
is dedicated to the problem of discovering the optimal set of candidate equations, for the further expert
conclusions and applications.
The paper is divided into the following subsections: in the section Sec. 2 we formulate the equation
discovery problem; Sec. 3 provides generalized description of the developed algorithm; in Sec. 4 we evaluate
the performance of the algorithm on validational data. Sec. 5 concludes the paper and discusses further
development of the approach. All required data and code that are required to reproduce the results are
available from the URL placed in Sec. 6.

2 Equation discovery problem


To describe some unstudied process, which involves multiple (n) dependent variables, we desire to de-
rive a system of di↵erential equations. Let’s denote these variables in general problem statement as u =
(u1 (t, x), u2 (t, x), ... , un (t, x)). They are defined in the spatial domain ⌦, represented by coordinates x, and
dependent from time t. In case of a system of ordinary di↵erential equations, the variables can be assumed
to be only time-dependent (i.e. u1 (t), u2 (t), ... , un (t)).
For the equations search process the algorithm requires sets of observations, arranged on a rectangular
grid. For the equation search process the algorithm demands arrays of calculated derivatives. While in some
cases these derivatives can be obtained directly, using measurement techniques, in others they necessitate a
preprocessing phase, where the derivatives are calculated numerically from the input data variables. While
the numerical techniques of derivatives estimation are numerous [12], the most efficient approaches are finite-
di↵erence di↵erentiation and analytical di↵erentiation of variable-approximating polynomials. In many cases,
additional smoothing is required to reduce magnitudes of noise in the data. Here, the algorithm employs
Gaussian smoothing in the spatial domain, or replacement of the initial data fields with their artificial neural
network approximation.

2
193

8 8 P Q
< L1 (u) = 0 < i ↵1i j t1ij = 0
S(u) = ... , ... (1)
: : P Q
Lk (u) = 0 i ↵ni j tnij = 0
The search for the optimal structures of equations in the system is done with the multi-objective opti-
mization, implemented with the Many-Objective Optimization Evolutionary Algorithm Based on Dominance
and Decomposition (MOEA/DD), introduced in [13].
The search is performed in the criteria space of complexities C(L0j u) and modelling errors Q(L0j u) for each
individual equations in a system. Therefore, the problem can be reformulated as Eq. 2. Here the constraints
are introduced in the equations construction logic rather than explicitly specified during the optimization
problem statement.

minimize F(S(u)) = (f1 (S(u)), ... , fm (S(u))) = (C(L01 u), Q(L01 u), ... , C(L0n u), Q(L0n u)) (2)
The complexity metric C(L0j u)is defined as a number of ”active” tokens in the equation, i.e. ones, that
are present in terms with non-zero coefficients.
The problem of selecting the most appropriate metric for evaluating the properties of process represen-
tation for the equation has been studied in work [1]. The best metrics for modelling quality are L2 norm
of matrices of di↵erential operator residuals, as presented in Eq. 3, or the norm of matrices of di↵erences
between the input variable fields and the solutions of corresponding equations Eq. 4.

Q(L0j u) = ||L0j u||2 (3)

Q(L0j u) = ||uj u˜j ||2 (4)


Due to the necessity to conduct optimization, having a limited number of candidate solutions, implemented
approach uses concept of domination for proposed solutions for of the systems of equations search problem.
It is said, that candidate system S1 (u) dominates candidate system S2 (u), if for all optimized criteria fi :
fi (S1 (u))  fi (S2 (u)) and for a single criterion fj : fj (S1 (u)) < fj (S2 (u)). A solution is called Pareto-
optimal, if no other solution dominates it. The objective of the implemented algorithm is to obtain a set
of candidate solutions, where each solution is Pareto-optimal. In addition to the Pareto-optimal set, other
non-dominated sets can be introduced by induction: n-th non-dominated level is comprised by solution, that
are not dominated by any solutions, except the ones on the n 1-th, or lower levels.

3 Approach description
In this section we will briefly describe main diversions of our approach from the original algorithm, and case-
specific solutions employed during system of di↵erential equations derivation, such as evolutionary operators.
In accordance to the optimization objectives, stated in the previous section, algorithm performs simultaneous
search of system equations and parameters, which define the equations structures. The structure of an
equation can be decomposed into a set of equation terms and set of their real-valued coefficients ↵.
X Y
L0j u = ↵i tij (5)
i j
Q
The terms of constructed equations are represented with a product of tokens j tij , tij 2 T , elementary
building blocks, containing arbitrary user-defined functions. This approach enables discovery of non-linear
equations with compoundn structures, that have structures like in 5. During search of di↵erential equations
various derivatives (e.g. @@xuni , n 2 N) are included into the pool T . Other case-specific functions, or external
j
variables can be included as tokens into the token pool to be available for the algorithm during equation
search. For example, if the objective of a study is the discovery of the equation for the temperature dynamics
in a medium, than the velocity field of the medium can be considered as such external variable.
To create a system, that is able to fully model the studied process, it is possible make a an assumption,
that each equation of the system must represent spatial or temporal dynamics of at least one variable. By
the property of describing a dynamic of a variable we understand, that the equation contains corresponding
derivatives of the variable. During the evolutionary search, evolutionary operators, a↵ecting the structures
of the equations have to preserve the descriptive properties of such terms.

3
194

3.1 Evolutionary algorithm details


To start the evolutionary optimization, the algorithm has to construct the initial population P = S1 (u), ... , S2 (u)
of randomly generated candidate systems of di↵erential equations. As was mentioned above, an equation of
a system has to represent dynamic of a corresponding variable. Therefore, during the initialization a variable
is assigned to each equation as its ”main” one. Without loss of generality, we can assume, that i-th equation
describes i-th variable.
To respect the dual approach to system discovery, encoding of an individual has to represent both equa-
tions and meta-parameters of the equations. The chromosome of an individual contains computational
graphs of the equations as ”equation genes” and values of the parameters, that define creation of the equa-
tion. Equation graphs take forms of tree graphs, where the leaves are elementary functions, stored in tokens,
intermediate nodes are product operators, that form equation terms from factor tokens. The root of the
graph is comprised of the summation operator, which combines separate terms into the equation. Scheme of
the equation system encoding is presented on Fig. 2.

Figure 1: Scheme of the encoding for an ODE with sparsity constants as metaparameters.

To regulate the complexity of the equations, proposed by the algorithm, a regularization tool has to
be created. Its main objective is exclusion of terms with low significance and low descriptive power in the
resulting model. Selection of the terms can be done with sparse regression, operating with LASSO operator,
presented in eq. 6. As the predicted value of the operator, a random equation term, which represents an
”equation variable”, i.e. contains its derivative, is selected. LASSO operator is able to obtain a vector of
term weights with values of the terms in the left-hand side of the equation, evaluated on the space-time
grid, normalized and combined into matrix Fk , and vector of right-hand part values Ftarget,k . In the operator
statement, by || · ||i the i-th norm of matrix is meant.

kFk Ftarget,k k22 + k k1 ! min (6)

The sparsity constant parameter , determines the penalty of optimized functional with respect to the
values of weights in , prioritizing setting zero coefficients to the less significant predictors. By regulating the
value of sparsity constant, the algorithm is able to control the complexity of the equation. Higher values of
promote equations with less numbers of terms, while lower values tend to lead to more complex equations.
Due to the significance of the sparsity parameter for the equation definition, it is included for each equation
in the system into the encoding of the individual.
The coefficients of the equation are computed with linear regression, where active terms from the left-hand
part are combined into matrix of predictors, and values of the term on the right part is used as a predicted
value.

4
195

3.2 Evolutionary operators


The general idea of evolutionary operators, a↵ecting the population to obtain the set of optimal systems
of equations, is borrowed from the single-objective algorithm of equation discovery, proposed in [14]. The
alterations of an individual equation can be done with operators of mutations and crossover. The manner, in
which the operators are applied to individuals of the population, follows the guidelines, presented in paper,
describing base algorithm.
The process of the evolution is held iteratively, for a specified number of iterations and over sectors,
defined by the weight vectors, introduced into the space of optimization criteria to decompose the problem
into smaller sub-problems. According to [15], the algorithm, constructs set of weight vectors W = w1 , ... , wN
from a unit simplex, one for each candidate solution in P.
With the previously stated introduction of the weight vectors in mind, each individual of the population
P is assigned to a random sector of the criteria space. That enables a more even coverage of the search space
due to the property, that the individuals converge in directions of weights.
The selection of the individuals for the crossover operators is held in the manner, that respects problem
decomposition. In base scenario, the parents are selected from the neighbouring sectors to the one, associated
with the processed weight vector. However, to increase the exploratory properties of the algorithm, that are
vital in the problem of equation construction, with a relative small probability, the parents are selected form
other, non-adjacent sectors. The selected candidates are added into the parent pool, and the crossover is
held among them.
The crossover operator a↵ect both systems of equations and corresponding vectors of meta-parameters.
The interactions between equations of the systems comply to the requirements of variable description. For
each of the modelled variable, the corresponding equations of the parent systems are a↵ected by crossover.
Two main types of the operator are used here: term-wise exchange and complete equation swapping.
The first type of the equation-level crossover operators involves exchange of terms between parent equa-
tions. All initial terms of the equation are divided into three groups: terms, present in a same form in both
parents, terms. Next, there are terms, which are present in both parents, but the parameters of their tokens
are di↵erent. And, finally, there are terms, unique between parents. The first group is not a↵ected with
crossover at all. The crossover between parents in the second groups is parametric-only: the same tokens
exchange the parameter values from a specified proportion.
After the creation of o↵spring individuals, they are a↵ected by mutation operators. Their purpose is two-
fold. Not only they are increasing the exploratory properties of the algorithm, but also prevent generation of
the repeating individuals, which is mandatory for the implemented multi-objective optimization approach.
Main idea of the mutation operator is the random change of a term into a new, unique one. The first type of
the operator changes a factor, representing a token, into a new, randomly generated one, or a change of token
parameters (e.g. frequency of a sine token) with an increment, taken from N (0, ). The second type involves
a replacement of a term with a newly generated one. When the o↵spring creation procedures are conducted,
the Pareto levels are updated with respect to the newly created solution. The population update algorithm
takes into account the decomposition of the problems with set of weight vectors and the domination.

4 Validation
To assess the performance of the proposed approach on tasks of discovering systems of equations, that govern
dynamical system, a number of validational experiments has been conducted. The most demonstrative
approach to check the behaviour our the algorithm employs synthetic data, obtained from the solution of
known equations.
A hunter-prey model, described by Lotka-Volterra equations (7), was selected as the dynamical system
to be described. The model represents simplified dynamics of two species: u = u(t) depicts ”prey”, while
v = v(t) represents the ”hunter” species. Constants ↵, , and determine the dynamics of the system. The
solutions of the equations were numerically obtained, using Runge-Kutta methods. The solutions for u(t)
and v(t) are demonstrated on figures Fig. 3. The process was modelled for a 1000 time steps with ini
⇢ 0
ut = ↵u uv
S(u) = (7)
vt0 = uv v

5
196

Figure 2: Generalized scheme of the algorithm main search sequence.

Figure 3: Visualization of the solution of Lotka-Volterra equations.

The Pareto-optimal set of equations, obtained from the algorithm, typically have forms, similar to the one,
presented in Fig. 4. Here the algorithm output is reformulated with the combination of optimized metrics:
instead of evaluating complexity or approximation errors of individual equations, they are viewed for the
system integrally. The allowable interval for complexity controlling parameters , used in sparsity operators,
is located between 10 8 and 10 2 . Next, additional family of trigonometric tokens was introduced into the
pool to create diversity of created terms.
While this test can not be considered as the comprehensive study of the algorithm properties, it can
be viewed as the proof of concept, that the algorithm is able to operate and discover the equations. 10
independent runs with 10 multiobjective optimization evolutionary algorithm iterations, and 10 more with
25 iterations were performed, and the obtained Pareto-optimal sets were compared. Due to the relatively
simple structure of the initial system of equations, a successful convergence to the similar (in terms of
obtaining sets with similar equations) was achieved in every case.

6
197

Figure 4: Pareto-frontier of systems of equations, obtained by the algorithm.

5 Conclusion
In this article, we proposed robust extension of the single di↵erential equation discovery approach to the
problems of creating models for systems of di↵erential equations. The multi-objective approach enables the
creation of a diverse set of models, and with the analysis of complexity-quality tradeo↵, an expert should
be able to select the parsimonious model for the process description. The main drawback of the developed
approach is high computational cost, which can be especially noticeable in cases of multidimensional data (i.e.
systems of partial di↵erential equations) or data with high noise levels, where high numbers of iterations are
required for the algorithm convergence. Therefore, improvement of the algorithm computational performance
can be stated as the priority for the further development. Also, the development of the sufficient tools for
using the derived equations for the process state prediction is another goal of the next researches. We hope,
that the research in this paper will motivate the research of the physics-based and interpretable methods of
machine learning and data-driven simulations.

6 Code and Data availability


The numerical solution data and the Python code that partially reproduce the experiments are available at
the GitHub repository 1 .

Acknowledgements
This work was supported by the Analytical Center for the Government of the Russian Federation (IGK
000000D730321P5Q0002), agreement No. 70-2021-00141.
1 https://github.com/ITMO-NSS-team/EPDE

7
198

References
[1] M. Maslyaev and A. Hvatov, ”Solver-Based Fitness Function for the Data-Driven Evolutionary Discovery
of Partial Di↵erential Equations,” 2022 IEEE Congress on Evolutionary Computation (CEC), 2022.
[2] Brunton, S.L., Brunton, B.W., Proctor, J.L. et al. Chaos as an intermittently forced linear system. Nat
Commun 8, 19 (2017).
[3] P. J. Schmid and J. Sesterhenn. Dynamic mode decomposition of numerical and experimental data. In
61st Annual Meeting of the APS Division of Fluid Dynamics. American Physical Society, November
2008.
[4] D. Kondrashov, M. D. Chekroun and M. Ghil, “Data-driven non-Markovian closure models,” Physica
D: Nonlinear Phenomena, vol. 297, pp. 33-55, 2015.
[5] Schmidt Michael and Hod Lipson, “Distilling free-form natural laws from experimental data,” Science,
vol. 5923, pp. 81–85, 2019.
[6] K. Kaheman, J. N. Kutz and S. L. Brunton “SINDy-PI: a robust algorithm for parallel implicit sparse
identification of nonlinear dynamics,” Proceedings of the Royal Society A, vol. 476(2242), pp. 20200279,
2020.

[7] J. Berg and K. Nyström, “Data-driven discovery of PDEs in complex datasets,” Journal of Computa-
tional Physics, vol. 384, pp. 239-252, 2019.
[8] Han Gao, Matthew J. Zahr, Jian-Xun Wang, “Physics-informed graph neural Galerkin networks: A
unified framework for solving PDE-governed forward and inverse problems”, Computer Methods in
Applied Mechanics and Engineering, Volume 390, 2022, 114502, ISSN 0045-7825,

[9] Long, Z., et. al. (2018). “PDE-Net: Learning PDEs from Data”. Proceedings of the 35th International
Conference on Machine Learning, in Proceedings of Machine Learning Research, 80:3208-3216.
[10] M. Raissi, “Deep hidden physics models: Deep learning of nonlinear partial di↵erential equations,” The
Journal of Machine Learning Research, vol. 19(1), pp. 932-955, 2018.

[11] J. Zhang and W. Ma, “Data-driven discovery of governing equations for fluid dynamics based on molec-
ular simulation,” J. Fluid Mech, vol 892, pp. A5, 2020.
[12] F. Van Breugel, J. N. Kutz and B. W. Brunton, ”Numerical Di↵erentiation of Noisy Data: A Unifying
Multi-Objective Optimization Framework,” in IEEE Access, vol. 8, pp. 196865-196877, 2020.
[13] K. Li, K. Deb, Q. Zhang and S. Kwong, “An Evolutionary Many-Objective Optimization Algorithm
Based on Dominance and Decomposition,” in IEEE Transactions on Evolutionary Computation, vol. 19,
no. 5, pp. 694-716, Oct. 2015, doi: 10.1109/TEVC.2014.2373386.
[14] Mikhail Maslyaev, Alexander Hvatov, Anna V. Kalyuzhnaya, “Partial di↵erential equations discovery
with EPDE framework: Application for real and synthetic data”, Journal of Computational Science,
Volume 53, 2021, 101345, ISSN 1877-7503,

[15] I. Das and J. E. Dennis, “Normal-boundary intersection: A new method for generating the Pareto surface
in nonlinear multicriteria optimization problems,” SIAM J. Optim., vol. 8, no. 3, pp. 631–657, 1998.

8
199

Solver-Based Fitness Function for the Data-Driven


Evolutionary Discovery of Partial Differential
Equations
Mikhail Maslyaev Alexander Hvatov
Nature Systems Simulation Laboratory Nature Systems Simulation Laboratory
ITMO University ITMO University
Saint-Petersburg, 197101, Russia Saint-Petersburg, 197101, Russia
mikemaslyaev@itmo.ru alex hvatov@itmo.ru

Abstract—Partial differential equations provide accurate mod- of the dynamical system understanding, the researchers must
els for many physical processes, although their derivation can collect measurements as a preliminary part of the analysis.
be challenging, requiring a fundamental understanding of the Thus, the data-driven equation derivation can be introduced
modeled system. This challenge can be circumvented with the
data-driven algorithms that obtain the governing equation only as an alternative. Data-driven algorithms can develop a model
using observational data. One of the tools commonly used in for a process using measurements data and a few assumptions
search of the differential equation is the evolutionary optimization about the constructed model structure (i.e., the model will take
algorithm. In this paper, we seek to improve the existing the form of a partial differential equation).
evolutionary approach to data-driven partial differential equation One of the contemporary approaches to data-driven PDE
discovery by introducing a more reliable method of evaluating
the quality of proposed structures, based on the inclusion of the discovery is based on evolutionary algorithms (EA). Here, the
automated algorithm of partial differential equations solving. In objective of EA is the construction of a model in the form of a
terms of evolutionary algorithms, we want to check whether the partial differential equation from a selected set of elementary
more computationally challenging fitness function represented by constituents and with specific conditions that have the best
the equation solver gives the sufficient resulting solution quality performance on the input data. One of the issues linked with
increase with respect to the more simple one. The approach
includes a computationally expensive equation solver compared applying evolutionary operators to such ambiguously stated
with the baseline method, which utilized equation discrepancy to tasks is selecting an objective (fitness) function that shall be
define the fitness function for a candidate structure in terms of optimized during the problem solution.
algorithm convergence and required computational resources on The computation complexity of the fitness function is a
the synthetic data obtained from the solution of the Korteweg-de classical problem that is considered for classical optimization
Vries equation.
Index Terms—equation discovery, partial differential equation,
benchmark problems [1]. However, in the equation discovery
fitness function selection, data-driven modelling case, the function landscape cannot be known a priori. Thus,
only empirical conclusions on the influence of the hardness
I. I NTRODUCTION of computational complexity of a fitness function may be
obtained. In this paper, two possible approaches to compute
Differential equations are commonly used as mathematical fitness functions for equation discovery evolutionary algorithm
models for continuous physical processes. They describe the are considered. First is the computationally hard complete
interdependence of various derivatives of a studied multivariate equation solution. Equation solution should be a more noise-
function. In addition, differential equations can be used for resistant approach to evaluating the quality of evolutionary
system state predictions. By integrating the governing partial algorithm candidates due to the independence of a noisy
differential equation with specific initial and correctly stated data differentiation error. The ability to handle noisy data
boundary conditions, the state of the modeled process in the is crucial for modeling real-world physical systems since
future can be obtained. instrumental observational data gathering is usually imposed
The task of partial differential equation derivation for a by various disturbances. Moreover, the solver approach allows
specific problem in the past involved examining conservation considering boundary values, which are not in the scope of
laws that can be applied to the system and using analysis and previously used computationally more straightforward fitness
variational principles to extract the equation from the known calculation technique. It involved evaluating differential oper-
properties of the system. While these methods are widely used ator discrepancy from zero and performing poorly on noisy
to derive the equations, they demand significant preliminary data.
study of the process, which in some cases can not be held due The paper is structured as follows: Section II is dedicated to
to low comprehension of the system. Occasionally, regardless the analysis of existing approaches of data-driven discovery of
200

partial differential equations for purposes of modeling dynam- structures and systems of equations. Conceptually similar ap-
ical systems; Section III gives a brief formal overview of the proaches were introduced in [11] and [12]. However, they tend
solved problem and states the tasks for the research; Section IV to have stability and convergence issues while operating on
provides a description for the evolutionary algorithm of partial noisy data. The inaccurate estimations of derivatives, leading
differential equation discovery and compares the possible to the errors in calculating fitness functions and incorrect
ways of the case-specific fitness function definition; Section operation of selection operator, make their output inconsistent
V is dedicated to the analysis of algorithm performance on and the constructed models unreliable.
synthetic data, and Section VI outlines the paper and also During the equation discovery process, different forms of
describes the possible future work directions. operators appear. In a classical mathematical physics analysis,
many methods allow solving an equation of a given type
II. R ELATED WORK with a given type boundary conditions. Classical solution
The earliest advances in the data-driven algorithmic deriva- approaches require the expert to choose the proper method for
tion of partial differential equations were made with symbolic the given equation. Such an approach is improper for data-
regression [2]. The main idea of symbolic regression is con- driven discovery. We require a more automated yet maybe
structing an expression that describes dependencies in data less precise equation solver. Such approaches usually involve
using a computational graph. The leaves of the computational neural networks [13]. In the paper, we use an alternative
graph are used for modeled function and its derivatives, while realization of a neural network-based solver.
internal nodes are reserved for algebraic operations. Then, the
graph is optimized to represent the input data. While symbolic III. P ROBLEM STATEMENT
regression can construct an arbitrary form of expression for the This work’s primary goal is to check the influence of the
final model, it has some disadvantages, such as its tendency to more advanced fitness function on the data-driven derivation of
overtrain and vast search space, leading to poor performance ordinary and partial differential equations. As in the majority
of the optimization algorithm. Symbolic regression - based of the data-driven methods, the model is constructed based
approach has proved to be rather successful in problems of on sets of measurements. It has to be expected that a single
ordinary differential equation discovery. The examples of such dynamical process prevails in the entire area so that the derived
algorithm applications are numerous and include [3] and [4]. model can be applied to the complete dataset. The versatility
The former article involves application of numerical tools of of the approach makes possible the creation of models in forms
differential equations solution to evaluate the quality of a of non-linear ordinary and partial differential equations with
candidate, while the latter is notable for use of multiobjective the defined order n in the form of the expression Eq. 1, where
optimization, considering not only the solution fitting to the t is time, x is spatial coordinates vector, u is the function to
problem, but also its complexity. be modeled. The subscript denotes derivative along that axis,
The next class of data-driven methods of PDE discovery is e.g., ut means the first-order time derivative.
based on sparse regression, trained over a defined beforehand
library of candidate terms. Examples of works involving
sparse regression are numerous and include [5]–[8]. The main F (t, x1 , ... , xk , ut , ux1 , ... , utt , utx1 , ...) ⇠ Lu = 0 (1)
advantage of this approach is its low computational demands,
In contrast to the symbolic regression, which does not im-
but it can achieve such good performance with the cost of
pose any limitations on the models’ structures, the proposed al-
a limited expanse of possible equations, discoverable by the
gorithm makes assumptions about the equation form to reduce
framework: the researcher has to manually select all reasonable
the search space and avoid unlikely candidates. The candidate
terms for the equation.
model structure is granted a form of Eq. 2, where the factors
One of the most actively developing methods involves
fj belong to the set of derivatives {ut , ux1 , ... , utt , ...}.
applications of artificial neural networks (ANN) to derive the
The highest order of the derivative along the axis for the set
partial differential equation in the form of first-order time
may vary. In some cases, the dynamical system’s details may
derivative approximations. The most notable examples in this
indicate that the derivatives along a specific axis can be only up
include physics-inspired artificial neural networks, developed
to a certain order, or a similar notion can be obtained from an
by [9], aimed at the representation of both input data and the
algorithm’s launch results. Also, it is possible to assume that
structure of the PDE with ANNs. Another approach, proposed
the total number of significant (i.e., with nonzero coefficient)
by [10], views differential operators through the lens of con-
terms in the candidate equation and factors in a term will be
volution kernels, thus approximating the system’s dynamics.
relatively low due to the majority of existing equations for
The former approach has the downside of requiring knowledge
dynamical systems having this property.
about the general structure of the equation. The latter approach
can detect an arbitrary operator for the dynamics, but the P Q
L0 u = i ai (t, x)ci = 0 , ci = j fj ,
classes of learned equations are still limited by necessarily (2)
ai (t, x) = a0i (t, x) ⇤ bi , b 2 R
having first-time derivatives.
Evolutionary algorithms of PDE discovery proved to be While the process modeling shall be based on the solution
rather effective, being able to derive equations with complex of a single equation, the study [14] shows the benefits of
201

the multi-objective optimization approach. It allows the re- discovered equations are defined only by data and by the
searcher to obtain the set of Pareto-optimal solutions in terms hyperparameters of the algorithm, mainly sparsity parameters,
of multiple objective criteria, such as quality of the model which limit the number of terms in the equation.
(zero discrepancies or data reproduction precision computed Before the fitness evaluation discussion, we provide the gen-
with solver), its complexity, etc. For the latter criterion, the eral details of the evolutionary algorithm shall be introduced
approach adopts the proposed quantity of the significant terms without delving into the details of candidate equation quality
in the equation, containing derivatives or other variables, that evaluations. The search process for the best equation form is
are informative for the research, as shown in Eq. 3. The divided into several steps. At first, the evolutionary operators
selection of the former objective function is the main object are utilized to suggest a set of terms {a01 c1 , ...a0k ck }, where
of this research. ci is the product of selected derivatives or modeled function,
while the products a0i are constructed from an allowed set of
C(L0j u) = #(L0j u) (3) case-specific functions. Next, in a loop over the terms of the
equation in a set, the following procedure is performed: the
The multi-objective optimization operates by a separate EA term is selected as a ”right part of the equation”, while other
based on the MOEA/DD algorithm, proposed by [15]. In study ones are used to approximating it: the sparsity is applied, and
[14], the optimization problem was stated for the tasks of then the coefficients are refined. The overview of the EA is
differential equations systems, but its approach is applicable presented in the form of Alg. 1.
for the case of a single equation discovery.
In each algorithm launch, a total number of terms in the set
IV. A LGORITHM D ESCRIPTION {a01 c1 , ...a0k ck } is fixed for mutation and crossover evolution-
ary operators. Sparsity operator 4 has to be applied to set zero
This section describes the general details of the proposed coefficients to the additional terms in the set, absent in the
partial differential equations discovery approach. The gen- equation, describing the process. Here, the sparsity-promoting
eralized scheme of the developed evolutionary algorithm is technique, based on LASSO regression, is employed, although
presented in the Fig. 1. For the purposes of the fitness other works indicate that the correlation can be used as an
evaluation both optimization-based partial differential equation indicator of term significance. In the operator description, ⇤
solver and equation discrepancy evaluation procedure can be and refer to the sparse vector of real-valued intermediate
used. equation weights. The intermediate status is guided by the
nature of sparse regression, where the features and predicted
values have. is the sparsity constant, which is case of the
equation discovery regulates the number of terms with non-
zero coefficients. F is the matrix of left part terms, evaluated
on domain grid, while Ftarget is the vector of right part term.

⇤ = arg min kFk Ftarget,k k22 + k k1 (4)

The operation of sparse regression requires calculated li-


braries of the predictors. Thus the algorithm must have access
to the values of all possible equation factors in the grid, placed
in the studied domain. While the values of functions a0i (t, x)
can be calculated from the coordinates of the grid points, the
evaluation of derivatives has to be performed in the other way
to increase the overall speed of the algorithm.
Fig. 1. Scheme of the evolutionary algorithm of partial differential equation
derivation. Differentiation prepossessing method phase includes the
stage of the input data variable approximated by the fully-
connected artificial neural network. Then the outputs of ANN
A. Evolutionary algorithm for specific points are used in the finite-difference scheme to
The proposed algorithm’s main objective is to discover a evaluate the derivatives. The main advantage of using an inter-
Pareto-optimal set of equations regarding objective functions, mediate artificial neural network between data assimilation and
describing various aspects of equation quality. The algorithm derivative calculation is its ability to filter out the input noise.
uses hyperparameters vectors of the single equation discov- Finite-difference differentiation has the property of increasing
ery algorithm for multi-objective optimization. The previous the magnitudes of noise. Thus to obtain reasonable values of
works have shown that the evolutionary algorithm, performing the derivatives, the data has to be smoothed beforehand. Other
the search of an equation, can converge to a single solution preprocessing approaches include kernel (Gaussian) filtering
or several equivalent equations if provided with enough it- and fitting polynomials for the analytical differentiation, as
erations and correctly selected evolutionary operators. These presented in Fig. 2.
202

Data: set of tokens T = {T0 , T1 , ... Tn }, where T0


are derivatives and T1 , ... Tn are user-specified
functions, sparsity constant lambda
Result: partial differential equations
Randomly generate initial population of candidate
equations from tokens from set T ;
for individual in population do
right part idx = 0;
max f itness val = inf ;
for target idx = 0 to terms number do
Apply sparse (LASSO) regression to find
intermediate coefficients ;
Apply linear regression to find correct
Fig. 2. Scheme of the possible preprocessing phases of the evolutionary
algorithm of PDE discovery. The alternative ways indicate alternative steps coefficients of the equations & get fitness
of the algorithm. The specific choice is dependent on prevalence of noise in value fit val;
data, time restraints, etc. if f it val > max f itness val then
max f itness val = f it val;
right part idx = target idx;
The final element of the proposed algorithm of equation else
discovery is the calculation of real-valued coefficients between pass
the terms, selected by the sparsity operator: {a0i ci | ⇤i 6= for epoch = 1 to epoch number do
0}. For these purposes, linear regression is employed, so the population.sort();
coefficients are obtained. Remove worst solutions to maintain population
During the initialization of the evolutionary algorithm, a size;
random initial population of equations is constructed with Tournament selection of parents for recombination;
derivatives of the modeled function and a set of additional Apply recombination and mutation operators;
case-specific functions selected by the researcher. While oper- for new individual in offsprings do
ating with the EA, the ”phenotype” of an individual stands Use LASSO operator and linear regression to
for the equation it represents, while the ”genotype” under find the best partition into left & right parts
the current encoding paradigm is defined with the string of and calculate fitness (as above);
objects representing equation terms. These terms by them- Add new solutions into population;
Algorithm 1: The pseudo-code of a single equation dis-
selves consist of a string of factors. Some additional restric-
covery process
tions are imposed on the constructed terms to avoid some
undesired behavior of the algorithm. For example, all terms
must have the modeled function, or another function, specified
X
as meaningful for the model. In other cases, function type- L= a⇤i (t, x)bi ci ! min (5)
specific restrictions are necessary, e.g., the higher powers of i
⇤ai ;ci
trigonometric functions must be avoided to prevent trigono-
The objective function, representing the quality of the
metric identity ”discovery” that obviously will have lower zero
proposed model, is evaluated as the L2-norm of discrepancy of
discrepancies than the other candidates and will be preferred
the right part of candidate equation term approximation with
by the algorithm.
the left part ones on the domain grid points. The overview of
During the search for the optimal structure of PDE, the the function is portrayed in Eq. 6
evolutionary operators of mutation and crossover are applied
to the population, as shown on the scheme of the algorithm.
The elitism is introduced to preserve the quality of the best ff itness = (||L||2 ) 1
X
candidate solution. The scheme of evolutionary operators is = (|| (a⇤i (t, x)bi ci ) a⇤i rhs (t, x)ci rhs ||2 )
1 (6)
presented in Fig. 3. i6=i rhs

While the equation discovery process, guided by this ob-


B. Equation discrepancy evaluation jective function, on noiseless data can converge to the desired
candidate solution, several issues may arise in other cases.
The specifics of the task pose an unusual problem: the First of all, the correct structure of the equation may not be
question of selecting an objective function, which allows faster optimal from the sense of the introduced optimization metrics.
and more reliable convergence of the evolutionary algorithm. An example of such an occasion can be found during the
Previously, the only viable approach was the evaluation of process of heat equation discovery Eq. 7.
the quality of equation term approximation, i.e., solving the
optimization problem, stated on the Eq. 5. ut = r(↵ru) (7)
203

Fig. 3. Scheme of the implemented evolutionary algorithm operators for tasks of partial differential equation discovery. As presented on the left, mutation
operator works by altering the existing solutions, while crossover operator, portrayed on the right, combines terms from selected equations.

Due to the properties of the equation, the second time (


derivative of its solution is close to zero across the domain. Lu(t, x) = f ;
(9)
On the other hand, the second derivative in the extended heat bu(t, x) = g, (t, x) 2
equation has a physical meaning in some physics applications.
The multi-objective optimization algorithm tends to converge To simplify the discovery process, we state only Dirichlet-
to the candidate utt = 0, which has low modeling error on type boundary conditions. While the choice of boundary condi-
the input data, and complexity of 1 term, thus alone forming tions not being a severe issue for the equations of up to second
Pareto non-dominated set. However, the predictive qualities of order, where the problem statement does not diverge from
this model are low, and integration of the equation will not the conventional ones, this approach may face difficulties on
achieve the result. Such cases should be handled manually. equations of higher orders. As a temporary way of providing
a PDE-solving algorithm with sufficient conditions, the field
C. Solver-based approach values in additional areas inside the domain are utilized. For
Another approach to evaluate the fitness function of a example, if the equation has third order, the algorithm is
partial differential equation is defined by using an automated provided with values on boundaries and in the center of the
technique of PDE solution. In contrast to the conventional domain.
algorithms of numerical differential equations solution (finite- The equation solution task is reduced the optimization
difference or spectral methods), we should process an arbitrary problem, stated in Eq. 10, where || · ||i and || · ||j are norms
candidate proposed by the evolutionary algorithm. of arbitrary and not necessarily same orders i and j, and is
The quality of the evolutionary algorithm individual is a constant.
defined with the fitness function, stated in Eq. 8, where u
represents the data, and ue denotes the ANN approximation of
the equation solution. (||Le
u(t, x) f ||i + ||be
u(t, x) g||) ! min (10)
e
u

1 Due to numerical limitations, the differential operator L


ff itness = (||e
u u||2 ) (8)
is substituted by the approximate one L. In a generalized
In line with the objective for the fitness function evaluation, approach to the PDE solution, the boundary operator b will be
the equation solution process is performed on the fixed mesh substituted to the approximate operator b. However, as it was
(ti , xi ) in the domain ⌦, corresponding with the data points. stated earlier, the equation discovery process requires Dirichlet
In most problems, the mesh is uniform, but an arbitrary boundary conditions, the definition of which does not cause
discretization can also be selected. approximation errors. Calculations of partial derivatives for the
The PDE-solving algorithm requires correctly (at least operators are done with the finite-difference scheme Eq. 11,
“domain-averaged” correctly) posed differential operator and where for example, the first time derivative is considered. t
initial/boundary conditions, matching with the type of passed denotes the time step of the domain grid.
boundary problem, expressed in Eq. 9, where for the modelled
function u(t, x), defined in domain (t, x) 2 ⌦ ⇢ Rk+1 , where @eu(t, x) e(t + t, x) u
u e(t t, x)
= (11)
k is the number of spatial dimensions. In the studied examples, @t 2⇤ t
the cases of two dimensions (time and space) are analyzed, In the stated problem the task is the selection of function
but there are no strict limitations on the dimensionality. e(x, x), matching the minimum value of functional on Eq. 10.
u
L and b are correspondingly arbitrary (possibly, non-linear) For these purposes the parameterized function u e(t, x, ⇥) :
differential and boundary operators, with the latter defined on Rk+1 ! R, where ⇥ = (✓1 , ..., ✓n params ) is the parameter
the boundary . vector for this specific function type, is chosen. Generally,
204

the class of parameterized function can be arbitrary. The


optimization problem can be posed as stated in the Eq. 12. ut + 6uux + uxxx = f (x, t) (13)

Following forcing, initial and boundary conditions as shown


(||Le
u(t, x, ⇥) f ||i + ||be
u(t, x, ⇥) g||) ! min (12)
⇥ in Eq. 14 were applied.
In this research, fully-connected artificial neural network is
selected to represent the function ue(t, x, ⇥). The question of f (x, t) = cos t sin x
selecting the best architecture of the ANN to represent the u(x, 0) = 0
PDE solution is not of interest in this work. The search for [uxx + 2ux + u] =0
x=0 (14)
parameter vector ⇥ is held by the training process of the
[2uxx + ux + 3u] =0
neural network. The details of the equation solver algorithms x=1
are presented in the Alg. 2 [5ux + 5u] =0
x=1

Data: Encoded equation and boundary conditions, The data for the numerical solution was obtained using the
initial NN model model Wolfram Mathematica 12.3 software to avoid any numerical
Result: Trained neural network model that errors and shown in Fig. 4.
approximates the solution
Compute model Sobolev space norm min norm;
for NN in cache do
Train model to repeat NN output;
Apply differential operator to trained model;
Compute Sobolev space norm norm curr if
norm curr < min norm then
model=trained model ;
min norm = norm curr
else
pass
while patience < threshold do
Apply differential operator to trained model;
Compute Sobolev space norm norm ;
if norm oscilates near the same value then
patience = patience + 1
if norm is not improved in improving patience
steps then
patience = patience + 1
Gradient descent step for model with respect to Fig. 4. Visualization of the Korteweg-de Vries equation solution, used during
norm; the experiments
Algorithm 2: The pseudo-code of an equation solver
algorithm While, as it was stated earlier, the convergence on noiseless
data was proved to be rather robust, the additional Gaussian
noise (✏ ⇠ N (µ = 0; = n ⇤ ||u(t)||, n = 0.1, 0.2, ..., 0.9))
V. VALIDATION was added into the data for more detailed analysis of algorithm
The algorithm was tested on the synthetic data to evaluate performance. To assess the changes in algorithm behavior, the
the effects of the newly introduced PDE solver-based fitness success rate, manifesting in the fraction of algorithm launches,
function evaluation technique. The necessity of using synthetic when it converges to the correct equation (for one point on
data in the experiment is created by the fact, that for the the Pareto frontier), was utilized. The motivation behind the
validation purposes we need to know the true structure of selection of this metric was driven by the fact that it is
the equation. With this approach the performance of the consistent even for launches with different fitness function
algorithm on the experimental data can be evaluated with the evaluation approaches. The improvements were made only
success rate: fraction of independent launches, that result in in the selection operator, so no significant changes in the
the discovery of correct equations. By the correct results we coefficient calculation quality were made. For each of the noise
denote the equations with correct structure (i.e. set of terms), levels, 20 independent algorithm launches were commenced.
and with coefficients, that insignificantly (up to 5%) deviating The examples took forms of Pareto-frontiers, as presented in
form the ground truth. For these purposes, the solution of the Fig. 6
Korteweg-de Vries equation (Eq. 13) added was utilized. The From the success rate results, presented on the graph Fig. 5,
equation was solved on a grid of 31 ⇥ 31 points with the the interpretation can be made that on high-quality data, the
dimension t representing time and x - space. EA performance with both approaches tends to be similar.
205

at increasing algorithm applicability to creating data-driven


models for dynamical systems. The objective function is
evaluated as the norm of PDE solution deviations from the
expected input values on some grid in the modeled domain.
The optimization-based method is employed to solve the
differential equation proposed by the EA. In accordance with
the specified differential and boundary operators, which define
the boundary problem, representing the dynamical system, it
can detect the parameters of function - solution of the equation.
The novel approach achieved better performance on artifi-
cially noised synthetic data, indicating that the evolutionary
algorithm can model natural processes and dynamical systems
based on the measurements datasets.
The direction of the future work on the related topic is
Fig. 5. Dependency of the evolutionary algorithm success rate in response guided by the high computational costs of the proposed
to the noise in data. In the experiments, for each magnitude factor n the
Gaussian noise with the standard deviation of n ⇤ ||u(t)| was added to the
approach. Its performance may be refined, and the program
solution of Korteweg-de Vries equation. realization may be optimized To increase the viability of
applying the evolutionary algorithm with PDE solver as the
fitness function evaluation tool to practical tasks. Another
promising approach is the application of techniques that reduce
the total number of required PDE solving launches by making
guesses of new candidate qualities compared to the previously
processed ones.
DATA AND CODE AVALIABILITY
The numerical solution data and the Python code that
partially reproduce the experiments are available at the GitHub
repository 1 .
ACKNOWLEDGEMENTS
This work was supported by the Analytical Center
for the Government of the Russian Federation (IGK
Fig. 6. An example of Pareto frontier, obtained from the multiobjective 000000D730321P5Q0002), agreement No. 70-2021-00141.
optimization.
R EFERENCES
[1] Jun He, Tianshi Chen, and Xin Yao, “On the easiest and hardest fitness
When the algorithm is applied to lesser quality data contain- functions,” IEEE Transactions on evolutionary computation, vol. 19, no.
ing significant noise, the discrepancy-based method tends to 2, pp. 295–305, 2014.
converge to incorrect structures, driven by high order deriva- [2] Schmidt Michael and Hod Lipson, “Distilling free-form natural laws
from experimental data,” Science, vol. 5923, pp. 81–85, 2019.
tives approximation errors. The robustness of the solver-based [3] H. Cao, L. Kang, Y. Chen, et al., “Evolutionary modeling of systems
approach allows it to have a considerably higher operational of ordinary differential equations with genetic programming,” Genetic
threshold, which makes. This quality is vital for the algorithm Programming and Evolvable Machines, vol. 1, pp. 309–337, 2000.
[4] Sébastien Gaucel, Maarten Keijzer, Evelyne Lutton, and Alberto
applications to real-world problems, where the significant TONDA, “Learning dynamical systems using standard symbolic regres-
noise related to the measurement process is unavoidable. sion,” in 17. European Conference EuroGP 2014, Grenade, Spain, Apr.
One of the issues linked with the use of the algorithm, 2014, vol. 8599 of Lecture Notes in Computer Science, p. np, Springer,
Chapitre 3.
solving partial differential equations to evaluate the quality of [5] H. Schaeffer, R. Caflisch, C. D. Hauck, and S. Osher, “Learning
the proposed candidate, is the significant increase in required partial differential equations via data discovery and sparse optimiza-
computational operations. Previously, to obtain the fitness tion,” Proceedings of the Royal Society A: Mathematical, Physical and
Engineering Science, 2017.
function value, a single L2 norm of the matrix had to be [6] Hayden Schaeffer, “Learning partial differential equations via data
calculated, while with the solver introduction, the algorithm discovery and sparse optimization,” Proc. R. Soc. A, vol. 473, no. 2197,
has to perform additional ANN training procedures during pp. 20160446, 2017.
[7] Samuel H. Rudy, Alessandro Alla, Steven L. Brunton, and J. Nathan
every EA epoch. Kutz, “Data-driven identification of parametric partial differential
equations,” SIAM Journal on Applied Dynamical Systems, vol. 18, no.
VI. C ONCLUSION 2, pp. 643–660, 2019.
[8] Linan Zhang and H. Schaeffer, “On the convergence of the sindy
This paper proposes a novel candidate solutions quality algorithm,” Multiscale Model. Simul.,, vol. 17(3), pp. 948–972, 2019.
evaluation method for the evolutionary algorithm of partial
differential equation discovery. The improvements were aimed 1 https://github.com/ITMO-NSS-team/EPDE
206

[9] M Raissi, P Perdikaris, and GE Karniadakis, “Physics informed deep


learning (part ii): Data-driven discovery of nonlinear partial differential
equations,” arXiv preprint arXiv:1711.10566, 2017.
[10] Zichao Long, Yiping Lu, and Bin Dong, “Pde-net 2.0: Learning pdes
from data with a numeric-symbolic hybrid deep network,” Journal of
Computational Physics, vol. 399, pp. 108925, 2019.
[11] Mikhail Maslyaev, Alexander Hvatov, and Anna V Kalyuzhnaya, “Par-
tial differential equations discovery with epde framework: application for
real and synthetic data,” Journal of Computational Science, p. 101345,
2021.
[12] Hao Xu, Haibin Chang, and Dongxiao Zhang, “Dlga-pde: Discovery of
pdes with incomplete candidate library via combination of deep learning
and genetic algorithm,” Journal of Computational Physics, vol. 418, pp.
109584, 2020.
[13] Lu Lu, Xuhui Meng, Zhiping Mao, and George Em Karniadakis,
“Deepxde: A deep learning library for solving differential equations,”
SIAM Review, vol. 63, no. 1, pp. 208–228, 2021.
[14] Mikhail Maslyaev and Alexander Hvatov, “Multi-objective discovery of
pde systems using evolutionary approach,” in 2021 IEEE Congress on
Evolutionary Computation (CEC), 2021, pp. 596–603.
[15] Ke Li, Kalyanmoy Deb, Qingfu Zhang, and Sam Kwong, “An evolu-
tionary many-objective optimization algorithm based on dominance and
decomposition,” IEEE Transactions on Evolutionary Computation, vol.
19, no. 5, pp. 694–716, 2014.
207

entropy
Article
Towards Generative Design of Computationally Efficient
Mathematical Models with Evolutionary Learning
Anna V. Kalyuzhnaya *, Nikolay O. Nikitin, Alexander Hvatov, Mikhail Maslyaev, Mikhail Yachmenkov and
Alexander Boukhanovsky

Nature Systems Simulation Lab, National Center for Cognitive Research, ITMO University, 49 Kronverksky Pr.,
197101 St. Petersburg, Russia; nnikitin@itmo.ru (N.O.N.); alex_hvatov@itmo.ru (A.H.);
mikemaslyaev@itmo.ru (M.M.); mmiachmenkov@itmo.ru (M.Y.); boukhanovsky@mail.ifmo.ru (A.B.)
* Correspondence: anna.kalyuzhnaya@itmo.ru

Abstract: In this paper, we describe the concept of generative design approach applied to the
automated evolutionary learning of mathematical models in a computationally efficient way. To
formalize the problems of models’ design and co-design, the generalized formulation of the modeling
workflow is proposed. A parallelized evolutionary learning approach for the identification of
model structure is described for the equation-based model and composite machine learning models.
Moreover, the involvement of the performance models in the design process is analyzed. A set
of experiments with various models and computational resources is conducted to verify different
aspects of the proposed approach.

Keywords: generative design; automated learning; evolutionary learning; co-design; genetic programming

!"#!$%&'(!
!"#$%&'

Citation: Kalyuzhnaya, A.V.; 1. Introduction


Nikitin, N.O.; Hvatov, A.;
Nowadays, data-driven modeling is a very popular concept, first of all because of
Maslyaev, M.; Yachmenkov, M.;
many examples of the successful application for a wide range of tasks where we have data
Boukhanovsky, A. Towards
samples which are sufficient for model training. However, originally the term “modeling”
Generative Design of
assumes a wider meaning than just identifying numerical coefficients in equations. One
Computationally Efficient
may say that modeling is an art of creation of mathematical (in the context) models that
Mathematical Models with
Evolutionary Learning. Entropy 2021,
describe processes, events, and systems with mathematical notation. And current successes
23, 28.
of artificial intelligence (AI) give the opportunity to come closer to the solution of the task
https://dx.doi.org/10.3390/e23010028 of mathematical modeling in this original formulation.
For this purpose we may use an approach of generative design that assumes open-
Received: 9 November 2020 ended automatic synthesis of new digital objects or digital reflections of material objects
Accepted: 24 December 2020 which have desired properties and are aligned with possible restrictions. Open-ended
Published: 27 December 2020 evolution is a term that assumes ongoing generation of novelty as new adaptations of spec-
imens, new entities and evolution of the evolvability itself [1]. We assume that new objects
Publisher’s Note: MDPI stays neu- are objects with essentially new features that appeared during the adaptation process and
tral with regard to jurisdictional claims that can’t be obtained with simple tuning or recombination of initially known parameters.
in published maps and institutional Other words, it is an approach that aims of algorithmic “growing” of a population of new
affiliations. objects when each of them is aligned with restrictions and have desired properties, to
some extent. However, only the objects which could maximize the measure of fitness will
be used for their intended purpose. The generative design is a well-known concept for
Copyright: © 2020 by the authors. Li-
creation of digital twins of material objects [2]. The same idea can be applied to mathe-
censee MDPI, Basel, Switzerland. This
matical models [3]. Indeed, it is known that we may grow mathematical expressions that
article is an open access article distributed
approximate some initial data with a symbolic (usually polynomial) regression approach.
under the terms and conditions of the However, if we look at mathematical expressions in a wider perspective we may admit
Creative Commons Attribution (CC BY) that expressions could be different even much more complicated. For example, we may try
license (https://creativecommons.org/ to apply this approach to the problem of searching for an equation of mathematical physics
licenses/by/4.0/). that is able to describe observed phenomena. Or, we may want to create in an automated

Entropy 2021, 23, 28. https://dx.doi.org/10.3390/e23010028 https://www.mdpi.com/journal/entropy


208

Entropy 2021, 23, 28 2 of 26

way a complicated data-driven model that consists of many single models and feature
processing stages. Tasks in both examples can be formalized as the generative design of
computer models.
Both of cases (model as mathematical equation and complicated data-driven models)
have their own spheres of application, but they also can be joined as composite models. In
machine learning the composite model case often is described in terms of the multi-model
data-driven pipelines. If a single data-driven model cannot provide appropriate results,
various ensembling techniques like stacking or blending are applied [4]. To achieve better
quality, complex modeling pipelines can be used, that include different pre-processing
stages and can contain several types of models. A generalization of ensembling approaches
is the composite model concept [5]. A composite model has a heterogeneous structure, so
it can include models of different nature: machine learning (ML), equation-based, etc. [6].
A design of a composite model can be represented from an automated ML (AutoML)
perspective that may use a genetic algorithm for learning the structure. The evolutionary
learning approach seems to be a natural and justified choice because of several reasons.
First of all, the idea of generative design refers to the possibility of controlled open-ended
evolution under a set of restrictions. After that, genetic algorithms give flexible opportuni-
ties for treating mixed problems with combinatorial and real parts of a chromosome.
However, the design of the composite model may depend on different factors: the
desired modeling quality, computational constraints, time limits, interpreting ability re-
quirements, etc. It raises the problem of co-design [7] of the automatically generated
composite models with the specific environment. Generative co-design is an approach
which allows to synthesize jointly a set (mostly a pair) of objects that will be compatible
with each other. In context of this article these are mathematical models and computational
infrastructure. The conceptual difference between the generative design (that builds the
model on a basis of dataset only) and the generative co-design (that takes into account both
data and infrastructure) is illustrated in Figure 1. The structure of composite models can be
very complex, so it is complicated to construct the models in an expert way. For this reason,
different optimization techniques are used for the structural learning of the model. Usually,
the objective function for optimization is aimed to minimize the error of the predictions
obtained from the candidate model [8].

Model design
Graph Gene c
Input Quality measures
structure programming
data Output
Func onal Structural
complexity Machine learning data
blocks
measures
Computa onal
Infrastructure Quan ta ve Performance Hyperparameter
parameters measures op misa on

Genotype Phenotype Methods

Genera ve co-design

Genera ve design

Figure 1. The description of the generative co-design concept: the different aspects of the model design (genotype,
phenotype, and the identification methods); the pipeline of the data-driven modeling; the difference between classical
design approach and co-design approach.
209

Entropy 2021, 23, 28 3 of 26

The paper is organized as follows. Section 2 describes the existing approaches to the
design of models. Section 3 provides the mathematical formulation for the model’s design
and co-design tasks and associated optimization problems. Section 4 described the actual
issues of generative co-design for the mathematical models. Section 5 provides the results
of experimental studies for different applications of generative design (composite models,
equation-based models, etc). The unsolved problems of co-design and potential future
works are discussed in Section 6. Section 7 provides the main conclusions.

2. Related Work
An extensive literature review shows many attempts for mathematical models design
in the different fields [9,10]. In particular, the methods of the automated model design
is highly valuable part of the various researches [11]. As an example, the equation-free
methods allow building the models that represent the multi-scale processes [12]. Another
example is building of the physical laws from data in form of function [13], ordinary
differential equations system [14], partial differential equations (PDE) [15]. The application
of the automated design of ML models or pipelines (which are algorithmicaly close notions)
are commonly named AutoML [8] although most of them work with models of fixed
structure, some give opportunity to automatically construct relatively simple the ML
structures. Convenient notation for such purpose is representation of a model as a directed
acyclic graph (DAG) [16]. Another example of popular AutoML tool for pipelines structure
optimization is TPOT [17].
To build the ML model effectively in the complicated high-performance environ-
ment [18], the properties of both algorithms and infrastructure should be taken into ac-
count. It especially important for the non-standard infrastructural setups: embedded [19],
distributed [20], heterogeneous [21] systems. Moreover, the adaptation of the model design
to the specific hardware is an actual problem for the deep learning models [22,23].
However, the application of co-design approaches [24] for the generative model
identification in the distributed or supercomputer environment [25,26] is still facing a
lot of issues. For example, the temporal characteristics of the designed models should
be known. The estimations of fitting and simulation time of the data-driven models can
be obtained in several ways. The first is the application of the analytical performance
models of the algorithm [27]. The identification of the analytical performance models can
be achieved using domain knowledge [28]. However, it can be impossible to build this
kind of model for the non-static heterogeneous environment. For this reason, the empirical
performance models (EPMs) are highly applicable to the different aspects of the generative
model design [29]. Moreover, the effective estimation of execution time is an important
problem for the generation of optimal computational schedule [30] or the mapping of
applications to the specific resources [31].
The execution of the complex resource-consuming algorithms in the specific infras-
tructure with limited resources raises the workflow scheduling problem [32]. It can be
solved using an evolutionary algorithm [33] or neural approaches [34].
It can be noted that the existing design and co-design approaches are mostly focused
on the specific application and do not consider the design for the different types of mathe-
matical models. In the paper, we propose the modified formulation of this problem that
allows applying the generative design and co-design approaches to the different tasks
and models.

3. Problem Statement
A problem of the generative design of mathematical models requires a model repre-
sentation as a flexible structure and appropriate optimization methods for maximizing a
measure of the quality of the designed model. To solve this optimization problem, different
approaches can be applied. The widely used approach is based on evolutionary algorithms
(e.g., genetic optimization implemented in TPOT [35] and DarwinML [16] frameworks) be-
cause it allows solving both exploration and exploitation tasks in a space of model structure
210

Entropy 2021, 23, 28 4 of 26

variants. The other optimization approaches like the random search of Bayesian optimiza-
tion also can be used, but the populational character of evolutionary methods makes it
possible to solve the generative problems in a multiobjective way and produce several
candidates model. Such formulation also can be successfully treated with the evolutionary
algorithms or hybrid ones that combine the use of evolutionary operators with additional
optimization procedures for increasing of robustness and acceleration of convergence. In
this section, we describe the problem of generative co-design of mathematical models and
computational resources in terms of the genetic programming approach.
A general statement for numerical simulation problem can be formulated as follows:

Y = H( M| Z ), (1)

where H is an operator of simulation with model M on data Z.


In the context of problem of computer model generative design, the model M should
have flexible structure that can evolve by changing (or adding/eliminating) the properties
of a set of atomic parts (“building blocks”). For such task, the model M can be described as
a graph (or more precisely as a DAG):

M = hS, E, { a1:| A| }i, (2)


D n oE
with edges E that denoted relations between nodes S, a1:| A| that characterize func-
n o
tional properties S of atomic blocks and set of their parameters a1:| A| .
In terms of evolutionary algorithms each specimen d p in population D of computer
model can be represented as a tuple that consists of phenotype Y, genotype M and fitness
function j( M ): ⌦ ↵
d p = Yp , M p , j M p , D = d p , p 2 [1 : | D |] . (3)
Genotype M should be mapped on plain vector as a multi-chromosome that consists of
three logical parts: functional properties, sets of their parameters, relations between blocks:
D E ⌧n o n o n o n o ⇥ ⇤
M p = S p , E p , { Ak } p = s1:|S p | , e1:| E p | , a1:|S p || A | , Ak = a1:| Ak | , k 2 1 : S p . (4)
p p k p k

The genotype is also illustrated in Figure 2.

func onal proper es


S1
A1 atomic block
set of parameters e1
S2 Sk
Individual A2 e2 Ak rela ons between blocks

S3 e3 e
A3 ... r

func ons parameters edges

S1 S2 S3 ... S|s | a11 ... a1|A |,1 ... a1,|S | ... a|A |,|S | e1 ... er
p 1 p p p

Figure 2. The structure of the genotype during evolutionary optimization: functional properties, set of parameters and
relations between atomic blocks.
211

Entropy 2021, 23, 28 5 of 26

An important property is that S p , A p , E p 6= const, what means varying over-


all size of chromosome (and its structure). Such property makes this approach is really
open-ended and consistent with idea of model evolution because it give an opportunity
to synthesize the models with truly new features instead of simple recombination and
optimization of existed ones. Technically open-endedness here refers to the ability of
generative design algorithms to expand or narrow a combinatorial search space in the
process of optimization with evolutionary operators. This leads to need of special real-
izations for crossover and mutation
D operators.
n o EAs the chromosome M p is a ordered set
with the structure fixed in a tuple S, a1:| A| , E it is necessary to preserve this structure
after crossover and mutation. That’s why these operators are written relative to the graph
structure and influence on the parts of chromosome that describe the node or a set of nodes
with associated edges (sub-graphs). We may ⇣ say
n thatoeach
⌘ single node can be described
as some function with parameters yk = f k x, a1:| A| . And mutation of function f k
k
performs symbolic changes in the mathematical expression that results in extension of
range of limits of initial genes.
So, the task of mathematical model generative design can be formulated as optimiza-
tion task:
pQ max ( M⇤ ) = max f Q M| I + , Tgen  tg , M = M p , (5)
M

where f Q is a fitness function that characterizes quality generated mathematical and pQ max
is a maximal value of fitness function, model M is a space of possible model structures,
I + - actual computational resources, Tgen is time for model structure optimization critical
threshold tg . In such formulation we try to design the model with the highest quality, but
we need to rely optimization to single configuration of computational resources. This factor
is a strong limitation for the idea of generative design because this idea assumes flexibility
of searched solution including the possibility to find the most appropriate for applied task
combination of model structure and computational resources. The concept was illustrated
on Figure 1.
Model and computational infrastructure co-design may be formulated as follows:

pmax ( M⇤ , I ⇤ ) = max F M, I | Tgen  tg , I = Iq , M = M p , (6)


M,I

where I is a set of infrastructure features, F is a vector fitness function that characterize a


trade off between a goodness of fit and computational intensity of model structure. Vector
function F consists of quality function f Q and time function f T that is negative for correct
maximization:
F ( M, I ) = f Q ( M, I ), f T ( M, I ) . (7)
The time function f T is a function that shows expected execution time of the model that is
being synthesized with generative design approach. As the model M is still in the process
of creation at the moment we want to estimate F, the values of f T may be defined by
performance models (e.g., Equation (9)). The example of the model selection from the
Pareto frontier on a basis of pmax and tc constraints is presented is Figure 3. It can be seen
that model M4 has the better quality but it does not satisfy the execution time constraint tc .
However, in most of cases correct reflection of infrastructure properties to model per-
formance is quite complicated task. In described case when we need, first, to generate the
model with appropriate quality and vital limitations for computation time, we have several
issues: (1) we may be not able to estimate model performance with respect to certain infras-
tructure in straight forward way and as a consequence we need performance models; (2) es-
timation of the dependency between model structure and computational resources reflects
only mean tendency due to number of simplifications in performance models and search for
minima on such uncertain estimations lead to unstable convergence to local minima. Due to
these issues the formulation of optimization for co-design on stage of model building may
be simplified to single criteria problem F M, I | Tgen  tg ⇡ F 0 M| TM  t, Tgen  tg
212

Entropy 2021, 23, 28 6 of 26

with change of direct usage of infrastructure features to estimated time of model execution
via performance models TM ⇡ T = f T ( M, I ):

p̂max ( M⇤ ) = max fˆQ M| TM  tc , Tgen  tg , (8)


M

where f Q is single criteria fitness function that characterize goodness of fit of model with
additional limitations for expected model execution time TM and estimated time for
structural optimization Tgen .

Figure 3. Pareto frontier obtained after the evolutionary learning of the composite model in the “quality-execution time”
subspace. The points referred as M1 - M4 represent the different solutions obtained during optimization. pmax and tc
represent quality and time constraints.

In the context of automated models building and their co-design with computational
resources, performance models (PM) should be formulated as a prediction of expected
execution time with the explicit approximation
n o of a number of operations as a function
of computer model properties S, a1:|S| and infrastructure I parameters. However, for
different computer models classes, there are different properties of performance models. In
the frame of this paper, we address the following classes of models: ML models, numerical
models (based on the numerical solution of symbolic equations), and composite models
(that may include both ML and numerical models).
For ML models PM can be formulated as follows:
" #
PM OMLi,it
TML ( Z, M) = maxi  + O ( I ), (9)
it
Vi ( I ) + Consti ( I )

where OML = OML( Z, M) is an approximate number of operations for data-driven model


with data volume Z and parametric model M, it—iterator for learning epoch, Vi ( I ) is for
performance of i0 th computational node in flops, Consti ( I ) is for constant overheads for
i0 th node in flops, O( I ) is for sequential part of model code.
213

Entropy 2021, 23, 28 7 of 26

D n oE
According to structure M = S, E, a1:|S| for data driven-model case, duple hS, Ei
characterize structural
n o features of models (e.g., activation functions and layers in neural
networks) and a1:|S| characterize hyper-parameters.
For numerical models PM can be formulated as follows:

PM ONi
TNum ( R, M) = maxi + O ( I ), (10)
Vi ( I ) + Consti ( I )

where ON = ON ( R, M)is an approximate number of operations for numerical model. In


distinction with ML models they are not required for learning epochs and do not have
strong dependency from volume of input data. Instead of this, there are internal features of
model M, but it is worth separately denote computational grid parameters R. They include
parameters of grid type, spatial and temporal resolution. Among the most influential
model parameters M there are type and order of equations, features of numerical solution
(e.g., numerical scheme, integration step, etc.).
For composite models PM total expected time is a sum of expected times for sequential
parts of model chain:

PM
OCi,j
TComp ( R, Z, M) = Â maxi Vi ( I ) + Consti ( I )
+ O ( I ), (11)
j

where expected time prediction for each sequential part is based on properties of appropri-
ate model class: ⇢
OML, i f model is ML
OC = . (12)
ON, i f model is numerical

4. Important Obstacles on the Way of Generative Co-Design Implementation


It may seem that the problem statement described above gives us a clear vision of an
evolutionary learning approach for generative design and co-design. However, several
subtle points should be highlighted. This section is devoted to a discussion of the most
interesting and challenging points (in the authors’ opinion) that affect the efficiency or
even the possibility of implementation the generative design (and co-design) approach for
growing new mathematical models.

Issue 1. Numerical Methods for Computation of Designed Arbitrary Function


Open-ended realization of automated symbolic model creation with a generative
design approach leads to the possibility of getting an unknown function as a resulted
model. On the one hand, it gives interesting perspectives to create the new approximations
of unknown laws. However, on the other hand, this possibility leads to the first conceptual
problem of the generative design of mathematical models and a serious stumbling block
on the way to implementing this idea. This problem is the need to calculate an arbitrary
function or get the numerical solution of an arbitrary equation.
The choice of the numerical method for a given problem (discovered algebraic, or-
dinary differential, partial differential equation equations) is the crucial point. In most
cases, the numerical method is designed to solve only several types of equations. When
the numerical method is applied to the problem type, where convergence theorem is not
proved, the result may not be considered as the solution.
As an example, solution of the partial difference equations using the finite difference
schemes. For brevity, we omit details and particular equations, the reader is referred
to [36] for details. The classical one-dimensional diffusion equation has different schemes,
in particular, explicit, implicit, Crank-Nicolson scheme. Every scheme has a different
approximation order and may lead to different solutions depending on the time-spatial
grid taken. If the Crank-Nicolson spatial derivative scheme is taken to solve another
equation, for example, the one-dimensional string equation, then the solution will also
214

Entropy 2021, 23, 28 8 of 26

depend on the time-spatial grid taken, however, in another manner. It leads to the general
problem that the particular finite schemes cannot be used for the general equation solution.
The second approach is to approximate a solution with a neural network, which
somewhat mimics the finite element method. The neural networks are known as universal
approximators. However, their utility for differential equations solution is still arguable.
The main problem is that the good approximation of the field is not necessary leads to
the good derivative approximation [37]. There is a lot of workarounds to approximate
derivatives together with the initial field, however, it is done with the loss of generality.
The possible promising solution is to combine optimization methods, local neural
network approximation, and classical approach [38]. However, there is still a lot of the
“white spots”, since the arbitrary equation means a strongly non-linear equation with
arbitrary boundary conditions. Such a generality cannot be achieved at the current time
and requires a significant differentiation, approximation, and numerical evaluation method
development. The illustration examples of the inverse problem solution are shown in
Section 5.1.

Issue 2. Effective Parallelization of Evolutionary Learning Algorithm


The procedure of generative design has high computation cost, thus effective algo-
rithm realization is highly demanded. Efficiency can be achieved primarily by parallelizing
the algorithm. As discussed generative algorithm is implemented on a base of the evolu-
tionary approach, so the first way is a computation of each specimen d p in a population in
a separate thread. Strictly speaking, it may be not only threads, but also separate computa-
tional nodes for clusters, but not to confuse computer nodes with nodes of a model graph
M p , here and further we will use the term “thread” in a wide sense. This way is the easiest
for implementation but will be effective only in the case of cheap computations of objective
function j( M ). D n oE
The second way is acceleration of each model M p on the level of its nodes S, a1:| A|
with possibility of logical parallelized. However, this way seems to be the most effective if
we have uniform (from the performance point of view) nodes of models M p and computa-
tional intensity appropriate for used infrastructure (in other words, each node should be
computed in a separate thread in acceptable time). Often for cases of composite models
and numerical models, this condition is becoming violated. Usually, the numerical model
is consists of differential equations that should be solved on large computational grids.
And composite models may include nodes that are significantly more computationally
expensive than others. All these features lead us to take into account possibility of par-
allelization of generative
D n algorithm
oE on several levels: (1) population level, (2) model M p
level, (3) each node S, a1:| A| level; and make an adaptation of algorithm with respect
to certain task.
Moreover, for the effective acceleration of the generative algorithm, we may take into
account that most of the new composite models are based on nodes that are repeated nu-
merously in the whole population. For such a case, we may provide storage for computed
nodes and use them as results of pre-build models. The illustration of an ineffective and an
effective parallelization setups described above is shown in the Figure 4.
The set of experiments that illustrates the problem raised in this issue and proposes
the possible solutions is presented in Section 5.2.2.

Issue 3. Co-Design of an Evolutionary Learning Algorithm and Computational Infrastructure


In the frame of this research, the problem of co-design appears not only for the question
of automatic creation of the computer model but also for the generative optimization
algorithm itself. In Equation (8) we described co-design of generated model regarding the
computational resources using estimation of model execution time TM . Separate problem
is adaptation of generative evolutionary algorithm regarding the computational resources
and specific traits of the certain task. In formulation Equation (8) it was only accounted
for by the restriction to the overall time Tgen for model generation. However, the task
215

Entropy 2021, 23, 28 9 of 26

can be formulated as search for generative evolutionary algorithm that is able to find
the best model structure M in limited time Tgen . This task can be solved by optimization
of hyper-parameters, evolutionary operators (and strategy of their usage) for generative
optimization algorithm and formulated as meta-optimization problem over a set of possible
algorithms U that are defined by a set of strategies B:
n o
U = {u( B)}, B = b1:| B| , b = h H, Ri, (13)

u⇤ = u(b⇤ ) = arg max F (u(b)| Tgen  tg ), (14)


b

where F is a meta-fitness function and each strategy b is defined by evolutionary op-


erators R and hyper-parameters H. Evolutionary operators also my be described as
hyper-parameters but here we subdivide them in separate entity R.

Figure 4. Setup that illustrates inefficiency of the parallel evolution implementation due to fitness function computation
complexity.

For the model’s generative design task, the most expensive step usually refers to the
evaluation of the fitness function value [6]. The calculation of the fitness function for the
individuals of the evolutionary algorithm can be parallelized in different ways that are
presented in Figure 5.

Fitness evaluation for the


population
Synchronous approach Asynchronous approach

Individuals Individuals

[ X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12 ] [ X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12 ]
manager

Unit 1 Unit 2 Unit 0

Unit 0 Unit 3
Unit 0 Unit 3
Unit 1 worker
worker

worker
Figure 5. Approaches to the parallel calculation of fitness function with the evolutionary learning algorithm: (a) syn-
chronously, each element of the population is processed at one node until all is processed (b) asynchronously, one of the
nodes controls the calculations in other nodes.
216

Entropy 2021, 23, 28 10 of 26

The described approaches can be used for the different variants of the computational
environment used for the generation of the models. The practical application of the
generated models with the complex structure almost always difficult because of the high
computation complexity of the numerical model-based simulations.
There are several groups of models that can be separated by the simulation pipeline
structure. For the data-driven model, the computational cost of the fitting (identification)
stage is higher than for the simulation stage. For the equation-based numerical models
with rigid structure, there is no implicit fitting stage, but the simulation can be very expen-
sive. In practice, different parallelization strategies can be applied to improve simulation
performance [39].
The set of experiments that provides the examples to the problem raised in this issue
can be seen in Section 5.3.

Issue 4. Computational Strategies for Identification of Graph M


D n oE
The problem of DAG M = S, E, a1:|S| identification has two sides. First of all, the
task of structural and parametric optimization of model M has exponential computational
complexity with the growth of nodes number. Even if the functional structure hS, Ei of the
composite model
n is already identified, there is a computationally expensive problem of
o
parameters a1:|S| (or hyperparameters in ML terms) tuning.
However, except
D for the
n computational
o E intensity, there is a problem of searching the op-

timal set of values S⇤ , E⇤ , a1:|S| in a space of high dimension (when chromosome has
great length from tens to hundreds of values). This leads to unstable results of optimization
algorithm because of the exponential growth of possible solutions in a combinatorial space
(some parameters could be continuous but they are discretized and generally problem may
be treated as combinatorial). One of the obvious ways for dealing with such a problem is
local dimensionality reduction (or segmentation of the whole search space). This could be
done with the application of various strategies. For example, we may simplify the task and
search with generative algorithm only functional parts, and parameters (hyperparameters)
may be optimized on the model execution stage (as discussed in Section 6). Such a way is
economically profitable but we will get a result with lower fitness. An alternative variant is
to introduce an approach for iterative segmentation of the domain space and greedy-like
search on each batch (Section 5.4).
Another point should be taken into account, the structure of DAG with directed
edges and ordered nodes (composite model with primary and secondary nodes) leads to
the necessity of introducing the sequential strategies for parameters tuning. Despite the
tuning can be performed simultaneously with the structural learning, there is a common
approach to apply it for the best candidates only [16]. Unlike the individual models tuning,
the tuning of the composite models with graph-based structure can be performed with
different strategies, that are represented in Figure 6.
Tun
a) b) in gs
tep
1 1 1

T T
M
Data

Data

1 2 1 2 M
T Tuning step 2

T
1 1 3
T ing
ste
p T
Tun

T - Tuning quality evalua on M - Modeling result


Figure 6. The different strategies of hyper-parameters tuning for the composite models: (a) individual tuning for each
atomic model (b) the tuning of the composite model that uses secondary models to evaluate the tuning quality for the
primary models.
217

Entropy 2021, 23, 28 11 of 26

The experiment that demonstrates the reduction of the search space for the composite
model design by the application of the modified hyperparameters tuning strategy for the
best individual is described in Section 5.4.

Issue 5. Estimation of PM for Designed Models


Analytic formulations of PM show expected execution time that is based on relation
between approximate number of operations and computational performance of certain
infrastructure configuration.
D n The
oEproblem is to estimate this relation for all pairs from model
structures M = S, E, a1:| A| and computational resources I = Iq with respect to
input data Z because we need to make estimations of OML, ON and OC (depending on the
class of models). Generally, there are two ways: (1) estimation of computational complexity
(in O notation) for each model M, (2) empirical performance model (EPM) estimation of
execution time for every specimen h M, I, Z i. The first option gives us theoretically proved
results, but this is hardly may be implemented in case of models’ generative design when
we have too many specimens h M, I, Z i. The second option is to make a set of experimental
studies for specimens h M, I, Z i execution time measurements. However, in this case, we
need to make a huge number of experiments before we start the algorithm of generative co-
design and the problem statement becomes meaningless. To avoid numerous experiments,
we may introduce estimation of EPM that consists of two steps. The first one is to estimate
relation between time TM and volume of data Z: TNum PM ( M, Z, I ) ⇡ T EPM ( Z | M, I ). To
Num
EPM
simplify identification of TNum ( Z | M, I ), we would like to approximate this with a linear
function with non-linear kernel y( Z ):

W
EPM
TNum ( Z|M, I ) = Â ww yw ( Z ), (15)
w =1

where W is a number of components of linear function. The second step is to use


EPM ( Z|M, I ) to estimate relation between execution time and infrastructure I:
value of TNum
EPM EPM ( I | M, Z ). For this purpose we should make even a raw estimation
TNum ( Z | M, I ) ! TNum
of number of operations OML, ON and OC.
On the example of EPM for numerical model (Equation (10)) we can make the follow-
ing assumptions:

O( I ) ⇡ 0, Consti ( I ) ⇡ 0, V = meani (Vi ( I )), ON = meani (ONi ), (16)


 
ONi ONi
maxi = meani , (17)
Vi ( I ) + Consti ( I ) Vi ( I ) + Consti ( I )
and get the following transformations for raw estimation of overall number of operations
nON with respect to n computational nodes:
PM
nON ( M, Z ) = nTNum ( M, Z, I )V ( I ), i 2 [1 : n]. (18)

It is worth nothing that the obvious way to improve accuracy of estimation nON is to
use for experimental setup resources with characteristics of computational performance
close to V = meani (Vi ( I )) and task partitioning close to ON = meani (ONi ). Getting the
estimation of nON and infrastructure parameters Vi ( I ), Consti ( I ), O( I ) we may go to raw
estimation: 
EPM ai nON ( M, Z )
TNum ( M, Z, I ) = maxi + O ( I ), (19)
Vi ( I ) + Consti ( I )
where ai is coefficient for model partitioning. Similar transformations could be made for
other models.
The experiments devoted to the identification of the empirical performance models
for both atomic and composite models are provided in Section 5.5.
218

Entropy 2021, 23, 28 12 of 26

5. Experimental Studies
The proposed approaches to the co-design of generative models cannot be recognized
as effective without experimental evaluation. To conduct the experiments, we constructed
the computational environment that includes and hybrid cluster and several multiprocessor
nodes that can be used to evaluate different benchmarks.
A set of experiments have been held with the algorithm of data-driven partial dif-
ferential equation discovery to analyze its performance with different task setups. All
experiments were conducted using the EPDE framework described in detail in [15].
The other set of experiments devoted to the automated design of the ML models
was conducted using the open-source Fedot framework (https://github.com/nccr-itmo/
FEDOT). The framework allows generating composite models using evolutionary ap-
proaches. The composite model generated by the framework can include different types
of models [6]. The following parameters of the genetic algorithm were used during the
experiments: maximum number of the generations in 20, number of the individuals in
each population is 32, probability of mutation, probability of mutation is 0.8, probability
of crossover is 0.8, maximum arity of the composite model is 4, maximum depth of the
composite model is 3. More detailed setup is described in [40].

5.1. Choice of the Model Evaluation Algorithm


The first group of experiments is connected with the Issue 1 that describes the different
aspects of numerical computation of designed models.
For example, the problem of data preprocessing for partial differential equations
models, represented by the calculation of derivatives of the input field, is of the top priority
for the correct operation of the algorithm: the incorrect selection of tools can lead to
the increasing magnitudes of the noise, present in the input data, or get high values of
numerical errors. The imprecise evaluation of equation factors can lead to cases, when the
wrong structure has lower equation discrepancy (the difference between the selected right
part term and the constructed left part) and, consequently, higher fitness values, than the
correct governing equation.
However, the versatility of the numerical differentiation adds the second criterion on
the board. The finite differences require a lot of expertise to choose and thus their automatic
use is restricted since the choice of the finite difference scheme is not a trivial task that
requires either a fine grid to reduce the error or choice of the particular scheme for the
given problem. Both ways require extended time.
Artificial neural networks (ANN), used to approximate the initial data field, are an
alternative to this approach, which can have a number of significant advantages. To get the
fields of derivatives, we utilize the automatic differentiation, that is based on the approach,
similar to the chain differentiation rule from the elementary calculus, and is able to combine
the evaluated values of derivatives of a function, comprising the neural network to get
the “correct” values of derivatives. In contrast to the previously used method of analytical
differentiation of polynomials, the automatic differentiation is able to get mixed derivatives.
From the performance point of view, the advantages of the artificial neural networks lie in
the area of ease of parallelization of tensor calculations and the use of graphical processing
units (GPU) for computation.
However, the task setup has a number of challenges in the approach to ANN training.
First of all, the analyzed function is observed on a grid, therefore, we can have a rather
limited set of training data. The interpolation approaches can alter the function, defining
the field, and the derivatives, in that case, will represent the structure of the interpolating
function. Next, the issue of the approximation quality remains unsolved. While the ANN
can decently approximate the function of one variable (which is useful for tasks of ordinary
differential equations discovery), on the multivariable problem statement the quality of the
approximation is relatively low. The example of approximation is presented in Figure 7.
In the conducted experiments [41] we have used the artificial neural network with the
following architecture: the ANN was comprised of 5 fully connected layers of 256, 512,
219

Entropy 2021, 23, 28 13 of 26

256, 128, 64 neurons with sigmoid activation function. As the input data, the values of the
solution function for a wave equation (utt = au xx ), solved with the implicit finite-difference
method, have been utilized. Due to the nature of the implemented solution method, the
function values were obtained on the uniform grid. The training of ANN was done for a
specified number of epochs (500 for the conducted experiments), when of the each epoch
the training batch is randomly selected as a proportion of all points (0.8 of the total number
of points). To obtain the derivatives, the automatic differentiation methods, implemented
in the Tensorflow package are applied to the trained neural network.

0.8

0.6

0.4
y

0.2

approximation by ANN
0.0 solution of the equation

0 1 2 3 4 5
x

(a) (b)
Figure 7. Comparison of the equation solution and its approximation by artificial neural networks (ANNs) for a time slice
(a) and heatmap of the approximation error (u approx utrue ) (b).

Even with the presented very good approximation of the original field, the first
derivatives (Figure 8) are obtained with decent quality and may serve as the building
blocks. However, it is seen that the derivative field is significantly biased.

automatic differentiation by ANN 0.6 automatic differentiation by ANN


0.0
polynomial differentiation polynomial differentiation

0.4
0.2

0.2
0.4
y

0.0
0.6

0.2
0.8

0.4
1.0

0 1 2 3 4 5 0 1 2 3 4 5
x x

(a) ut (b) u x
Figure 8. Comparison of derivatives obtained by polynomial differentiation and by symbolic regression for first time
derivative (a) first spatial derivatives (b) for a time slice (t = 50).

Further differentiation amplifies the error. The higher-order derivatives shown in


Figure 9 cannot be used as the building blocks of the model and do not represent the
derivatives of the initial data field.
220

Entropy 2021, 23, 28 14 of 26

1.0 automatic differentiation by ANN


2.0
polynomial differentiation
0.5
1.5

0.0
1.0
0.5
automatic differentiation by ANN
y

y
polynomial differentiation 0.5
1.0

1.5 0.0

2.0 0.5

2.5
1.0
0 1 2 3 4 5 0 1 2 3 4 5
x x

(a) utt (b) u xx


Figure 9. Comparison of derivatives obtained by polynomial differentiation and by symbolic regression for second time
derivative (a) second spatial derivatives (b) for a time slice (t = 50).

Both of the implemented differentiation techniques are affected by numerical errors,


inevitable in the machine calculations, and contain errors, linked to the limitations of the
method (for example, approximation errors). To evaluate the influence of the errors on
the discovered equation structure, the experiments were conducted on simple ordinary
differential Equation (ODE) (20) with solution function (21).

dx
L(t) = x (t) sin t + cos t = 1, (20)
dt
x (t) = sin t + C cos t. (21)
We have tried to rediscover the equation, based on data, obtained via analytical
differentiation of function (21), application of polynomial differentiation, and with the
derivative, calculated by automatic differentiation of fitted neural network. The series of
function values and the derivatives are presented in Figure 10. Here, we can see, that the
proposed ANN can decently approximate data; the analytical & polynomial differentiation
obtains similar fields, while automatic differentiation algorithm may result in insignificant
errors. 10 independent runs of the equation discovery algorithm have been performed
for each derivative calculation method, and the results with the lowest errors have been
compared. For the quality metric, the Mean Square Error of the vector, representing
the discrepancy of the function x̄ (t), which is the solution of discovered on data-driven
equation M(t) = 0 with aim of | M(t)| ! min, evaluated on the nodes of the grid was used.
While all of the runs resulted in the successful discovery of governing equations, the is-
sues with such equations are in the area of function parameters detection and calculating the
correct coefficients of the equation. The best result was achieved on the data from analytical
differentiation: MSE = 1.452 · 10 4 . The polynomial differentiation got the similar quality
MSE = 1.549 · 10 4 , while the automatic differentiation achieved MSE = 3.236 · 10 4 . It
could be concluded, that in the case of first-order equations, the error of the differentiation
has less order than all other errors and thus the fastest method for the given problem may
be used. However, in the PDE case, it is complicated to use only first-order derivatives,
whereas arbitrary ordinary differential equations may be represented as the system of the
first-order equations.
221

Entropy 2021, 23, 28 15 of 26

Figure 10. The solution of ODE from Equation (20), its approximation by neural network, and derivatives calculated by
analytic, polynomial and automatic differentiation.

5.2. Computationally Intensive Function Parallelization


5.2.1. Parallelization of Generative Algorithm for PDE Discovery
The first experiment devoted to the parallelization of the atomic models’ computation
using partial differential equations discovery case as an example. As shown in Figure 4,
the parallelization of the evolutionary algorithm in some cases does not give significant
speed improvement. In cases where atomic models are computationally expensive, it is
expedient to try to reduce every node computation as much as possible.
The experiment [42] was dedicated to the selection of an optimal method of com-
putational grid domain handling. It had been previously proven, that the conventional
approach when we process the entire domain at once, was able to correctly discover the
governing equation. However, with the increasing size of the domain, the calculations may
take longer times. In this case parallelization of the evolutionary algorithm does not give
speed-up on a given computational resources configuration, since the computation of a
fitness function of a single gene takes the whole computational capacity.
To solve this issue, we have proposed a method of domain division into a set of spatial
subdomains to reduce the computational complexity of a single gene. For each of these
subdomains, the structure of the model in form of the differential equation is discovered,
and the results are compared and combined, if the equation structures are similar: with
insignificant differences in coefficients or the presence of terms with higher orders of
smallness. The main algorithm for the subdomains is processed in a parallel manner due
to the isolated method of domain processing: we do not examine any connections between
domains until the final structure of the subdomains’ models is obtained.
The experiments to analyze the algorithm performance were conducted on the syn-
thetic data: by defining the presence of a single governing equation, we exclude the issue
of the existence of multiple underlying processes, described by different equations, in
different parts of the studied domain. So, we have selected a solution of the wave equation
with two spatial dimensions in Equation (22) for a square area, which was processed as
one domain, and after that, into small fractions of subdomains.

∂2 u ∂2 u ∂2 u
2
= 2 + 2. (22)
∂t ∂x ∂y
222

Entropy 2021, 23, 28 16 of 26

However, that division has its downsides: smaller domains have less data, therefore,
the disturbances (noise) in individual point will have a higher impact on the results.
Furthermore, in realistic scenarios, the risks of deriving an equation, that describes a local
process, increases with the decrease in domain size. The Pareto front, indicating the trade-
off between the equation discrepancy and the time efficiency, could be utilized to find
the parsimonious setup of the experiment. On the noiseless data (we assume, that the
derivatives are calculated without the numerical error) even the data from a single point
will correctly represent the equation. Therefore, the experiments must be held on the data
with low, but significant noise levels.
We have conducted the experiments with the partition of data (Figure 11), containing
80 ⇥ 80 ⇥ 80 values, divided by spatial axes in fractions from the set {1, 10}. The experi-
ments were held with 10 independent runs on each of the setup (size of input data (number
of subdomains, into which the domain was divided, and sparsity constant, which affects
the number of terms of the equation).

100

Relative computation time, %


80

60

40

20

0 20 40 60 80 100
Number of subdomains

(a) (b)
Figure 11. The results of the experiments on the divided domains. (a) evaluations of discovered equation quality for
different division fractions along each axis (2⇥ division represents division of domain into 4 square parts); (b) domain
processing time (relative to the processing of entire domain) for subdomain number.

The results of the test, presented in Figure 11, give insight into the consequences of
the processing domain by parts. It can be noticed, that with the split of data into smaller
portions, the qualities of the equations decrease due to the “overfitting” to the local noise.
However, in this case, due to higher numerical errors near the boundaries of the studied
domain, the base equation, derived from the full data, has its own errors. By dividing
the area into smaller subdomains, we allow some of the equations to be trained on data
with lower numerical errors and, therefore, have higher quality. The results, presented
in the Figure 11b are obtained only for the iterations of the evolutionary algorithm of the
equation discovery and do not represent the differences in time for other stages, such as
preprocessing, or further modeling of the process.
We can conclude that the technique of separating the domain into lesser parts and pro-
cessing them individually can be beneficial both for achieving speedup via parallelization
of the calculations and avoiding equations, derived from the high error zones. In this case,
such errors were primarily numerical, but in realistic applications, they can be attributed to
the faulty measurements or prevalence of a different process in a local area.

5.2.2. Reducing of the Computational Complexity of Composite Models


To execute the next set of experiments, we used the Fedot framework to build the
composite ML models for classification and regression problems. The different open
223

Entropy 2021, 23, 28 17 of 26

datasets were used as benchmarks that allow to analyze the efficiency of the generative
design in various situations.
To improve the performance of the model building (this issue was noted in Issue 2),
different approaches can be applied. First of all, caching techniques can be used. The cache
can be represented as a dictionary with the topological description of the model position in
the graph as a key and a fitted model as a value. Moreover, the fitted data preprocessor
can be saved in cache together with the model. The common structure of the cache is
represented in Figure 12.

Described by SID
Computa onally Shared storage for the ed models
expensive
(structural ID)
Cache dic onary
Iden ca on
Hyperparam. ( ng) SID 1 Cached model 1
Data-driven
model ... ...
Input data Predic on
SID N Cached model N
Depends on
Fast and simple
underlying chain Methods: append, clear, get
structure

Figure 12. The structure of the multi-chain shared cache for the fitted composite models.

The results of the experiments with a different implementation of cache are described
in Figure 13.

4000 Real evals (local cache misses)


Real evals (shared cache misses)
Number of fits

3000 Requested evals

2000

1000

0
0 1 2 3 4 5 6 7 8 9 10
Generations

Figure 13. The total number model fit requests and the actually executed fits (cache misses) for the shared and local cache.

Local cache allows reducing the number of models fits up to five times against the
non-cached variant. The effectiveness of the shared cache implementation is twice as high
as that for the local cache.
The parallelization of the composite models building, fitting, and application also
makes it possible to decrease the time devoted to the design stage. It can be achieved
in different ways. First of all, the fitting and application of the atomic ML models can
be parallelized using the features of the underlying framework (e.g., Scikit-learn, Keras,
TensorFlow, etc [43]), since the atomic models can be very complex. However, this approach
is more effective in the shared memory systems and it is hard to scale it to the distributed
environments. Moreover, not all models can be efficiently parallelized in this way.
Then, the evolutionary algorithm that builds the composite model can be paralleled
itself, since the fitness function for each individual can be calculated independently. To con-
duct the experiment, the classification benchmark based at the credit scoring problem (https:
//github.com/nccr-itmo/FEDOT/blob/master/cases/credit_scoring_problem.py) was
224

Entropy 2021, 23, 28 18 of 26

used. The parameters of the evolutionary algorithm are the same as described at the
beginning of the section.
The obtained values of the fitness function for the classification problem are presented
in Figure 14.

(a) (b)
Figure 14. (a) The best achieved fitness value for the different computational configurations (represented as different
number of parallel threads) used to evaluate the evolutionary algorithm on classification benchmark. The boxplots are build
for the 10 independent runs. (b) Pareto frontier (blue) obtained for the classification benchmark in “execution time-model
quality” subspace. The red points represent dominated individuals.

The effectiveness of the evolutionary algorithm parallelization depends on the vari-


ance of the composite models fitting time in the population. It is matters because the new
population can not be formed until all individuals from the previous one are assessed. This
problem is illustrated in Figure 15 for cases (a) and (b) that were evaluated with classifica-
tion dataset and parameters of evolutionary algorithm described above. It can be noted that
the modified selection scheme noted in (b) can be used to increase parallelization efficiency.
The early selection, mutation, and crossover of the already processed individuals allow to
start the processing of the next population before the previous population’s assessment
is finished.

(a) (b)
Figure 15. (a) The comparison of different scenarios of evolutionary optimization: best (ideal), realistic and worst cases
(b) The conceptual dependence of the parallelization efficiency from the variance of the execution time in population for
the different types of selection.
225

Entropy 2021, 23, 28 19 of 26

The same logic can be applied for the parallel fitting of the part of composite model
graphs. It raises the problem of the importance of assessment for the structural subgraphs
and the prediction of most promising candidate models before the final evaluation of the
fitness function will be done.

5.3. Co-Design Strategies for the Evolutionary Learning Algorithm


The co-design of the generative algorithm and the available infrastructure is an impor-
tant issue (described in detail in the Issue 3) in the task of composite model optimization.
The interesting case here is optimization under the pre-defined time constraints [44]. The
experimental results obtained for the two different optimization strategies are presented
in Figure 16. The classification problem was solved using the credit scoring problem
(described above) as a benchmark for the classification task. The parameters of the evo-
lutionary algorithm are the same as described at the beginning of the section. The fitness
function value is based on ROC AUC measure and maximized during optimization.
The static strategy S1 represents the evolutionary optimization with the fixed hyper-
parameters of the algorithm. The computational infrastructure used in the experiment
makes it possible to evaluate the 20 generations with 20 individuals in the population with
a time limit of T0 . This strategy allows finding the solution with the fitness function value
F0 . However, if the time limit T1 < T0 is taken into account, the static strategy allow to find
the solution S1 with the fitness function value F1 , where F1 < F0 .
Otherwise, the adaptive optimization strategy S2 , which takes the characteristics of
the infrastructure to self-tune the parameters can be used. It allow to evaluate 20 generation
with 10 individuals in a time limit T1 and reach the fitness function value F2 . As can be
seen, the F1 < F2 < F0 , so the better solution is found under the given time constraint.

Figure 16. The comparison of different approaches to the evolutionary optimization of the composite models. The min-
max intervals are built for the 10 independent runs. The green line represents the static optimization algorithm with
20 individuals in the population; the blue line represented the dynamic optimization algorithm with 10 individuals in the
population. T0 , T1 and T2 are different real-time constraints, F0 , F1 and F2 are the values of fitness functions obtained with
the corresponding constraints.
226

Entropy 2021, 23, 28 20 of 26

5.4. Strategies for Optimization of Hyperparameters in Evolutionary Learning Algorithm


As it was noted in the issue described in Issue 4, the very large search space is a major
problem in the generative design. To prove that it can be solved with the application of the
specialized hyperparameters tuning strategies, a set of experiments was conducted.
As can be seen from Figure 6, the direct tuning strategy means that each atomic model
is considered an autonomous model during tuning. The computational cost of the tuning
is low in this case (since it is not necessary to fit all the models in a chain to estimate
the quality metric), but the found set of parameters can be non-optimal. The composite
model tuning allows to take into account the influence of the chain beyond the scope of
an individual atomic model, but the cost is additional computations to tune all models. A
pseudocode of an algorithm for composite model tuning is represented in Algorithm 1.

Algorithm 1: The simplified pseudocode of the composite models tuning algorithm illustrated in Figure 6b.
Data: maxTuningTime, tuneData, paramsRanges
Result: tunedCompositeModel
fitData, validationData = Split(tuneData)
for atomicModel in compositeModel do
candidateCompositeModel = compositeModel
while tuningTime < maxTuningTime do
bestQuality = 0
candidateAtomicModel OptFunction(atomicModel, paramsRanges) // OptFunction can be
implemented as random search, Bayesian optimization, etc.
candidateCompositeModel Update(candidateCompositeModel, candidateAtomicModel)
Fit(candidateCompositeModel, fitData)
quality = EvaluateQuality (candidateCompositeModel, validationData)
if quality > bestQuality then
bestQuality = quality
bestAtomicModel = candidateAtomicModel
end
compositeModel Update(compositeModel, bestAtomicModel)
end
end
tunedCompositeModel = compositeModel

The results of the model-supported tuning of the composite models for the different
regression problems obtained from PMLB benchmark suite (Available in the https://
github.com/EpistasisLab/pmlb) are presented in Table 1. The self-developed toolbox
that was used to run the experiments with PMLB and FEDOT is available in the open
repository (https://github.com/ITMO-NSS-team/AutoML-benchmark). The applied
tuning algorithm is based on a random search in a pre-defined range.

Table 1. The quality measures for the composite models after and before random search-based tuning of hyperparameters. The
regression problems from PMLB suite [45] are used as benchmarks.

Benchmark Name MSE without Tuning MSE with Tuning R2 without Tuning R2 with Tuning
1203_BNG_pwLinear 8.213 0.102 0.592 0.935
197_cpu_act 5.928 7.457 0.98 0.975
215_2dplanes 1.007 0.001 0.947 1
228_elusage 126.755 0.862 0.524 0.996
294_satellite_image 0.464 0.591 0.905 0.953
4544_GeographicalOriginalofMusic 0.194 2.113 0.768 0.792
523_analcatdata_neavote 0.593 0.025 0.953 0.999
560_bodyfat 0.07 0.088 0.998 0.894
561_cpu 3412.46 0.083 0.937 0.91
564_fried 1.368 0.073 0.944 0.934
227

Entropy 2021, 23, 28 21 of 26

It can be seen that the hyperparameter optimization allow increasing the quality of
the models in most cases.

5.5. Estimation of the Empirical Performance Models


The experiments for the performance models identification (this problem was raised
in the issue described in Issue 5) were performed using the benchmark with a large number
of features and observations in the sample. The benchmark is based on a classification
task from the robotics field. It is quite a suitable example since there is a large number
of tasks in this domain that can be performed on different computational resources from
the embedded system to supercomputer in robotics. The analyzed task is devoted to
the manipulator grasp stability prediction obtained from the Kaggle competition (https:
//www.kaggle.com/ugocupcic/grasping-dataset).
An experiment consists of grasping the ball, shaking it for a while, while computing
grasp robustness. Multiple measurements are taken during a given experiment. Only one
robustness value is associated though. The obtained dataset is balanced and has 50/50
stable and unstable grasps respectively.
The approximation of the EPM with simple regression models is a common way to
analyze the performance of algorithms [46]. After the set of experiments, for the majority
of considered models it was confirmed that the common regression surface of a single
model EPM can be represented as a linear model. However, some considered models can
be described better by another regression surface (see the quality measures for the different
structures of EPM in Appendix A). One of them is a random forest model EPM. According
to the structure of the Equation (9), these structures of EPM can be represented as follows:
8
< Q1 Nobs N f eat + Q2 Nobs , f or the common case
T EPM = N2 N , (23)
: Nobs + obsQ2 f eat , speci f ic case f or random f orest
Q2 1 2

where T EPM —model fitting time estimation (represented in ms according to the scale of
coefficients from Table 2), Nobs —number of observations in the sample, N f eat —number of
features in the sample. The characteristics of the computational resources and hyperparam-
eters of the model are considered as static in this case.
We applied the least squared errors (LSE) algorithm to (23) and obtained the Q
coefficients for the set of models that presented Table 2. The coefficient of determination R2
is used to evaluate the quality of obtained performance models.

Table 2. The examples of coefficients for the different performance models.

ML Model Q1 · 104 Q2 · 103 R2


LDA 2.9790 3.1590 0.9983
QDA 1.9208 3.1012 0.9989
Naive Bayes for Bernoulli models 1.3440 3.3120 0.9986
Decision tree 31.110 4.1250 0.9846
PCA 3.1291 2.4174 0.9992
Logistic regression 9.3590 2.3900 0.9789
Random forest 94.42 · 104 2.507·108 0.9279

The application of the evolutionary optimization to the benchmark allows finding the
optimal structure of the composite model for the specific problem. We demonstrate EPM
constructing for the composite model which consists of logistic regression and random
forest as a primary nodes and logistic regression as a secondary node. On the basis of (11),
EPM for this composite model can be represented as follows:
2 N
Nobs
N f eat
EPM
TAdd = max (Q1,1 Nobs N f eat + Q2,1 Nobs , Q11,2 Nobs N f eat + Q2,2 Nobs ) + obs + , (24)
Q21,3 Q22,3
228

Entropy 2021, 23, 28 22 of 26

where TAddEPM —composite model fitting time estimated by the additive EMP, Q , j-i coeffi-
i
cient of j model type for EPM according to the Table 2.
The performance model for the composite model with three nodes (LR + RF = LR) is
shown in Figure 17. The visualizations for the atomic models are available in Appendix A.

Figure 17. Predictions of the performance model that uses an additive approach for local empirical performance models
(EPMs) of atomic models. The red points represent the real evaluations of the composite model as a part of validation.

The RMSE (root-mean-squared-error) measure is used to evaluate the quality of chain


EPM evaluation against real measurements. In this case, the obtained RMSE = 21.3 s
confirms the good quality of obtained estimation in an observed 0–400 seconds range.

6. Discussion and Future Works


In a wider sense co-design problem may be solved as an iterative procedure that
includes additional tuning during the model execution stage and a cyclic closure (or re-
building stage) with respect to time evolution. Re-building stage may be initiated by two
types of events: (1) model error overcomes acceptable threshold ec ; (2) execution time
overcomes acceptable threshold tc . In this case a solution is to build the new model with
respect to corrected set of structures S̃ and performance model T̃M :
min
p0 ( M⇤ , t) > rc , Tex min > tc , p̃min ( M⇤⇤ , t) = max F 0 M̃, t| T̃M  tc , Tgen  tg , (25)

where t is a variable of real time and rc is a critical threshold for values of error function
E. Such a problem is typical for models that are connected with a lifecycle of their pro-
totype, e.g., models inside digital shadow for industrial system [47], weather forecasting
models [48], etc.
Additional fitting of co-designed system may appear also on the level of model
execution where classic scheduling approach may be blended with model tuning. Classic
formulation of scheduling for resource intensive applications Tex min ( L⇤ ) = min G 0 ( L| M, I )
A
is based on idea of optimization search for such algorithm L⇤ that helps to provide minimal
computation time Tex min for model execution process through balanced schedules of
workload on computation nodes. However, such approach is restricted by assumption
of uniform performance models for all parts of application. In real cases performance of
application may change dynamically in time and among functional parts. Thus, to reach
more effective execution it is desirable to formulate optimization problem with respect to
possibility of tuning model characteristics that influence on model performance:
229

Entropy 2021, 23, 28 23 of 26

⇣n o⇤ ⌘ ⇣ ⇣n o⌘ ⌘ n o
Tex max a1:|S| , L⇤ = max G M a1:|S| , L| I , M = S⇤ , E⇤ , a1:|S| , L = { L m }, (26)
a,L

where G is objective function that characterize expected time of model execution with
respect to used scheduling algorithm L and model M. In the context of generative modeling
problem on the stage of execution model M can be fully described as a set of model
properties that consists of optimal model structure: optimal functions ⇤
n S o(from previous
stage) and additional set of performance influential parameters a1:|S| . Reminiscent
approaches can be seen in several publications, e.g., [49].

7. Conclusions
In this paper, we aimed to highlight the different aspects of the creation of mathe-
matical models using automated evolutionary learning approach. Such approach may be
represented from the perspective of generative design and co-design for mathematical
models. First of all, we formalize several actual and unsolved issues that exist in the
field of generative design of mathematical models. They are devoted to different aspects:
computational complexity, performance modeling, parallelization, interaction with the
infrastructure, etc. The set of experiments was conducted as proof-of-concept solutions
for every announced issue and obstacle. The composite ML models obtained by the FE-
DOT framework and differential equation-based models obtained by the EPDE framework
were used as case studies. Finally, the common concepts of the co-design implementation
were discussed.

Author Contributions: Conceptualization, A.V.K. and A.B.; Investigation, N.O.N., A.H., M.M. and
M.Y.; Methodology, A.V.K.; Project administration, A.B.; Software, N.O.N., A.H., and M.M.; Supervi-
sion, A.B.; Validation, M.M.; Visualization, M.Y.; Writing–original draft, A.V.K., N.O.N. and A.H. All
authors have read and agreed to the final publication of the manuscript.
Funding: This research is financially supported by the Ministry of Science and Higher Education,
Agreement #075-15-2020-808.
Conflicts of Interest: The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

AI Artificial intelligence
ANN Artificial neural network
AutoML Automated machine learning
DAG Directed acyclic graph
EPM Empirical performance model
GPU Graphics processing unit
ML Machine learning
MSE Mean squared error
NAS Neural architecture search
ODE Ordinary differential equation
PDE Partial differential equation
PM Performance model
R2 Coefficient of determination
RMSE Root mean square error
ROC AUC Area under receiver operating characteristic curve
230

Entropy 2021, 23, 28 24 of 26

Appendix A. Additional Details on the Empirical Performance Models Validation


The validation of different EPM for the set of the atomic models (that was noted in
Table 2) is presented in Table A1. R2 and RMSE metrics are used to compare the predictions
of EPM and real measurements of the fitting time. The obtained results confirm that
the linear EPM with two terms is most suitable for most of the ML models used in the
experiments. However, the fitting time for some models (e.g., random forest) is represented
better by the more specific EPM. The one-term EPM provides a lower quality than more
complex analogs.

Table A1. Approximation errors for the different empirical performance models’ structures obtained
for the atomic ML models. The best suitable structure is highlighted with bold.

Q1 Nobs N f eat Nobs


2
Nobs N f eat
Q1 Nobs N f eat Q21
+ Q22
Model + Q2 Nobs
RMSE, s R2 RMSE, s R2 RMSE, s R2
LDA 0.35 0.92 0.11 0.99 0.66 0.74
QDA 0.75 0.57 0.03 0.99 0.93 0.36
Naive Bayes 0.82 0.42 0.04 0.99 0.961 0.21
Decision tree 1.48 0.98 1.34 0.98 3.49 0.89
PCA 0.28 0.78 0.04 0.99 0.28 0.95
Logit 0.54 0.91 0.37 0.96 0.95 0.75
Random forest 96.81 0.60 26.50 0.71 21.36 0.92

The visualization of the performance models predictions for the different cases is
presented in Figure A1. It confirms that the selected EPMs allow estimating the fitting time
quite reliably.

(a) LDA (b) QDA

(c) DT (d) PCA

Figure A1. Cont.


231

Entropy 2021, 23, 28 25 of 26

(e) BernoulliNaveBayes (f) Logit


Figure A1. The empirical performance models for the different atomic models: LDA, QDA, Decision Tree (DT), PCA
dimensionality reduction model, Bernoulli Naïve Bayes model, logistic regression. The heatmap represent the prediction of
EPM and the black points are real measurements.

References
1. Packard, N.; Bedau, M.A.; Channon, A.; Ikegami, T.; Rasmussen, S.; Stanley, K.; Taylor, T. Open-Ended Evolution and Open-Endedness:
Editorial Introduction to the Open-Ended Evolution I Special Issue; MIT Press: Cambridge, MA, USA, 2019.
2. Krish, S. A practical generative design method. Comput.-Aided Des. 2011, 43, 88–100. [CrossRef]
3. Ferreira, C. Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence; Springer: Berlin/Heidelberg, Germany,
2006; Volume 21.
4. Pavlyshenko, B. Using stacking approaches for machine learning models. In Proceedings of the 2018 IEEE Second International
Conference on Data Stream Mining & Processing (DSMP), Lviv, Ukraine, 21–25 August 2018; pp. 255–258.
5. Kovalchuk, S.V.; Metsker, O.G.; Funkner, A.A.; Kisliakovskii, I.O.; Nikitin, N.O.; Kalyuzhnaya, A.V.; Vaganov, D.A.; Bochenina,
K.O. A conceptual approach to complex model management with generalized modelling patterns and evolutionary identification.
Complexity 2018, 2018, 5870987. [CrossRef]
6. Kalyuzhnaya, A.V.; Nikitin, N.O.; Vychuzhanin, P.; Hvatov, A.; Boukhanovsky, A. Automatic evolutionary learning of composite
models with knowledge enrichment. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion,
Cancun, Mexico, 8–12 July 2020; pp. 43–44.
7. Lecomte, S.; Guillouard, S.; Moy, C.; Leray, P.; Soulard, P. A co-design methodology based on model driven architecture for real
time embedded systems. Math. Comput. Model. 2011, 53, 471–484. [CrossRef]
8. He, X.; Zhao, K.; Chu, X. AutoML: A Survey of the State-of-the-Art. arXiv 2019, arXiv:1908.00709.
9. Caldwell, J.; Ram, Y.M. Mathematical Modelling: Concepts and Case Studies; Springer Science & Business Media: Berlin/Heidelberg,
Germany, 2013; Volume 6.
10. Banwarth-Kuhn, M.; Sindi, S. How and why to build a mathematical model: A case study using prion aggregation. J. Biol. Chem.
2020, 295, 5022–5035. [CrossRef] [PubMed]
11. Castillo, O.; Melin, P. Automated mathematical modelling for financial time series prediction using fuzzy logic, dynamical
systems and fractal theory. In Proceedings of the IEEE/IAFE 1996 Conference on Computational Intelligence for Financial
Engineering (CIFEr), New York City, NY, USA, 24–26 March 1996; pp. 120–126.
12. Kevrekidis, I.G.; Gear, C.W.; Hyman, J.M.; Kevrekidid, P.G.; Runborg, O.; Theodoropoulos, C. Equation-free, coarse-grained
multiscale computation: Enabling mocroscopic simulators to perform system-level analysis. Commun. Math. Sci. 2003, 1, 715–762.
13. Schmidt, M.; Lipson, H. Distilling free-form natural laws from experimental data. Science 2009, 324, 81–85. [CrossRef]
14. Kondrashov, D.; Chekroun, M.D.; Ghil, M. Data-driven non-Markovian closure models. Phys. D Nonlinear Phenom. 2015,
297, 33–55. [CrossRef]
15. Maslyaev, M.; Hvatov, A.; Kalyuzhnaya, A. Data-Driven Partial Derivative Equations Discovery with Evolutionary Approach. In
International Conference on Computational Science; Springer: Berlin/Heidelberg, Germany, 2019; pp. 635–641.
16. Qi, F.; Xia, Z.; Tang, G.; Yang, H.; Song, Y.; Qian, G.; An, X.; Lin, C.; Shi, G. A Graph-based Evolutionary Algorithm for Automated
Machine Learning. Softw. Eng. Rev. 2020, 1, 10–37686.
17. Olson, R.S.; Bartley, N.; Urbanowicz, R.J.; Moore, J.H. Evaluation of a tree-based pipeline optimization tool for automating
data science. In Proceedings of the Genetic and Evolutionary Computation Conference, New York, NY, USA, 20–24 July 2016;
pp. 485–492.
18. Zhao, H. High Performance Machine Learning through Codesign and Rooflining. Ph.D. Thesis, UC Berkeley, Berkeley, CA,
USA, 2014.
19. Amid, A.; Kwon, K.; Gholami, A.; Wu, B.; Asanović, K.; Keutzer, K. Co-design of deep neural nets and neural net accelerators for
embedded vision applications. IBM J. Res. Dev. 2019, 63, 6:1–6:14. [CrossRef]
20. Li, Y.; Park, J.; Alian, M.; Yuan, Y.; Qu, Z.; Pan, P.; Wang, R.; Schwing, A.; Esmaeilzadeh, H.; Kim, N.S. A network-centric
hardware/algorithm co-design to accelerate distributed training of deep neural networks. In Proceedings of the 2018 51st Annual
IEEE/ACM International Symposium on Microarchitecture (MICRO), Fukuoka, Japan, 20–24 October 2018; pp. 175–188.
232

Entropy 2021, 23, 28 26 of 26

21. Bertels, K. Hardware/Software Co-Design for Heterogeneous Multi-Core Platforms; Springer: Berlin/Heidelberg, Germany, 2012.
22. Wang, K.; Liu, Z.; Lin, Y.; Lin, J.; Han, S. HAQ: Hardware-Aware Automated Quantization With Mixed Precision. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019.
23. Cai, H.; Zhu, L.; Han, S. Proxylessnas: Direct neural architecture search on target task and hardware. arXiv 2018, arXiv:1812.00332.
24. Dosanjh, S.S.; Barrett, R.F.; Doerfler, D.; Hammond, S.D.; Hemmert, K.S.; Heroux, M.A.; Lin, P.T.; Pedretti, K.T.; Rodrigues, A.F.;
Trucano, T. Exascale design space exploration and co-design. Future Gener. Comput. Syst. 2014, 30, 46–58. [CrossRef]
25. Gramacy, R.B.; Lee, H.K. Adaptive Design of Supercomputer Experiments. 2018. Available online: http://citeseerx.ist.psu.edu/
viewdoc/download?doi=10.1.1.312.3750&rep=rep1&type=pdf (accessed on 26 December 2020).
26. Glinskiy, B.; Kulikov, I.; Snytnikov, A.V.; Chernykh, I.; Weins, D.V. A multilevel approach to algorithm and software design for
exaflops supercomputers. Numer. Methods Program. 2015, 16, 543–556.
27. Kaltenecker, C. Comparison of Analytical and Empirical Performance Models: A Case Study on Multigrid Systems. Master’s The-
sis, University of Passau, Passau, Germany, 2016.
28. Calotoiu, A. Automatic Empirical Performance Modeling of Parallel Programs. Ph.D. Thesis, Technische Universität, Berlin,
Germany, 2018.
29. Eggensperger, K.; Lindauer, M.; Hoos, H.H.; Hutter, F.; Leyton-Brown, K. Efficient benchmarking of algorithm configurators via
model-based surrogates. Mach. Learn. 2018, 107, 15–41. [CrossRef]
30. Chirkin, A.M.; Belloum, A.S.; Kovalchuk, S.V.; Makkes, M.X.; Melnik, M.A.; Visheratin, A.A.; Nasonov, D.A. Execution time
estimation for workflow scheduling. Future Gener. Comput. Syst. 2017, 75, 376–387. [CrossRef]
31. Gamatié, A.; An, X.; Zhang, Y.; Kang, A.; Sassatelli, G. Empirical model-based performance prediction for application mapping
on multicore architectures. J. Syst. Archit. 2019, 98, 1–16. [CrossRef]
32. Shi, Z.; Dongarra, J.J. Scheduling workflow applications on processors with different capabilities. Future Gener. Comput. Syst.
2006, 22, 665–675. [CrossRef]
33. Visheratin, A.A.; Melnik, M.; Nasonov, D.; Butakov, N.; Boukhanovsky, A.V. Hybrid scheduling algorithm in early warning
systems. Future Gener. Comput. Syst. 2018, 79, 630–642. [CrossRef]
34. Melnik, M.; Nasonov, D. Workflow scheduling using Neural Networks and Reinforcement Learning. Procedia Comput. Sci. 2019,
156, 29–36. [CrossRef]
35. Olson, R.S.; Moore, J.H. TPOT: A tree-based pipeline optimization tool for automating machine learning. Proc. Mach. Learn. Res.
2016, 64, 66–74.
36. Evans, L.; Society, A.M. Partial Differential Equations; Graduate Studies in Mathematics; American Mathematical Society:
Providence, RI, USA, 1998.
37. Czarnecki, W.M.; Osindero, S.; Jaderberg, M.; Swirszcz, G.; Pascanu, R. Sobolev training for neural networks. In Proceedings of
the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 4278–4287.
38. Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward
and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [CrossRef]
39. Epicoco, I.; Mocavero, S.; Porter, A.R.; Pickles, S.M.; Ashworth, M.; Aloisio, G. Hybridisation strategies and data structures for
the NEMO ocean model. Int. J. High Perform. Comput. Appl. 2018, 32, 864–881. [CrossRef]
40. Nikitin, N.O.; Polonskaia, I.S.; Vychuzhanin, P.; Barabanova, I.V.; Kalyuzhnaya, A.V. Structural Evolutionary Learning for
Composite Classification Models. Procedia Comput. Sci. 2020, 178, 414–423. [CrossRef]
41. Full Script That Allows Reproducing the Results Is Available in the GitHub Repository. Available online: https://github.
com/ITMO-NSS-team/FEDOT.Algs/blob/master/estar/examples/ann_approximation_experiments.ipynb (accessed on
26 December 2020).
42. Full Script That Allows Reproducing the Results Is Available in the GitHub Repository. Available online: https://github.com/
ITMO-NSS-team/FEDOT.Algs/blob/master/estar/examples/Pareto_division.py (accessed on 26 December 2020).
43. Géron, A. Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent
Systems; O’Reilly Media: Sebastopol, CA, USA, 2019.
44. Nikitin, N.O.; Vychuzhanin, P.; Hvatov, A.; Deeva, I.; Kalyuzhnaya, A.V.; Kovalchuk, S.V. Deadline-driven approach for multi-
fidelity surrogate-assisted environmental model calibration: SWAN wind wave model case study. In Proceedings of the Genetic
and Evolutionary Computation Conference Companion, Prague, Czech Republic, 13–17 July 2019; pp. 1583–1591.
45. Olson, R.S.; La Cava, W.; Orzechowski, P.; Urbanowicz, R.J.; Moore, J.H. PMLB: A large benchmark suite for machine learning
evaluation and comparison. BioData Min. 2017, 10, 1–13. [CrossRef]
46. Li, K.; Xiang, Z.; Tan, K.C. Which surrogate works for empirical performance modelling? A case study with differential evolution.
In Proceedings of the 2019 IEEE Congress on Evolutionary Computation (CEC), Wellington, New Zealand, 10–13 June 2019;
pp. 1988–1995.
47. Bauernhansl, T.; Hartleif, S.; Felix, T. The Digital Shadow of production–A concept for the effective and efficient information
supply in dynamic industrial environments. Procedia CIRP 2018, 72, 69–74. [CrossRef]
48. Cha, D.H.; Wang, Y. A dynamical initialization scheme for real-time forecasts of tropical cyclones using the WRF model. Mon.
Weather Rev. 2013, 141, 964–986. [CrossRef]
49. Melnik, M.; Nasonov, D.A.; Liniov, A. Intellectual Execution Scheme of Iterative Computational Models based on Symbiotic
Interaction with Application for Urban Mobility Modelling. IJCCI 2019, 1, 245–251.
233

Comparison of Single- and Multi- Objective Optimization �ality


for Evolutionary Equation Discovery
Mikhail Maslyaev Alexander Hvatov
maslyaitis@gmail.com alex_hvatov@itmo.ru
ITMO University ITMO University
St Petersburg, Russia St Petersburg, Russia

ABSTRACT some information about its operation. In the case of modeling phys-
Evolutionary di�erential equation discovery proved to be a tool to ical processes, commonly, the most suitable models have forms of
obtain equations with less a priori assumptions than conventional partial di�erential equations. Thus many recent studies aimed to
approaches, such as sparse symbolic regression over the complete develop the concept of data-driven di�erential equations discovery.
possible terms library. The equation discovery �eld contains two In the paper, data-driven discovery implies obtaining a di�eren-
independent directions. The �rst one is purely mathematical and tial equation from a set of empirical measurements, describing the
concerns di�erentiation, the object of optimization and its relation dynamics of a dependent variable in some domain. Furthermore,
to the functional spaces and others. The second one is dedicated equation-based models can be incorporated into pipelines of au-
purely to the optimizatioal problem statement. Both topics are tomated machine learning, that can include arbitrary submodels,
worth investigating to improve the algorithm’s ability to handle with approach, discussed in paper [14].
experimental data a more arti�cial intelligence way, without signif- Initial advances in di�erential equations discovery were made
icant pre-processing and a priori knowledge of their nature. In the with symbolic regression algorithm, as in [1]. The algorithm em-
paper, we consider the prevalence of either single-objective opti- ploys genetic programming to detect the graph, that represents
mization, which considers only the discrepancy between selected di�erential equation. One of the groups of the most simple yet
terms in the equation, or multi-objective optimization, which addi- practical techniques of equation construction is based on the sparse
tionally takes into account the complexity of the obtained equation. linear regression (least absolute shrinkage and selection operator),
The proposed comparison approach is shown on classical model introduced in works [11], [15], [16], and other similar projects. This
examples – Burgers equation, wave equation, and Korteweg - de approach has limited �exibility, having applicability restrictions
Vries equation. in cases of the equation with low magnitude coe�cients, being
discovered on noisy data. This issue is addressed by employing
CCS CONCEPTS Bayesian interference as in [12] to estimate the coe�cients of the
equation, as in work [4]. To account for the uncertainty in the
• Applied computing ! Mathematics and statistics; • Computing
resulting model, the approximating term library can be biased sta-
methodologies ! Heuristic function construction.
tistically [2]. Physics-informed neural networks (PINN) form the
next class of data-driven equation discovery tools, representing
KEYWORDS
the process dynamics with arti�cial neural networks. The primary
symbolic regression, dynamic system modeling, interpretable learn- research on this topic is done in work [13], while recent advances
ing, di�erential equations, sparse regression have been made in incorporating more complex types of neural
ACM Reference Format: networks in the PINNs [3, 17].
Mikhail Maslyaev and Alexander Hvatov. 2023. Comparison of Single- and In recent studies [7, 10], evolutionary algorithms have proved
Multi- Objective Optimization Quality for Evolutionary Equation Discovery. to be a rather �exible tool for di�erential equation discovery, de-
In Genetic and Evolutionary Computation Conference Companion (GECCO manding only a few assumptions about the process properties. The
’23 Companion), July 15–19, 2023, Lisbon, Portugal. ACM, New York, NY, problem is stated as the process representation error minimization.
USA, 4 pages. https://doi.org/10.1145/3583133.3590601 Implementing multi-objective evolutionary optimization, �rst in-
troduced for DE systems, as in [8], seems to be a feasible way to
1 INTRODUCTION improve the quality of the equation search, operating on fewer
The recent development of arti�cial intelligence has given high initial assumptions and providing higher diversity among the pro-
importance to problems of interpretable machine learning. In many cessed candidates. Additional criteria can represent other valuable
cases, users value models not only for their quality of predicting properties of the constructed models, namely conciseness.
the state of the studied system but also for the ability to provide This study compares the performance of single- and multi- objec-
tive optimization. Namely, the hypothesis that the multi-objective
Permission to make digital or hard copies of part or all of this work for personal or optimization creates and preserves diversity in the population and
classroom use is granted without fee provided that copies are not made or distributed
for pro�t or commercial advantage and that copies bear this notice and the full citation thus may achieve a better �tness function values, than that of a
on the �rst page. Copyrights for third-party components of this work must be honored. single-objective approach.The theoretical comparison shows that
For all other uses, contact the owner/author(s). multi-objective algorithms allow escaping local minima as soon as
GECCO ’23 Companion, July 15–19, 2023, Lisbon, Portugal
© 2023 Copyright held by the owner/author(s). the number of objectives is reasonably small [5]. For equation dis-
ACM ISBN 979-8-4007-0120-7/23/07. covery applications, the function landscapes have a more complex
https://doi.org/10.1145/3583133.3590601
234

GECCO ’23 Companion, July 15–19, 2023, Lisbon, Portugal M. Maslyaev, and A. Hvatov

structure, so increased diversity of the population can bene�t the 2.2 Mechanics of implemented evolutionary
resulting quality. operators
To direct the search for the optimal equations, standard evolution-
2 ALGORITHM DESCRIPTION ary operators of mutation and cross-over have been implemented.
The data-driven di�erential equation identi�cation operates on While the mechanics of single- and multi-objective optimization
problems of selecting a model for dynamics of the variable D = in the algorithm di�er, they work similarly on the stage of apply-
D (C, x) in a spatio-temporal domain (0,) ) ⌦, that is implicitly
>
ing equation structure-changing operators. With the graph-like
described by di�erential equation Eq. 1 with corresponding initial encoding of candidate equations, the operators can be represented
and boundary conditions. It can be assumed, that the order of the as changes, introduced into its subgraphs.
unknown equation can be arbitrary, but rather low (usually of The algorithm properties to explore structures are provided by
second or third order). mutation operators, which operate by random token and term ex-
mD mD mD changes. The number of terms to change has no strict limits. For
(C, x, D,
, , ... )=0 (1) tokens with parameters (?:+1, ... ?= ) 2 R= : , such as a para-
mC mG 1 mG=
Both multi-objective and single-objective approaches have the metric representation of an unknown external dependent variable,
same core of "graph-like" representation of a di�erential equation parameters are also optimized: the mutation is done with a random
(encoding) and similar evolutionary operators that will be described Gaussian increment.
further. In order to combine structural elements of better equations,
the cross-over operator is implemented. The interactions between
2.1 Di�erential equation representation parent equations are held on a term-level basis. The sets of terms
pairs from the parent equation are divided into three groups: terms
To represent the candidate di�erential equation the computational
identical in both equations, terms that are present in both equations
graph structure is employed. A �xed three-layer graph structure is
but have di�erent parameters or only a few tokens inside of them
employed to avoid the infeasible structures, linked to unconstrained
are di�erent, and the unique ones. The cross-over occurs for the two
graph construction and overtraining issues, present in symbolic
latter groups. For the second group it manifests as the parameter
regression. The lowest level nodes contain tokens, middle nodes
exchange between parents: the new parameters are selected from
and the root are multiplication and summation operations. The
the interval between the parents’ values.
data-driven equations take the form of a linear combination of
Cross-over between unique terms works as the complete ex-
product terms, represented by the multiplication of derivatives,
change between them. The construction of exchange pairs between
other functions and a real-valued coe�cient Eq. 2.
these tokens works entirely randomly.
(
0 (C, x, D, mD , mD , ... mD ) = Õ U Œ 5 = 0
mC mG 1 mG= 8 8 9 89
(2) 2.3 Optimization of equation quality metric
⌧ 0 (D)| = 0
The selection of the optimized functional distinguishes multiple
Here, the factors 58 9 are selected from the user-de�ned set of approaches to the di�erential equation search. First of all, a more
elementary functions, named tokens. The problem of an equation trivial optimization problem can be stated as in Eq. 4, where we
search transforms into the task of detecting an optimal set of tokens assume the identity of the equation operator 0 (D) = 0 to zero as
to represent the dynamics of the variable D (C, x), and forming the in Eq. 2.
equation by evaluating the coe�cients U = (U 1, ... U< ).
During the equation search, we operate with tensors of token ’ ÷
values, evaluated on grids DW = D (CW , xW ) in the processed domain &>? ( 0 (D)) = || 0 (D)||= = || U8 58 9 ||= ! min (4)
> U 8 C8 9
(0,) ) ⌦. 8 9
Sparsity promotion in the equation operates by �ltering out
An example of a more complex optimized functional is the norm
nominal terms with low predicting power and is implemented with
of a discrepancy between the input values of the modelled variable
LASSO regression. For each individual, a term (without loss of
and the solution proposed by the algorithm di�erential equation,
generality, we can assume that it is the <-th term) is marked to be a
estimated on the same grid. Classical solution techniques can not
"right-hand side of the equation" for the purposes of term �ltering
Πbe applied here due to the inability of a user to introduce the par-
and coe�cient calculation. The terms )8 = 9 58 9 are paired with
titioning of the processed domain, form �nite-di�erence schema
real-value coe�cients obtained from the optimization subproblem
without a priori knowledge of an equation, proposed by evolution-
of Eq. 3. Finally, the equation coe�cients are detected by linear
ary algorithm. An automatic solving method for candidate equation
regression.
(viewed as in Eq. 6) quality evaluation is introduced in [9] to work
’ ÷ ÷ around this issue.
U 0 = arg min (|| U80 58 9 5< 9 || 2 + _||U 0 || 1 ) (3)
U
8, 8<< 9 9
&B>; ( 0 (D)) = ||D D ||= ! min (5)
In the initialization of the algorithm equation graphs are ran- U 8 C8 9
domly constructed for each individual from the sets of user-de�ned ’ ÷
tokens with a number of assumptions about the structures of the 0
(D) = 0 : 0 (D) = U8 58 9 = 0 (6)
“plausible equations”. 8 9
235

Comparison of Single- and Multi- Objective Optimization �ality for Evolutionary Equation Discovery GECCO ’23 Companion, July 15–19, 2023, Lisbon, Portugal

While both quality metrics Eq. 4 and Eq. 5 in ideal conditions consumption.10 independent runs are conducted with each setup.
provide decent convergence of the algorithm, in the case of the The main equation quality indicator in our study is the statistical
noisy data, the errors in derivative estimations can make di�erential analysis of the objective function mean (` = ` (& ( 0 ))) and variance
operator discrepancy from the identity (as in problem in Eq. 4) an f 2 = (f (& ( 0 ))) 2 among the di�erent launches.
unreliable metric. Applying the automatic solving algorithm has The �rst equation was the wave equation as on Eq. 8 with the
high computational cost due to training a neural network to satisfy necessary boundary and initial conditions. The equation is solved
the discretized equation and boundary operators. with the Wolfram Mathematica software in the domain of (G, C) 2
As the single-objective optimization method for the study, we [0, 1] [0, 1] on a grid of 101 101. Here, we have employed
> >
have employed a simple evolutionary algorithm with a strategy that numerical di�erentiation procedures.
minimizes one of the aforementioned quality objective functions.
Due to the purposes of experiments on synthetic noiseless data, the m 2D m 2D
= 0.04 2 (8)
discrepancy-based approach has been adopted. mC 2 mG
The algorithm’s convergence due to the relatively simple struc-
2.4 Multi-objective optimization application ture was ensured in the case of both algorithms: the algorithm
As we stated earlier, in addition to process representation, the proposes the correct structure during the initialization or in the
conciseness is also a valuable for regulating the interpretability initial epochs of the optimization. However, such a trivial case can
of the model. Thus the metric of this property can be naturally be a decent indicator of the “ideal” algorithm behaviour. The values
introduced as Eq. 7, with an adjustment of counting not the total of examined metrics for this experiment and for the next ones are
number of active terms but the total number of tokens (:8 for 8 C⌘ presented on Tab. 1.
term).
Table 1: Results of the equation discovery

⇠( 0
(D)) = #( ) =
0
:8 ⇤ 1U8 <0 (7)
8 metric method wave Burgers KdV
In addition to evaluating the quality of the proposed solution ` single-objective 5.72 2246.38 0.162
from the point of the equation simplicity, multi-objective enables multi-objective 2.03 1.515 16.128
the detection of systems of di�erential equations, optimizing quali- f2 single-objective 18.57 4.41 ⇤ 107 8.9 ⇤ 10 3
ties of modeling of each variable. multi-objective 0 20.66 ⇡ 10 13
While there are many evolutionary multi-objective optimiza-
tion algorithms, MOEADD (Multi-objective evolutionary algorithm
based on dominance and decomposition) [6] algorithm has proven The statistical analysis of the algorithm performance on each
to be an e�ective tool in applications of data-driven di�erential equation is provided in Fig. 1.
equations construction. We employ baseline version of the MOEADD Another examination was performed on the solution of Burgers’
from the aforementioned paper with the following parameters: PBI equation, which has a more complex, non-linear structure. The
penalty factor \ = 1.0, probability of parent selection inside the problem was set as in Eq. 9, for a case of a process without viscosity,
2
sector neighbourhood X = 0.9 (4 nearest sector are considered as thus omitting term a mmCD2 . As in the previous example, the equation
“neighbouring”) with 40% of individuals selected as parents. Evo- was solved with the Wolfram Mathematica toolkit.
lutionary operator parameters are: crossover rate (probability of
a�ecting individual terms): 0.3 and mutation rate of 0.6.The result mD mD
+D =0 (9)
of the algorithm is the set of equations, ranging from the most sim- mC mG
m= D = 0) to the highly
plistic constructions (typically in forms of mG Derivatives used during the equation search were computed
analytically due to the function not being constant only on small
=
:
complex equations, where extra terms probably represents the noise domain.
components of the dynamics. The presence of other structures that have relatively low opti-
mized function values, such as DG0 DC0 = DCC
00 , makes this case of data
3 EXPERIMENTAL STUDY rather informative. Thus, the algorithm has a local optimum that is
This section of the paper is dedicated to studying equation dis- far from the correct structure from the point of error metric.
covery framework properties. As the main object of interest, we The �nal set-up for an experiment was de�ned with a non-
designate the di�erence of derived equations between single- and homogeneous Korteweg-de Vries equation, presented in Eq. 10.
multi-objective optimization launches. The validation was held The presence of external tokens in separate terms in the equation
on the synthetic datasets, where modelled dependent variable is makes the search more di�cult.
obtained from solving an already known and studied equation.
The tests were held on three cases: wave, Burgers and Korteweg- mD mD m 3D
+ 6D + = cos C sin C (10)
de Vries equations due to unique properties of each equation. The mC mG mG 3
algorithms were tested in the following pattern: 64 evolutionary The experiment results indicate that the algorithm may detect
iterations for the single-objective optimization algorithm and 8 the same equation in multiple forms. Each term of the equation
iterations of multi-objective optimization for the populations of 8 may be chosen as the “right-hand side” one, and the numerical error
candidate equations, which resulted in roughly similar resource with di�erent coe�cient sets can also vary.
236

GECCO ’23 Companion, July 15–19, 2023, Lisbon, Portugal M. Maslyaev, and A. Hvatov

104

102
101
100 10 1

2
10
6 × 100
4
10
0 6
4 × 10 10
8
3 × 10 0 10
10
10
2
2 × 100 10
Single Objective Multi-Objective Single Objective Multi-Objective Single Objective Multi-Objective

(a) (b) (c)

Figure 1: Resulting quality objective function value, introduced as Eq. 6, for single- and multi-objective approaches for (a) wave
equation, (b) Burgers equation, and (c) Korteweg-de Vries equation

4 CONCLUSION limit, with active learning and control. Proceedings of the Royal Society A 478,
2260 (2022), 20210904.
This paper examines the prospects of using multi-objective opti- [3] Han Gao, Matthew J Zahr, and Jian-Xun Wang. 2022. Physics-informed graph
mization for the data-driven discovery of partial di�erential equa- neural galerkin networks: A uni�ed framework for solving pde-governed forward
and inverse problems. Computer Methods in Applied Mechanics and Engineering
tions. While initially introduced for handling problems of deriving 390 (2022), 114502.
systems of partial di�erential equations, the multi-objective view [4] L Gao, Urban Fasel, Steven L Brunton, and J Nathan Kutz. 2023. Convergence of
of the problem improves the overall quality of the algorithm. The uncertainty estimates in Ensemble and Bayesian sparse model discovery. arXiv
preprint arXiv:2301.12649 (2023).
improved convergence, provided by higher candidate individual [5] Hisao Ishibuchi, Yusuke Nojima, and Tsutomu Doi. 2006. Comparison between
diversity, makes the process more reliable in cases of equations single-objective and multi-objective genetic algorithms: Performance comparison
with complex structures, as was shown in the examples of Burgers’ and performance measures. In 2006 IEEE International Conference on Evolutionary
Computation. IEEE, 1143–1150.
and Korteweg-de Vries equations. [6] Q. Zhang K. Li, K. Deb and S. Kwong. 2015. An Evolutionary Many-Objective
The previous studies have indicated the algorithm’s reliability, Optimization Algorithm Based on Dominance and Decomposition. in IEEE
Transactions on Evolutionary Computation) 19, 5 (2015), 694–716. https://doi.org/
converging to the correct equation, while this research has proposed 10.1109/TEVC.2014.2373386
a method of improving the rate at which the correct structures are [7] Lu Lu, Xuhui Meng, Zhiping Mao, and George Em Karniadakis. 2021. DeepXDE:
identi�ed. This property is valuable for real-world applications A deep learning library for solving di�erential equations. SIAM Rev. 63, 1 (2021),
208–228.
because incorporating large and complete datasets improves the [8] Mikhail Maslyaev and Alexander Hvatov. 2021. Multi-Objective Discovery of PDE
noise resistance of the approach. Systems Using Evolutionary Approach. In 2021 IEEE Congress on Evolutionary
The further development of the proposed method involves intro- Computation (CEC). 596–603. https://doi.org/10.1109/CEC45853.2021.9504712
[9] Mikhail Maslyaev and Alexander Hvatov. 2022. Solver-Based Fitness Function
ducing techniques for incorporating expert knowledge into the for the Data-Driven Evolutionary Discovery of Partial Di�erential Equations. In
search process. This concept can help generate preferable can- 2022 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1–8.
[10] Mikhail Maslyaev, Alexander Hvatov, and Anna V Kalyuzhnaya. 2021. Partial
didates or exclude infeasible ones even before costly coe�cient di�erential equations discovery with EPDE framework: application for real and
calculation and �tness evaluation procedures. synthetic data. Journal of Computational Science (2021), 101345.
[11] Daniel A Messenger and David M Bortz. 2021. Weak SINDy for partial di�erential
equations. J. Comput. Phys. 443 (2021), 110525.
5 CODE AND DATA AVAILABILITY [12] Lizhen Nie and Veronika Ročková. 2022. Bayesian Bootstrap Spike-and-Slab
The numerical solution data and the Python scripts, that reproduce LASSO. J. Amer. Statist. Assoc. 0, 0 (2022), 1–16. https://doi.org/10.1080/01621459.
2022.2025815
the experiments, are available at the GitHub repository 1 . [13] M Raissi, P Perdikaris, and GE Karniadakis. 2017. Physics informed deep learning
(Part II): Data-driven discovery of nonlinear partial di�erential equations. arXiv
ACKNOWLEDGEMENTS preprint arXiv:1711.10566 (2017). https://arxiv.org/abs/1711.10566
[14] Mikhail Sarafanov, Valerii Pokrovskii, and Nikolay O Nikitin. 2022. Evolutionary
This research is �nancially supported by the Ministry of Science Automated Machine Learning for Multi-Scale Decomposition and Forecasting of
and Higher Education, agreement FSER-2021-0012. Sensor Time Series. In 2022 IEEE Congress on Evolutionary Computation (CEC).
IEEE, 01–08.
[15] Hayden Schae�er. 2017. Learning partial di�erential equations via data discovery
REFERENCES and sparse optimization. Proc. R. Soc. A 473, 2197 (2017), 20160446.
[1] H. Cao, L. Kang, Y. Chen, et al. 2000. Evolutionary Modeling of Systems of [16] H. Schae�er, R. Ca�isch, C. D. Hauck, and S. Osher. 2017. Learning partial
Ordinary Di�erential Equations with Genetic Programming. Genetic Program- di�erential equations via data discovery and sparse optimization. Proceedings
ming and Evolvable Machines 1 (2000), 309–337. https://doi.org/doi:10.1023/A: of the Royal Society A: Mathematical, Physical and Engineering Science (2017).
1010013106294 https://doi.org/473(2197):20160446
[2] Urban Fasel, J Nathan Kutz, Bingni W Brunton, and Steven L Brunton. 2022. [17] Pongpisit Thanasutives, Takashi Morita, Masayuki Numao, and Ken ichi Fukui.
Ensemble-SINDy: Robust sparse model discovery in the low-data, high-noise 2023. Noise-aware physics-informed machine learning for robust PDE discovery.
Machine Learning: Science and Technology 4, 1 (feb 2023), 015009. https://doi.org/
1 https://github.com/ITMO-NSS-team/EPDE_GECCO_experiments 10.1088/2632-2153/acb1f0

Вам также может понравиться