An Introduction To PDE-constrained Optimization: Wolfgang Bangerth Department of Mathematics Texas A&M University

An introduction to
PDE-constrained optimization
Wolfgang Bangerth
Department of Mathematics
Texas A&M University

1
Overview
Why partial differential equations?
Why optimization?
Examples of PDE optimization
Why is this hard?
Formulation of PDE-constrained problems

Discretization
Solvers
Summary and outlook

2
Partial differential equations describe almost every aspect of
physics, chemistry, and engineering that can be put into a continuum
framework:
Elastic and inelastic deformation of bodies, for example

bridges under load or cars upon impact during accidents
Flow around a car, an air foil, or a space ship
Reactive turbulent flow inside a combustion engine
Reactive slow flow inside chemical reactors
Electromagnetic waves
Quantum mechanics and quantum field theory

Light or X-ray intensities in biomedical imaging
Behavior of bacteria in response to chemical substances
(chemotaxis)
...
3
PDEs are typically solved by one of the three established methods:
Finite element method (FEM)
Finite difference method (FDM)
Finite volume method (FVM)
Applying these methods to an equation leads to

A large linear or nonlinear system of equations: realistic, three-
dimensional problems often have hundreds of thousands or
millions of equations
A huge body of work exists on how to solve these resulting
systems efficiently (e.g. by iterative linear solvers, multigrid, ...)
An equally large body of work exists on the analysis of such
methods (e.g. for convergence, stability, ...)
A major development of the last 15 years are error estimates
In other words, the numerical solution

of PDEs is a mature field.
4
Why optimization?
Models (e.g. PDEs) describe how a system behaves if external
forcing factors are known and if the characteristics (e.g. material
makeup, material properties) are known.
In other words: by solving a known model we can reproduce how a
system would react.
On the other hand, this is rarely what we are interested in:

We may wish to optimize certain parameters of a model to obtain a
more desirable outcome (e.g. “shape optimization”, “optimal
control”, ...)
We may wish to determine unknown parameters in a model by
comparing the predicted reaction with actual measurements
(“parameter estimation”, “inverse problems”)

5
Why optimization?
Optimization also is a mature field:
Many methods are available to deal with the many, often very
complicated, aspects of real problems (strong nonlinearities,
equality and inequality constraints, ...)
A large body of work exists on the analysis of these methods

(convergence, stability)
Many methods are tailored to the efficient solution of problems with

many unknowns and many constraints
A huge amount of experience exists on applying these methods to

realistic problems in sciences and engineering

6
PDE-constrained optimization: Examples
Elastic/inelastic deformation:
Models the behavior of bodies subject to forces
Goals of optimization: minimize production cost, maximize
strength, minimize maximal stress under load
Goals of inverse problem: find material parameters
Forward model for 3d elasticity can easily have 1-10M unknowns but
is a simple elliptic (possibly degenerate) problem.

7
Flow simulations:
Models fluid or air flow around a body
Optimization: maximize lift-to-drag ratio, maximize fuel efficiency,
minimize production cost, optimize flight trajectory, optimize safety,
extend operable flight regime
Inverse problem: identify effective parameters in reduced models
Credit: Charbel Farhat
Nonlinear forward model for 3d can easily have 10-100M unknowns

and has nasty properties. It is also time dependent.
8
Reactive flow simulations:
Models flow of liquids or gases that react with each other
Optimization: maximize throughput of chemical reactor, minimize
harmful byproducts, optimize yield, ...
Inverse problem: identify reaction parameters that can't be
determined directly
Nonlinear forward model for 3d can easily have 100M unknowns

and has nasty properties. It is also time dependent.
9
Biomedical imaging:
Model describes propagation of radiation in bodies
Inverse problem: to learn about the interior of the body, i.e. to find
internal material parameters that hopefully represent pathologic
structure
(X ray) (Ultrasound) (MRI) (PET)

Linear forward models for 3d often have 100k to 1M unknowns.
Forward problem very stable (often of diffusive type), but this makes
the inverse problem ill-posed.

10
So what's the problem – both PDE solution and optimization are

mature fields (or so you say)!
From the PDE side:

Many of the PDE solvers use special features of the equations
under consideration, but optimization problems don't have them
Optimization problems are almost always indefinite and sometimes
ill-conditioned, making the analysis much more complicated
Approaches to error estimation and multigrid are not available for
these non-standard problems
There is very little experience (in theory and practice) with
inequalities in PDEs
In other words, for PDE guys pretty much everything is new!
11
So what's the problem – both PDE solution and optimization are

mature fields (or so you say)!
From the optimization side:

Discretized PDEs are huge problems: 100,000s or millions of
unknowns
Linear systems are typically very badly conditioned
Model can rarely be solved to high accuracy and doesn't allow for
internal differentiation
Maybe most importantly, unknowns are “artificial” since they result
from somewhat arbitrary discretization

In other words, for optimization

guys pretty much
everything is new as well!
12
In the language of PDEs, let
u be the state variable(s)
q be the controls (either a set of numbers or functions themselves)
f be given external forces
Then PDEs are typically of the form
Au = f Bq or Aqu= f
where A,B are in general nonlinear partial differential operators.

For example, using the Laplace equation:
−u = f q or −∇⋅q ∇ u= f
This equation has to hold everywhere, i.e. at all infinitely many points
in the domain!

13
Instead of requiring a PDE to hold at every point, it is typically posed
in weak form. For example, instead of
−u = f q
we would require that
∫ ∇ u  x ⋅∇ v  x  dx = ∫ [ f  x q  x ] v  x  dx
for every test function v , or shorter
∇ u , ∇ v  =  f q , v
The general problem can then be written with semilinear forms as

either
Au , v  =  f , vBq , v  ∀v
or
Aq ;u , v =  f , v  ∀v

14
Objective functionals often have the form (linear objective functional)

J u , q = ∫ j  x u x  dx ∥q∥2
2
or (quadratic objective functional)
1 2  2
J u , q =
2
∫
[ j  x u x −z  x ] dx
2
∥q∥
For example:
Euler flow of fluids: calculation of lift force as a function of shape
parameters q
J u = ∫∂ [n  x ⋅e z ]u p  x  dx u p : pressure, n : normal vector
Parameter estimation: minimization of misfit between predicted

and actual measurements
2  2
J u , q = ∫∂ [u x −z  x ] dx
2
∥q∥ u : light intensity
As a rule of thumb, objective functionals

for PDEs are fairly simple.
15
The optimization problem is then written as
minu , q J u , q
such that Aq ,u , v =  f , v  ∀v
Sometimes, bound constraints on q are added.
A Lagrangian based on functions (not vectors) then reads:
Lu , q , = J u , q Aq , u ,− f ,
and the optimality conditions is then a set of three coupled partial

differential equations:
Lu u , q , v = J u u , qv  Au q ,u , v = 0 ∀v,
L q u , q , = J q u , q A q q , u , = 0 ∀  ,
L u , q , v = Aq ,u , v− f , v  = 0 ∀ v.

16
Example (a QP of optimal control type):
1 2  2
minu , q
2
∫
u−z  dx
2
∫
q dx
such that −u = f q
Then the Lagrangian is
1 2  2
Lu , q , =
2
∫
u−z dx
2
∫
q dx∇ u , ∇ − f q , 
and the optimality conditions read:
Lu u , q , v = u−z , v∇  , ∇ v  = 0 ∀v,

L q u , q , = q , − ,  = 0 ∀,
L u , q , v = ∇ u , ∇ v− f q , v = 0 ∀ v.

17
Questions about the optimality system
The optimality conditions form a system of coupled PDEs:
u−z − = 0,
q− = 0,
−u− f q = 0
Even for this simple problem of linear PDEs, there are a number of
questions:
Do Lagrange multipliers exist?
Does this system have a solution at all?
If so, can we solve it analytically?
If not, can we at least solve it approximately on a computer?

Does an approximate system admit Lagrange multipliers?
Does the approximate system have a stable solution?

18
Discretization
General idea of discretization:
Subdivide space (and time) into little pieces: discretization
Derive equations that a numerical approximation (not the exact
solution) has to satisfy on each little piece
Through coupling between pieces obtain one large linear or
nonlinear system of equations
Solve it in an efficient way

19
Discretization
In the finite element method, one replaces the solution u by the
ansatz
uh  x  = ∑i U i i  x 
Clearly, a variational statement like
∇ uh , ∇ v =  f , v ∀v
can then no longer hold since it is unlikely that
−u h = f
However, to determine the N coefficients Ui, we can consider the

following N moments of the equation:
∇ uh , ∇ i  =  f ,i  ∀ i=1... N

20
Discretization
Using the expansion
uh  x  = ∑i U i i  x 
in
∇ uh , ∇ i  =  f ,i  ∀ i=1... N
yields the linear system

AU = F
where
Aij = ∇ i , ∇  j 
F i =  f ,i 
From this the expansion coefficients of the approximation uh can be

determined.

21
Discretization
A similar approach applied to the optimality conditions
Lu u , q , v = u−z , v∇  , ∇ v  = 0 ∀v,
L q u , q , = q , − ,  = 0 ∀,
L u , q , v = ∇ u , ∇ v− f q , v = 0 ∀ v.
yields the variational statement

uh ,i ∇  h , ∇ i  =  z ,i  ∀ i=1... N ,
q h ,i −h , i  = 0 ∀ i=1... N ,
∇ uh , ∇ i −qh , i  =  f ,i  ∀ i=1... N
that then gives rise to the following linear system:
 M 0 AT U
T
   
Z
0 R C Q = 0
A C 0  F

22
Solvers
What to do with
Problems:
 T
   
M 0 AT U Z
0 R C Q = 0
A C 0  F
The system is large: if we approximate each variable with 1M

unknowns, then the matrix is 3M x 3M. Gaussian elimination or LU
decompositions won't work
Most standard optimization software fails with systems that large
The matrix A from discretizing the Laplace operator is typically ill-
conditioned (condition number > 1e6-1e8)
The condition number of the entire system is often even worse:
>1e10-1e12, so iterative solvers won't readily work either
The system is indefinite, so things like multigrid, AMG, ... don't
work
Of the blocks, typically only A is invertible
23
Solvers
What to do with
 M 0 AT U
T
Z
0 R C Q = 0
A C 0  F   
Answers:
From years of experience in PDEs, we have very good solvers for
the forward problem, i.e. for A, e.g. multigrid, CG, ...
We should try to reduce the matrix to a form that mostly requires
us to solve forward problems rather than the awkward form above
Do block elimination (= form Schur complement = form projected
matrix):
T −T −1 −T −1
[ RC A M A C ]Q = C A  Z −M A F 
A U = F −CQ
AT  = Z −MU

24
Solvers
What to do with
T −T −1 −T −1
[ RC A M A C ]Q = C A  Z −M A F 
A U = F −CQ
AT  = Z −MU
The second and third equations only need solves with A. We

know how to do this
The Schur complement
T −T −1
S =  RC A M A C
is symmetric and positive definite
It is also a much smaller problem, being only the size of the
controls
Apply Conjugate Gradient (CG) to the Schur complement
equation!

25
Solvers
Applying CG to
−T −1
S Q = C A Z −M A F 
T −T −1
S =  RC A M A C
Building up S is not usually an option

Every CG iteration requires one multiplication of a vector with S
Every multiplication with S requires one forward and one adjoint
solve
S is still an ill-conditioned matrix, so many iterations may be
necessary (sometimes 1000s)
Much research goes into preconditioning S

26
Challenges
Consider solving
T −T −1 −T −1
[ RC A M A C ]Q = C A  Z −M A F 
A U = F −CQ
AT  = Z −MU
This requires
2 solves for the right hand side
(2*CG iterations) solves to invert the Schur complement
2 solves for the state and adjoint equation
All this times the number of Newton iterations for nonlinear

problems

For a nonlinear problem with a few hundred controls, we often
have to do 1000s to 10,000s of solves with A !
27
Alternatives
We could also attempt to directly solve
 M 0 AT U
T
   
Z
0 R C Q = 0
A C 0  F
System is indefinite, so only GMRes, SymmLQ or QMR might work
System is very badly conditioned so we need to expect many
iterations unless we have good preconditioners
Could precondition with inexact solves with the Schur complement
 
−1
 0 1 0 S C T A−T 0 C T A−T MA−1

0 0
1 0 1 C A 0 0 0 1
T
0 0 1 M A 0 1 0 0
The basic problem remains that we need to do many

forward/adjoint solves

28
Alternatives
Other alternatives include
Preconditioning with accurate solves with an approximation
−1
 
0 1 0 S
 
T −T T −T −1
0 0 C A 0 C A MA
1 0 1 C A 0 0 0 1
0 0 1 M AT 0 1 0 0
where the approximation of S is based on forward preconditioners:
T −T −1
S =  RC A M A C
Preconditioning with LM-BFGS updates of the inverse of S
Direct multigrid on the KKT system
Multigrid on the Schur complement
...

But: Nobody really knows how

to do all this efficiently!
29
The basic problem in PDE optimization
For a nonlinear problem with a few hundred controls, we often
have to do 1000s to 10,000s of solves with A !
For complicated 3d models with a few 100,000 or million unknowns,

every forward solve can easily cost minutes, bringing the total
compute time into hours/days/weeks.
This gets even worse if we have time-dependent problems.
And all this even though we have a fairly trivial optimization problem:
Convex objective function (but possibly nonlinear constraints)
No state constraints (though possibly control constraints of bounds
type)
No complicated other constraints.

30
Summary and outlook
To date, PDE-constrained optimization problems are fairly trivial
but huge from an optimization perspective, but moderately large
and very complex from a PDE perspective
Even solving the most simple problems is considered frontier

research
Because efficient linear solvers for the saddle point problems like
the ones originating from optimization are virtually unknown, one
tries to go back to forward solvers through the Schur complement
Inclusion of bounds on controls allows to keep this structure
Inclusion of state constraints would yield a variational inequality

that requires different techniques and for which we don't have
solvers yet
Multiple experiment parameter estimation problems can also make

the computational complexity very
large
31
Summary and outlook
PDE constrained optimization has not seen anything complex yet:
No optimal experimental design
No optimization under uncertainty
No optimization for stability or worst case behavior
Not even simple optimization for complex models like turbulent

flow
PDE constrained optimization is not without a reason a field

with a huge amount of activity at present!
32

An Introduction To PDE-constrained Optimization: Wolfgang Bangerth Department of Mathematics Texas A&M University

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

An Introduction To PDE-constrained Optimization: Wolfgang Bangerth Department of Mathematics Texas A&M University

Загружено:

Авторское право:

Доступные форматы

An introduction to

Formulation of PDE-constrained problems

Elastic and inelastic deformation of bodies, for example

Quantum mechanics and quantum field theory

Applying these methods to an equation leads to

In other words, the numerical solution

On the other hand, this is rarely what we are interested in:

A large body of work exists on the analysis of these methods

Many methods are tailored to the efficient solution of problems with

A huge amount of experience exists on applying these methods to

Credit: Charbel Farhat

Nonlinear forward model for 3d can easily have 10-100M unknowns

Nonlinear forward model for 3d can easily have 100M unknowns

(X ray) (Ultrasound) (MRI) (PET)

So what's the problem – both PDE solution and optimization are

From the PDE side:

So what's the problem – both PDE solution and optimization are

From the optimization side:

where A,B are in general nonlinear partial differential operators.

The general problem can then be written with semilinear forms as

Parameter estimation: minimization of misfit between predicted

As a rule of thumb, objective functionals

A Lagrangian based on functions (not vectors) then reads:

Lu , q , = J u , q Aq , u ,− f ,

and the optimality conditions is then a set of three coupled partial

and the optimality conditions read:

Lu u , q , v = u−z , v∇  , ∇ v  = 0 ∀v,

If not, can we at least solve it approximately on a computer?

However, to determine the N coefficients Ui, we can consider the

yields the linear system

From this the expansion coefficients of the approximation uh can be

yields the variational statement

The system is large: if we approximate each variable with 1M

The second and third equations only need solves with A. We

Building up S is not usually an option

All this times the number of Newton iterations for nonlinear

 0 1 0 S C T A−T 0 C T A−T MA−1

The basic problem remains that we need to do many

For complicated 3d models with a few 100,000 or million unknowns,

Even solving the most simple problems is considered frontier

Inclusion of bounds on controls allows to keep this structure

Inclusion of state constraints would yield a variational inequality

Multiple experiment parameter estimation problems can also make

No optimal experimental design

No optimization under uncertainty

No optimization for stability or worst case behavior

Not even simple optimization for complex models like turbulent

PDE constrained optimization is not without a reason a field

Вам также может понравиться