Вы находитесь на странице: 1из 32

An introduction to

PDE-constrained optimization

Wolfgang Bangerth

Department of Mathematics
Texas A&M University

   

1
Overview
Why partial differential equations?
Why optimization?
Examples of PDE optimization
Why is this hard?

Formulation of PDE-constrained problems


Discretization
Solvers
Summary and outlook

   

2
Why partial differential equations?
Partial differential equations describe almost every aspect of
physics, chemistry, and engineering that can be put into a continuum
framework:

Elastic and inelastic deformation of bodies, for example


bridges under load or cars upon impact during accidents
Flow around a car, an air foil, or a space ship
Reactive turbulent flow inside a combustion engine
Reactive slow flow inside chemical reactors
Electromagnetic waves

Quantum mechanics and quantum field theory


Light or X-ray intensities in biomedical imaging
Behavior of bacteria in response to chemical substances
(chemotaxis)
  ...  

3
Why partial differential equations?
PDEs are typically solved by one of the three established methods:
Finite element method (FEM)
Finite difference method (FDM)
Finite volume method (FVM)

Applying these methods to an equation leads to


A large linear or nonlinear system of equations: realistic, three-
dimensional problems often have hundreds of thousands or
millions of equations
A huge body of work exists on how to solve these resulting
systems efficiently (e.g. by iterative linear solvers, multigrid, ...)
An equally large body of work exists on the analysis of such
methods (e.g. for convergence, stability, ...)
A major development of the last 15 years are error estimates

  In other words, the numerical solution


  of PDEs is a mature field.
4
Why optimization?
Models (e.g. PDEs) describe how a system behaves if external
forcing factors are known and if the characteristics (e.g. material
makeup, material properties) are known.
In other words: by solving a known model we can reproduce how a
system would react.

On the other hand, this is rarely what we are interested in:


We may wish to optimize certain parameters of a model to obtain a
more desirable outcome (e.g. “shape optimization”, “optimal
control”, ...)
We may wish to determine unknown parameters in a model by
comparing the predicted reaction with actual measurements
(“parameter estimation”, “inverse problems”)

   

5
Why optimization?
Optimization also is a mature field:

Many methods are available to deal with the many, often very
complicated, aspects of real problems (strong nonlinearities,
equality and inequality constraints, ...)

A large body of work exists on the analysis of these methods


(convergence, stability)

Many methods are tailored to the efficient solution of problems with


many unknowns and many constraints

A huge amount of experience exists on applying these methods to


realistic problems in sciences and engineering

   

6
PDE-constrained optimization: Examples
Elastic/inelastic deformation:
Models the behavior of bodies subject to forces
Goals of optimization: minimize production cost, maximize
strength, minimize maximal stress under load
Goals of inverse problem: find material parameters

Forward model for 3d elasticity can easily have 1-10M unknowns but
is a simple elliptic (possibly degenerate) problem.

   

7
PDE-constrained optimization: Examples
Flow simulations:
Models fluid or air flow around a body
Optimization: maximize lift-to-drag ratio, maximize fuel efficiency,
minimize production cost, optimize flight trajectory, optimize safety,
extend operable flight regime
Inverse problem: identify effective parameters in reduced models

Credit: Charbel Farhat

Nonlinear forward model for 3d can easily have 10-100M unknowns


 
and has nasty properties. It is also  time dependent.
8
PDE-constrained optimization: Examples
Reactive flow simulations:
Models flow of liquids or gases that react with each other
Optimization: maximize throughput of chemical reactor, minimize
harmful byproducts, optimize yield, ...
Inverse problem: identify reaction parameters that can't be
determined directly

Nonlinear forward model for 3d can easily have 100M unknowns


 
and has nasty properties. It is also  time dependent.
9
PDE-constrained optimization: Examples
Biomedical imaging:
Model describes propagation of radiation in bodies
Inverse problem: to learn about the interior of the body, i.e. to find
internal material parameters that hopefully represent pathologic
structure

(X ray) (Ultrasound) (MRI) (PET)


Linear forward models for 3d often have 100k to 1M unknowns.
Forward problem very stable (often of diffusive type), but this makes
the inverse problem ill-posed.
   

10
PDE-constrained optimization

So what's the problem – both PDE solution and optimization are


mature fields (or so you say)!

From the PDE side:


Many of the PDE solvers use special features of the equations
under consideration, but optimization problems don't have them
Optimization problems are almost always indefinite and sometimes
ill-conditioned, making the analysis much more complicated
Approaches to error estimation and multigrid are not available for
these non-standard problems
There is very little experience (in theory and practice) with
inequalities in PDEs
  In other words, for PDE guys  pretty much everything is new!
11
PDE-constrained optimization

So what's the problem – both PDE solution and optimization are


mature fields (or so you say)!

From the optimization side:


Discretized PDEs are huge problems: 100,000s or millions of
unknowns
Linear systems are typically very badly conditioned
Model can rarely be solved to high accuracy and doesn't allow for
internal differentiation
Maybe most importantly, unknowns are “artificial” since they result
from somewhat arbitrary discretization

 
In other words, for optimization
 
guys pretty much
everything is new as well!
12
Formulation of PDE-constrained problems
In the language of PDEs, let
u be the state variable(s)
q be the controls (either a set of numbers or functions themselves)
f be given external forces
Then PDEs are typically of the form
Au = f Bq or Aqu= f

where A,B are in general nonlinear partial differential operators.


For example, using the Laplace equation:
−u = f q or −∇⋅q ∇ u= f
This equation has to hold everywhere, i.e. at all infinitely many points
in the domain!

   

13
Formulation of PDE-constrained problems
Instead of requiring a PDE to hold at every point, it is typically posed
in weak form. For example, instead of
−u = f q
we would require that
∫ ∇ u  x ⋅∇ v  x  dx = ∫ [ f  x q  x ] v  x  dx
for every test function v , or shorter

∇ u , ∇ v  =  f q , v

The general problem can then be written with semilinear forms as


either
Au , v  =  f , vBq , v  ∀v
or
Aq ;u , v =  f , v  ∀v
   

14
Formulation of PDE-constrained problems
Objective functionals often have the form (linear objective functional)

J u , q = ∫ j  x u x  dx ∥q∥2
2
or (quadratic objective functional)
1 2  2
J u , q =
2
∫
[ j  x u x −z  x ] dx
2
∥q∥

For example:
Euler flow of fluids: calculation of lift force as a function of shape
parameters q
J u = ∫∂ [n  x ⋅e z ]u p  x  dx u p : pressure, n : normal vector

Parameter estimation: minimization of misfit between predicted


and actual measurements
2  2
J u , q = ∫∂ [u x −z  x ] dx
2
∥q∥ u : light intensity

  As a rule of thumb, objective functionals


  for PDEs are fairly simple.
15
Formulation of PDE-constrained problems
The optimization problem is then written as
minu , q J u , q
such that Aq ,u , v =  f , v  ∀v
Sometimes, bound constraints on q are added.

A Lagrangian based on functions (not vectors) then reads:

Lu , q , = J u , q Aq , u ,− f ,

and the optimality conditions is then a set of three coupled partial


differential equations:
Lu u , q , v = J u u , qv  Au q ,u , v = 0 ∀v,
L q u , q , = J q u , q A q q , u , = 0 ∀  ,
L u , q , v = Aq ,u , v− f , v  = 0 ∀ v.

   

16
Formulation of PDE-constrained problems
Example (a QP of optimal control type):
1 2  2
minu , q
2
∫
u−z  dx
2
∫
q dx
such that −u = f q
Then the Lagrangian is
1 2  2
Lu , q , =
2
∫
u−z dx
2
∫
q dx∇ u , ∇ − f q , 

and the optimality conditions read:

Lu u , q , v = u−z , v∇  , ∇ v  = 0 ∀v,


L q u , q , = q , − ,  = 0 ∀,
L u , q , v = ∇ u , ∇ v− f q , v = 0 ∀ v.

   

17
Questions about the optimality system
The optimality conditions form a system of coupled PDEs:

u−z − = 0,
q− = 0,
−u− f q = 0

Even for this simple problem of linear PDEs, there are a number of
questions:
Do Lagrange multipliers exist?
Does this system have a solution at all?
If so, can we solve it analytically?

If not, can we at least solve it approximately on a computer?


Does an approximate system admit Lagrange multipliers?
Does the approximate system have a stable solution?
   

18
Discretization
General idea of discretization:
Subdivide space (and time) into little pieces: discretization
Derive equations that a numerical approximation (not the exact
solution) has to satisfy on each little piece
Through coupling between pieces obtain one large linear or
nonlinear system of equations
Solve it in an efficient way

   
Credit: Charbel Farhat
19
Discretization
In the finite element method, one replaces the solution u by the
ansatz
uh  x  = ∑i U i i  x 
Clearly, a variational statement like
∇ uh , ∇ v =  f , v ∀v
can then no longer hold since it is unlikely that
−u h = f

However, to determine the N coefficients Ui, we can consider the


following N moments of the equation:

∇ uh , ∇ i  =  f ,i  ∀ i=1... N

   

20
Discretization
Using the expansion
uh  x  = ∑i U i i  x 
in
∇ uh , ∇ i  =  f ,i  ∀ i=1... N

yields the linear system


AU = F
where
Aij = ∇ i , ∇  j 
F i =  f ,i 

From this the expansion coefficients of the approximation uh can be


determined.

   

21
Discretization
A similar approach applied to the optimality conditions
Lu u , q , v = u−z , v∇  , ∇ v  = 0 ∀v,
L q u , q , = q , − ,  = 0 ∀,
L u , q , v = ∇ u , ∇ v− f q , v = 0 ∀ v.

yields the variational statement


uh ,i ∇  h , ∇ i  =  z ,i  ∀ i=1... N ,
q h ,i −h , i  = 0 ∀ i=1... N ,
∇ uh , ∇ i −qh , i  =  f ,i  ∀ i=1... N
that then gives rise to the following linear system:

 M 0 AT U
T

   
Z
0 R C Q = 0
A C 0  F

   

22
Solvers
What to do with

Problems:
 T

   
M 0 AT U Z
0 R C Q = 0
A C 0  F

The system is large: if we approximate each variable with 1M


unknowns, then the matrix is 3M x 3M. Gaussian elimination or LU
decompositions won't work
Most standard optimization software fails with systems that large
The matrix A from discretizing the Laplace operator is typically ill-
conditioned (condition number > 1e6-1e8)
The condition number of the entire system is often even worse:
>1e10-1e12, so iterative solvers won't readily work either
The system is indefinite, so things like multigrid, AMG, ... don't
work
  Of the blocks, typically only A is  invertible

23
Solvers
What to do with

 M 0 AT U
T
Z
0 R C Q = 0
A C 0  F   
Answers:
From years of experience in PDEs, we have very good solvers for
the forward problem, i.e. for A, e.g. multigrid, CG, ...
We should try to reduce the matrix to a form that mostly requires
us to solve forward problems rather than the awkward form above
Do block elimination (= form Schur complement = form projected
matrix):
T −T −1 −T −1
[ RC A M A C ]Q = C A  Z −M A F 
A U = F −CQ
AT  = Z −MU
   

24
Solvers
What to do with
T −T −1 −T −1
[ RC A M A C ]Q = C A  Z −M A F 
A U = F −CQ
AT  = Z −MU

The second and third equations only need solves with A. We


know how to do this
The Schur complement
T −T −1
S =  RC A M A C
is symmetric and positive definite
It is also a much smaller problem, being only the size of the
controls
Apply Conjugate Gradient (CG) to the Schur complement
equation!
   

25
Solvers
Applying CG to
−T −1
S Q = C A Z −M A F 

T −T −1
S =  RC A M A C

Building up S is not usually an option


Every CG iteration requires one multiplication of a vector with S
Every multiplication with S requires one forward and one adjoint
solve
S is still an ill-conditioned matrix, so many iterations may be
necessary (sometimes 1000s)
Much research goes into preconditioning S
   

26
Challenges
Consider solving
T −T −1 −T −1
[ RC A M A C ]Q = C A  Z −M A F 
A U = F −CQ
AT  = Z −MU
This requires
2 solves for the right hand side
(2*CG iterations) solves to invert the Schur complement
2 solves for the state and adjoint equation

All this times the number of Newton iterations for nonlinear


problems

 
For a nonlinear problem with a  few hundred controls, we often
have to do 1000s to 10,000s of solves with A !
27
Alternatives
We could also attempt to directly solve

 M 0 AT U
T

   
Z
0 R C Q = 0
A C 0  F
System is indefinite, so only GMRes, SymmLQ or QMR might work
System is very badly conditioned so we need to expect many
iterations unless we have good preconditioners
Could precondition with inexact solves with the Schur complement

 
−1

 0 1 0 S C T A−T 0 C T A−T MA−1


0 0
1 0 1 C A 0 0 0 1
T
0 0 1 M A 0 1 0 0

The basic problem remains that we need to do many


forward/adjoint solves
   

28
Alternatives
Other alternatives include
Preconditioning with accurate solves with an approximation
−1

 
0 1 0 S
 
T −T T −T −1
0 0 C A 0 C A MA
1 0 1 C A 0 0 0 1
0 0 1 M AT 0 1 0 0
where the approximation of S is based on forward preconditioners:
T −T −1
S =  RC A M A C
Preconditioning with LM-BFGS updates of the inverse of S
Direct multigrid on the KKT system
Multigrid on the Schur complement
...

 
But: Nobody really knows how
 
to do all this efficiently!
29
The basic problem in PDE optimization
For a nonlinear problem with a few hundred controls, we often
have to do 1000s to 10,000s of solves with A !

For complicated 3d models with a few 100,000 or million unknowns,


every forward solve can easily cost minutes, bringing the total
compute time into hours/days/weeks.
This gets even worse if we have time-dependent problems.

And all this even though we have a fairly trivial optimization problem:
Convex objective function (but possibly nonlinear constraints)
No state constraints (though possibly control constraints of bounds
type)
  No complicated other constraints.
 

30
Summary and outlook
To date, PDE-constrained optimization problems are fairly trivial
but huge from an optimization perspective, but moderately large
and very complex from a PDE perspective

Even solving the most simple problems is considered frontier


research

Because efficient linear solvers for the saddle point problems like
the ones originating from optimization are virtually unknown, one
tries to go back to forward solvers through the Schur complement

Inclusion of bounds on controls allows to keep this structure

Inclusion of state constraints would yield a variational inequality


that requires different techniques and for which we don't have
solvers yet

Multiple experiment parameter estimation problems can also make


  the computational complexity very
  large
31
Summary and outlook
PDE constrained optimization has not seen anything complex yet:

No optimal experimental design

No optimization under uncertainty

No optimization for stability or worst case behavior

Not even simple optimization for complex models like turbulent


flow
Credit: Charbel Farhat

PDE constrained optimization is not without a reason a field


 
with a huge amount  of activity at present!
32

Вам также может понравиться