Вы находитесь на странице: 1из 19

5mm.

INF5620: Numerical Methods for Partial


Differential Equations
Hans Petter Langtangen

Nonlinear PDEs
Simula Research Laboratory, and
Dept. of Informatics, Univ. of Oslo

Last update: April 29, 2011

INF5620: Numerical Methods for Partial Differential Equations – p.1/148 Nonlinear PDEs – p.2/148

Examples Nonlinear discrete equations; FDM

Some nonlinear model problems to be treated next: Finite differences for −u′′ = f (u):
′′ 1
−u (x) = f (u), u(0) = uL , u(1) = uR ,
− (ui−1 − 2ui + ui+1 ) = f (ui )
−(α(u)u′ )′ = 0, u(0) = uL , u(1) = uR h2
∂u ⇒ nonlinear system of algebraic equations
−∇ · [α(u)∇u] = g(x), with u or − α B.C.
∂n
F (u) = 0, or Au = b(u), u = (u0 , . . . , uN )T
Discretization methods:
standard finite difference methods Finite differences for (α(u)u′ )′ = 0:
standard finite element methods
the group finite element method 1
(α(ui+1/2 )(ui+1 − ui ) − α(ui−1/2 )(ui − ui−1 )) = 0
h2
We get nonlinear algebraic equations
Solution method: iterate over linear equations 1
([α(ui+1 ) + α(ui )](ui+1 − ui ) − [α(ui ) + α(ui−1 )](ui − ui−1 )) = 0
2h2
⇒ nonlinear system of algebraic equations

Nonlinear PDEs – p.3/148


F (u) = 0 or A(u)u = b Nonlinear PDEs – p.4/148

Nonlinear discrete equations; FEM Nonlinearities in the FEM

Finite elements for −u′′ = f (u) with u(0) = u(1) = 0 Note that X
P f( ϕk (x)uk )
Galerkin approach: find u = nk=1 uk ϕk (x) ∈ V such that
k
Z 1 Z 1
is a complicated function of u0 , . . . , uN
u′ v ′ dx = f (u)vdx ∀v ∈ V
0 0 F.ex.: f (u) = u2
Z !2
Left-hand side is easy to assemble: v = ϕi 1 X
ϕk uk ϕi dx
Z 1 X
0 k
1
− (ui−1 − 2ui + ui+1 ) = f( uk ϕk (x))ϕi dx gives rise to a difference representation
h 0 k

P P h 
We write u = k uk ϕk instead of u = k ck ϕk since u2 + 2ui (ui−1 + ui+1 ) + 6u2i + u2i+1
12 i−1
ck = u(xk ) = uk and uk is used in the finite difference schemes
(compare with f (ui ) = u2i in FDM!)
R
Must use numerical integration in general to evaluate f (u)ϕi dx

Nonlinear PDEs – p.5/148 Nonlinear PDEs – p.6/148

The group finite element method FEM for a nonlinear coefficient

The group finite element method: We now look at

X n
X (α(u)u′ )′ = 0, u(0) = uL , u(1) = uR
f (u) = f ( uj ϕj (x)) ≈ f (uj )ϕj
j j=1 Using a finite element method (fine exercise!) results in an integral
R1 R1P Z
Resulting term: f (u)ϕi dx = j ϕi ϕj f (uj ) gives 1 X
0 0
α( uk ϕk )ϕ′i ϕ′j dx
0
X Z 1  k
ϕk ϕi dx f (uj )
j 0 ⇒ complicated to integrate by hand or symbolically
Linear P1 elements and trapezoidal rule (do it!):
This integral andPformulation also arise from approxmating some
function by u = j uj ϕj 1 1
(α(ui ) + α(ui+1 ))(ui+1 − ui ) − (α(ui−1 ) + α(ui ))(ui − ui−1 ) = 0
2 2
We can write the term as M f (u), where M has rows consisting
of h/6(1, 4, 1), and row i in M f (u) becomes ⇒ same as FDM with arithmetic mean for α(ui+1/2 )

h
(f (ui−1 ) + 4f (ui ) + f (ui+1 ))
6 Nonlinear PDEs – p.7/148 Nonlinear PDEs – p.8/148
Nonlinear algebraic equations Solving nonlinear algebraic eqs.

FEM/FDM for nonlinear PDEs gives Have


nonlinear algebraic equations: A(u)u − b = 0, Au − b(u) = 0, F (u) = 0
(α(u)u′ )′ = 0 ⇒ A(u)u = b
Idea: solve nonlinear problem as a sequence of linear
−u′′ = f (u) ⇒ Au = b(u) subproblems
In general a nonlinear PDE gives Must perform some kind of linearization

F (u) = 0 Iterative method: guess u0 , solve linear problems for u1 , u2 , . . .


and hope that
or lim uq = u
q→∞

F0 (u0 , . . . , uN ) = 0 i.e. the iteration converges


...
FN (u0 , . . . , uN ) = 0

Nonlinear PDEs – p.9/148 Nonlinear PDEs – p.10/148

Picard iteration (1) Picard iteration (2)

Model problem: A(u)u = b Model problem: Au = b(u)


Simple iteration scheme: Simple iteration scheme:
q q+1
A(u )u = b, q = 0, 1, . . . Auq+1 = b(uq ), q = 0, 1, . . .

0 Relaxation:
Must provide (good) guess u
Termination:
Au∗ = b(uq ), uq+1 = ωu∗ + (1 − ω)uq
||uq+1 − uq || ≤ ǫu
or using the residual (expensive, req. new A(uq+1 )!) (may improve convergence, avoids too large steps)
This method is also called Successive substitutions
||b − A(uq+1 )uq+1 || ≤ ǫr

Relative criteria:
||uq+1 − uq || ≤ ǫu ||uq ||
or (more expensive)

||b − A(uq+1 )uq+1 || ≤ ǫr ||b − A(u0 )u0 || Nonlinear PDEs – p.11/148 Nonlinear PDEs – p.12/148

Newton’s method for scalar equations Newton’s method for systems of equations

The Newton method for f (x) = 0, x ∈ IR Systems of nonlinear equations:


Given an approximation xq
F (u) = 0, F (u) ≈ M (u; uq )
Approximate f by a linear function at xq :
Multi-dimensional Taylor-series expansion:
f (x) ≈ M (x; xq ) = f (xq ) + f ′ (xq )(x − xq )
M (u; uq ) = F (uq ) + J (u − uq ), J ≡ ∇F
Find new xq+1 such that
∂Fi
f (xq ) Ji,j =
M (xq+1 ; xq ) = 0 ⇒ xq+1 = xq − ∂uj
f ′ (xq )
Iteration no. q:
solve linear system J (uq )(δu)q+1 = −F (uq )
update: uq+1 = uq + (δu)q+1
Can use relaxation: uq+1 = uq + ω(δu)q+1

Nonlinear PDEs – p.13/148 Nonlinear PDEs – p.14/148

The Jacobian matrix; FDM (1) The Jacobian matrix; FDM (2)

Model equation: u′′ = −f (u) 1


Fi ≡ (ui−1 − 2ui + ui+1 ) − f (ui ) = 0
Scheme: h2

1 Derivation:
Fi ≡ (ui−1 − 2ui + ui+1 ) + f (ui ) = 0 ∂Fi 1
h2 Ji,i−1 = = 2
∂ui−1 h
Jacobian matrix term (FDM): ∂Fi 1
Ji,i+1 = = 2
∂Fi ∂ui+1 h
Ji,j =
∂uj ∂Fi 2
Ji,i = = − 2 + f ′ (ui )
∂ui h
Fi contains only ui , ui±1
Must form the Jacobian matrix J in each iteration and solve
Only
J δuq+1 = −F (uq )
∂Fi ∂Fi ∂Fi
Ji,i−1 = , Ji,i = , Ji,i+1 = 6= 0
∂ui−1 ∂ui ∂ui+1 and then update
uq+1 = uq + ωδuq+1
⇒ Jacobian is tridiagonal
Nonlinear PDEs – p.15/148 Nonlinear PDEs – p.16/148
The Jacobian matrix; FEM A 2D/3D transient nonlinear PDE (1)

−u′′ = f (u) on Ω = (0, 1) with u(0) = u(1) = 0 and FEM PDE for heat conduction in a solid where the conduction depends
" # on the temperature u:
Z 1 X X
Fi ≡ ϕ′i ϕ′k uk − f ( us ϕs )ϕi dx = 0 ∂u
0 k s
̺C = ∇ · [κ(u)∇u]
∂t
∂Fi
First term of the Jacobian Ji,j = ∂uj :
(f.ex. u = g on the boundary and u = I at t = 0)

Z Z Z 1 Stable Backward Euler FDM in time:


∂ 1 X 1
∂ X ′ ′
ϕ′i ϕ′k uk dx = ϕi ϕk uk dx = ϕ′i ϕ′j dx un − un−1
∂uj 0 0 ∂uj 0 = ∇ · [α(un )∇un ]
k k
∆t
Second term: with α = κ/(̺C)
Z Z P
∂ 1 X 1 X Next step: Galerkin formulation, where un = j unj ϕj is the
− f( us ϕs )ϕi dx = − f ′( us ϕs )ϕj ϕi dx
∂uj 0 s 0 s
unknown and un−1 is just a known function
P
because when u = s us ϕs ,
∂ ′ ∂u ′ ∂
P ′
∂uj f (u) = f (u) ∂uj = f (u) ∂uj s us ϕs = f (u)ϕj Nonlinear PDEs – p.17/148 Nonlinear PDEs – p.18/148

A 2D/3D transient nonlinear PDE (2) A 2D/3D transient nonlinear PDE (3)

FEM gives nonlinear algebraic equations: Picard iteration:


Use “old” un,q in α(un ) term, solve linear problem for un,q+1 ,
Fi (un0 , . . . , unN ) = 0, i = 0, . . . , N q = 0, 1, . . .
Z
where
Ai,j = (ϕi ϕj + ∆tα(un,q )∇ϕi · ∇ϕj ) dΩ
Z Ω
 n n−1
 n n

Fi ≡ u −u v + ∆tα(u )∇u · ∇v dΩ Z

bi = un−1 ϕi dΩ

Newton’s method: need Jacobian,

∂Fi
Ji,j =
∂unj
Z
Ji,j = (ϕi ϕj + ∆t(α′ (un,q )ϕj ∇un,q · ∇ϕi + α(un,q )∇ϕi · ∇ϕj )) dΩ

Nonlinear PDEs – p.19/148 Nonlinear PDEs – p.20/148

Iteration methods at the PDE level Continuation methods

Consider −u′′ = f (u) Challenging nonlinear PDE:


Could introduce Picard iteration at the PDE level:
∇ · (||∇u||q ∇u) = 0
2
d q+1
− u = f (uq ), q = 0, 1, . . . For q = 0 this problem is simple
dx2
Idea: solve a sequence of problems, starting with q = 0, and
⇒ linear problem for uq+1 increase q towards a target value
A PDE-level Newton method can also be formulated Sequence of PDEs:
(see the HPL book for details)
We get identical results for our model problem ∇ · (||∇ur ||qr ∇ur ) = 0, r = 0, 1, 2, . . .

Time-dependent problems: first use finite differences in time, then with = 0 < q0 < q1 < q2 < · · · < qm = q
use an iteration method (Picard or Newton) at the time-discrete
PDE level Start guess for ur is ur−1
(the solution of a “simpler” problem)
CFD: The Reynolds number is often the continuation parameter q

Nonlinear PDEs – p.21/148 Nonlinear PDEs – p.22/148

Exercise 1 Exercise 2

Derive the nonlinear algebraic equations for the problem For the problem in Exercise 1, use the group finite element
  method with P1 elements and the Trapezoidal rule for integrals
d du and show that the resulting equations coincide with those
α(u) = 0 on (0, 1), u(0) = 0, u(1) = 1,
dx dx obtained in Exercise 1.

using a finite difference method and the Galerkin method with P1


finite elements and the Trapezoidal rule for approximating
integrals.

Nonlinear PDEs – p.23/148 Nonlinear PDEs – p.24/148


Exercise 3 Exercise 4

For the problem in Exercises 1 and 2, identify the F vector in the Explain why discretization of nonlinear differential equations by
system F = 0 of nonlinear equations. Derive the Jacobian J for a finite difference and finite element methods normally leads to a
general α(u). Then write the expressions for the Jacobian when Jacobian with the same sparsity pattern as one would encounter
α(u) = α0 + α1 u + α2 u2 . in an associated linear problem. Hint: Which unknowns will enter
equation number i?

Nonlinear PDEs – p.25/148 Nonlinear PDEs – p.26/148

Exercise 5 Exercise 6

Show that if F (u) = 0 is a linear system of equations, The operator ∇ · (α∇u), with α = ||∇u||q , q ∈ IR, and || · || being
F = Au − b, for a constant matrix A and vector b, then Newton’s the Eucledian norm, appears in several physical problems,
method (with ω = 1) finds the correct solution in the first iteration. especially flow of non-Newtonian fluids. The quantity ∂α/∂uj is
central when formulating a Newton method, where uj is the
P
coefficient in the finite element approximation u = j uj ϕj . Show
that

||∇u||q = q||∇u||q−2 ∇u · ∇ϕj .
∂uj

Nonlinear PDEs – p.27/148 Nonlinear PDEs – p.28/148

Exercise 7 Exercise 8

Consider the PDE Repeat Exercise 7 for the PDE


∂u
= ∇ · (α(u)∇u) ∂u
∂t ̺(u) = ∇ · (α(u)∇u)
discretized by a Forward Euler difference in time. Explain why this ∂t
nonlinear PDE gives rise to a linear problem (and hence no need
for Newton or Picard iteration) at each time level.
Discretize the PDE by a Backward Euler difference in time and
realize that there is a need for solving nonlinear algebraic
equations. Formulate a Picard iteration method for the spatial
PDE to be solved at each time level. Formulate a Galerkin method
for discretizing the spatial problem at each time level. Choose
some appropriate boundary conditions.
Explain how to incorporate an initial condition.

Nonlinear PDEs – p.29/148 Nonlinear PDEs – p.30/148

Exercise 9 Exercise 10

For the problem in Exercise 8, assume that a nonlinear Newton In Exercise 8, restrict the problem to one space dimension,
cooling law applies at the whole boundary: choose simple boundary conditions like u = 0, use the group finite
element method for all nonlinear coefficients, apply P1 elements,
∂u use the Trapezoidal rule for all integrals, and derive the system of
−α(u) = H(u)(u − uS ),
∂n nonlinear algebraic equations that must be solved at each time
level.
where H(u) is a nonlinear heat transfer coefficient and uS is the
Set up some finite difference method and compare the form of the
temperature of the surroundings (and u is the temperature). Use a
nonlinear algebraic equations.
Backward Euler scheme in time and a Galerkin method in space.
Identify the nonlinear algebraic equations to be solved at each
time level. Derive the corresponding Jacobian.
The PDE problem in this exercise is highly relevant when the
temperature variations are large. Then the density times the heat
capacity (̺), the heat conduction coefficient (α) and the heat
transfer coefficient (H) normally vary with the temperature (u).

Nonlinear PDEs – p.31/148 Nonlinear PDEs – p.32/148


Exercise 11

In Exercise 8, use the Picard iteration method with one iteration at


each time level, and introduce this method at the PDE level.
Realize the similarities with the resulting discretization and that of
the corresponding linear diffusion problem.

Shallow water waves

Nonlinear PDEs – p.33/148 Shallow water waves – p.34/148

Tsunamis Norwegian tsunamis

Waves in fjords, lakes, or oceans, generated by


slide
earthquake Tromsø

70˚
subsea volcano
asteroid Bodø

human activity, like nuclear detonation, or slides generated by oil

EN
ED
drilling, may generate tsunamis

SW
65˚

Propagation over large distances Trondheim

Hardly recognizable in the open ocean, but wave amplitude

AY
increases near shore

RW
Stockholm
Oslo

NO
Bergen

Run-up at the coasts may result in severe damage 60˚

Giant events: Dec 26 2004 (≈ 300000 killed), 1883 (similar to

10˚

15˚
2004), 65 My ago (extinction of the dinosaurs)
Circules: Major incidents, > 10 killed; Triangles: Selected smaller
incidents; Square: Storegga (5000 B.C.)

Shallow water waves – p.35/148 Shallow water waves – p.36/148

Tsunamis in the Pacific Selected events; slides

location year run-up dead


Loen 1905 40m 61
Tafjord 1934 62m 41
Loen 1936 74m 73
Storegga 5000 B.C. 10m(?) ??
Vaiont, Italy 1963 270m 2600
Litua Bay, Alaska 1958 520m 2
Shimabara, Japan 1792 10m(?) 15000

Scenario: earthquake outside Chile, generates tsunami, propagating


at 800 km/h accross the Pacific, run-up on densly populated coasts in
Japan; Shallow water waves – p.37/148 Shallow water waves – p.38/148

Selected events; earthquakes etc. Why simulation?

location year strength run-up dead Increase the understanding of tsunamis


Thera 1640 B.C. volcano ? ? Assist warning systems
Thera 1650 volcano ? ? Assist building of harbor protection (break waters)
Lisboa 1755 M=9 ? 15(?)m ?000 Recognize critical coastal areas (e.g. move population)
Portugal 1969 M=7.9 1m
Hindcast historical tsunamis (assist geologists/biologists)
Amorgos 1956 M=7.4 5(?)m 1
Krakatao 1883 volcano 40 m 36 000
Flores 1992 M=7.5 25 m 1 000
Nicaragua 1992 M=7.2 10 m 168
Sumatra 2004 M=9 50 m 300 000

The selection is biased wrt. European events; 150 catastrophic


tsunami events have been recorded along along the Japanese coast in
modern times.

Tsunamis: no. 5 killer among natural hazards


Shallow water waves – p.39/148 Shallow water waves – p.40/148
Problem sketch Mathematical model

η(x,y,t) PDEs:
z  
y ∂η ∂ ∂ ∂H
= − (uH) − (vH) −
x ∂t ∂x ∂y ∂t
∂u ∂η
= − , x ∈ Ω, t > 0
H(x,y,t) ∂t ∂x
∂v ∂η
= − , x ∈ Ω, t > 0
∂t ∂y

η(x, y, t) : surface elevation


Assume wavelength ≫ depth (long waves) u(x, y, t) and v(x, y, t) : horizontal (depth averaged) velocities
Assume small amplitudes relative to depth H(x, y) : stillwater depth (given)
Appropriate approx. for many ocean wave phenomena
Boundary conditions: either η, u or v given at each point
Initial conditions: all of η, u and v given
Reference: HPL chapter 6.2

Shallow water waves – p.41/148 Shallow water waves – p.42/148

Primary unknowns A global staggered mesh

Discretization: finite differences


Staggered mesh in time and space
⇒ η, u, and v unknown at different points:

ℓ ℓ+ 1 ℓ+ 1
ηi+ 1
,j+ 1 , ui,j+2 1 , vi+ 12 ,j+1
2 2 2 2

ℓ+ 1
vi+ 12 ,j+1
2

Widely used mesh in computational fluid dynamics (CFD)


ℓ+ 12 ℓ+ 21
Important for Navier-Stokes solvers
ui,j+ 1 ℓ
r ui+1,j+ 1
2 ηi+ 1
,j+ 1 2 Basic idea:
2 2
centered differences in time and space

ℓ+ 1
vi+ 12 ,j
2
Shallow water waves – p.43/148 Shallow water waves – p.44/148

Discrete equations; η Discrete equations; u

∂η ∂ ∂ ∂u ∂η 1
= − (uH) − (vH) = − at (i, j + , ℓ)
∂t ∂x ∂y ∂t ∂x 2
1 1 1 ℓ
at (i + , j + , ℓ − ) [Dt u = −Dx η]i,j+ 1
2 2 2 2

1 h ℓ+ 12 ℓ− 21
i 1 h ℓ ℓ
i
ℓ− 1 u 1 − u = − ηi+ 1 ,j+ 1 − ηi− 1 1
[Dt η = −Dx (uH) − Dy (vH)]i+ 12 ,j+ 1 ∆t i,j+ 2 i,j+ 12 ∆x 2 2 2 ,j+ 2
2 2

1 h ℓ ℓ−1
i 1 h ℓ− 12 ℓ− 12
i
η 1 1 − ηi+ 1 1 = − (Hu)i+1,j+ 1 − (Hu) i,j+ 12
∆t i+ 2 ,j+ 2 2 ,j+ 2 ∆x 2

1 h ℓ− 1 ℓ− 1
i
− (Hv)i+ 12 ,j+1 − (Hv)i+ 12 ,j
∆y 2 2

Shallow water waves – p.45/148 Shallow water waves – p.46/148

Discrete equations; v Complicated costline boundary

∂v ∂η 1
= − at (i + , j, ℓ)
∂t ∂y 2

[Dt v = −Dy η]i+ 1
2,j

1 h 1
i 1 h ℓ i
ℓ+ ℓ− 1 ℓ
v 12 − vi+ 12,j = η 1 1 − ηi+ 1 1
∆t i+ 2 ,j 2 ∆y i+ 2 ,j+ 2 2 ,j− 2

Saw-tooth approximation to real boundary


Successful method, widely used
Warning: can lead to nonphysical waves

Shallow water waves – p.47/148 Shallow water waves – p.48/148


Relation to the wave equation Stability and accuracy

Eliminate u and v (easy in the PDEs) and get Centered differences in time and space
2 Truncation error, dispersion analysis: O(∆x2 , ∆y 2 , ∆t2 )
∂ η
= ∇ · [H(x, y)∇η] Stability as for the std. wave equation in 2D:
∂t2
s
Eliminate discrete u and v 1 1
∆t ≤ H − 2 1 1
⇒ Standard 5-point explicit finite difference scheme for discrete η ∆x2 + ∆y 2
(quite some algebra needed, try 1D first)
(CFL condition)
If H const, exact numerical solution is possible for
one-dimensional wave propagation

Shallow water waves – p.49/148 Shallow water waves – p.50/148

Verification of an implementation Tsunami due to a slide

How can we verify that the program works?


Compare with an analytical solution
(if possible)
Check that basic physical mechanisms are reproduced in a
qualitatively correct way by the program

Surface elevation ahead of the slide, dump behind


Initially, negative dump propagates backwards
The surface waves propagate faster than the slide moves
Shallow water waves – p.51/148 Shallow water waves – p.52/148

Tsunami due to faulting Tsunami approaching the shore

p
The velocity of a tsunami is gH(x, y, t).

The sea surface deformation reflect the bottom deformation


The back part of the wave moves at higher speed ⇒ the wave
Velocity of surface waves (H ∼ 5 km): 790 km/h
becomes more peak-formed
Velocity of seismic waves in the bottom: 6000–25000 km/h
Deep water (H ∼ 3 km): wave length 40 km, height 1 m
Shallow water waves – p.53/148 Shallow water waves – p.54/148

Shallow water (H ∼ 10 m): wave length 2 km, height 4 m

Tsunamis experienced from shore

As a fast tide, with strong currents in fjords


A wall of water approaching the beach

Wave breaking: the top has larger effective depth and moves faster
than the front part (requires a nonlinear PDE)
Convection-dominated flow

Shallow water waves – p.55/148 Convection-dominated flow – p.56/148


Typical transport PDE The transport PDE for fluid flow

Transport of a scalar u (heat, pollution, ...) in fluid flow v: The fluid flow itself is governed by Navier-Stokes equations:

∂u ∂v 1
+ v · ∇u = α∇2 u + f + v · v = − ∇p + ν∇2 v + f
∂t ∂t ̺

Convection (change of u due to the flow): v · ∇u ∇·v =0


Diffusion (change of u due to molecular collisions): α∇2 u Important dimensionless number: Reynolds number Re
Common case: convection ≫ diffusion → numerical difficulties
convection |v · ∇v| V 2 /L VL
Important dimensionless number: Peclet number Pe Re = = ∼ =
diffusion |ν∇2 v| αV /L2 ν
|v · ∇u| V U/L VL
Pe = ∼ = Re ≫ 1 and Pe ≫ 1: numerical difficulties
|α∇2 u| αU/L2 α

V : characteristic velocity v, L: characteristic length scale, α:


diffusion constant, U : characteristic size of u

Convection-dominated flow – p.57/148 Convection-dominated flow – p.58/148

A 1D stationary transport problem Notation for difference equations (1)

Assumption: no time, 1D, no source term Define


uni+ 1 ,j,k − uni− 1 ,j,k
2 ′ ′′ ′ ′′ α [Dx u]ni,j,k ≡ 2 2
v · ∇u = α∇ u → vu = αu → u = ǫu , ǫ= h
v
with similar definitions of Dy , Dz , and Dt
Complete model problem:
Another difference:
u′ (x) = ǫu′′ (x), x ∈ (0, 1), u(0) = 0, u(1) = 1 uni+1,j,k − uni−1,j,k
[D2x u]ni,j,k ≡
2h
ǫ small: boundary layer at x = 1
Standard numerics (i.e. centered differences) will fail! Compound difference:
Cure: upwind differences 1 
[Dx Dx u]ni = un − 2uni + uni+1
h2 i−1

Convection-dominated flow – p.59/148 Convection-dominated flow – p.60/148

Notation for difference equations (1) Centered differences

One-sided forward difference:


u′ (x) = ǫu′′ (x), x ∈ (0, 1), u(0) = 0, u(1) = 1
un − uni
[Dx+ u]ni ≡ i+1
h ui+1 − ui−1 ui−1 − 2ui + ui+1
=ǫ , i = 2, . . . , n − 1
2h h2
and the backward difference:
u1 = 0, un = 1
uni − uni−1
[Dx− u]ni ≡ or
h
[D2x u = ǫDx Dx u]i
Put the whole equation inside brackets: Analytical solution:
1 − ex/ǫ
[Dx Dx u = −f ]i u(x) =
1 − e1/ǫ
is a finite difference scheme for u′′ = −f
⇒ u′ (x) > 0, i.e., monotone function

Convection-dominated flow – p.61/148 Convection-dominated flow – p.62/148

Numerical experiments (1) Numerical experiments (2)

n=20, epsilon=0.1 n=20, epsilon=0.01


1.2 1.2
1 centered 1 centered
exact exact
0.8 0.8
0.6 0.6
0.4 0.4
u(x)

u(x)

0.2 0.2
0 0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x x

Convection-dominated flow – p.63/148 Convection-dominated flow – p.64/148


Numerical experiments (3) Numerical experiments (4)

n=80, epsilon=0.01 n=20, epsilon=0.001


1.2 1.2
1 centered 1 centered
exact exact
0.8 0.8
0.6 0.6
0.4 0.4
u(x)

u(x)
0.2 0.2
0 0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x x

Convection-dominated flow – p.65/148 Convection-dominated flow – p.66/148

Numerical experiments; summary Analysis

The solution is not monotone if h > 2ǫ Can find an analytical solution of the discrete problem (!)
The convergence rate is h2 Method: insert ui ∼ β i and solve for β
(expected since both differences are of 2nd order)
provided h ≤ 2ǫ 1 + h/(2ǫ)
β1 = 1, β2 =
Completely wrong qualitative behavior for h ≫ 2ǫ 1 − h/(2ǫ)

cf. HPL app. A.4.4


Complete solution:
ui = C1 β1i + C2 β2i
Determine C1 and C2 from boundary conditions

β2i − β2
ui =
β2n − β2

Convection-dominated flow – p.67/148 Convection-dominated flow – p.68/148

Important result Upwind differences

Observe: ui oscillates if β2 < 0 Problem:

1 + h/(2ǫ) u′ (x) = ǫu′′ (x), x ∈ (0, 1), u(0) = 0, u(1) = 1


⇒ <0 ⇒ h > 2ǫ
1 − h/(2ǫ)
Use a backward difference, called upwind difference, for the u′
Must require h ≤ 2ǫ for ui to have the same qualitative property as term:
u(x)
ui − ui−1 ui−1 − 2ui + ui+1
This explains why we observed oscillations in the numerical =ǫ , i = 2, . . . , n − 1
h h2
solution
u1 = 0, un = 1
The scheme can be written as

[Dx− u = ǫDx Dx u]i

Convection-dominated flow – p.69/148 Convection-dominated flow – p.70/148

Numerical experiments (1) Numerical experiments (2)

n=20, epsilon=0.1 n=20, epsilon=0.01


1.2 1.2
1 upwind 1 upwind
exact exact
0.8 0.8
0.6 0.6
0.4 0.4
u(x)

u(x)

0.2 0.2
0 0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x x

Convection-dominated flow – p.71/148 Convection-dominated flow – p.72/148


Numerical experiments; summary Analysis

The solution is always monotone, i.e., always qualitatively correct Analytical solution of the discrete equations:
The boundary layer is too thick
ui = β i ⇒ β1 = 1, β2 = 1 + h/ǫ
The convergence rate is h
(in agreement with truncation error analysis)
ui = C1 + C2 β2i
Using boundary conditions:

β2i − β2
ui =
β2n − β2

Since β2 > 0 (actually β2 > 1), β2i does not oscillate

Convection-dominated flow – p.73/148 Convection-dominated flow – p.74/148

Centered vs. upwind scheme An interpretation of the upwind scheme

Truncation error: The upwind scheme


centered is more accurate than upwind
ui − ui−1 ui−1 − 2ui + ui+1
Exact analysis: =ǫ
centered is more accurate than upwind when centered is stable h h2
(i.e. monotone ui ), but otherwise useless or
ǫ = 10−6 ⇒ 500 000 grid points to make h ≤ 2ǫ [Dx− u = ǫDx Dx u]i
Upwind gives best reliability, at a cost of a too thick boundary layer can be rewritten as

ui+1 − ui−1 h ui−1 − 2ui + ui+1


= (ǫ + )
2h 2 h2
or
h
[D2x u = (ǫ + )Dx Dx u]i
2
Upwind = centered + artificial diffusion (h/2)

Convection-dominated flow – p.75/148 Convection-dominated flow – p.76/148

Finite elements for the model problem Finite elements and upwind differences

Galerkin formulation of How to construct upwind differences in a finite element context?


′ ′′ One possibility: add artificial diffusion (h/2)
u (x) = ǫu (x), x ∈ (0, 1), u(0) = 0, u(1) = 1
h ′′
and linear (P1) elements leads to a centered scheme (show it!) u′ (x) = (ǫ + )u (x), x ∈ (0, 1), u(0) = 0, u(1) = 1
2
ui+1 − ui−1 ui−1 − 2ui + ui+1
=ǫ , i = 2, . . . , n − 1 Can be solved by a Galerkin method
2h h2
Another, equivalent strategy: use of perturbed weighting functions
u1 = 0, un = 1
or
[D2x u = ǫDx Dx u]i
Stability problems when h > 2ǫ

Convection-dominated flow – p.77/148 Convection-dominated flow – p.78/148

Perturbed weighting functions in 1D Optimal artificial diffusion

Take Try a weighted sum of a centered and an upwind discretization:


wi (x) = ϕi (x) + τ ϕ′i (x)
[u′ ]i ≈ [θDx− u + (1 − θ)D2x u]i , 0≤θ≤1
or alternatively written
[θDx− u + (1 − θ)D2x u = ǫDx Dx u]i
w(x) = v(x) + τ v ′ (x)
Is there an optimal θ?
where v is the standard test function in a Galerkin method
Yes, for

Use this wi or w as test function for the convective term u : h 2ǫ
θ(h/ǫ) = coth −
Z 1 Z 1 Z 1 2ǫ h
u′ wdx = u′ vdx + τ u′ v ′ dx we get exact ui (i.e. u exact at nodal points)
0 0 0
Equivalent artificial diffusion τo = 0.5hθ(h/ǫ)
The new term τ u′ v ′ is the weak formulation of an artificial diffusion
Exact finite element method: w(x) = v(x) + τo v ′ (x) for the
term τ u′′ v
convective term u′
With τ = h/2 we then get the upwind scheme

Convection-dominated flow – p.79/148 Convection-dominated flow – p.80/148


Multi-dimensional problems Streamline diffusion

Model problem: Idea: add diffusion in the streamline direction


v · ∇u = α∇2 u Isotropic physical diffusion, expressed through a diffusion tensor :
or written out: d X
d
∂u ∂u X ∂2u
vx + vy = α∇2 u αδij = α∇2 u
∂x ∂y i=1 j=1
∂xi ∂xj
Non-physical oscillations occur with centered differences or
Galerkin methods when the left-hand side terms are large αδij is the diffusion tensor (here: same in all directions)
Remedy: upwind differences Streamline diffusion makes use of an anisotropic diffusion tensor
αij :
Downside: too much diffusion  
Xd X d
Important result: extra stabilizing diffusion is needed only in the ∂ ∂u vi vj
αij , αij = τ
streamline direction, i.e., in the direction of v = (vx , vy ) i=1 j=1
∂xi ∂xj ||v||2

Implementation: artificial diffusion term or perturbed weighting


function

Convection-dominated flow – p.81/148 Convection-dominated flow – p.82/148

Perturbed weighting functions (1) Perturbed weighting functions (2)

Consider the weighting function ⇒ Streamline diffusion can be obtained by perturbing the weighting
function

w = v + τ v · ∇v Common name: SUPG
R (streamline-upwind/Petrov-Galerkin)
for the convective (left-hand side) term: w v · ∇u dΩ
This expands to
Z Z
vv · ∇udΩ + τ ∗ v · ∇u v · ∇vdΩ

The latterP
term can be viewed as the Galerkin formulation of (write
v · ∇u = i ∂u/∂xi etc.)

Xd X d  
∂ ∂u
τ ∗ vi vj
i=1 j=1
∂xi ∂xj

Convection-dominated flow – p.83/148 Convection-dominated flow – p.84/148

Consistent SUPG A step back to 1D

Why not just add artificial diffusion? Let us try to use


Why bother with perturbed weighting functions? w(x) = v(x) + τ v ′ (x)
In standard FEM (method of weighted residuals), on both terms in u = ǫu′′ : ′

Z Z 1 Z 1
L(u)wΩ = 0 (u′ v + (ǫ + τ )u′ v ′ )dx + τ v ′′ u′ dx = 0
Ω 0 0

the exact solution is a solution of the FEM equations (it fulfills Problem: last term contains v ′′
L(u)) Remedy: drop it (!)
This no longer holds if we Justification: v ′′ = 0 on each linear (P1) element
add an artificial diffusion term (∼ h/2)
Drop 2nd-order derivatives of v in 2D/3D too
use different weighting functions on different terms
Consistent SUPG is not so consistent...
Idea: use consistent SUPG
no artificial diffusion term
same (perturbed) weighting function
applies to all terms
Convection-dominated flow – p.85/148 Convection-dominated flow – p.86/148

Choosing τ ∗ A test problem (1)

Choosing τ ∗ is a research topic y


du/dn=0 or u=0
1
Many suggestions y = x tan θ + 0.25

Two classes: v
τ∗ ∼ h u=1 du/dn=0
θ
τ ∗ ∼ ∆t (time-dep. problems)
Little theory
0.25
u=0

u=0 1 x

Convection-dominated flow – p.87/148 Convection-dominated flow – p.88/148


A test problem (2) Galerkin’s method

Methods:
1. Classical SUPG:
2
Brooks and Hughes: "A streamline upwind/Petrov-Galerkin finite
element formulation for advection domainated flows with particular
emphasis on the incompressible Navier-Stokes equations", Comp.
Methods Appl. Mech. Engrg., 199-259, 1982. 1

2. An additional discontinuity-capturing term

v · ∇u
w = v + τ ∗ v · ∇v + τ̂ ∇u 0
||∇u||2
−0.65
was proposed in 0

Hughes, Mallet and Mizukami: "A new finite element formulation


for computational fluid dynamics: II. Beyond SUPG", Comp. 1
1
Methods Appl. Mech. Engrg., 341-355, 1986. Z 0

X
Convection-dominated flow – p.89/148 Convection-dominated flow – p.90/148

SUPG Time-dependent problems

Model problem:
∂u
+ v · ∇u = ǫ∇2 u
2
∂t
Can add artificial streamline diffusion term
Can use perturbed weighting function
1

w = v + τ ∗ v · ∇v

0
on all terms
How to choose τ ∗ ?
−0.65
0

1 1
Z 0

X
Convection-dominated flow – p.91/148 Convection-dominated flow – p.92/148

Taylor-Galerkin methods (1) Taylor-Galerkin methods (2)

Idea: Lax-Wendroff + Galerkin We can write the scheme on the form


Model equation:  n
∂u 1 ∂2u
∂u ∂u Dt+ u + U = U 2 ∆t 2
+U =0 ∂x 2 ∂x
∂t ∂x
Lax-Wendroff: 2nd-order Taylor series in time, ⇒ Forward scheme with artificial diffusion
 n  2
n Lax-Wendroff: centered spatial differences,
∂u 1 ∂ u
un+1 = un + ∆t + ∆t2
∂t 2 ∂t2 1 2
[δt+ u + U D2x u = U ∆tDx Dx u]ni
2
Replace temporal by spatial derivatives,
Alternative: Galerkin’s method in space,
∂ ∂
= −U 1 2
∂t ∂x [δt+ u + U D2x u = U ∆tDx Dx u]ni
2
Result:
 n  2 n provided that we lump the mass matrix
∂u 1 ∂ u
un+1 = un − U ∆t + U 2 ∆t2 This is Taylor-Galerkin’s method
∂x 2 ∂x2 Convection-dominated flow – p.93/148 Convection-dominated flow – p.94/148

Taylor-Galerkin methods (3) Taylor-Galerkin methods (4)

In multi-dimensional problems, Can use the Galerkin method in space


(gives centered differences)
∂u
+ v · ∇u = 0 The result is close to that of SUPG, but τ ∗ is diferent
∂t
⇒ The Taylor-Galerkin method points to τ ∗ = ∆t/2 for SUPG in
we have time-dependent problems

= −v · ∇
∂t
and (∇ · v = 0)

Xd X d  
∂2 ∂ ∂
2
= ∇ · (vv · ∇) = vr vs
∂t r=1 s=1
∂xr ∂xs

This is streamline diffusion with τ ∗ = ∆t/2:

1
[Dt+ u + v · ∇u = ∆t∇ · (vv · ∇u)]n
2
Convection-dominated flow – p.95/148 Convection-dominated flow – p.96/148
The importance of linear system solvers

PDE problems often (usually) result in linear systems of algebraic


equations
Ax = b
Special methods utilizing that A is sparse is much faster than
Gaussian elimination!
Solving linear systems
Most of the CPU time in a PDE solver is often spent on solving
Ax = b
⇒ Important to use fast methods

Solving linear systems – p.97/148 Solving linear systems – p.98/148

Example: Poisson eq. on the unit cube (1) Example: Poisson eq. on the unit cube (2)

−∇2 u = f on an n = q × q × q grid Compare Banded Gaussian elimination (BGE) versus Conjugate


FDM/FEM result in Ax = b system Gradients (CG)

FDM: 7 entries pr. row in A are nonzero Work in BGE: O(q 7 ) = O(n2.33 )

FEM: 7 (tetrahedras), 27 (trilinear elms.), or 125 (triquadratic Work in CG: O(q 3 ) = O(n) (multigrid; optimal),
elms.) entries pr. row in A are nonzero for the numbers below we use incomplete factorization
preconditioning: O(n1.17 )
A is sparse (mostly zeroes)
n = 27000:
Fraction of nonzeroes: Rq −3
(R is nonzero entries pr. row) CG 72 times faster than BGE
BGE needs 20 times more memory than CG
Important to work with nonzeroes only!
n = 8 million:
CG 107 times faster than BGE
BGE needs 4871 times more memory than CG

Solving linear systems – p.99/148 Solving linear systems – p.100/148

Classical iterative methods Convergence

Ax = b, A ∈ IRn,n , x, b ∈ IRn . M xk = N xk−1 + b, k = 1, 2, . . .

Split A: A = M − N The iteration converges if G = M −1 N has its largest eigenvalue,


̺(G), less than 1
Write Ax = b as
M x = N x + b, Rate of convergence: R∞ (G) = − ln ̺(G)

and introduce an iteration To reduce the initial error by a factor ǫ,

k
Mx = Nx k−1
+ b, k = 1, 2, . . . ||x − xk || ≤ ǫ||x − x0 ||

Systems M y = z should be easy/cheap to solve one needs


− ln ǫ/R∞ (G)
Different choices of M correspond to different classical iteration
methods: iterations
Jacobi iteration
Gauss-Seidel iteration
Successive Over Relaxation (SOR)
Symmetric Successive Over Relaxation (SSOR)
Solving linear systems – p.101/148 Solving linear systems – p.102/148

Some classical iterative methods Jacobi iteration

Split: A = L + D + U M =D
L and U are lower and upper triangular parts, D is A’s diagonal Put everything, except the diagonal, on the rhs
Jacobi iteration: M = D (N = −L − U ) 2D Poisson equation −∇2 u = f :
Gauss-Seidel iteration: M = L + D (N = −U )
ui,j−1 + ui−1,j + ui+1,j + ui,j+1 − 4ui,j = −h2 fi,j
SOR iteration: Gauss-Seidel + relaxation
SSOR: two (forward and backward) SOR steps Solve for diagonal element and use old values on the rhs:
Rate of convergence R∞ (G) for −∇2 u = f in 2D with u = 0 as 1 k−1 
BC: uki,j = u + uk−1 k−1 k−1 2
i−1,j + ui+1,j + ui,j+1 + h fi,j
4 i,j−1
Jacobi: πh2 /2
Gauss-Seidel: πh2 for k = 1, 2, . . .
SOR: π2h
SSOR: > πh
SOR/SSOR is superior (h vs. h2 , h → 0 is small)

Solving linear systems – p.103/148 Solving linear systems – p.104/148


Relaxed Jacobi iteration Relation to explicit time stepping

Idea: Computed new x approximation x∗ from Relaxed Jacobi iteration for −∇2 u = f is equivalent with solving
∗ k−1 ∂u
Dx = (−L − U )x +b
α = ∇2 u + f
∂t
Set
xk = ωx∗ + (1 − ω)xk−1 by an explicit forward scheme until ∂u/∂t ≈ 0, provided
ω = 4∆t/(αh)2
weighted mean of xk−1 and xk if ω ∈ (0, 1)
Stability for forward scheme implies ω ≤ 1
In this example: ω = 1 best (⇔ largest ∆t)
Forward scheme for t → ∞ is a slow scheme, hence Jacobi
iteration is slow

Solving linear systems – p.105/148 Solving linear systems – p.106/148

Gauss-Seidel/SOR iteration Symmetric/double SOR: SSOR

M =L+D SSOR = Symmetric SOR


For our 2D Poisson eq. scheme: One (forward) SOR sweep for unknowns 1, 2, 3, . . . , n

 One (backward) SOR sweep for unknowns n, n − 1, n − 2, . . . , 1


1 k
uki,j = u + uki−1,j + uk−1 k−1 2
i+1,j + ui,j+1 + h fi,j
4 i,j−1 M can be shown to be
  −1  
i.e. solve for diagonal term and use the most recently computed 1 1 1 1
values on the right-hand side M= D+L D D+U
2−ω ω ω ω
SOR is relaxed Gauss-Seidel iteration:
compute x∗ from Gauss-Seidel it. Notice that each factor in M is diagonal or lower/upper triangular
(⇒ very easy to solve systems M y = z)
set xk = ωx∗ + (1 − ω)xk−1
ω ∈ (0, 2), with ω = 2 − O(h2 ) as optimal choice
Very easy to implement!

Solving linear systems – p.107/148 Solving linear systems – p.108/148

Status: classical iterative methods Conjugate Gradient-like methods

Jacobi, Gauss-Seidel/SOR, SSOR are too slow for paractical PDE


Ax = b, A ∈ IRn,n , x, b ∈ IRn .
computations
The simplest possible solution method for −∇2 u = f and other Use a Galerkin or least-squares method to solve a linear system
stationary PDEs in 2D/3D is to use SOR (!)
Classical iterative methods converge quickly in the beginning but Idea: write
k
X
slow down after a few iterations
xk = xk−1 + αj q j
Classical iterative methods are important ingredients in multigrid j=1
methods
αj : unknown coefficients, q j : known vectors
Compute the residual:

k
X
r k = b − Axk = rk−1 − αj Aq j
j=1

and apply the ideas of the Galerkin or least-squares methods

Solving linear systems – p.109/148 Solving linear systems – p.110/148

Galerkin Least squares

Residual: Residual:
k
X k
X
r k = b − Axk = r k−1 − αj Aq j r k = b − Axk = r k−1 − αj Aq j
j=1 j=1

(r k , q i ) = 0 ∂
(rk , rk ) = 0
∂αi
Galerkin’s method (r ∼ R, q j ∼ Nj , αj ∼ uj ):
Least squares: minimize (r k , rk )
(rk , q i ) = 0, i = 1, . . . , k Result: linear system for αj :

(·, ·): Eucledian inner product k


X
Result: linear system for αj , (Aq i , Aq j )αj = (rk−1 , Aq i ), i = 1, . . . , k
j=1
k
X
(Aq i , q j )αj = (rk−1 , q i ), i = 1, . . . , k
j=1

Solving linear systems – p.111/148 Solving linear systems – p.112/148


The nature of the methods Extending the basis

Start with a guess x0 Vk is normally selected as a so-called Krylov subspace:


In iteration k: seek xk in a k-dimensional vector space Vk
Vk = span{r 0 , Ar 0 , . . . , Ak−1 r 0 }
Basis for the space: q 1 , . . . , q k
Use Galerkin or least squares to compute the (optimal) Alternatives for computing q k+1 ∈ Vk+1 :
approximation xk in Vk
k
X
Extend the basis from Vk to Vk+1 (i.e. find q k+1 ) q k+1 = rk + βj q j
j=1
k
X
q k+1 = Aqk + βj q j
j=1

The first dominates in frequently used algorithms – only that


choice is used hereafter
How to choose βj ?

Solving linear systems – p.113/148 Solving linear systems – p.114/148

Orthogonality properties Formula for updating the basis vectors

Bad news: must solve a k × k linear system for αj in each Define


iteration (as k → n the work in each iteration approach the work of hu, vi ≡ (Au, v) = uT Av
solving Ax = b!)
and
The coefficient matrix in the αj system: [u, v] ≡ (Au, Av) = uT AT Av

(Aq i , q j ), (Aq i , Aq j ) Galerkin: require A-orthogonal qj vectors, which then results in

Idea: make the coefficient matrices diagonal hrk , q i i


βi = −
That is, hq i , q i i
Galerkin: (Aq i , q j ) = 0 for i 6= j
Least squares: require AT A–orthogonal qj vectors, which then
Least squares: (Aq i , Aq j ) = 0 for i 6= j results in
Use βj to enforce orthogonality of q i [rk , q i ]
βi = −
[q i , q i ]

Solving linear systems – p.115/148 Solving linear systems – p.116/148

Simplifications Symmetric A

Galerkin: hq i , q j i = 0 for i 6= j gives If A is symmetric (AT = A) and positive definite (positive eigenvalues
⇔ y T Ay > 0 for any y 6= 0), also βi = 0 for i < k
(rk−1 , q k ) ⇒ need to store q k only
αk =
hq k , q k i (q 1 , . . . , q k−1 are not used in iteration k)

and αi = 0 for i < k (!):

xk = xk−1 + αk q k

That is, hand-derived formulas for αj


Least squares:
(rk−1 , Aq k )
αk =
[q k , q k ]
and αi = 0 for i < k

Solving linear systems – p.117/148 Solving linear systems – p.118/148

Summary: least squares algorithm Truncation and restart

given a start vector x0 , Problem: need to store q 1 , . . . , q k


compute r 0 = b − Ax0 and set q 1 = r 0 . Much storage and computations when k becomes large
for k = 1, 2, . . . until termination criteria are fulfilled:
αk = (r k−1 , Aq k )/[q k , q k ] Truncation: work with a truncated sum for xk ,
xk = xk−1 + αk q k
k
X
r k = rk−1 − αk Aq k
xk = xk−1 + αj q j
if A is symmetric then
j=k−K+1
βk = [rk , q k ]/[q k , q k ]
q k+1 = r k − βk q k
where a possible choice is K = 5
else
βj = [r k , q j ]/[q j , q j ], j = 1, . . . , k Small K might give convergence problems
Pk
q k+1 = r k − j=1 βj q j Restart: restart the algorithm after K iterations
(alternative to truncation)
The Galerkin-version requires A to be symmetric and positive
definite and results in the famous Conjugate Gradient method

Solving linear systems – p.119/148 Solving linear systems – p.120/148


Family of methods Convergence

Generalized Conjugate Residual method Conjugate Gradient-like methods converge slowly (but usually
= least squares + restart faster than SOR/SSOR)
Orthomin method To reduce the initial error by a factor ǫ,
= least squares + truncation
1 2√
Conjugate Gradient method ln κ
= Galerkin + symmetric and positive definite A 2 ǫ

Conjugate Residuals method iterations are needed, where κ is the condition number:
= Least squares + symmetric and positive definite A
largest eigenvalue of A
Many other related methods: BiCGStab, Conjugate Gradients κ=
smalles eigenvalue of A
Squared (CGS), Generalized Minimum Residuals (GMRES),
Minimum Residuals (MinRes), SYMMLQ
κ = O(h−2 ) when solving 2nd-order PDEs
Common name: Conjugate Gradient-like methods (incl. elasticity and Poisson eq.)
All of these are easily called in Diffpack

Solving linear systems – p.121/148 Solving linear systems – p.122/148

Preconditioning Classical methods as preconditioners

Idea: Introduce an equivalent system Idea: “solve” M y = z by one iteration with a classical iterative
method (Jacobi, SOR, SSOR)
−1 −1
M Ax = M b Jacobi preconditioning: M = D (diagonal of A)
solve it with a Conjugate Gradient-like method and construct M No extra storage as M is stored in A
such that No extra computations as M is a part of A
1. κ = O(1) ⇒ M ≈ A (i.e. fast convergence) Efficient solution of M y = z
2. M is cheap to compute
But: M is probably not a good approx to A
3. M is sparse (little storage) ⇒ poor quality of this type of preconditioners?
4. systems M y = z (occuring in the algorithm due to
Conjugate Gradient method + SSOR preconditioner is widely
M −1 Av-like products) are efficiently solved (O(n) op.)
used
Contradictory requirements!
The preconditioning business: find a good balance between 1-4

Solving linear systems – p.123/148 Solving linear systems – p.124/148

M as a factorization of A M as an incomplete factorization of A

Idea: Let M be an LU-factorization of A, i.e., New idea: compute sparse L and U


How? compute only with nonzeroes in A
M = LU
bU
⇒ Incomplete factorization, M = L b 6= LU
where L and U are lower and upper triangular matrices resp.
M is not a perfect approx to A
Implications:
M is cheap to compute and store
1. M = A (κ = 1): very efficient preconditioner! (O(n) complexity)
2. M is not cheap to compute
M y = z is efficiently solved (O(n) complexity)
(requires Gaussian elim. on A!)
3. M is not sparse (L and U are dense!) This method works well - much better than SOR/SSOR
4. systems M y = z are not efficiently solved preconditioning
(O(n2 ) process when L and U are dense)

Solving linear systems – p.125/148 Solving linear systems – p.126/148

How to compute M Numerical experiments

Run through a standard Gaussian elimination, which factors A as Two test cases:
A = LU −∇2 u = f on the unit cube and FDM
Normally, L and U have nonzeroes where A has zeroes −∇2 u = f on the unit cube and FEM
Idea: let L and U be as sparse as A Diffpack makes it easy to run through a series of numerical
Compute only with the nonzeroes of A experiments, using multiple loops, e.g.,
sub LinEqSolver_prm
Such a preconditioner is called Incomplete LU Factorization, ILU set basic method = { ConjGrad & MinRes }
ok
Option: add contributions outside A’s sparsity pattern to the sub Precond_prm
set preconditioning type = PrecRILU
diagonal, multiplied by ω set RILU relaxation parameter = { 0.0 & 0.4 & 0.7 & 1.0 }
ok
Relaxed Incomplete Factorization (RILU): ω > 1
Modified Incomplete Factorization (MILU): ω = 1
See algorithm C.3 in the book

Solving linear systems – p.127/148 Solving linear systems – p.128/148


Test case 1: 3D FDM Poisson eq. Jacobi vs. SOR vs. SSOR

Equation: −∇2 u = 1 n = 203 = 8000 and n = 303 = 27000


Boundary condition: u = 0 Jacobi: not converged in 1000 iterations
7-pt star standard finite difference scheme SOR(ω = 1.8): 2.0s and 9.2s
Grid size: 20 × 20 × 20 = 8000 points and 20 × 30 × 30 = 27000 SSOR(ω = 1.8): 1.8s and 9.8s
points Gauss-Seidel: 13.2s and 97s
Source code: SOR’s sensitivity to relax. parameter ω:
$NOR/doc/Book/src/linalg/LinSys4/ 1.0: 96s, 1.6: 23s, 1.7: 16s, 1.8: 9s, 1.9: 11s
All details in HPL Appendix D SSOR’s sensitivity to relax. parameter ω:
Input files: 1.0: 66s, 1.6: 17s, 1.7: 13s, 1.8: 9s, 1.9: 11s
$NOR/doc/Book/src/linalg/LinSys4/experiments ⇒ relaxation is important,
Solver’s CPU time written to standard output great sensitivity to ω

Solving linear systems – p.129/148 Solving linear systems – p.130/148

Conjugate Residuals or Gradients? Different preconditioners

Compare Conjugate Residuals with Conjugate Gradients ILU, Jacobi, SSOR preconditioners (ω = 1.2)
Or: least squares vs. Galerkin MinRes:
Diffpack names: MinRes and ConjGrad Jacobi: not conv., SSOR: 11.4s, ILU: 4s

MinRes: not converged in 1000 iterations ConjGrad:


Jacobi: 4.8s, SSOR: 2.8s, ILU: 2.7s
ConjGrad: 0.7s and 3.9s
Sensitivity to relax. parameter in SSOR, with ConjGrad as solver:
⇒ ConjGrad is clearly faster than the best SOR/SSOR 1.0: 3.3s, 1.6: 2.1s, 1.8: 2.1s, 1.9: 2.6s
Add ILU preconditioner Sensitivity to relax. parameter in RILU, with ConjGrad as solver:
MinRes: 0.7s and 4s 0.0: 2.7s, 0.6: 2.4s, 0.8: 2.2s, 0.9: 1.9s,
0.95: 1.9s, 1.0: 2.7s
ConjGrad: 0.6s and 2.7s
⇒ ω slightly less than 1 is optimal,
The importance of preconditioning grows as n grows RILU and SSOR are equally fast (here)

Solving linear systems – p.131/148 Solving linear systems – p.132/148

Test case 2: 3D FEM Poisson eq. Jacobi vs. SOR vs. SSOR

Equation: −∇2 u = A1 π 2 sin πx + 4A2 π 2 sin 2πy + 9A3 π 2 sin 3πz n = 9261 and n = 303 = 29791, trilinear and triquadratic elms.
Boundary condition: u known Jacobi: not converged in 1000 iterations
ElmB8n3D and ElmB27n3D elements SOR(ω = 1.8): 9.1s and 81s, 42s and 338s
Grid size: 21 × 21 × 21 = 9261 nodes and 31 × 31 × 31 = 29791 SSOR(ω = 1.8): 47s and 248s, 138s and 755s
nodes Gauss-Seidel: not converged in 1000 iterations
Source code:
SOR’s sensitivity to relax. parameter ω:
$NOR/doc/Book/src/fem/Poisson2
1.0: not conv., 1.6: 200s, 1.8: 83s, 1.9: 57s
All details in HPL Chapter 3.2 and 3.5 (n = 29791 and trilinear elements)
Input files: SSOR’s sensitivity to relax. parameter ω:
$NOR/doc/Book/src/fem/Poisson2/linsol-experiments 1.0: not conv., 1.6: 212s, 1.7: 207s, 1.8: 245s, 1.9: 435s
Solver’s CPU time available in casename-summary.txt (n = 29791 and trilinear elements)
⇒ relaxation is important,
great sensitivity to ω

Solving linear systems – p.133/148 Solving linear systems – p.134/148

Conjugate Residuals or Gradients? Different preconditioners

Compare Conjugate Residuals with Conjugate Gradients ILU, Jacobi, SSOR preconditioners (ω = 1.2)
Or: least squares vs. Galerkin MinRes:
Diffpack names: MinRes and ConjGrad Jacobi: 68s., SSOR: 57s, ILU: 28s

MinRes: not converged in 1000 iterations ConjGrad:


Jacobi: 19s, SSOR: 14s, ILU: 16s
9261 vs 29791 unknowns, trilinear elements
Sensitivity to relax. parameter in SSOR, with ConjGrad as solver:
ConjGrad: 5s and 22s 1.0: 17s, 1.6: 12s, 1.8: 13s, 1.9: 18s
⇒ ConjGrad is clearly faster than the best SOR/SSOR! Sensitivity to relax. parameter in RILU, with ConjGrad as solver:
Add ILU preconditioner 0.0: 16s, 0.6: 15s, 0.8: 13s, 0.9: 12s, 0.95: 11s, 1.0: 16s
MinRes: 5s and 28s ⇒ ω slightly less than 1 is optimal,
RILU and SSOR are equally fast (here)
ConjGrad: 4s and 16s
ILU prec. has a greater impact when using triquadratic elements
(and when n grows)

Solving linear systems – p.135/148 Solving linear systems – p.136/148


More experiments Multigrid methods

Convection-diffusion equations: Multigrid methods are the most efficient methods for solving linear
$NOR/doc/Book/src/app/Cd/Verify systems
Files: linsol_a.i etc as for LinSys4 and Poisson2 Multigrid methods have optimal complexity O(n)
Elasticity equations: Multigrid can be used as stand-alone solver or preconditioner
$NOR/doc/Book/src/app/Elasticity1/Verify Multigrid applies a hierarchy of grids
Files: linsol_a.i etc as for the others Multigrid is not as robust as Conjugate Gradient-like methods and
Run experiments and learn! incomplete factorization as preconditioner, but faster when it
works
Multigrid is complicated to implement
Diffpack has a multigrid toolbox that simplifies the use of multigrid
dramatically

Solving linear systems – p.137/148 Solving linear systems – p.138/148

The rough ideas of multigrid Damping in Gauss-Seidel’s method (1)

Observation: e.g. Gauss-Seidel methods are very efficient during Model problem: −u′′ = f by finite differences:
the first iterations
High-frequency errors are efficiently damped by Gauss-Seidel −uj−1 + 2uj − uj+1 = h2 fj

Low-frequence errors are slowly reduced by Gauss-Seidel solved by Gauss-Seidel iteration:


Idea: jump to a coarser grid such that low-frequency errors get
ℓ−1
higher frequency 2uℓj = uℓj−1 + uj+1 + h2 fj
Repeat the procedure
Study the error eℓi = uℓi − u∞
i :
On the coarsest grid: solve the system exactly
ℓ−1
Transfer the solution to the finest grid 2eℓj = eℓj−1 + ej+1
Iterate over this procedure
This is like a time-dependent problem, where the iteration index ℓ
is a pseudo time

Solving linear systems – p.139/148 Solving linear systems – p.140/148

Damping in Gauss-Seidel’s method (2) Gauss-Seidel’s damping factor

Can find eℓj with techniques from Appendix A.4: 1


|ξ| = √ , p = kh ∈ [0, π]
X 5 − 4 cos p
eℓj = Ak exp (i(kjh − ω̃ℓ∆t))
k
1

or (easier to work with here):


0.9

0.8

X 0.7

eℓj = Ak ξ ℓ exp (ikjh), ξ = exp (−iω̃∆t) 0.6

k 0.5

0.4

Inserting a wave component in the scheme: 0 0.5 1 1.5


p
2 2.5 3

exp (ikh) 1
ξ = exp (−iω̃∆t) = , |ξ| = √
2 − exp (−ikh) 5 − 4 cos kh Small p = kh ∼ h/λ: low frequency (relative to the grid) and small
damping
Interpretation of |ξ|: reduction in the error per iteration Large (→ π) p = kh ∼ h/λ: high frequency (relative to the grid)
and efficient damping
Solving linear systems – p.141/148 Solving linear systems – p.142/148

More than one grid Transferring the solution between grids

From the previous analysis: error components with high frequency From fine to coarser: restriction
are quickly damped
Jump to a coarser grid, e.g. h′ = 2h simple restriction

weighted restriction
p is increased by a factor of 2, i.e., not so high-frequency waves
fine grid function
on the h grid is efficiently damped by Gauss-Seidel on the h′ grid
Repeat the procedure
1 2 3 4 5 q-1
On the coarsest grid: solve by Gaussian elimination
1 2 3 4 5 6 7 8 9 q
Interpolate solution to a finer grid, perform Gauss-Seidel
iterations, and repeat until the finest grid is reached From coarse to finer: prolongation

interpolated fine grid function

coarse grid function

1 2 3 4 5 q−1
1 2 3 4 5 6 7 8 9 q

Solving linear systems – p.143/148 Solving linear systems – p.144/148


Smoothers A multigrid algorithm

The Gauss-Seidel method is called a smoother when used to Start with the finest grid
damp high-frequency error components in multigrid Perform smoothing (pre-smoothing)
Other smoothers: Jacobi, SOR, SSOR, incomplete factorization Restrict to coarser grid
No of iterations is called no of smoothing sweeps Repeat the procedure (recursive algorithm!)
Common choice: one sweep On the coarsest grid: solve accurately
Prolongate to finer grid
Perform smoothing (post-smoothing)
One cycle is finished when reaching the finest grid again
Can repeat the cycle
Multigrid solves the system in O(n) operations

Check out HPL C.4.2 for details!!

Solving linear systems – p.145/148 Solving linear systems – p.146/148

V- and W-cycles Multigrid requires flexible software

Different strategies for constructing cycles: Many ingredients in multigrid:


pre- and post-smoother
q γq =1 γq =2 no of smoothing sweeps
4 solver on the coarsest level
3 cycle strategy
restriction and prolongation methods
2
how to construct the various grids?
1
There are also other variants of multigrid (e.g. for nonlinear
problems)
coarse grid solve
smoothing
The optimal combination of ingredients is only known for simple
model problems (e.g. the Poisson eq.)
In general: numerical experimentation is required!
(Diffpack has a special multigrid toolbox for this)

Solving linear systems – p.147/148 Solving linear systems – p.148/148

Вам также может понравиться