Slides 8

5mm.
INF5620: Numerical Methods for Partial

Differential Equations
Hans Petter Langtangen
Nonlinear PDEs
Simula Research Laboratory, and
Dept. of Informatics, Univ. of Oslo
Last update: April 29, 2011
INF5620: Numerical Methods for Partial Differential Equations – p.1/148 Nonlinear PDEs – p.2/148
Examples Nonlinear discrete equations; FDM
Some nonlinear model problems to be treated next: Finite differences for −u′′ = f (u):
′′ 1
−u (x) = f (u), u(0) = uL , u(1) = uR ,
− (ui−1 − 2ui + ui+1 ) = f (ui )
−(α(u)u′ )′ = 0, u(0) = uL , u(1) = uR h2
∂u ⇒ nonlinear system of algebraic equations
−∇ · [α(u)∇u] = g(x), with u or − α B.C.
∂n
F (u) = 0, or Au = b(u), u = (u0 , . . . , uN )T
Discretization methods:
standard finite difference methods Finite differences for (α(u)u′ )′ = 0:
standard finite element methods
the group finite element method 1
(α(ui+1/2 )(ui+1 − ui ) − α(ui−1/2 )(ui − ui−1 )) = 0
h2
We get nonlinear algebraic equations
Solution method: iterate over linear equations 1
([α(ui+1 ) + α(ui )](ui+1 − ui ) − [α(ui ) + α(ui−1 )](ui − ui−1 )) = 0
2h2
⇒ nonlinear system of algebraic equations
Nonlinear PDEs – p.3/148

F (u) = 0 or A(u)u = b Nonlinear PDEs – p.4/148
Nonlinear discrete equations; FEM Nonlinearities in the FEM
Finite elements for −u′′ = f (u) with u(0) = u(1) = 0 Note that X
P f( ϕk (x)uk )
Galerkin approach: find u = nk=1 uk ϕk (x) ∈ V such that
k
Z 1 Z 1
is a complicated function of u0 , . . . , uN
u′ v ′ dx = f (u)vdx ∀v ∈ V
0 0 F.ex.: f (u) = u2
Z !2
Left-hand side is easy to assemble: v = ϕi 1 X
ϕk uk ϕi dx
Z 1 X
0 k
1
− (ui−1 − 2ui + ui+1 ) = f( uk ϕk (x))ϕi dx gives rise to a difference representation
h 0 k
P P h
We write u = k uk ϕk instead of u = k ck ϕk since u2 + 2ui (ui−1 + ui+1 ) + 6u2i + u2i+1
12 i−1
ck = u(xk ) = uk and uk is used in the finite difference schemes
(compare with f (ui ) = u2i in FDM!)
R
Must use numerical integration in general to evaluate f (u)ϕi dx
Nonlinear PDEs – p.5/148 Nonlinear PDEs – p.6/148
The group finite element method FEM for a nonlinear coefficient
The group finite element method: We now look at
X n
X (α(u)u′ )′ = 0, u(0) = uL , u(1) = uR
f (u) = f ( uj ϕj (x)) ≈ f (uj )ϕj
j j=1 Using a finite element method (fine exercise!) results in an integral
R1 R1P Z
Resulting term: f (u)ϕi dx = j ϕi ϕj f (uj ) gives 1 X
0 0
α( uk ϕk )ϕ′i ϕ′j dx
0
X Z 1 k
ϕk ϕi dx f (uj )
j 0 ⇒ complicated to integrate by hand or symbolically
Linear P1 elements and trapezoidal rule (do it!):
This integral andPformulation also arise from approxmating some
function by u = j uj ϕj 1 1
(α(ui ) + α(ui+1 ))(ui+1 − ui ) − (α(ui−1 ) + α(ui ))(ui − ui−1 ) = 0
2 2
We can write the term as M f (u), where M has rows consisting
of h/6(1, 4, 1), and row i in M f (u) becomes ⇒ same as FDM with arithmetic mean for α(ui+1/2 )
h
(f (ui−1 ) + 4f (ui ) + f (ui+1 ))
6 Nonlinear PDEs – p.7/148 Nonlinear PDEs – p.8/148
Nonlinear algebraic equations Solving nonlinear algebraic eqs.
FEM/FDM for nonlinear PDEs gives Have

nonlinear algebraic equations: A(u)u − b = 0, Au − b(u) = 0, F (u) = 0
(α(u)u′ )′ = 0 ⇒ A(u)u = b
Idea: solve nonlinear problem as a sequence of linear
−u′′ = f (u) ⇒ Au = b(u) subproblems
In general a nonlinear PDE gives Must perform some kind of linearization
F (u) = 0 Iterative method: guess u0 , solve linear problems for u1 , u2 , . . .

and hope that
or lim uq = u
q→∞
F0 (u0 , . . . , uN ) = 0 i.e. the iteration converges

...
FN (u0 , . . . , uN ) = 0
Picard iteration (1) Picard iteration (2)
Model problem: A(u)u = b Model problem: Au = b(u)

Simple iteration scheme: Simple iteration scheme:
q q+1
A(u )u = b, q = 0, 1, . . . Auq+1 = b(uq ), q = 0, 1, . . .
0 Relaxation:
Must provide (good) guess u
Termination:
Au∗ = b(uq ), uq+1 = ωu∗ + (1 − ω)uq
||uq+1 − uq || ≤ ǫu
or using the residual (expensive, req. new A(uq+1 )!) (may improve convergence, avoids too large steps)
This method is also called Successive substitutions
||b − A(uq+1 )uq+1 || ≤ ǫr
Relative criteria:
||uq+1 − uq || ≤ ǫu ||uq ||
or (more expensive)
||b − A(uq+1 )uq+1 || ≤ ǫr ||b − A(u0 )u0 || Nonlinear PDEs – p.11/148 Nonlinear PDEs – p.12/148
Newton’s method for scalar equations Newton’s method for systems of equations
The Newton method for f (x) = 0, x ∈ IR Systems of nonlinear equations:

Given an approximation xq
F (u) = 0, F (u) ≈ M (u; uq )
Approximate f by a linear function at xq :
Multi-dimensional Taylor-series expansion:
f (x) ≈ M (x; xq ) = f (xq ) + f ′ (xq )(x − xq )
M (u; uq ) = F (uq ) + J (u − uq ), J ≡ ∇F
Find new xq+1 such that
∂Fi
f (xq ) Ji,j =
M (xq+1 ; xq ) = 0 ⇒ xq+1 = xq − ∂uj
f ′ (xq )
Iteration no. q:
solve linear system J (uq )(δu)q+1 = −F (uq )
update: uq+1 = uq + (δu)q+1
Can use relaxation: uq+1 = uq + ω(δu)q+1
The Jacobian matrix; FDM (1) The Jacobian matrix; FDM (2)
Model equation: u′′ = −f (u) 1

Fi ≡ (ui−1 − 2ui + ui+1 ) − f (ui ) = 0
Scheme: h2
1 Derivation:
Fi ≡ (ui−1 − 2ui + ui+1 ) + f (ui ) = 0 ∂Fi 1
h2 Ji,i−1 = = 2
∂ui−1 h
Jacobian matrix term (FDM): ∂Fi 1
Ji,i+1 = = 2
∂Fi ∂ui+1 h
Ji,j =
∂uj ∂Fi 2
Ji,i = = − 2 + f ′ (ui )
∂ui h
Fi contains only ui , ui±1
Must form the Jacobian matrix J in each iteration and solve
Only
J δuq+1 = −F (uq )
∂Fi ∂Fi ∂Fi
Ji,i−1 = , Ji,i = , Ji,i+1 = 6= 0
∂ui−1 ∂ui ∂ui+1 and then update
uq+1 = uq + ωδuq+1
⇒ Jacobian is tridiagonal
The Jacobian matrix; FEM A 2D/3D transient nonlinear PDE (1)
−u′′ = f (u) on Ω = (0, 1) with u(0) = u(1) = 0 and FEM PDE for heat conduction in a solid where the conduction depends
" # on the temperature u:
Z 1 X X
Fi ≡ ϕ′i ϕ′k uk − f ( us ϕs )ϕi dx = 0 ∂u
0 k s
̺C = ∇ · [κ(u)∇u]
∂t
∂Fi
First term of the Jacobian Ji,j = ∂uj :
(f.ex. u = g on the boundary and u = I at t = 0)
Z Z Z 1 Stable Backward Euler FDM in time:

∂ 1 X 1
∂ X ′ ′
ϕ′i ϕ′k uk dx = ϕi ϕk uk dx = ϕ′i ϕ′j dx un − un−1
∂uj 0 0 ∂uj 0 = ∇ · [α(un )∇un ]
k k
∆t
Second term: with α = κ/(̺C)
Z Z P
∂ 1 X 1 X Next step: Galerkin formulation, where un = j unj ϕj is the
− f( us ϕs )ϕi dx = − f ′( us ϕs )ϕj ϕi dx
∂uj 0 s 0 s
unknown and un−1 is just a known function
P
because when u = s us ϕs ,
∂ ′ ∂u ′ ∂
P ′
∂uj f (u) = f (u) ∂uj = f (u) ∂uj s us ϕs = f (u)ϕj Nonlinear PDEs – p.17/148 Nonlinear PDEs – p.18/148
A 2D/3D transient nonlinear PDE (2) A 2D/3D transient nonlinear PDE (3)
FEM gives nonlinear algebraic equations: Picard iteration:

Use “old” un,q in α(un ) term, solve linear problem for un,q+1 ,
Fi (un0 , . . . , unN ) = 0, i = 0, . . . , N q = 0, 1, . . .
Z
where
Ai,j = (ϕi ϕj + ∆tα(un,q )∇ϕi · ∇ϕj ) dΩ
Z Ω
n n−1
n n

Fi ≡ u −u v + ∆tα(u )∇u · ∇v dΩ Z
Ω
bi = un−1 ϕi dΩ
Ω
Newton’s method: need Jacobian,
∂Fi
Ji,j =
∂unj
Z
Ji,j = (ϕi ϕj + ∆t(α′ (un,q )ϕj ∇un,q · ∇ϕi + α(un,q )∇ϕi · ∇ϕj )) dΩ
Ω
Iteration methods at the PDE level Continuation methods
Consider −u′′ = f (u) Challenging nonlinear PDE:

Could introduce Picard iteration at the PDE level:
∇ · (||∇u||q ∇u) = 0
2
d q+1
− u = f (uq ), q = 0, 1, . . . For q = 0 this problem is simple
dx2
Idea: solve a sequence of problems, starting with q = 0, and
⇒ linear problem for uq+1 increase q towards a target value
A PDE-level Newton method can also be formulated Sequence of PDEs:
(see the HPL book for details)
We get identical results for our model problem ∇ · (||∇ur ||qr ∇ur ) = 0, r = 0, 1, 2, . . .
Time-dependent problems: first use finite differences in time, then with = 0 < q0 < q1 < q2 < · · · < qm = q
use an iteration method (Picard or Newton) at the time-discrete
PDE level Start guess for ur is ur−1
(the solution of a “simpler” problem)
CFD: The Reynolds number is often the continuation parameter q
Exercise 1 Exercise 2
Derive the nonlinear algebraic equations for the problem For the problem in Exercise 1, use the group finite element
method with P1 elements and the Trapezoidal rule for integrals
d du and show that the resulting equations coincide with those
α(u) = 0 on (0, 1), u(0) = 0, u(1) = 1,
dx dx obtained in Exercise 1.
using a finite difference method and the Galerkin method with P1

finite elements and the Trapezoidal rule for approximating
integrals.

For the problem in Exercises 1 and 2, identify the F vector in the Explain why discretization of nonlinear differential equations by
system F = 0 of nonlinear equations. Derive the Jacobian J for a finite difference and finite element methods normally leads to a
general α(u). Then write the expressions for the Jacobian when Jacobian with the same sparsity pattern as one would encounter
α(u) = α0 + α1 u + α2 u2 . in an associated linear problem. Hint: Which unknowns will enter
equation number i?
Show that if F (u) = 0 is a linear system of equations, The operator ∇ · (α∇u), with α = ||∇u||q , q ∈ IR, and || · || being
F = Au − b, for a constant matrix A and vector b, then Newton’s the Eucledian norm, appears in several physical problems,
method (with ω = 1) finds the correct solution in the first iteration. especially flow of non-Newtonian fluids. The quantity ∂α/∂uj is
central when formulating a Newton method, where uj is the
P
coefficient in the finite element approximation u = j uj ϕj . Show
that
∂
||∇u||q = q||∇u||q−2 ∇u · ∇ϕj .
∂uj
Consider the PDE Repeat Exercise 7 for the PDE

∂u
= ∇ · (α(u)∇u) ∂u
∂t ̺(u) = ∇ · (α(u)∇u)
discretized by a Forward Euler difference in time. Explain why this ∂t
nonlinear PDE gives rise to a linear problem (and hence no need
for Newton or Picard iteration) at each time level.
Discretize the PDE by a Backward Euler difference in time and
realize that there is a need for solving nonlinear algebraic
equations. Formulate a Picard iteration method for the spatial
PDE to be solved at each time level. Formulate a Galerkin method
for discretizing the spatial problem at each time level. Choose
some appropriate boundary conditions.
Explain how to incorporate an initial condition.
For the problem in Exercise 8, assume that a nonlinear Newton In Exercise 8, restrict the problem to one space dimension,
cooling law applies at the whole boundary: choose simple boundary conditions like u = 0, use the group finite
element method for all nonlinear coefficients, apply P1 elements,
∂u use the Trapezoidal rule for all integrals, and derive the system of
−α(u) = H(u)(u − uS ),
∂n nonlinear algebraic equations that must be solved at each time
level.
where H(u) is a nonlinear heat transfer coefficient and uS is the
Set up some finite difference method and compare the form of the
temperature of the surroundings (and u is the temperature). Use a
nonlinear algebraic equations.
Backward Euler scheme in time and a Galerkin method in space.
Identify the nonlinear algebraic equations to be solved at each
time level. Derive the corresponding Jacobian.
The PDE problem in this exercise is highly relevant when the
temperature variations are large. Then the density times the heat
capacity (̺), the heat conduction coefficient (α) and the heat
transfer coefficient (H) normally vary with the temperature (u).

Exercise 11
In Exercise 8, use the Picard iteration method with one iteration at

each time level, and introduce this method at the PDE level.
Realize the similarities with the resulting discretization and that of
the corresponding linear diffusion problem.
Shallow water waves
Nonlinear PDEs – p.33/148 Shallow water waves – p.34/148
Tsunamis Norwegian tsunamis
Waves in fjords, lakes, or oceans, generated by

slide
earthquake Tromsø
70˚
subsea volcano
asteroid Bodø
human activity, like nuclear detonation, or slides generated by oil
EN
ED
drilling, may generate tsunamis
SW
65˚
Propagation over large distances Trondheim
Hardly recognizable in the open ocean, but wave amplitude
AY
increases near shore
RW
Stockholm
Oslo
NO
Bergen
Run-up at the coasts may result in severe damage 60˚
Giant events: Dec 26 2004 (≈ 300000 killed), 1883 (similar to
5˚
10˚
15˚
2004), 65 My ago (extinction of the dinosaurs)
Circules: Major incidents, > 10 killed; Triangles: Selected smaller
incidents; Square: Storegga (5000 B.C.)
Shallow water waves – p.35/148 Shallow water waves – p.36/148
Tsunamis in the Pacific Selected events; slides
location year run-up dead

Loen 1905 40m 61
Tafjord 1934 62m 41
Loen 1936 74m 73
Storegga 5000 B.C. 10m(?) ??
Vaiont, Italy 1963 270m 2600
Litua Bay, Alaska 1958 520m 2
Shimabara, Japan 1792 10m(?) 15000
Scenario: earthquake outside Chile, generates tsunami, propagating

at 800 km/h accross the Pacific, run-up on densly populated coasts in
Japan; Shallow water waves – p.37/148 Shallow water waves – p.38/148
Selected events; earthquakes etc. Why simulation?
location year strength run-up dead Increase the understanding of tsunamis

Thera 1640 B.C. volcano ? ? Assist warning systems
Thera 1650 volcano ? ? Assist building of harbor protection (break waters)
Lisboa 1755 M=9 ? 15(?)m ?000 Recognize critical coastal areas (e.g. move population)
Portugal 1969 M=7.9 1m
Hindcast historical tsunamis (assist geologists/biologists)
Amorgos 1956 M=7.4 5(?)m 1
Krakatao 1883 volcano 40 m 36 000
Flores 1992 M=7.5 25 m 1 000
Nicaragua 1992 M=7.2 10 m 168
Sumatra 2004 M=9 50 m 300 000
The selection is biased wrt. European events; 150 catastrophic

tsunami events have been recorded along along the Japanese coast in
modern times.
Tsunamis: no. 5 killer among natural hazards

Problem sketch Mathematical model
η(x,y,t) PDEs:
z
y ∂η ∂ ∂ ∂H
= − (uH) − (vH) −
x ∂t ∂x ∂y ∂t
∂u ∂η
= − , x ∈ Ω, t > 0
H(x,y,t) ∂t ∂x
∂v ∂η
= − , x ∈ Ω, t > 0
∂t ∂y
η(x, y, t) : surface elevation

Assume wavelength ≫ depth (long waves) u(x, y, t) and v(x, y, t) : horizontal (depth averaged) velocities
Assume small amplitudes relative to depth H(x, y) : stillwater depth (given)
Appropriate approx. for many ocean wave phenomena
Boundary conditions: either η, u or v given at each point
Initial conditions: all of η, u and v given
Reference: HPL chapter 6.2
Primary unknowns A global staggered mesh
Discretization: finite differences

Staggered mesh in time and space
⇒ η, u, and v unknown at different points:
ℓ ℓ+ 1 ℓ+ 1
ηi+ 1
,j+ 1 , ui,j+2 1 , vi+ 12 ,j+1
2 2 2 2
ℓ+ 1
vi+ 12 ,j+1
2
Widely used mesh in computational fluid dynamics (CFD)

ℓ+ 12 ℓ+ 21
Important for Navier-Stokes solvers
ui,j+ 1 ℓ
r ui+1,j+ 1
2 ηi+ 1
,j+ 1 2 Basic idea:
2 2
centered differences in time and space
ℓ+ 1
vi+ 12 ,j
2
Discrete equations; η Discrete equations; u
∂η ∂ ∂ ∂u ∂η 1
= − (uH) − (vH) = − at (i, j + , ℓ)
∂t ∂x ∂y ∂t ∂x 2
1 1 1 ℓ
at (i + , j + , ℓ − ) [Dt u = −Dx η]i,j+ 1
2 2 2 2
1 h ℓ+ 12 ℓ− 21
i 1 h ℓ ℓ
i
ℓ− 1 u 1 − u = − ηi+ 1 ,j+ 1 − ηi− 1 1
[Dt η = −Dx (uH) − Dy (vH)]i+ 12 ,j+ 1 ∆t i,j+ 2 i,j+ 12 ∆x 2 2 2 ,j+ 2
2 2
1 h ℓ ℓ−1
i 1 h ℓ− 12 ℓ− 12
i
η 1 1 − ηi+ 1 1 = − (Hu)i+1,j+ 1 − (Hu) i,j+ 12
∆t i+ 2 ,j+ 2 2 ,j+ 2 ∆x 2
1 h ℓ− 1 ℓ− 1
i
− (Hv)i+ 12 ,j+1 − (Hv)i+ 12 ,j
∆y 2 2
Discrete equations; v Complicated costline boundary
∂v ∂η 1
= − at (i + , j, ℓ)
∂t ∂y 2
ℓ
[Dt v = −Dy η]i+ 1
2,j
1 h 1
i 1 h ℓ i
ℓ+ ℓ− 1 ℓ
v 12 − vi+ 12,j = η 1 1 − ηi+ 1 1
∆t i+ 2 ,j 2 ∆y i+ 2 ,j+ 2 2 ,j− 2
Saw-tooth approximation to real boundary

Successful method, widely used
Warning: can lead to nonphysical waves

Relation to the wave equation Stability and accuracy
Eliminate u and v (easy in the PDEs) and get Centered differences in time and space
2 Truncation error, dispersion analysis: O(∆x2 , ∆y 2 , ∆t2 )
∂ η
= ∇ · [H(x, y)∇η] Stability as for the std. wave equation in 2D:
∂t2
s
Eliminate discrete u and v 1 1
∆t ≤ H − 2 1 1
⇒ Standard 5-point explicit finite difference scheme for discrete η ∆x2 + ∆y 2
(quite some algebra needed, try 1D first)
(CFL condition)
If H const, exact numerical solution is possible for
one-dimensional wave propagation
Verification of an implementation Tsunami due to a slide
How can we verify that the program works?

Compare with an analytical solution
(if possible)
Check that basic physical mechanisms are reproduced in a
qualitatively correct way by the program
Surface elevation ahead of the slide, dump behind

Initially, negative dump propagates backwards
The surface waves propagate faster than the slide moves
Tsunami due to faulting Tsunami approaching the shore
p
The velocity of a tsunami is gH(x, y, t).
The sea surface deformation reflect the bottom deformation

The back part of the wave moves at higher speed ⇒ the wave
Velocity of surface waves (H ∼ 5 km): 790 km/h
becomes more peak-formed
Velocity of seismic waves in the bottom: 6000–25000 km/h
Deep water (H ∼ 3 km): wave length 40 km, height 1 m
Shallow water (H ∼ 10 m): wave length 2 km, height 4 m
Tsunamis experienced from shore
As a fast tide, with strong currents in fjords

A wall of water approaching the beach
Wave breaking: the top has larger effective depth and moves faster
than the front part (requires a nonlinear PDE)
Convection-dominated flow
Shallow water waves – p.55/148 Convection-dominated flow – p.56/148

Typical transport PDE The transport PDE for fluid flow
Transport of a scalar u (heat, pollution, ...) in fluid flow v: The fluid flow itself is governed by Navier-Stokes equations:
∂u ∂v 1
+ v · ∇u = α∇2 u + f + v · v = − ∇p + ν∇2 v + f
∂t ∂t ̺
Convection (change of u due to the flow): v · ∇u ∇·v =0

Diffusion (change of u due to molecular collisions): α∇2 u Important dimensionless number: Reynolds number Re
Common case: convection ≫ diffusion → numerical difficulties
convection |v · ∇v| V 2 /L VL
Important dimensionless number: Peclet number Pe Re = = ∼ =
diffusion |ν∇2 v| αV /L2 ν
|v · ∇u| V U/L VL
Pe = ∼ = Re ≫ 1 and Pe ≫ 1: numerical difficulties
|α∇2 u| αU/L2 α
V : characteristic velocity v, L: characteristic length scale, α:

diffusion constant, U : characteristic size of u
Convection-dominated flow – p.57/148 Convection-dominated flow – p.58/148
A 1D stationary transport problem Notation for difference equations (1)
Assumption: no time, 1D, no source term Define

uni+ 1 ,j,k − uni− 1 ,j,k
2 ′ ′′ ′ ′′ α [Dx u]ni,j,k ≡ 2 2
v · ∇u = α∇ u → vu = αu → u = ǫu , ǫ= h
v
with similar definitions of Dy , Dz , and Dt
Complete model problem:
Another difference:
u′ (x) = ǫu′′ (x), x ∈ (0, 1), u(0) = 0, u(1) = 1 uni+1,j,k − uni−1,j,k
[D2x u]ni,j,k ≡
2h
ǫ small: boundary layer at x = 1
Standard numerics (i.e. centered differences) will fail! Compound difference:
Cure: upwind differences 1
[Dx Dx u]ni = un − 2uni + uni+1
h2 i−1
Notation for difference equations (1) Centered differences
One-sided forward difference:

u′ (x) = ǫu′′ (x), x ∈ (0, 1), u(0) = 0, u(1) = 1
un − uni
[Dx+ u]ni ≡ i+1
h ui+1 − ui−1 ui−1 − 2ui + ui+1
=ǫ , i = 2, . . . , n − 1
2h h2
and the backward difference:
u1 = 0, un = 1
uni − uni−1
[Dx− u]ni ≡ or
h
[D2x u = ǫDx Dx u]i
Put the whole equation inside brackets: Analytical solution:
1 − ex/ǫ
[Dx Dx u = −f ]i u(x) =
1 − e1/ǫ
is a finite difference scheme for u′′ = −f
⇒ u′ (x) > 0, i.e., monotone function
Numerical experiments (1) Numerical experiments (2)
n=20, epsilon=0.1 n=20, epsilon=0.01

1.2 1.2
1 centered 1 centered
exact exact
0.8 0.8
0.6 0.6
0.4 0.4
u(x)
u(x)
0.2 0.2
0 0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x x


1.2 1.2
1 centered 1 centered
exact exact
0.8 0.8
0.6 0.6
0.4 0.4
u(x)
u(x)
0.2 0.2
0 0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x x
Numerical experiments; summary Analysis
The solution is not monotone if h > 2ǫ Can find an analytical solution of the discrete problem (!)
The convergence rate is h2 Method: insert ui ∼ β i and solve for β
(expected since both differences are of 2nd order)
provided h ≤ 2ǫ 1 + h/(2ǫ)
β1 = 1, β2 =
Completely wrong qualitative behavior for h ≫ 2ǫ 1 − h/(2ǫ)
cf. HPL app. A.4.4

Complete solution:
ui = C1 β1i + C2 β2i
Determine C1 and C2 from boundary conditions
β2i − β2
ui =
β2n − β2
Important result Upwind differences
Observe: ui oscillates if β2 < 0 Problem:
1 + h/(2ǫ) u′ (x) = ǫu′′ (x), x ∈ (0, 1), u(0) = 0, u(1) = 1

⇒ <0 ⇒ h > 2ǫ
1 − h/(2ǫ)
Use a backward difference, called upwind difference, for the u′
Must require h ≤ 2ǫ for ui to have the same qualitative property as term:
u(x)
ui − ui−1 ui−1 − 2ui + ui+1
This explains why we observed oscillations in the numerical =ǫ , i = 2, . . . , n − 1
h h2
solution
u1 = 0, un = 1
The scheme can be written as
[Dx− u = ǫDx Dx u]i

1.2 1.2
1 upwind 1 upwind
exact exact
0.8 0.8
0.6 0.6
0.4 0.4
u(x)
u(x)
0.2 0.2
0 0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x x

Numerical experiments; summary Analysis
The solution is always monotone, i.e., always qualitatively correct Analytical solution of the discrete equations:
The boundary layer is too thick
ui = β i ⇒ β1 = 1, β2 = 1 + h/ǫ
The convergence rate is h
(in agreement with truncation error analysis)
ui = C1 + C2 β2i
Using boundary conditions:
β2i − β2
ui =
β2n − β2
Since β2 > 0 (actually β2 > 1), β2i does not oscillate
Centered vs. upwind scheme An interpretation of the upwind scheme
Truncation error: The upwind scheme

centered is more accurate than upwind
ui − ui−1 ui−1 − 2ui + ui+1
Exact analysis: =ǫ
centered is more accurate than upwind when centered is stable h h2
(i.e. monotone ui ), but otherwise useless or
ǫ = 10−6 ⇒ 500 000 grid points to make h ≤ 2ǫ [Dx− u = ǫDx Dx u]i
Upwind gives best reliability, at a cost of a too thick boundary layer can be rewritten as
ui+1 − ui−1 h ui−1 − 2ui + ui+1

= (ǫ + )
2h 2 h2
or
h
[D2x u = (ǫ + )Dx Dx u]i
2
Upwind = centered + artificial diffusion (h/2)
Finite elements for the model problem Finite elements and upwind differences
Galerkin formulation of How to construct upwind differences in a finite element context?

′ ′′ One possibility: add artificial diffusion (h/2)
u (x) = ǫu (x), x ∈ (0, 1), u(0) = 0, u(1) = 1
h ′′
and linear (P1) elements leads to a centered scheme (show it!) u′ (x) = (ǫ + )u (x), x ∈ (0, 1), u(0) = 0, u(1) = 1
2
ui+1 − ui−1 ui−1 − 2ui + ui+1
=ǫ , i = 2, . . . , n − 1 Can be solved by a Galerkin method
2h h2
Another, equivalent strategy: use of perturbed weighting functions
u1 = 0, un = 1
or
[D2x u = ǫDx Dx u]i
Stability problems when h > 2ǫ
Perturbed weighting functions in 1D Optimal artificial diffusion
Take Try a weighted sum of a centered and an upwind discretization:

wi (x) = ϕi (x) + τ ϕ′i (x)
[u′ ]i ≈ [θDx− u + (1 − θ)D2x u]i , 0≤θ≤1
or alternatively written
[θDx− u + (1 − θ)D2x u = ǫDx Dx u]i
w(x) = v(x) + τ v ′ (x)
Is there an optimal θ?
where v is the standard test function in a Galerkin method
Yes, for
′
Use this wi or w as test function for the convective term u : h 2ǫ
θ(h/ǫ) = coth −
Z 1 Z 1 Z 1 2ǫ h
u′ wdx = u′ vdx + τ u′ v ′ dx we get exact ui (i.e. u exact at nodal points)
0 0 0
Equivalent artificial diffusion τo = 0.5hθ(h/ǫ)
The new term τ u′ v ′ is the weak formulation of an artificial diffusion
Exact finite element method: w(x) = v(x) + τo v ′ (x) for the
term τ u′′ v
convective term u′
With τ = h/2 we then get the upwind scheme

Multi-dimensional problems Streamline diffusion
Model problem: Idea: add diffusion in the streamline direction

v · ∇u = α∇2 u Isotropic physical diffusion, expressed through a diffusion tensor :
or written out: d X
d
∂u ∂u X ∂2u
vx + vy = α∇2 u αδij = α∇2 u
∂x ∂y i=1 j=1
∂xi ∂xj
Non-physical oscillations occur with centered differences or
Galerkin methods when the left-hand side terms are large αδij is the diffusion tensor (here: same in all directions)
Remedy: upwind differences Streamline diffusion makes use of an anisotropic diffusion tensor
αij :
Downside: too much diffusion
Xd X d
Important result: extra stabilizing diffusion is needed only in the ∂ ∂u vi vj
αij , αij = τ
streamline direction, i.e., in the direction of v = (vx , vy ) i=1 j=1
∂xi ∂xj ||v||2
Implementation: artificial diffusion term or perturbed weighting

function
Perturbed weighting functions (1) Perturbed weighting functions (2)
Consider the weighting function ⇒ Streamline diffusion can be obtained by perturbing the weighting
function
∗
w = v + τ v · ∇v Common name: SUPG
R (streamline-upwind/Petrov-Galerkin)
for the convective (left-hand side) term: w v · ∇u dΩ
This expands to
Z Z
vv · ∇udΩ + τ ∗ v · ∇u v · ∇vdΩ
The latterP
term can be viewed as the Galerkin formulation of (write
v · ∇u = i ∂u/∂xi etc.)
Xd X d
∂ ∂u
τ ∗ vi vj
i=1 j=1
∂xi ∂xj
Consistent SUPG A step back to 1D
Why not just add artificial diffusion? Let us try to use

Why bother with perturbed weighting functions? w(x) = v(x) + τ v ′ (x)
In standard FEM (method of weighted residuals), on both terms in u = ǫu′′ : ′
Z Z 1 Z 1
L(u)wΩ = 0 (u′ v + (ǫ + τ )u′ v ′ )dx + τ v ′′ u′ dx = 0
Ω 0 0
the exact solution is a solution of the FEM equations (it fulfills Problem: last term contains v ′′
L(u)) Remedy: drop it (!)
This no longer holds if we Justification: v ′′ = 0 on each linear (P1) element
add an artificial diffusion term (∼ h/2)
Drop 2nd-order derivatives of v in 2D/3D too
use different weighting functions on different terms
Consistent SUPG is not so consistent...
Idea: use consistent SUPG
no artificial diffusion term
same (perturbed) weighting function
applies to all terms
Choosing τ ∗ A test problem (1)
Choosing τ ∗ is a research topic y

du/dn=0 or u=0
1
Many suggestions y = x tan θ + 0.25
Two classes: v
τ∗ ∼ h u=1 du/dn=0
θ
τ ∗ ∼ ∆t (time-dep. problems)
Little theory
0.25
u=0
u=0 1 x

A test problem (2) Galerkin’s method
Methods:
1. Classical SUPG:
2
Brooks and Hughes: "A streamline upwind/Petrov-Galerkin finite
element formulation for advection domainated flows with particular
emphasis on the incompressible Navier-Stokes equations", Comp.
Methods Appl. Mech. Engrg., 199-259, 1982. 1
2. An additional discontinuity-capturing term
v · ∇u
w = v + τ ∗ v · ∇v + τ̂ ∇u 0
||∇u||2
−0.65
was proposed in 0
Hughes, Mallet and Mizukami: "A new finite element formulation

for computational fluid dynamics: II. Beyond SUPG", Comp. 1
1
Methods Appl. Mech. Engrg., 341-355, 1986. Z 0
X
SUPG Time-dependent problems
Model problem:
∂u
+ v · ∇u = ǫ∇2 u
2
∂t
Can add artificial streamline diffusion term
Can use perturbed weighting function
1
w = v + τ ∗ v · ∇v
0
on all terms
How to choose τ ∗ ?
−0.65
0
1 1
Z 0
X
Taylor-Galerkin methods (1) Taylor-Galerkin methods (2)
Idea: Lax-Wendroff + Galerkin We can write the scheme on the form

Model equation: n
∂u 1 ∂2u
∂u ∂u Dt+ u + U = U 2 ∆t 2
+U =0 ∂x 2 ∂x
∂t ∂x
Lax-Wendroff: 2nd-order Taylor series in time, ⇒ Forward scheme with artificial diffusion
n 2
n Lax-Wendroff: centered spatial differences,
∂u 1 ∂ u
un+1 = un + ∆t + ∆t2
∂t 2 ∂t2 1 2
[δt+ u + U D2x u = U ∆tDx Dx u]ni
2
Replace temporal by spatial derivatives,
Alternative: Galerkin’s method in space,
∂ ∂
= −U 1 2
∂t ∂x [δt+ u + U D2x u = U ∆tDx Dx u]ni
2
Result:
n 2 n provided that we lump the mass matrix
∂u 1 ∂ u
un+1 = un − U ∆t + U 2 ∆t2 This is Taylor-Galerkin’s method
∂x 2 ∂x2 Convection-dominated flow – p.93/148 Convection-dominated flow – p.94/148
Taylor-Galerkin methods (3) Taylor-Galerkin methods (4)
In multi-dimensional problems, Can use the Galerkin method in space

(gives centered differences)
∂u
+ v · ∇u = 0 The result is close to that of SUPG, but τ ∗ is diferent
∂t
⇒ The Taylor-Galerkin method points to τ ∗ = ∆t/2 for SUPG in
we have time-dependent problems
∂
= −v · ∇
∂t
and (∇ · v = 0)
Xd X d
∂2 ∂ ∂
2
= ∇ · (vv · ∇) = vr vs
∂t r=1 s=1
∂xr ∂xs
This is streamline diffusion with τ ∗ = ∆t/2:
1
[Dt+ u + v · ∇u = ∆t∇ · (vv · ∇u)]n
2
The importance of linear system solvers
PDE problems often (usually) result in linear systems of algebraic

equations
Ax = b
Special methods utilizing that A is sparse is much faster than
Gaussian elimination!
Solving linear systems
Most of the CPU time in a PDE solver is often spent on solving
Ax = b
⇒ Important to use fast methods
Solving linear systems – p.97/148 Solving linear systems – p.98/148
Example: Poisson eq. on the unit cube (1) Example: Poisson eq. on the unit cube (2)
−∇2 u = f on an n = q × q × q grid Compare Banded Gaussian elimination (BGE) versus Conjugate

FDM/FEM result in Ax = b system Gradients (CG)
FDM: 7 entries pr. row in A are nonzero Work in BGE: O(q 7 ) = O(n2.33 )
FEM: 7 (tetrahedras), 27 (trilinear elms.), or 125 (triquadratic Work in CG: O(q 3 ) = O(n) (multigrid; optimal),
elms.) entries pr. row in A are nonzero for the numbers below we use incomplete factorization
preconditioning: O(n1.17 )
A is sparse (mostly zeroes)
n = 27000:
Fraction of nonzeroes: Rq −3
(R is nonzero entries pr. row) CG 72 times faster than BGE
BGE needs 20 times more memory than CG
Important to work with nonzeroes only!
n = 8 million:
CG 107 times faster than BGE
BGE needs 4871 times more memory than CG
Classical iterative methods Convergence
Ax = b, A ∈ IRn,n , x, b ∈ IRn . M xk = N xk−1 + b, k = 1, 2, . . .
Split A: A = M − N The iteration converges if G = M −1 N has its largest eigenvalue,

̺(G), less than 1
Write Ax = b as
M x = N x + b, Rate of convergence: R∞ (G) = − ln ̺(G)
and introduce an iteration To reduce the initial error by a factor ǫ,
k
Mx = Nx k−1
+ b, k = 1, 2, . . . ||x − xk || ≤ ǫ||x − x0 ||
Systems M y = z should be easy/cheap to solve one needs

− ln ǫ/R∞ (G)
Different choices of M correspond to different classical iteration
methods: iterations
Jacobi iteration
Gauss-Seidel iteration
Successive Over Relaxation (SOR)
Symmetric Successive Over Relaxation (SSOR)
Some classical iterative methods Jacobi iteration
Split: A = L + D + U M =D
L and U are lower and upper triangular parts, D is A’s diagonal Put everything, except the diagonal, on the rhs
Jacobi iteration: M = D (N = −L − U ) 2D Poisson equation −∇2 u = f :
Gauss-Seidel iteration: M = L + D (N = −U )
ui,j−1 + ui−1,j + ui+1,j + ui,j+1 − 4ui,j = −h2 fi,j
SOR iteration: Gauss-Seidel + relaxation
SSOR: two (forward and backward) SOR steps Solve for diagonal element and use old values on the rhs:
Rate of convergence R∞ (G) for −∇2 u = f in 2D with u = 0 as 1 k−1
BC: uki,j = u + uk−1 k−1 k−1 2
i−1,j + ui+1,j + ui,j+1 + h fi,j
4 i,j−1
Jacobi: πh2 /2
Gauss-Seidel: πh2 for k = 1, 2, . . .
SOR: π2h
SSOR: > πh
SOR/SSOR is superior (h vs. h2 , h → 0 is small)

Relaxed Jacobi iteration Relation to explicit time stepping
Idea: Computed new x approximation x∗ from Relaxed Jacobi iteration for −∇2 u = f is equivalent with solving
∗ k−1 ∂u
Dx = (−L − U )x +b
α = ∇2 u + f
∂t
Set
xk = ωx∗ + (1 − ω)xk−1 by an explicit forward scheme until ∂u/∂t ≈ 0, provided
ω = 4∆t/(αh)2
weighted mean of xk−1 and xk if ω ∈ (0, 1)
Stability for forward scheme implies ω ≤ 1
In this example: ω = 1 best (⇔ largest ∆t)
Forward scheme for t → ∞ is a slow scheme, hence Jacobi
iteration is slow
Gauss-Seidel/SOR iteration Symmetric/double SOR: SSOR
M =L+D SSOR = Symmetric SOR

For our 2D Poisson eq. scheme: One (forward) SOR sweep for unknowns 1, 2, 3, . . . , n
One (backward) SOR sweep for unknowns n, n − 1, n − 2, . . . , 1

1 k
uki,j = u + uki−1,j + uk−1 k−1 2
i+1,j + ui,j+1 + h fi,j
4 i,j−1 M can be shown to be
−1
i.e. solve for diagonal term and use the most recently computed 1 1 1 1
values on the right-hand side M= D+L D D+U
2−ω ω ω ω
SOR is relaxed Gauss-Seidel iteration:
compute x∗ from Gauss-Seidel it. Notice that each factor in M is diagonal or lower/upper triangular
(⇒ very easy to solve systems M y = z)
set xk = ωx∗ + (1 − ω)xk−1
ω ∈ (0, 2), with ω = 2 − O(h2 ) as optimal choice
Very easy to implement!
Status: classical iterative methods Conjugate Gradient-like methods
Jacobi, Gauss-Seidel/SOR, SSOR are too slow for paractical PDE

Ax = b, A ∈ IRn,n , x, b ∈ IRn .
computations
The simplest possible solution method for −∇2 u = f and other Use a Galerkin or least-squares method to solve a linear system
stationary PDEs in 2D/3D is to use SOR (!)
Classical iterative methods converge quickly in the beginning but Idea: write
k
X
slow down after a few iterations
xk = xk−1 + αj q j
Classical iterative methods are important ingredients in multigrid j=1
methods
αj : unknown coefficients, q j : known vectors
Compute the residual:
k
X
r k = b − Axk = rk−1 − αj Aq j
j=1
and apply the ideas of the Galerkin or least-squares methods
Galerkin Least squares
Residual: Residual:
k
X k
X
r k = b − Axk = r k−1 − αj Aq j r k = b − Axk = r k−1 − αj Aq j
j=1 j=1
(r k , q i ) = 0 ∂
(rk , rk ) = 0
∂αi
Galerkin’s method (r ∼ R, q j ∼ Nj , αj ∼ uj ):
Least squares: minimize (r k , rk )
(rk , q i ) = 0, i = 1, . . . , k Result: linear system for αj :
(·, ·): Eucledian inner product k

X
Result: linear system for αj , (Aq i , Aq j )αj = (rk−1 , Aq i ), i = 1, . . . , k
j=1
k
X
(Aq i , q j )αj = (rk−1 , q i ), i = 1, . . . , k
j=1

The nature of the methods Extending the basis
Start with a guess x0 Vk is normally selected as a so-called Krylov subspace:

In iteration k: seek xk in a k-dimensional vector space Vk
Vk = span{r 0 , Ar 0 , . . . , Ak−1 r 0 }
Basis for the space: q 1 , . . . , q k
Use Galerkin or least squares to compute the (optimal) Alternatives for computing q k+1 ∈ Vk+1 :
approximation xk in Vk
k
X
Extend the basis from Vk to Vk+1 (i.e. find q k+1 ) q k+1 = rk + βj q j
j=1
k
X
q k+1 = Aqk + βj q j
j=1
The first dominates in frequently used algorithms – only that

choice is used hereafter
How to choose βj ?
Orthogonality properties Formula for updating the basis vectors
Bad news: must solve a k × k linear system for αj in each Define

iteration (as k → n the work in each iteration approach the work of hu, vi ≡ (Au, v) = uT Av
solving Ax = b!)
and
The coefficient matrix in the αj system: [u, v] ≡ (Au, Av) = uT AT Av
(Aq i , q j ), (Aq i , Aq j ) Galerkin: require A-orthogonal qj vectors, which then results in
Idea: make the coefficient matrices diagonal hrk , q i i

βi = −
That is, hq i , q i i
Galerkin: (Aq i , q j ) = 0 for i 6= j
Least squares: require AT A–orthogonal qj vectors, which then
Least squares: (Aq i , Aq j ) = 0 for i 6= j results in
Use βj to enforce orthogonality of q i [rk , q i ]
βi = −
[q i , q i ]
Simplifications Symmetric A
Galerkin: hq i , q j i = 0 for i 6= j gives If A is symmetric (AT = A) and positive definite (positive eigenvalues
⇔ y T Ay > 0 for any y 6= 0), also βi = 0 for i < k
(rk−1 , q k ) ⇒ need to store q k only
αk =
hq k , q k i (q 1 , . . . , q k−1 are not used in iteration k)
and αi = 0 for i < k (!):
xk = xk−1 + αk q k
That is, hand-derived formulas for αj

Least squares:
(rk−1 , Aq k )
αk =
[q k , q k ]
and αi = 0 for i < k
Summary: least squares algorithm Truncation and restart
given a start vector x0 , Problem: need to store q 1 , . . . , q k

compute r 0 = b − Ax0 and set q 1 = r 0 . Much storage and computations when k becomes large
for k = 1, 2, . . . until termination criteria are fulfilled:
αk = (r k−1 , Aq k )/[q k , q k ] Truncation: work with a truncated sum for xk ,
xk = xk−1 + αk q k
k
X
r k = rk−1 − αk Aq k
xk = xk−1 + αj q j
if A is symmetric then
j=k−K+1
βk = [rk , q k ]/[q k , q k ]
q k+1 = r k − βk q k
where a possible choice is K = 5
else
βj = [r k , q j ]/[q j , q j ], j = 1, . . . , k Small K might give convergence problems
Pk
q k+1 = r k − j=1 βj q j Restart: restart the algorithm after K iterations
(alternative to truncation)
The Galerkin-version requires A to be symmetric and positive
definite and results in the famous Conjugate Gradient method

Family of methods Convergence
Generalized Conjugate Residual method Conjugate Gradient-like methods converge slowly (but usually
= least squares + restart faster than SOR/SSOR)
Orthomin method To reduce the initial error by a factor ǫ,
= least squares + truncation
1 2√
Conjugate Gradient method ln κ
= Galerkin + symmetric and positive definite A 2 ǫ
Conjugate Residuals method iterations are needed, where κ is the condition number:
= Least squares + symmetric and positive definite A
largest eigenvalue of A
Many other related methods: BiCGStab, Conjugate Gradients κ=
smalles eigenvalue of A
Squared (CGS), Generalized Minimum Residuals (GMRES),
Minimum Residuals (MinRes), SYMMLQ
κ = O(h−2 ) when solving 2nd-order PDEs
Common name: Conjugate Gradient-like methods (incl. elasticity and Poisson eq.)
All of these are easily called in Diffpack
Preconditioning Classical methods as preconditioners
Idea: Introduce an equivalent system Idea: “solve” M y = z by one iteration with a classical iterative
method (Jacobi, SOR, SSOR)
−1 −1
M Ax = M b Jacobi preconditioning: M = D (diagonal of A)
solve it with a Conjugate Gradient-like method and construct M No extra storage as M is stored in A
such that No extra computations as M is a part of A
1. κ = O(1) ⇒ M ≈ A (i.e. fast convergence) Efficient solution of M y = z
2. M is cheap to compute
But: M is probably not a good approx to A
3. M is sparse (little storage) ⇒ poor quality of this type of preconditioners?
4. systems M y = z (occuring in the algorithm due to
Conjugate Gradient method + SSOR preconditioner is widely
M −1 Av-like products) are efficiently solved (O(n) op.)
used
Contradictory requirements!
The preconditioning business: find a good balance between 1-4
M as a factorization of A M as an incomplete factorization of A
Idea: Let M be an LU-factorization of A, i.e., New idea: compute sparse L and U

How? compute only with nonzeroes in A
M = LU
bU
⇒ Incomplete factorization, M = L b 6= LU
where L and U are lower and upper triangular matrices resp.
M is not a perfect approx to A
Implications:
M is cheap to compute and store
1. M = A (κ = 1): very efficient preconditioner! (O(n) complexity)
2. M is not cheap to compute
M y = z is efficiently solved (O(n) complexity)
(requires Gaussian elim. on A!)
3. M is not sparse (L and U are dense!) This method works well - much better than SOR/SSOR
4. systems M y = z are not efficiently solved preconditioning
(O(n2 ) process when L and U are dense)
How to compute M Numerical experiments
Run through a standard Gaussian elimination, which factors A as Two test cases:
A = LU −∇2 u = f on the unit cube and FDM
Normally, L and U have nonzeroes where A has zeroes −∇2 u = f on the unit cube and FEM
Idea: let L and U be as sparse as A Diffpack makes it easy to run through a series of numerical
Compute only with the nonzeroes of A experiments, using multiple loops, e.g.,
sub LinEqSolver_prm
Such a preconditioner is called Incomplete LU Factorization, ILU set basic method = { ConjGrad & MinRes }
ok
Option: add contributions outside A’s sparsity pattern to the sub Precond_prm
set preconditioning type = PrecRILU
diagonal, multiplied by ω set RILU relaxation parameter = { 0.0 & 0.4 & 0.7 & 1.0 }
ok
Relaxed Incomplete Factorization (RILU): ω > 1
Modified Incomplete Factorization (MILU): ω = 1
See algorithm C.3 in the book

Test case 1: 3D FDM Poisson eq. Jacobi vs. SOR vs. SSOR
Equation: −∇2 u = 1 n = 203 = 8000 and n = 303 = 27000

Boundary condition: u = 0 Jacobi: not converged in 1000 iterations
7-pt star standard finite difference scheme SOR(ω = 1.8): 2.0s and 9.2s
Grid size: 20 × 20 × 20 = 8000 points and 20 × 30 × 30 = 27000 SSOR(ω = 1.8): 1.8s and 9.8s
points Gauss-Seidel: 13.2s and 97s
Source code: SOR’s sensitivity to relax. parameter ω:
$NOR/doc/Book/src/linalg/LinSys4/ 1.0: 96s, 1.6: 23s, 1.7: 16s, 1.8: 9s, 1.9: 11s
All details in HPL Appendix D SSOR’s sensitivity to relax. parameter ω:
Input files: 1.0: 66s, 1.6: 17s, 1.7: 13s, 1.8: 9s, 1.9: 11s
$NOR/doc/Book/src/linalg/LinSys4/experiments ⇒ relaxation is important,
Solver’s CPU time written to standard output great sensitivity to ω
Conjugate Residuals or Gradients? Different preconditioners
Compare Conjugate Residuals with Conjugate Gradients ILU, Jacobi, SSOR preconditioners (ω = 1.2)
Or: least squares vs. Galerkin MinRes:
Diffpack names: MinRes and ConjGrad Jacobi: not conv., SSOR: 11.4s, ILU: 4s
MinRes: not converged in 1000 iterations ConjGrad:

Jacobi: 4.8s, SSOR: 2.8s, ILU: 2.7s
ConjGrad: 0.7s and 3.9s
Sensitivity to relax. parameter in SSOR, with ConjGrad as solver:
⇒ ConjGrad is clearly faster than the best SOR/SSOR 1.0: 3.3s, 1.6: 2.1s, 1.8: 2.1s, 1.9: 2.6s
Add ILU preconditioner Sensitivity to relax. parameter in RILU, with ConjGrad as solver:
MinRes: 0.7s and 4s 0.0: 2.7s, 0.6: 2.4s, 0.8: 2.2s, 0.9: 1.9s,
0.95: 1.9s, 1.0: 2.7s
ConjGrad: 0.6s and 2.7s
⇒ ω slightly less than 1 is optimal,
The importance of preconditioning grows as n grows RILU and SSOR are equally fast (here)
Test case 2: 3D FEM Poisson eq. Jacobi vs. SOR vs. SSOR
Equation: −∇2 u = A1 π 2 sin πx + 4A2 π 2 sin 2πy + 9A3 π 2 sin 3πz n = 9261 and n = 303 = 29791, trilinear and triquadratic elms.
Boundary condition: u known Jacobi: not converged in 1000 iterations
ElmB8n3D and ElmB27n3D elements SOR(ω = 1.8): 9.1s and 81s, 42s and 338s
Grid size: 21 × 21 × 21 = 9261 nodes and 31 × 31 × 31 = 29791 SSOR(ω = 1.8): 47s and 248s, 138s and 755s
nodes Gauss-Seidel: not converged in 1000 iterations
Source code:
SOR’s sensitivity to relax. parameter ω:
$NOR/doc/Book/src/fem/Poisson2
1.0: not conv., 1.6: 200s, 1.8: 83s, 1.9: 57s
All details in HPL Chapter 3.2 and 3.5 (n = 29791 and trilinear elements)
Input files: SSOR’s sensitivity to relax. parameter ω:
$NOR/doc/Book/src/fem/Poisson2/linsol-experiments 1.0: not conv., 1.6: 212s, 1.7: 207s, 1.8: 245s, 1.9: 435s
Solver’s CPU time available in casename-summary.txt (n = 29791 and trilinear elements)
⇒ relaxation is important,
great sensitivity to ω
Conjugate Residuals or Gradients? Different preconditioners
Compare Conjugate Residuals with Conjugate Gradients ILU, Jacobi, SSOR preconditioners (ω = 1.2)
Or: least squares vs. Galerkin MinRes:
Diffpack names: MinRes and ConjGrad Jacobi: 68s., SSOR: 57s, ILU: 28s
MinRes: not converged in 1000 iterations ConjGrad:

Jacobi: 19s, SSOR: 14s, ILU: 16s
9261 vs 29791 unknowns, trilinear elements
Sensitivity to relax. parameter in SSOR, with ConjGrad as solver:
ConjGrad: 5s and 22s 1.0: 17s, 1.6: 12s, 1.8: 13s, 1.9: 18s
⇒ ConjGrad is clearly faster than the best SOR/SSOR! Sensitivity to relax. parameter in RILU, with ConjGrad as solver:
Add ILU preconditioner 0.0: 16s, 0.6: 15s, 0.8: 13s, 0.9: 12s, 0.95: 11s, 1.0: 16s
MinRes: 5s and 28s ⇒ ω slightly less than 1 is optimal,
RILU and SSOR are equally fast (here)
ConjGrad: 4s and 16s
ILU prec. has a greater impact when using triquadratic elements
(and when n grows)

More experiments Multigrid methods
Convection-diffusion equations: Multigrid methods are the most efficient methods for solving linear
$NOR/doc/Book/src/app/Cd/Verify systems
Files: linsol_a.i etc as for LinSys4 and Poisson2 Multigrid methods have optimal complexity O(n)
Elasticity equations: Multigrid can be used as stand-alone solver or preconditioner
$NOR/doc/Book/src/app/Elasticity1/Verify Multigrid applies a hierarchy of grids
Files: linsol_a.i etc as for the others Multigrid is not as robust as Conjugate Gradient-like methods and
Run experiments and learn! incomplete factorization as preconditioner, but faster when it
works
Multigrid is complicated to implement
Diffpack has a multigrid toolbox that simplifies the use of multigrid
dramatically
The rough ideas of multigrid Damping in Gauss-Seidel’s method (1)
Observation: e.g. Gauss-Seidel methods are very efficient during Model problem: −u′′ = f by finite differences:
the first iterations
High-frequency errors are efficiently damped by Gauss-Seidel −uj−1 + 2uj − uj+1 = h2 fj
Low-frequence errors are slowly reduced by Gauss-Seidel solved by Gauss-Seidel iteration:

Idea: jump to a coarser grid such that low-frequency errors get
ℓ−1
higher frequency 2uℓj = uℓj−1 + uj+1 + h2 fj
Repeat the procedure
Study the error eℓi = uℓi − u∞
i :
On the coarsest grid: solve the system exactly
ℓ−1
Transfer the solution to the finest grid 2eℓj = eℓj−1 + ej+1
Iterate over this procedure
This is like a time-dependent problem, where the iteration index ℓ
is a pseudo time
Damping in Gauss-Seidel’s method (2) Gauss-Seidel’s damping factor
Can find eℓj with techniques from Appendix A.4: 1

|ξ| = √ , p = kh ∈ [0, π]
X 5 − 4 cos p
eℓj = Ak exp (i(kjh − ω̃ℓ∆t))
k
1
or (easier to work with here):

0.9
0.8
X 0.7
eℓj = Ak ξ ℓ exp (ikjh), ξ = exp (−iω̃∆t) 0.6
k 0.5
0.4
Inserting a wave component in the scheme: 0 0.5 1 1.5

p
2 2.5 3
exp (ikh) 1
ξ = exp (−iω̃∆t) = , |ξ| = √
2 − exp (−ikh) 5 − 4 cos kh Small p = kh ∼ h/λ: low frequency (relative to the grid) and small
damping
Interpretation of |ξ|: reduction in the error per iteration Large (→ π) p = kh ∼ h/λ: high frequency (relative to the grid)
and efficient damping
More than one grid Transferring the solution between grids
From the previous analysis: error components with high frequency From fine to coarser: restriction
are quickly damped
Jump to a coarser grid, e.g. h′ = 2h simple restriction
weighted restriction
p is increased by a factor of 2, i.e., not so high-frequency waves
fine grid function
on the h grid is efficiently damped by Gauss-Seidel on the h′ grid
Repeat the procedure
1 2 3 4 5 q-1
On the coarsest grid: solve by Gaussian elimination
1 2 3 4 5 6 7 8 9 q
Interpolate solution to a finer grid, perform Gauss-Seidel
iterations, and repeat until the finest grid is reached From coarse to finer: prolongation
interpolated fine grid function
coarse grid function
1 2 3 4 5 q−1
1 2 3 4 5 6 7 8 9 q

Smoothers A multigrid algorithm
The Gauss-Seidel method is called a smoother when used to Start with the finest grid
damp high-frequency error components in multigrid Perform smoothing (pre-smoothing)
Other smoothers: Jacobi, SOR, SSOR, incomplete factorization Restrict to coarser grid
No of iterations is called no of smoothing sweeps Repeat the procedure (recursive algorithm!)
Common choice: one sweep On the coarsest grid: solve accurately
Prolongate to finer grid
Perform smoothing (post-smoothing)
One cycle is finished when reaching the finest grid again
Can repeat the cycle
Multigrid solves the system in O(n) operations
Check out HPL C.4.2 for details!!
V- and W-cycles Multigrid requires flexible software
Different strategies for constructing cycles: Many ingredients in multigrid:

pre- and post-smoother
q γq =1 γq =2 no of smoothing sweeps
4 solver on the coarsest level
3 cycle strategy
restriction and prolongation methods
2
how to construct the various grids?
1
There are also other variants of multigrid (e.g. for nonlinear
problems)
coarse grid solve
smoothing
The optimal combination of ingredients is only known for simple
model problems (e.g. the Poisson eq.)
In general: numerical experimentation is required!
(Diffpack has a special multigrid toolbox for this)

Slides 8

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Slides 8

Загружено:

Авторское право:

Доступные форматы

5mm.

INF5620: Numerical Methods for Partial

Last update: April 29, 2011

Examples Nonlinear discrete equations; FDM

Nonlinear PDEs – p.3/148

Nonlinear discrete equations; FEM Nonlinearities in the FEM

Nonlinear PDEs – p.5/148 Nonlinear PDEs – p.6/148

The group finite element method FEM for a nonlinear coefficient

The group finite element method: We now look at

FEM/FDM for nonlinear PDEs gives Have

F (u) = 0 Iterative method: guess u0 , solve linear problems for u1 , u2 , . . .

F0 (u0 , . . . , uN ) = 0 i.e. the iteration converges

Nonlinear PDEs – p.9/148 Nonlinear PDEs – p.10/148

Picard iteration (1) Picard iteration (2)

Model problem: A(u)u = b Model problem: Au = b(u)

The Newton method for f (x) = 0, x ∈ IR Systems of nonlinear equations:

Nonlinear PDEs – p.13/148 Nonlinear PDEs – p.14/148

Model equation: u′′ = −f (u) 1

Z Z Z 1 Stable Backward Euler FDM in time:

FEM gives nonlinear algebraic equations: Picard iteration:

Newton’s method: need Jacobian,

Nonlinear PDEs – p.19/148 Nonlinear PDEs – p.20/148

Iteration methods at the PDE level Continuation methods

Consider −u′′ = f (u) Challenging nonlinear PDE:

Nonlinear PDEs – p.21/148 Nonlinear PDEs – p.22/148

using a finite difference method and the Galerkin method with P1

Nonlinear PDEs – p.23/148 Nonlinear PDEs – p.24/148

Nonlinear PDEs – p.25/148 Nonlinear PDEs – p.26/148

Nonlinear PDEs – p.27/148 Nonlinear PDEs – p.28/148

Consider the PDE Repeat Exercise 7 for the PDE

Nonlinear PDEs – p.29/148 Nonlinear PDEs – p.30/148

Nonlinear PDEs – p.31/148 Nonlinear PDEs – p.32/148

In Exercise 8, use the Picard iteration method with one iteration at

Shallow water waves

Nonlinear PDEs – p.33/148 Shallow water waves – p.34/148

Tsunamis Norwegian tsunamis

Waves in fjords, lakes, or oceans, generated by

human activity, like nuclear detonation, or slides generated by oil

Propagation over large distances Trondheim

Hardly recognizable in the open ocean, but wave amplitude

Run-up at the coasts may result in severe damage 60˚

Giant events: Dec 26 2004 (≈ 300000 killed), 1883 (similar to

Shallow water waves – p.35/148 Shallow water waves – p.36/148

Tsunamis in the Pacific Selected events; slides

location year run-up dead

Scenario: earthquake outside Chile, generates tsunami, propagating

Selected events; earthquakes etc. Why simulation?

location year strength run-up dead Increase the understanding of tsunamis

The selection is biased wrt. European events; 150 catastrophic

Tsunamis: no. 5 killer among natural hazards

η(x, y, t) : surface elevation

Shallow water waves – p.41/148 Shallow water waves – p.42/148

Primary unknowns A global staggered mesh

Discretization: finite differences

Widely used mesh in computational fluid dynamics (CFD)

Discrete equations; η Discrete equations; u

Shallow water waves – p.45/148 Shallow water waves – p.46/148

Discrete equations; v Complicated costline boundary

Saw-tooth approximation to real boundary

Shallow water waves – p.47/148 Shallow water waves – p.48/148

Shallow water waves – p.49/148 Shallow water waves – p.50/148

Verification of an implementation Tsunami due to a slide

How can we verify that the program works?

Surface elevation ahead of the slide, dump behind