Вы находитесь на странице: 1из 186

FINITE ELEMENT METHODS FOR THE NUMERICAL

SOLUTION OF PARTIAL DIFFERENTIAL EQUATIONS

Vassilios A. Dougalis

Department of Mathematics, University of Athens, Greece


and
Institute of Applied and Computational Mathematics, FORTH, Greece

Revised edition 2013


PREFACE

This is the current version of notes that I have used for the past thirty-ve years
in graduate courses at the University of Tennessee, Knoxville, the University of Crete,
the National Technical University of Athens, and the University of Athens. They have
evolved, and aged, over the years but I hope they may still prove useful to students
interested in learning the basic theory of Galerkin - nite element methods and some
facts about Sobolev spaces.
I rst heard about approximation with cubic Hermite functions and splines from
George Fix in the Numerical Analysis graduate course at Harvard in the fall of 1971, and
also, subsequently, from Garrett Birkho of course. But most of the basic techniques of
the analysis of Galerkin methods I learnt from courses and seminars that Garth Baker
taught at Harvard during the period 1973-75.
Over the years I was fortunate to be associated with and learn more about Galerkin
methods from Max Gunzburger, Ohannes Karakashian, Larry Bales, Bill McKinney,
George Akrivis, Vidar Thomee and my students; for my debt to the latter it is apt to
,
say that
`
oo.
I would like to thank very much Dimitri Mitsoudis and Gregory Kounadis for trans-
forming the manuscript into TEX.

V. A. Dougalis
Athens, March 2012

In the revised 2013 edition, two new chapters 6 and 7 on Galerkin nite element
methods for parabolic and second-order hyperbolic equations were added. These had
previously existed in hand-written form, and I would like to thank Gregory Kounadis
and Leetha Saridakis for writing them in TEX.

V. A. Dougalis
Athens, February 2013

i
Contents

1 Some Elements of Hilbert Space Theory 1


1.1 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Inner product, Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Some topological concepts . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Hilbert space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Examples of Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 The Projection Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7 Bounded (continuous) linear functionals on a Hilbert space . . . . . . . 14
1.8 Bounded (continuous) linear operators on a Hilbert space . . . . . . . . 17
1.9 The LaxMilgram and the Galerkin theorems . . . . . . . . . . . . . . 19

2 Elements of the Theory of Sobolev Spaces and Variational Formula-


tion of BoundaryValue Problems in One Dimension 26
2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2 Notation and preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3 The Sobolev space H 1 (I) . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.4 The Sobolev spaces H m (I), m = 2, 3, 4, . . . . . . . . . . . . . . . . . . . 46
0
2.5 The space H 1 (I) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.6 Twopoint boundaryvalue problems . . . . . . . . . . . . . . . . . . . 53
2.6.1 Zero Dirichlet boundary conditions. . . . . . . . . . . . . . . . . 53
2.6.2 Neumann boundary conditions. . . . . . . . . . . . . . . . . . . 58

3 Galerkin Finite Element Methods for TwoPoint BoundaryValue


Problems 63
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

ii
3.2 The Galerkinnite element method with piecewise linear, continuous
functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.3 An indenite problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.4 Approximation by Hermite cubic functions and cubic splines . . . . . . 84
3.4.1 Hermite, piecewise cubic functions . . . . . . . . . . . . . . . . . 84
3.4.2 Cubic splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4 Results from the Theory of Sobolev Spaces and the Variational For-
mulation of Elliptic BoundaryValue Problems in RN 99
4.1 The Sobolev space H 1 (). . . . . . . . . . . . . . . . . . . . . . . . . . 99
0
4.2 The Sobolev space H 1 (). . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.3 The Sobolev spaces H m (), m = 2, 3, 4, . . . . . . . . . . . . . . . . . . . 104
4.4 Sobolevs inequalities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.5 Variational formulation of some elliptic boundaryvalue problems. . . . 107
4.5.1 (a) Homogeneous Dirichlet boundary conditions. . . . . . . . . . 107
4.5.2 (b) Homogeneous Neumann boundary conditions. . . . . . . . . 110

5 The Galerkin Finite Element Method for Elliptic BoundaryValue


Problems 112
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.2 Piecewise linear, continuous functions on a triangulation of a plane
polygonal domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.3 Implementation of the nite element method with P1 triangles . . . . . 128

6 The Galerkin Finite Element Method for the Heat Equation 139
6.1 Introduction. Elliptic projection . . . . . . . . . . . . . . . . . . . . . . 139
6.2 Standard Galerkin semidiscretization . . . . . . . . . . . . . . . . . . . 141
6.3 Full discretization with the implicit Euler and the Crank-Nicolson method147
6.4 The explicit Euler method. Inverse inequalities and stiness . . . . . . 154

7 The Galerkin Finite Element Method for the Wave Equation 161
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.2 Standard Galerkin semidiscretization . . . . . . . . . . . . . . . . . . . 162

iii
7.3 Fully discrete schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

References 179

iv
Chapter 1

Some Elements of Hilbert Space


Theory

1.1 Vector spaces


A set V of elements u, v, w, . . . is called a vector space (over the complex numbers) if

1. For every pair of elements u V , v V we dene a new element w V , their


sum, denoted by w = u + v.

2. For every complex number and every u V we dene an element w = u V ,


the product of and u.

3. Sum and product obey the following laws:

i. u, v V : u + v = v + u.

ii. u, v, w V : (u + v) + w = u + (v + w).

iii. 0 V such that u + 0 = u, u V .

iv. u V (u) V such that u + (u) = 0.

v. 1 u = u, u V .

vi. (u) = ()u, u V and complex , .

vii. ( + )u = u + u, for u V , , complex.

viii. (u + v) = u + v, for u, v V , complex.

1
The elements u, v, w, . . . of V are called vectors.
An expression of the form

1 u1 + 2 u2 + . . . + n un ,

where i complex numbers and ui V is called a linear combination of the vectors ui ,


1 i n. The vectors u1 , . . . , un are called linearly dependent if there exist complex
numbers i , not all zero, for which:

1 u1 + 2 u2 + . . . + n un = 0.

They are called linearly independent if they are not linearly dependent, i.e. if 1 u1 +
2 u2 + . . . + n un = 0 holds only in the case 1 = 2 = . . . = n = 0.
A vector space V is called nitedimensional (of dimension n) if V contains n
linearly independent elements and if any n + 1 vectors in V are linearly dependent. As
a consequence, a set of n linearly independent vectors forms a basis of V , i.e. it is a set
of linearly independent vectors that spans V , i.e. such that any u in V can be written
uniquely as a linear combination of the basis vectors.

1.2 Inner product, Norm


A vector space V is called an inner product space if for every pair of elements u V ,
v V we can dene a complex number, denoted by (u, v) and called the inner product
of u and v, with the following properties:

1. u V : (u, u) 0. If (u, u) = 0 then u = 0.

2. (u, v) = (v, u), u, v V , where z is the complex conjugate of the complex


number z.

3. (u + v, w) = (u, w) + (v, w) for u, v, w V , , complex.

As a consequence of (2) and (3) (u, u) = (u, v) for u, v V and complex. The
vectors u, v are called orthogonal if (u, v) = 0.
For every u V we dene the nonnegative number u by
1
u = (u, u) 2 ,

2
which is called the norm of u. As a consequence of the properties of the inner product
we see that:

i. u 0 and if u = 0, then u = 0

ii. complex , u V : u = || u

iii. u, v V : u + v u + v (Triangle inequality).

To prove (iii) we rst prove the CauchySchwarz Inequality:

iv. |(u, v)| u v, u, v V .

To prove (iv) we may assume that (u, v) = 0. We let now = (u,v)


|(u,v)|
. We nd then for
any real that

0 (u + v, u + v) = 2 (v, v) + 2 |(u, v)| + (u, u).

Hence for any real the quadratic on the right hand side of the above inequality is
nonnegative. Hence, necessarily,

|(u, v)| 2 (u, u) (v, v),

which gives (iv). To prove now the triangle inequality (iii) we see that

u + v 2 = (u + v, u + v) = (u, u) + (v, v) + (u, v) + (v, u)

u2 + v2 + 2 |(u, v)| u2 + v2 + 2 u v

= (u + v) 2 ,

from which (iii) follows. (Supplied only with a norm that just satises properties
(i)(iii), V becomes a normed vector space).

1.3 Some topological concepts


In V we dene the distance of two vectors u and v as (u, v) = u v. If u0
is a xed vector in V and a given positive number, then the set of vectors v in V
which satisfy v u0 < is called the open ball with center u0 and radius . The set
v u0 is the closed ball with center u0 and radius . We say that a sequence of

3
vectors u1 , u2 , u3 , . . . in V is convergent if there exists a vector u V such that, given
> 0 there exists a positive integer N = N () for which:

un u < , for all n N.

We call u the limit of the sequence {ui }i1 and write limn un = u or un u in V as
n . It is easy to see that a convergent sequence has only one limit. Obviously
un u in V un u 0 as n .
A sequence of vectors u1 , u2 , u3 , . . . is said to be a Cauchy sequence if given any
> 0, there exists an integer N = N () such that

un um < , for all m, n N.

It is easy to see that every convergent sequence is Cauchy. The converse is not always
true. We will say that V is complete whenever every Cauchy sequence in V is conver-
gent. A subset A of V is called a dense subset of V if for every u V there exists a
sequence u1 , u2 , u3 , . . . A such that un u as n .
Exercise: If un u, vn v in V then:

(a) limn (un + vn ) = u + v for complex , .

(b) limn (un , vn ) = (u, v) (and as a consequence we say that the inner product is a
continuous function of its arguments).

(c) limn un = u.

(d) limn n un = u for every convergent sequence n of complex numbers.

1.4 Hilbert space


A complete inner product space V is called a Hilbert space. In other words, a Hilbert
space is an inner product space in which a Cauchy sequence is always convergent. We
will usually denote Hilbert spaces by H.
A subset S of a Hilbert space H is called a subspace of H if u S, v S imply
that u + v S, for any complex numbers , . S is said to be a dense subspace of H
if it is a dense subset of H and a subspace of H. S is said to be a closed subspace of H

4
if S is a subspace of H with the following property: let {un } be a convergent sequence
in H such that un S, n = 1, 2, 3, . . .. Then u = limn un belongs to S too.
Exercise: A dense, closed subspace of H coincides with H.
Given any (noncomplete) inner product space V we can prove that by adding new
elements to V we can extend V to a (complete) Hilbert space H such that V is a dense
subspace of H. The process is referred to as completion of V or as closure of V in H.

1.5 Examples of Hilbert spaces


n
A. H = Cn with the Euclidean inner product (u, v) = i=1 ui vi and norm u =
1/2
( ni=1 |ui |2 ) .

B. l2
We denote by l2 the set of all complex sequences u = {u1 , u2 , u3 , . . .} {ui }
i=1 which

satisfy the inequality




|uj | 2 < .
j=1

In l2 we dene u + v, u and (u, v) in the following way:

w = u + v with w = {wi }
i=1 , wi = ui + vi .

z = u ( complex number) z = {zi }


i=1 , zi = ui .


(u, v) = uj v j .
j=1


Then it follows that j=1 |zj | 2 < and j=1 |wj | 2 < because

|wj | 2 = |uj + vj | 2 2|uj | 2 + 2|vj | 2 .

The convergence of the series dening the inner product follows from

1
|uj vj | = |uj ||vj | {|uj | 2 + |vj | 2 }.
2

By verifying the axioms one by one we easily conclude that the set of all vectors
u, v, w, . . . with the above mentioned properties and the given operations form an inner
product space. (The zero vector is the sequence 0= {0, 0, . . .}). The space l2 is a

5
complete inner product space and hence a Hilbert space. To show that let u(1) , u(2) , . . .
be a Cauchy sequence in l2 with

(n) (n)
u(n) = {u1 , u2 , . . .}.

Then given > 0


( ) 12
1
(n) (m)
u(n) u(m) = (u(n) u(m) , u(n) u(m) ) = 2 |uj uj | 2 < (1.1)
j=1

for all n, m N (). In particular it follows that

(n) (m)
|uj uj | < , for all n, m N () and every j = 1, 2, 3, . . . .

(1) (2)
Fix j. Then the sequence uj , uj , . . . is convergent. We denote the limit of this
sequence by uj , i.e.
(n)
lim uj = uj for j = 1, 2, 3, . . . . (1.2)
n

Now it follows from (1.1) that for every positive integer k


k
(n) (m)
|uj uj | 2 < 2 for all n, m N ().
j=1

Letting m in the above, since it is a nite sum, we obtain by (1.2) that


k
(n)
|uj uj | 2 2 for all n, m N ().
j=1

Letting now k in the above, we obtain that




(n)
|uj uj | 2 2 for all n, m N (). (1.3)
j=1

We set now u = {u1 , u2 , . . .}. By (1.3), u u(n) l2 . Hence u = (u u(n) ) + u(n) l2 .


By (1.3) it also follows that
( ) 21


(n)
u(n) u = |uj uj | 2 < for all n N ().
j=1

Hence there exists u l2 such that u(n) u as n . We conclude that the


Cauchy sequence {u(1) , u(2) , . . .} is actually a convergent sequence. Therefore l2 is

6
complete and hence a Hilbert space. We remark that the CauchySchwarz inequality
|(u, v)| uv becomes
( ) 12 ( ) 21



| u j vj | |uj | 2 |vj | 2 .
j=1 j=1 j=1

The triangle inequality u + v u + v becomes


( ) 12 ( ) 12 ( ) 12

|uj + vj | 2 |uj | 2 + |vj | 2 .
j=1 j=1 j=1

C. L2 ()
Let be an open set of Rn . We describe the points in Rn by ntuples x = (x1 , x2 , . . . , xn )
and denote the (Euclidean) length of the vector x by:
( n ) 21

|x| = x2i .
i=1

We now consider the set of complexvalued continuous functions u(x) = u(x1 , . . . , xn )


dened on . Addition u+v and multiplication u by a complex number are dened,
as usual, by:

w = u + v, w(x) = u(x) + v(x) ,

z = u, z(x) = u(x)

We dene now an inner product for such functions by:



(u, v) = u(x)v(x)dx , (1.4)


where dx is the volume element in , i.e. dx = dx1 dx2 . . . dxn and
. . . dx is the
multiple integral (in the Riemann sense)

. . . dx = . . . . . . dx1 dx2 . . . dxn .
| {z }

Since is an arbitrary open set of Rn , the integral in (1.4) dening the inner product
may not exist. We restrict therefore our attention to complexvalued functions u(x),
dened on , with the property that

|u(x)| 2 dx < .

7
Let now V be the abovedescribed vector space, i.e. let

V = {u | u continuous on , |u(x)| 2 dx < }.

V is an inner product space with the inner product dened by (1.4). To see this, note
that
|u(x) + v(x)| 2 2(|u(x)| 2 + |v(x)| 2 ).

It follows that u V , v V u + v V and easily, u V for complex. Finally


the existence of the integral in (1.4) is proved by noting that x :

2|u(x)||v(x)| |u(x)| 2 + |v(x)| 2 .

Hence, integrating:

|(u, v)| = | u(x)v(x)dx| |u(x)||v(x)|dx

1 1
|u(x)| 2 dx + |v(x)| 2 dx.
2 2

By verifying now the axioms one by one we conrm that V is an inner product space.
The norm on V is given by
( ) 12
1
u (u, u) = 2 |u(x)| dx 2
.

The zero element in V is the function u(x) 0, x . For u, v V , the Cauchy


Schwarz and the triangle inequalities take the form:
( ) 12 ( ) 21
| u(x)v(x)dx| |u(x)| dx 2
|v(x)| dx 2

( ) 12 ( ) 21 ( ) 12
|u(x) + v(x)| dx
2
|u(x)| dx
2
+ |v(x)| dx .
2

The functions u1 , u2 , . . . V form a Cauchy sequence in V if > 0


( ) 21
um un = |um (x) un (x)| dx < , 2

for all m, n N = N (). The sequence is convergent if there exists a function u V


such that for every > 0 there exists an integer N = N () such that
( ) 21
un u = |un (x) u(x)| dx 2
<

8
holds for all n N ().
The space V is not complete. To see that let = (1, 1) in R and let V be the set
1
of continuous, realvalued functions u dened on (1, 1) such that 1 |u(x)| 2 dx < .
1 1
Let (u, v) be dened as 1 u(x)v(x)dx and let u = (u, u) 2 . Consider the sequence
u1 , u2 , . . ., where

1 for 1 < x < 1j


uj (x) = jx for 1j x 1

j

1
1 for j
<x<1.
Exercise: Prove that {uj }
j=1 is a Cauchy sequence in V .
However, there is no continuous function u on (1, 1) for which un u 0 as
n . It is easy to see that un f 0 as n where f is the discontinuous
function

1 for 1 < x < 0


f (x) = 0 for x=0



1 for 0 < x < 1 .
Thus, V is a noncomplete inner product space. By our assertion in 1.4 we can
complete the space V by adding new elements to it. This extended complete space
we call L2 (). The elements that we add to V can be considered as representatives of
those Cauchy sequences of V for which there do not exist functions u V such that
un u 0. Some of those additional functions may be piecewise continuous but
in general they will be highly discontinuous functions dened on . (For example f ,
above, belongs to L2 ()).
It is well-known that L2 () is isometrically isomorphic to the set of (equivalence
classes of) complexvalued Lebesgue measurable functions u on for which the (Lebe-

sgue) integral |u(x)|2 dx is nite. The inner product (understood in the Lebesgue
sense) is given again by (1.4).
As a consequence of the process of completion of V :

(i) L2 () is a complete inner product space (a Hilbert space).

(ii) V is dense in L2 (), i.e. for every u L2 () there exists a sequence {un }
n=1 of
( ) 1
functions in V such that un u = |un (x) u(x)| 2 dx 2 0 as n .

9
1.6 The Projection Theorem
The following result provides important information about geometric properties of a
Hilbert space.

Theorem 1.1 (Projection Theorem). Let G be a closed subspace of a Hilbert space H,


properly included in H. Then, given h H there exists a unique element g G such
that:
(i) h g = inf h .
G

Moreover
(ii) (h g, ) = 0 for each G.

H
hg

0
g
G

Figure 1.1:

Proof. Let h H such that h G (if h G pick g = h and the theorem is


proved). Now, since for all G, h 0, there exists a sequence n G such
that
lim h n = inf h (1.5)
n G

We rst show that {n }n1 is a Cauchy sequence. Let f1 , f2 H. Then the parallelo-
gram law holds:
2f1 2 + 2f2 2 = f1 + f2 2 + f1 f2 2 .

(Proof: Exercise). Set f1 = h m , f2 = h n . Then

m + n 2
m n 2 = 2h m 2 + 2h n 2 4h . (1.6)
2

Now
m + n 1 1
h h m + h n
2 2 2
10
and by (1.5),
m + n 1 1
lim sup h + = .
m,n 2 2 2
But by denition of ,
m + n
lim inf h .
m,n 2
Hence limm,n h m +n
2
= . Then by (1.6) and (1.5) we conclude that

lim m n 2 = 2 2 + 2 2 4 2 = 0.
m,n

Hence {n }n1 is a Cauchy sequence in G. Since G is closed, the sequence is convergent


in G, i.e. there exists g G such that n g as n . We show that this g satises
h g = inf G h . Obviously h g h n + g n and taking the
limit of both sides as n we get that h g . By denition of , h g .
Hence h g = as required. We also show that g is unique. Indeed, let g1 , g2 G,
g1 = g2 have the property that

inf h g = h g1 = h g2 .
gG

Then, since 12 (g1 + g2 ) G h 12 (g1 + g2 ). But by the triangle inequality

1 1 1 1 1
h (g1 + g2 ) h g1 + h g2 = + = .
2 2 2 2 2

Hence
1 1 1
h (g1 + g2 ) = = h g1 + h g2 ,
2 2 2
i.e. the triangle inequality holds as equality. Now for any , H such that = 0,
= 0, + = + = , for some > 0. (Proof: Exercise.) Hence
there exists such that h g1 = (h g2 ), i.e. h(1 ) = g1 g2 . If = 1, g1 = g2
(g1 g2 )
(contradiction). If = 1, h = 1
i.e. h G (contradiction). Hence g is unique
and we proved (i) above.
To prove (ii), with g constructed as above, suppose that there exists = 0 in G
for which (ii) fails, i.e (h g, ) = 0. Dene the element g G by:

(h g, )
g = g + .
( , )

11
Then

(h g, ) (h g, )
h g 2 = (h g , h g )
2 2
(h g, )
= (h g, h g) ( , h g)
( , )
(h g, ) |(h g, )| 2
(h g, ) + ( , )
( , ) 4
|(h g, )| 2
= h g2 .
2

Hence
h g < h g = inf h (contradiction).
G

Hence (h g, ) = 0, G and we have (ii). 


Exercise: Prove that if for some g G, (h g, ) = 0 G, then (i) in Theorem
1.1 holds.
Given h H we call g, the existence and uniqueness of which is guaranteed by
Theorem 1.1, the orthogonal projection of h on the closed subspace G or the best
approximation of h in G. If we denote by f = h g, then we can write

h=f +g where g G and (f, ) = 0, G.

Hence f is orthogonal to all vectors of the closed subspace G. Let G be the set of all
such vectors, i.e. let

G = {u H : (u, ) = 0 G}.

G is called the orthogonal complement of G and it is a closed subspace of H. To see


that let un G such that un u. Then (u, ) = (u, ) (un , ), for all G
since (un , ) = 0. Hence for all G |(u, )| = |(u un , )| u un as
n , i.e. (u, ) = 0 u G . Hence G is closed. (To show that it is a subspace
is trivial). It is easy to see that G G = {0}. Hence we can write H as the direct
sum of G and G
H = G G ,

meaning by that that there exist two disjoint closed subspaces G and G such that
every element h H can be written uniquely as the sum of g G and f G ,

12
h = g + f , as above.
Exercise: With g, f dened as above prove the Pythagorean theorem:

h2 = g2 + f 2 .


A particular case of importance occurs when G is nitedimensional. Then G is
closed (Proof: Exercise). Let {1 , 2 , . . . , s } be a basis of G. Given h H we can
explicitly construct the best approximation g of h in G as follows: By (ii), g satises:

(h g, ) = 0 G.

Hence
(h g, i ) = 0 , i = 1, 2, . . . , s. (1.7)
s
Let g = i=1 ci i . We seek the coecients {ci }si=1 . By (1.7), the ci s satisfy the
following linear system of equations:

s
Mij cj = (h, i ) 1 i s, (1.8)
j=1

where M = {Mij } is the s s Gram matrix (or mass matrix) associated with the basis
{i }si=1 of G and dened by

Mij = (j , i ), 1 i, j s.

To see that M is invertible, suppose that for some complex svector d = [d1 , . . . ds ]T
we have that M d = 0. Thus

s s
Mij dj = 0 ( dj j , i ) = 0 i : 1 i s.
j=1 j=1

Hence we conclude easily that the vector u = sj=1 dj j G is orthogonal to all G,

i.e. u G G u = 0. Hence sj=1 dj j = 0 and by the linear independence of the
j s dj = 0 , j d = 0. Hence M d = 0 d = 0, i.e. M is invertible. Actually,
M is positive-denite (Exercise).
Example: Let = (0, 1) and H = L2 (0, 1) (realvalued). Suppose f is a given element
of L2 (0, 1) and let G be the subspace of H consisting of all realvalued polynomials of
degree n 1, n > 1. Find the best approximation to f in G.

13
Solution. A basis for G obviously consists of the functions j (x) = xj1 , 1 j n.
Let g be the best approximation (orthogonal projection) of f in G. Suppose that

g = nj=1 aj j . Then g satises:

(g f, i ) = 0 , 1 i n,

from which

n
Mij aj = (f, i ) , 1 i n, (1.9)
j=1

where M is the n n Gram matrix,


1 1
1
Mij = (i , j ) = i j = xi+j2 dx =
0 0 i+j1

M is positive-denitive but very ill conditioned. 

1.7 Bounded (continuous) linear functionals on a


Hilbert space
Let H be a Hilbert space. By a functional F on H we mean a function from H into
the complex numbers C, i.e. a map which assigns to every H a unique complex
number F (),
F : H C , 7 F ().

A functional F on H is called a linear functional if for every , H and , C :

F ( + ) = F () + F ().

A functional F on H is called bounded if

|F ()|
sup < .
0=H

If a functional F on H is bounded we dene its norm, denoted by F (do not confuse


with the norm of H, ) by

|F ()|
F = sup . (1.10)
0=H

14
Let F be a bounded, linear functional on H. Then it is easy to see that F is a
continuous function of its argument. Indeed, let n in H. Then

n
|F (n ) F ()| = |F (n )| F n 0.

Hence F (n ) F () in C, i.e. F is continuous. (Note that the inequality

|F ()| F

follows from the denition of the norm (1.10) F of F .)


Conversely, let F be a linear functional on H and suppose that F is a continuous
function of its argument on H. We shall show that F is bounded. Indeed, if F is
continuous at 0 H, then for each > 0 there exists a > 0 such that |F (0 )
F (h)| < for h 0 . Now let = 0 be an arbitrary element of H. By the
linearity of F we obtain that
( ) { ( ) }

F () = F = F + 0 F (0 )

Since the vector



+ 0 = h, satises the relation h 0 we have that |F ()| <
|F ()|

, i.e.
< . Fix = 0 > 0. Then = (0 ) 0 , and since 0 , 0 are
independent of we see that

|F ()| 0
F = sup < ,
0=H 0

i.e. F is bounded.
Hence we proved that for a linear functional f on H, boundedness continuity.
We speak thus of a bounded (continuous) linear functional (b.l.f.).
Exercise: Let F be a b.l.f. on H. With the norm F dened by (1.10) show that

F = sup |F ()| = sup |F ()|.


H: 1 H: =1


Now let F , G be b.l.f.s on a Hilbert space H. We can dene the sum of two b.l.f.s
F + G = L by L() = F () + G() for each H and the scalar product F as
F = G, G() = F ().
Exercise: We denote by H the space of bounded linear functionals F, G, . . . on a

15
Hilbert space H. (H is called the dual of H). With addition and scalar multiplication
dened as above show that H forms a vector space. Then F , dened by (1.10) is
a norm on H , i.e. H is a normed linear space. Finally show that H is complete, i.e.
every Cauchy sequence in H converges to an element in H .
An example of a bounded, linear functional on H is furnished by the inner product
on H of the elements of H with a xed element f H. Given f H, dene for every
H, F () by
F () = (, f ).

Clearly F is a linear functional on H. To see that it is bounded observe that

|F ()| = |(, f )| f .

So for all = 0:
|F ()|
f < .

Hence
|F ()|
F = sup f .
0=H
In fact, since F (f ) = f we see that the sup is attained for = f H. Hence
2

F = f .
It turns out that the converse of the above statement is also true. Namely that
every bounded linear functional on H has the form (, f ) for some f H. This is the
content of:

Theorem 1.2 (Riesz Representation Theorem). Every bounded (continuous) linear


functional F on a Hilbert space H can be expressed in the form F () = (, f ), for each
H, where f is an element of H which is uniquely determined by F ; moreover,
F = f .

Proof. We denote by G the set of all elements g H such that F (g) = 0, i.e.
G = KerF . Obviously G is a subspace of H. Moreover G is a closed subspace of H. To
see that let gn g with gn G. Then F (g) = F (g gn ) + F (gn ) = F (g gn ). Hence

|F (g)| F g gn 0 as n , i.e. F (g) = 0 g G.

There are two possibilities now: either G = H or G ( H (properly included in H).


In the rst case F is the zero functional on H and the theorem is proved with f = 0.

16
Hence, assume that G ( H. In this case G contains nonzero elements. Let f0 G ,
f0 = 0. For H, consider the vector F ()f0 F (f0 ). This vector belongs to G
because F (F ()f0 F (f0 )) = F ()F (f0 ) F (f0 )F () = 0. Hence, since f0 G ,
we see that for all H:

(F ()f0 F (f0 ), f0 ) = 0.

Hence F ()f0 2 = F (f0 ) (, f0 ) from which


( )
F (f0 )
F () = , f0 , for all H.
f0 2

We set now
F (f0 )
f= f0
f0 2
and the equality above provides the required representation, i.e. we have existence of
f.
To prove that f is unique, suppose that there exist two vectors f1 = f2 such that
for all H: f () = (, f1 ) = (, f2 ). Hence (, f1 f2 ) = 0 for all H. In
particular, set = f1 f2 from which it follows that f1 f2 = 0. It remains to prove
that F = f . We immediately obtain from F () = (, f ) that

=0 |F ()|
|F ()| = |(, f )| f = f .

Hence
|F ()|
F = sup f .
0=H
On the other hand taking = f we see that F (f ) = (f, f ) = f 2 , from which

f 2 F f , i.e. f F .

Hence F = f . 

1.8 Bounded (continuous) linear operators on a Hil-


bert space
Let H be a Hilbert space as usual. Let M be a subspace of H. By a linear operator
T : M H we mean a function dened on M with values in H which assigns to the

17
vector u M the (unique) vector T u H and which satises:

T (u + v) = T (u) + T (v) for u, v M, , C .

The subspace M of H on which T is dened is called the domain of T and is denoted


by D(T ). The range of T is the set of vectors v H to each one of which there
corresponds at least one u D(T ) such that T u = v, i.e.

Range(T ) Ran(T ) = {v H : u D(T ) such that T u = v}.

We also dene
Ker(T ) {u D(T ) : T u = 0}.

The operator T is called onetoone (11) if u1 = u2 T u1 = T u2 . Equivalently, T is


onetoone if T u1 = T u2 u1 = u2 . T is called onto if Ran(T ) = H, i.e. if for every
v H we can nd a u D(T ) such that T u = v.
Exercise: Ran(T ) is a subspace of H. So is Ker(T ).
A linear operator T dened on the whole of H, (i.e. D(T ) = H) will be called
a linear operator on H. Unless otherwise indicated we will assume henceforth that
D(T ) = H. A linear operator T on H is said to be bounded if

T
sup < .
0=H

Let T be a bounded linear operator (b.l.op.) on H. We dene the norm of T by

T
T = sup . (1.11)
0=H

(It follows that T T H).


A linear operator T on H is continuous if whenever fn f in H then T fn T f
0. As in the case of bounded linear functionals we can prove (Exercise) that a linear
operator T on H is bounded if and only if it is continuous. As in the case of bounded
linear functionals we can show (Exercise) that

T = sup T = sup T .
0=H: 1 H: =1

Now, let T, S, . . . be b.l.ops on H. We dene their sum T + S as that (linear) operator


W on H, such that W = T + S, H. Similarly T = S, where S = T . It

18
is easy to see that with these denitions of addition and scalar multiplication, the set
of b.l.ops on H forms a vector space.
Exercise: With the norm dened by (1.11) the vector space of b.l.ops on H becomes
a normed linear space.
We denote this normed linear space by B(H).
Exercise: B(H) is a complete normed linear space.
Now, let T, S be b.l.ops on B(H). Their product T S is dened as the function
D : H H which maps the element u H on the element T (Su). It is easily seen
that (T S)u = T (Su) denes a linear operator on H. Moreover T (S1 + S2 ) = T S1 + T S2
etc, while in general T S = ST . Since T Su = T (Su) T Su T Su,
we see that T S T S, i.e. T S B(H).
Referring to the Projection Theorem 1.1, set g = P h, where g is the orthogonal
projection (best approximation) of h on a closed subspace G of H. P is called the
projection operator onto G.
Exercise: Show that

(i) P is a linear operator on H.

(ii) P is a bounded linear operator on H.

(iii) RanP = G, KerP = G , Ran(I P ) = G , Ker(I P ) = G, where I is the


identity operator Iu = u, u H (obviously I B(H) with I = 1).

(iv) P 2 = P , P = 1.

(v) I P is the projection operator onto G .

1.9 The LaxMilgram and the Galerkin theorems


Henceforth we will usually consider real Hilbert spaces, i.e. complete inner product
spaces over the real numbers with (f, g) = (f, g), , real, f, g H, and
(f, g) = (g, f ), f, g H.
A (real) bilinear form on a real Hilbert space H is a map from H H into R denoted

19
by B(f, g) for f, g H, which satises:

B(1 f1 + 2 f2 , g) = 1 B(f1 , g) + 2 B(f2 , g)

B(f, 1 g1 + 2 g2 ) = 1 B(f, g1 ) + 2 B(f, g2 )

for fi , f, gi , g H, i , i R. In general, B(f, g) = B(g, f ), i.e. B is not symmetric.


The following theorem will be central in the sequel:

Theorem 1.3 (LaxMilgram Theorem). Let H be a (real) Hilbert space and let B(., .) :
H H R be a bilinear form on H which satises:

(i) |B(, )| c1 , H

(ii) B(, ) c2 2 H,

where c1 , c2 are positive constants independent of , H.


Let F : H R be a given (real valued) bounded linear functional on H. Then there
exists a unique u H satisfying

B(u, v) = F (v) for all v H.

Moreover,
1
u F .
c2
Proof. Let H be xed. Then : H R , dened for every v H by
(v) = B(, v), denes a continuous linear functional on H. (Linearity follows from
the fact that B is a bilinear form. For boundedness observe that for each v H:

|(v)| = |B(, v)| c1 v.

Hence c1 < ).
By the Riesz Representation Theorem (1.2) therefore, there exists a unique element
H such that
for every v H.
(v) = B(, v) = (v, ) (1.12)

Hence for every H, we dene a H by (1.12) and denote the correspondence


7 by = A, i.e.

B(, v) = (v, A), H, v H. (1.13)

20
Now A is a linear operator dened on H. To show linearity, observe that, given
, H for every v H and , real we have that

(v, A( + )) = B( + , v) = B(, v) + B(, v) =

= (v, A) + (v, A) = (v, A + A).

Hence A( + ) = A + A A is linear.
We claim now that A, dened by (1.13) has a range Ran(A) which is a closed
subspace of H. It is (easily) a subspace. To show that it is closed, let n = An be
a convergent sequence, such that n .
Now, since B(n , v) = (v, An ) v H

B(n m , v) = (An Am , v) v H. Choose n m = v and using (ii) get


n m 1
c2
An Am . Hence {n } is a Cauchy sequence in H, i.e. there exists
H such that n . We now show that = A, thus showing that Ran(A),
i.e. that Ran(A) is closed.
Now |B(n , v) B(, v)| C1 n v gives that

lim B(n , v) = B(, v) v H.


n

Also (An , v) = (n , v) (,
v) since |(n , v)(,
v)| n v.
Since B(n , v) =
v) v V , i.e. = A, by denition of A. Hence
(An , v) v H B(, v) = (,
Ran(A) is closed. We now claim that Ran(A) = H. Suppose that Ran(A) is properly
included in H, so that z = 0 (Ran(A)) . Hence (z, v) = 0 v Ran(A). In
particular H, B(, z) = (A, z) = 0. Hence for = z, 0 = B(z, z) c2 z 2
z = 0 (contradiction). So Ran(A) = H.
Now, given F , a b.l.f. on H, by Riesz representation, ! H such that F (v) =
(, v) v H. Since Ran(A) = H, u H such that Au = . Hence u such that

F (v) = (Au, v) = B(u, v) v H

and we have existence of u as claimed in the statement of the theorem.


For uniqueness, suppose that u1 = u2 such that B(u1 , v) = F (v) = B(u2 , v)
v H. Hence

B(u1 u2 , v) = 0 v H 0 = B(u1 u2 , u1 u2 ) c2 u1 u2 2 u1 = u2 .

21
Finally since B(u, u) = F (u), (i), (ii) give that (u = 0) c2 u 2 |F (u)|, from which
1 |F (u)|
u c2 u
. Hence
1 |F (v)| 1
u sup = F .
v=0 c2 v c2

We nally present a basic theorem for the Galerkin approximation (see below for
denition) uh to the solution u of B(u, v) = F (v) guaranteed by the LaxMilgram
theorem.

Theorem 1.4 (Galerkin). Let H be a real Hilbert space and let B(., .) : H H R
be a bilinear form on H which satises:

(i) |B(, )| c1 , H,

(ii) B(, ) c2 2 H,

for some positive constants c1 , c2 independent of , H. Let F be a given real


valued b.l.f. on H and let u be the unique element of H, guaranteed by the LaxMilgram
theorem, satisfying B(u, v) = F (v), v H.
Let {Sh } for 0 < h 1 be a family of nitedimensional subspaces of H. For every h
there exists a unique uh such that

B(uh , vh ) = F (vh ) vh Sh . (1.14)

We call uh the Galerkin approximation of u in Sh .


Moreover we have the error estimate

c1
u uh inf u .
c2 Sh

Proof. The existenceuniqueness of uh Sh is guaranteed by the LaxMilgram


theorem applied to the Hilbert space (Sh , ). Alternatively, let {j }m
j=1 be a basis

for Sh , where m = m(h) = dimSh , and try to nd uh Sh in the form uh = m j=1 cj j .

By (1.14) uh satises

B(uh , i ) = F (i ) 1 i m, i.e.
( m )
m
B cj j , i = F (i ) 1 i m = cj B(j , i ) = F (i ), 1 i m.
j=1 j=1

22
Hence if A is the m m matrix given by Aij = B(j , i ), 1 i, j m, the cj s are the
solution of the linear system

m
Aij cj = F (i ), 1 i m. (1.15)
j=1

The associated homogeneous system m j=1 Aij cj = 0, 1 i m, has only the zero
m m
solution. (Since j=1 Aij cj = 0 B( j=1 cj j , i ) = 0, 1 i m, B(vh , vh ) = 0,

where vh = m j=1 cj j . Hence, by (ii) vh = 0 ci = 0.) Therefore A is invertible and

(1.15) has a unique solution, i.e. (1.14) has a unique solution uh Sh . (Exercise:
Show that A is positive denite.)
For the error estimate observe that by (ii)

c2 u uh 2 B(u uh , u uh ) = B(u uh , u) (1.16)

(since B(uh , ) = F () = B(u, ) Sh B(u uh , ) = 0 Sh ). For the


same reason, for any Sh

B(u uh , u) = B(u uh , u ) c1 u uh u ,

using (i).
By (1.16) we conclude therefore that c2 u uh 2 c1 u uh u , i.e.

c1
u uh u Sh , i.e.
c2
c1 c1
u uh inf u = Ph u u,
c2 Sh c2
where Ph is the projection operator on Sh . 
Here is an immediate corollary to Galerkins Theorem 1.4.

Corollary 1.1. With notation introduced in Theorem 1.4, suppose that the family Sh
of subspaces satises
lim inf u = 0.
h0 Sh

Then limh0 u uh = 0. 

Finally we mention that in the case of a symmetric, bilinear form B, i.e. when (in
addition to (i), (ii) of Theorem 1.3)

(iii) B(u, v) = B(v, u) u, v H

23
we can obtain a variational formulation of the problem of nding u H such that

B(u, v) = F (v) v H,

where F is a bounded linear functional on H.


For v H consider the following (nonlinear) functional J : H R dened by

1
J(v) = B(v, v) F (v) (1.17)
2

and the associated problem of nding z H such that

J(z) = min J(v). (1.18)


vH

We have the following theorem:

Theorem 1.5 (RayleighRitz). Suppose B is a symmetric, bilinear form which satises


the hypotheses (i), (ii) of Theorem 1.3. Then, the problem (1.18) of minimizing over
H the functional J dened by (1.17) has a unique solution which coincides with u, the
existenceuniqueness of which was guaranteed by the LaxMilgram Theorem 1.3.

Proof. Let u be the solution of the problem B(u, v) = F (v) v H. Then


w H:

1
J(u + w) = B(u + w, u + w) F (u + w) = (due to the symmetry of B) =
2
( ) ( )
1 1
= B(u, u) F (u) + B(w, w) + (B(u, w) F (w))
2 2
1
= J(u) + B(w, w), since B(u, w) = F (w) by Theorem 1.3.
2

Hence
1 c2
J(u + w) = J(u) + B(w, w) J(u) + w 2 by (ii).
2 2
Therefore w H, w = 0: J(u + w) > J(u) i.e.

J(u) = min J(v) and J(v) > J(u) if v = u.


vH


Immediately, we have the following corollary, which is the analog of Theorem 1.4.

24
Corollary 1.2 (RayleighRitz, Galerkin). With notation introduced in Theorem 1.4
and the additional hypothesis of symmetry of B, the problem of minimizing the func-
tional J dened by (1.17), over Sh , i.e. nding uh Sh such that

J(uh ) = min J()


Sh

has a unique solution uh , which coincides with the Galerkin approximation in Sh of u,


constructed in Theorem 1.4. 

25
Chapter 2

Elements of the Theory of Sobolev


Spaces and Variational Formulation
of BoundaryValue Problems in
One Dimension

This chapter (and chapter 4) follows closely the analogous material in H. Brezis, Anal-
yse fonctionelle, theorie et applications, Masson, Paris, 1983. (For the translation in
Greek and the new English edition, see the References)

2.1 Motivation
We consider the following twopoint boundaryvalue problem in one dimension. Find
a realvalued function u(x), dened for x [a, b] and satisfying

(p(x)u (x)) + q(x)u(x) = f (x), a x b.
()
u(a) = u(b) = 0.

Here p(x), q(x), f (x) are realvalued functions dened on [a, b] such that p C 1 ([a, b]),
p(x) > 0 for x [a, b], q C([a, b]), q(x) 0 x [a, b], f C([a, b]). A classical
(or strong) solution of the boundaryvalue problem (b.v.p.) () is a function u of class
C 2 ([a, b]) which satises () in the usual sense.

26
If we multiply the equation in () by a function C 1 ([a, b]), such that (a) =
(b) = 0 and integrate by parts we obtain
b b b

() pu dx + qu dx = f dx, C 1 ([a, b]), (a) = (b) = 0.
a a a

Note that () makes sense for u C 1 ([a, b]) e.g. (as opposed to () which requires
u C 2 ([a, b])). In fact () just requires that u, u be integrable functions. One may
say (vaguely) that a solution u C 1 ([a, b]) (such that u(a) = u(b) = 0) of () is (one
kind of) a weak or generalized solution of ().
The variational method for solving (i.e. proving existence and uniqueness of solu-
tions) problems such as () and also boundaryvalue problems for partial dierential
equations proceeds roughly as follows:

(i) We dene precisely what we mean by a weak solution of (). Typically it will
be the solution of a weak (or variational) form of the problem (), such as (),
or, equivalently the solution of an appropriate minimization problem. Here the
Sobolev spaces will play a central role.

(ii) We show existence and uniqueness of the weak solution, for example by the Lax
Milgram theorem; note that () suggests the variational problem
b b
B(u, ) a (pu + qu) = F () a f , C 1 ([a, b]), such that (a) =
(b) = 0.

(iii) We then prove that the weak solution is suciently regular. For example here we
must prove that the weak solution is in C 2 ([a, b]).

(iv) We nally prove that a weak solution, which is in C 2 ([a, b]), is a strong (classical)
solution of ().

Note that a weak formulation of the problem provides us also with a method
(Galerkin) for approximating its weak solution in a suitably chosen nitedimensional
subspace of functions with good approximation properties, that are also suitable for
numerical computations.

27
2.2 Notation and preliminaries
We now introduce some notation on function spaces that will be used in the sequel and
collect (without proof) some useful facts about L2 .
We let denote an open subset of RN ; for this part of the notes take, for example,
to be an open interval (a, b) in R . For simplicity we shall consider only realvalued
functions dened on or . We dene:

C() = space of continuous functions on .

C k () = space of ktimes dierentiable functions on , i.e the space of those functions


1 ++N f (x)
f (x), x such that
x1 1 ...xNN
are continuous functions on for all integers
0 i k, 1 i N such that 1 + 2 + + N k. Put C 0 () C().

C () = k0 C k ().

Cc () = space of functions in C() whose support is a compact set included in . (If


f C(), support of f = suppf = {x : f (x) = 0}). Hence these functions
vanish outside a compact set (strictly) included in .

Cck () = C k () Cc ().

Cc () = C () Cc (). Often the notation C0 () is used instead of Cc ().

We recall a few facts about the spaces Lp (). Let dx denote the Lebesgue measure
in Rn . By L1 () we denote the space of (Lebesgue) integrable functions f on , i.e.
the functions for which

f L1 = f L1 () = |f (x)| dx < .


(We denote usually
f=
f (x) dx).
Let 1 p < . Then

Lp () = {f : R ; |f | p L1 ()}.

We put
( ) p1
f Lp = f Lp () = |f (x)| dx
p
.

28
For p = we dene L () = {f : R , f measurable such that there exists a
constant C < such that |f (x)| C a.e. (almost everywhere) in , i.e. such that
|f (x)| C for all x except possibly for some x belonging to a subset of of
Lebesgue measure zero}. We put

f L = f L () = inf{C : |f (x)| C a.e. in }

and note that |f (x)| f L for every x O, where O has measure 0. (The
quantities f Lp , 1 p are norms on the respective spaces Lp . We have already
introduced the Hilbert space L2 = L2 ()).
We shall mainly be concerned with L2 (). We would like to emphasize that the
functions f (x) L2 () are not really functions but equivalence classes of functions,
where the equivalence f g holds if and only if f (x) = g(x) a.e. in . For example,
when we say that f = 0 as an element of L2 (), we mean that f (x) = 0 for all x
outside a set of measure zero in , i.e. that f (x) = 0 a.e. in ; contrast with the
situation f C(), f = 0 f (x) = 0 x .
The following results (see Brezis for proofs) will be used in sequel. (L1loc () will

denote the functions f on for which K |f (x)| dx < for every compact set K .
For example f (x) = 1
x
L1loc ((0, 1)) but f L1 ((0, 1)).

Lemma 2.1. If f L1loc () such that



f u = 0 u Cc (),

then f = 0 a.e. in .

Lemma 2.2. The space Cc () is dense in L2 (), i.e.

f L2 (), > 0 f Cc () : f fL2 < .

Denition 2.1. A regularizing sequence (or a sequence of molliers) is a sequence of


functions {n }, n = 1, 2, . . . such that:

n Cc (RN ),

n 0 on RN ,
( N ) 12
1 1
suppn B(0, ) {x RN : |x| x2i },
n n
i=1

n dx = 1.
RN

29
Such functions clearly exist. E.g. in R , let

e x211 if |x| < 1
(x) =
0 if |x| 1.

Clearly (x) is continuous on R and

j (x) 12
1x , |x| < 1,
(j) (x) = e
(1 x2 )2j

where j (x) are polynomials. Since y j ey 0 as y + we see that (x) Cc (R).


Of course supp = [1, 1]. Let
( )1
C= (x) dx .

Dene
n (x) = C n (nx).

Then
1 1
n Cc (R), n (x) 0 on R , suppn = [ , ], n dx = 1.
n n

In RN dene
e |x| 12 1 if |x| < 1
(x) =
0 if |x| 1.
N
(where |x| = ( i=1 x2i )1/2 ) and
( )1
N
n (x) = C n (nx) , C = (x) dx .
RN

Denote the convolution of two functions f (x), g(x) dened on RN as the function

(f g)(x) = f (x y) g(y) dy
RN

(provided the integrals exist).

Lemma 2.3. (i) Let f Cc (). Extend f by zero on the whole of RN . Then, for
suciently large n,

n f Cc ()

and sup |f (x) (n f )(x)| 0, n .


x

30
(ii) Let f C(RN ). Then n f C (RN ) and n f f uniformly, on every
compact set K RN , i.e.

sup |f (x) (n f )(x)| 0 , n


xK

K compact RN .

(iii) Let f L2 (RN ). Then n f C (RN ) L2 (RN ) and

n f f L2 (RN ) 0 , n .

Finally we mention:

Lemma 2.4. Cc () is dense in L2 (), i.e.

f L2 (), > 0 f Cc () : f fL2 < .

2.3 The Sobolev space H 1(I)


Let I = (a, b) be an open interval in R . (We will mainly use a bounded interval (a, b)
in the applications but here we may suppose that I could be unbounded, i.e. that
possibly a = and/or b = ).

Denition 2.2. The Sobolev space H 1 (I) is dened by




H (I) = {u L (I) : g L (I) such that
1 2 2
u = g , Cc1 (I)}.
I I

For u H 1 (I) we denote g = u and call g the weak (generalized) derivative of u (in
the L2 sense).

Remarks.

(i) When there is no reason for confusion we shall denote H 1 = H 1 (I), L2 = L2 (I),
etc.

(ii) It is clear that the generalized derivative g in the above denition is unique.

For suppose g1 , g2 L2 (I) such that I (g1 g2 ) = 0, Cc1 (I). Since
Cc (I) Cc1 (I) L2 (I) and since (by Lemma 2.4) Cc (I) is dense in L2 (I),

31
it follows that Cc1 (I) is dense in L2 (I). It follows that g1 g2 = 0 in L2 (I).
(N.B. In general in a Hilbert space H, where D H dense in H, we prove that
(g, ) = 0 D g = 0, since i D such that i g, i in H.
Therefore 0 = (g, i ) (g, g) g = 0). We emphasize again that g1 = g2 in L2
means that g1 (x) = g2 (x) a.e. in I.

(iii) The functions Cc1 (I) in the denition of g are called test functions. One could
take Cc (I) to be the set of test functions instead of Cc1 (I). (The only thing to

show is that if I u = I g, for u, g L2 (I), holds for every Cc (I),
then it will hold for every Cc1 (I). This follows from the CauchySchwarz
inequality and the facts that Cc1 (I) n Cc (I) and n , e.g.
in L2 (I), and also that (n ) = n Cc (I) and n in L2 (I)).

(iv) It is clear that if u C 1 (I) L2 (I) and if the (classical) derivative u of u belongs

to L2 (I), then integration by parts gives that I u = I u Cc1 (I), i.e.
that u is the weak derivative of u, i.e. that u H 1 (I). Of course, if I is bounded,
then u C 1 (I) u, u L2 (I) and we have C 1 (I) H 1 (I).

(v) There are other ways of dening the Sobolev space H 1 . Using e.g. the theory of
distributions we may conclude that every u L2 (I) has a distributional derivative
u . We say that u H 1 (I) if u coincides as a distribution with a function
u L2 (I). If I = R we may also dene H 1 using Fourier transforms.

Examples.

(i) Consider u(x) = |x| on I = (1, 1). Clearly u C(I), u L2 (I), but u fails to
have a classical derivative at x = 0. Consider the function

1 if 1 < x 0
g(x) =
1 if 0 < x < 1.

32
Clearly, g L2 (I). In addition for each Cc1 (I),
1 0 1
g(x)(x) dx = (1) (x) dx 1 (x) dx
1 1 0
0 1
= (x) d(x) (x) dx = [(x)(x)]01
1 0
0 1

[x(x)]0 +
1
(x) (x) dx + x (x) dx
1 0
1 1
= |x| (x) dx = u(x) (x) dx.
1 1

It follows that u(x) = |x| H 1 ((1, 1)) and u = g is the weak derivative of u.

(ii) More generally, if I is a bounded interval and u C(I) with u (classical deriva-
tive) piecewise continuous on I (as would be the case e.g. if u is a piecewise
polynomial, continuous function on I), then u H 1 (I) and its weak derivative
coincides with the classical derivative a.e. in I.

(iii) As in (i) the function



1 x if x 0
u(x) = (|x| + x) =
2 0 if x < 0

on I = (1, 1) belongs to H 1 and its weak derivative u is the function



0 if 1 < x 0
H(x) =
1 if 0 < x < 1,

which is called Heavisides function. Clearly H L2 (I). Does H belong to


H 1 (I)? The answer is no: Suppose that H H 1 (I). Then there must exist

v(x) L2 (I) such that I H = I v, Cc1 (I), i.e. a v L2 (I) such
1 1 1
that 1 v = 1 H = 0 = (1) + (0) = (0) Cc1 (I). Take
0
then such a with support in the interval (1, 0). It follows that 1 v =
1
1
v(x)(x) = 0 Cc1 ((1, 0)). Since Cc1 ((1, 0)) is dense in L2 ((1, 0)), as
in Remark (ii), p. 30, we see that v(x) = 0 a.e. in (1, 0). Analogously, taking
Cc1 (0, 1) we prove that v(x) = 0 a.e. in (0, 1). We conclude therefore that
1
v = 0 a.e. in (1, 1). But that would contradict 1 v = (0) Cc1 (I).
(Of course, as a distribution, H(x) has a distributional derivative which coincides
with the function, H = 0 . We just proved that 0 L2 (I)).

33
It is clear that H 1 (I) is a linear subspace of L2 (I), since if u, v H 1 (I) and u , v
are their weak derivatives, then u + v is the weak derivative of u + v. Hence
(u + v) H 1 (I), and (u + v) = u + v for , R. We denote by (, ),
, respectively, the inner product, norm of L2 = L2 (I), i.e. we let, for u, v L2 (I),
1
(u, v) = I u(x)v(x) dx, u = (u, u) 2 . Then, for u, v H 1 (I) we dene

(u, v)1 (u, v) + (u , v ),


( )1 1
u1 u2 + u 2 2 = (u, u)12 .

It is clear that (, )1 denes an inner product on H 1 = H 1 (I) and 1 the induced


norm on H 1 (I). (To be precise, sometimes we shall denote 1 = H 1 (I) etc.).
Hence H 1 (I) becomes an inner product space.

Theorem 2.1. The space (H 1 , 1 ) is a Hilbert space.

Proof. We only need to show that H 1 (I) is complete in the norm 1 . Let
{un }n=1,2,... H 1 (I) be a Cauchy sequence in the norm 1 , i.e. let

lim um un 1 = 0.
m,n

By the denition of 1 it follows that {un } and {un } are Cauchy sequences in L2 .
Since L2 is complete it follows that u, g L2 (I) such that un u in L2 , un g in
L2 . Now, by denition, (un , ) = (un , ) Cc1 (I) for n = 1, 2, 3, . . ..
Since Cc1 (I)

|(un , ) (u, )| un u 0, n

and
|(un , ) (g, )| un g 0, n ,

it follows that (u, ) = (g, ) Cc1 (I), i.e. that u H 1 (I) and u = g.
It remains to show that un u as n in H 1 . But this follows from

un u21 = un u2 + un u 2 = un u2 + un g2 0 as n .

Hence u H 1 (I) such that un u in H 1 , i.e. H 1 (I) is complete. 

34
Remark: Consider the map T : H 1 L2 L2 given by T u = [u, u ], u H 1 .
Equipping L2 L2 with the norm ((u, u) + (v, v))1/2 we see that T is an isometry of
H 1 onto a closed subspace of L2 L2 . It follows that H 1 is separable, since L2 is.
The following theorem will be very important in sequel:

Theorem 2.2. If u H 1 (I), then u C(I) such that u = u a.e. in I and


x
u(x) u(y) = u (t) dt, x, y I.
y

Before proving the theorem we make some comments on its content. Note rst
that if u H 1 (I) and u = v a.e. on I, then v H 1 (I). Then Theorem 2.2 tells us
that in the equivalence class of an element u H 1 (I) there is one (and only one since
u, v C(I), u = v a.e. on I u(x) = v(x) x I) continuous representative of u,
denoted in the theorem by u. Hence, when there is need to do so, we shall use instead
of u its continuous representative u. For example as the value u(x) for some x I (not
well-dened if u L2 ) we mean the value of u at that x. Sometimes we shall replace u
by u with no special mention or by just noting that u is continuous, upon modication
on a set of measure zero in I. We emphasize that the statement u C(I) such
that u = u a.e. in I is dierent from the statement that u is continuous a.e. in I.
For the proof of Theorem 2.2 we shall need two lemmata.

Lemma 2.5. Let f L1loc (I) such that



f = 0 Cc1 (I).
I

Then, there exists a constant C such that f = C a.e. in I.



Proof. Let be a xed function in Cc (I) such that = 1. We shall show that,
I

given w Cc (I), there exists Cc1 (I), such that = w ( I w). Indeed, given w
x
Cc (I), consider h(x) = w(x) ( I w)(x). Clearly h Cc (I). Put (x) = a h(x) dx.
y
Let supph [c, d] (a, b) = I. Clearly, for a < y < c, (y) = a h(x) dx = 0,
and for d < z < b
z b ( )
(z) = h(x) dx = h(x) dx = h = w ( w)
a
a

I I I

= w ( w)( ) = w w = 0.
I I I I I

35
It follows that Cc (I). Also (x) = h(x) Cc (I), i.e. Cc (I), and = h =

w ( I w). Now, by hypothesis, I f = 0 Cc1 (I). In particular, for each
w Cc (I),
( )
f w ( w) = 0 f w f w = 0 f w ( f )w = 0
I I
(
I
I )I I I I

f f w = 0.
I I

By Lemma 2.1 we conclude that f (x) = I
f a.e. on I. i.e. f (x) = C I
f a.e.
on I. 

Lemma 2.6. Let g L1loc (I). For y0 I xed, put


x
v(x) = g(t) dt, x I.
y0

Then v C(I) and



v = g Cc1 (I).
I I

Proof. That v C(I) when g L1loc (I), is a wellknown fact from measure theory.

t=x

y
0
A

a y x
0

We now have for Cc1 (I) that


( x )

v = g(t) dt (x) dx
I I y0
y0 ( y0 ) b ( x )
= dx g(t) dt (x) + dx g(t) dt (x).
a x y0 y0

36
Now
y0 ( y0 ) y0 y0

dx g(t) dt (x) = dx dt(g(t) (x))
a x a x
= (since g(t) (x) is integrable on A)
y0 t

= g(t) (x) dxdt = dt dxg(t) (x)
Ay0 t a
ya0
= g(t) dt (x) dx = g(t)(t) dt.
a a a

Similarly we may prove that


b ( x ) b

dx g(t) dt (x) = g(t)(t) dt.
y0 y0 y0

We conclude that
y0 b

v = g g = g Cc1 (I).
I a y0 I


Proof of Theorem 2.2. Fix y0 I. Note that since u H 1 (I) u L2 (I) u
L1loc (I). Put x
u(x) = u (t) dt.
y0

By Lemma 2.6 u C(I) and I u = I u Cc1 (I). But by denition of u ,


I u = I u Cc1 (I). Therefore

u u) = 0 Cc1 (I).
(
I

By Lemma 2.5, we conclude that there exists a constant C such that u u = C a.e.
on I. Dene now u(x) = u(x) C. It follows that u C(I) and u = u a.e. on I.
Moreover for x, y I,
x y

u(x) u(y) = u(x) u(y) = u u
x y0 y0
x y0
= u + u = u (t) dt
y0 y y


Remark: Lemma 2.6 gives in particular that the primitive (antiderivative) v of a
function g L2 (I) is in H 1 (I) provided v L2 (I). (The latter fact is always true if I
is bounded).
The following theorem gives a technical tool that will be often used in sequel.

37
Theorem 2.3 (Extension operator). There exists an extension operator

E : H 1 (I) H 1 (R),

linear and continuous, such that

(i) Eu|I = u u H 1 (I), (f |I denotes the restriction of f to I).

(ii) EuL2 (R) CuL2 (I) u H 1 (I).

(iii) EuH 1 (R) CuH 1 (I) u H 1 (I).



(In (ii) we can take C = 2 2 and in (iii) C = C0 (1+1/(I)), where C0 some constant,
independent of u and I, and (I) the length of I possibly (I) = ).

Proof. We begin with the case I = (0, ). We will show that the extension
operator dened by even reection about x = 0, i.e. by

u(x) if x 0

(Eu)(x) u (x) =
u(x) if x < 0,

u H 1 (I), solves the problem. Indeed


0
2
u L2 (R) = 2
(u(x)) dx + (u(x))2 dx = 2uL2 (I) .
0

So (ii) is satised. (Obviously E is linear and satises (i)). Now put



u (x) if x > 0
v(x) =
u (x) if x < 0.

Clearly v L2 (R) since v2L2 (R) = 2u 2L2 (I) . By Theorem 2.2 we also have that
x x

u (x) u(0) = u (t) dt = v(t) dt for x 0.
0 0

Also, for x < 0,


x x x

u (x) u(0) = u (t) dt = u (t) dt = v(t) dt.
0 0 0
x
Hence, u (x) u(0) = 0
v(t) dt, x R. Since u L2 (R) and v L2 (R), it follows
by Lemma 2.6 (see remark after end of proof of Theorem 2.2) that u H 1 (R) and
(u ) = v. Hence
( )
u 2H 1 (R) = u 2L2 (R) + v2L2 ( re) =2 u2L2 (I) + u 2L2 (I) = 2u2H 1 (I) .

38
Hence in the case I = (0, ), with Eu = u , (ii) and (iii) are satised (as equalities)

with C = 2. (The proof holds for any unbounded interval of the form (a, ) or
(, a), a R. For example, for u H 1 ((a, )), dene Eu by reection evenly
about x = a, i.e. as
u(x) if x > a
(Eu)(x) =
u(2a x) if x a,

and the proof follows with the same constants C mutatis mutandis).
We now go to the case of a bounded interval. It suces to consider the case of
I = (0, 1). Consider a xed function C 1 (R), 0 (x) 1 x R such that

1 if x < 1
4
(x) =
0 if x > 3 ,
4

and for every f dened on (0, 1) denote by f its extension by zero to (0, ), i.e. put

f (x) if x (0, 1)

f (x) =
0 if x 1.

Now if u H 1 (I) it follows that u) = u + u , where


u H 1 ((0, )) and that (
by u we mean the extension by zero to (0, ) of u L2 ((0, 1)). To see this, note rst
u L2 ((0, )) since
that
3
4
2
u)
( 2
u2 u2L2 ((0,1)) .
0 0

Moreover, for any Cc1 ((0, )) we have that


1 1 1 1


u =
u =
u (() ) = u() u
0
(0 0 0 0
)
= since Cc1 ((0, )) Cc1 ((0, 1)), and u H 1 ((0, 1))
1 1 1

= u () u = (u + u ) = g,
0 0 0 0

where
u + u if x (0, 1)
g(x) =
0 if x 1.
Now g L2 ((0, )) since
1 ( 1 1 )
2 2 2 2
g2L2 ((0,)) = (u + u ) 2 (u ) + ( ) u 2
0 0 0
( 1 1 )
2
2 (u ) + max | (x)| 2
u2 C1 u2H 1 (I) .
0 0x1 0

39
(Note that we can easily arrange that max0x1 | (x)| be equal to e.g. 2.5). Moreover
g = u + u . It follows that u) = g = u + u . Returning
u H 1 ((0, )) and (
to the proof of the theorem, for u H 1 (I), I = (0, 1), write u as

u = u + (1 )u, as above.

The function u can be extended to (0, ) by u H 1 ((0, ))


u as before. Clearly
and
u2L2 ((0,)) u2L2 ((0,1)) . Also, as above
( )
2 2
( u) L2 ((0,)) = g 2 u L2 (I) + max | (x)| uL2 ((0,1))
2 2 2

0 0x1

C1 u2H 1 (I) .

It follows that

u2H 1 ((0,)) C2 uH 1 (I) .

u H 1 ((0, )) as in the rst part of the proof to a function v1 (x)


Now, extend
H 1 (R) by even reection about x = 0. It follows that

v1 L2 (R) = uL2 ((0,))
2 2uL2 (I)

and that

v1 H 1 (R) = uH 1 ((0,))
2 2C2 uH 1 (I) .

(It is clear that v1 |I = u and that the operation u 7 v1 is linear in u).


Analogously the function (1 )u (for which (1 )u = 0 for 0 x 1/4), can be
extended to (, 1) by (1 )u where

u(x) if 0 < x < 1
u(x) =
0 if < x 0.

We obtain again (1 )u H 1 ((, 1)) with

(1 )uL2 ((,1)) uL2 (I)

and
(1 )uH 1 ((,1)) C2 uH 1 (I) .

Extend now (1 )u to a function v2 H 1 (R) by (even) reection about x = 1. It


follows that

v2 L2 (R) = 2(1 )uL2 ((,1)) 2uL2 (I)

40
and that

v2 H 1 (R) = 2(1 )uH 1 ((,1)) 2C2 uH 1 (I) ,

that v2 |I = (1 )u and that (1 )u 7 v2 is linear.


We dene now the operator E as Eu = v1 + v2 . Clearly E satises (i) and is linear;
(ii) and (iii) follow by the above and the triangle inequality. (Note that when I = (a, b),
must be redened as
xa
a,b (x) = ( ),
xb
so that
1 xa
a,b (x) = ( ).)
ba xb

The following result is a basic density theorem for H 1 (I) and will be used very often
in sequel.

Theorem 2.4. Let u H 1 (I). There exists a sequence {un }n=1,2,... of functions in
Cc (R) such that
un |I u in H 1 (I), n .

Comment: The theorem asserts that if I = R, then Cc (R) is dense in H 1 (R).


Otherwise, Cc (I) is not dense in H 1 (I) in fact we shall see later that the closure of
0
Cc (I) in H 1 (I) is the space H 1 consisting of those functions of H 1 (I) which are zero
at the boundary of I. If I is bounded, Theorem 2.4 asserts that there is a sequence of
functions un C (I) such that un u in H 1 (I).

Proof of Theorem 2.4. First note that it suces to consider the case I = R. For
suppose that the result holds for R. If I R extend u to Eu in H 1 (R) as in Theorem
2.3. Then there exists a sequence un Cc (R) such that un EuH 1 (R) 0, n .
But then

un |I uH 1 (I) = un EuH 1 (I) un EuH 1 (R) 0, n ,

i.e. the result holds for I.


Hence consider the case I = R. The approximating sequence is constructed by
regularization and truncation as un = n (n u). Here {n }, n = 1, 2, . . . is the

41
regularizing sequence dened in Def. 2.1 and n = n (x) Cc (R) is a truncation
function, dened for n = 1, 2, . . . by
(x)
n (x) = ,
n
where (x) is a xed function in Cc (R) such that

1 if |x| 1
(x) =
0 if |x| 2.

Hence n (x) = 1 for |x| n and n (x) = 0 for |x| 2n. Moreover,
1 (x) C
|n (x)| = | | , where C = max | (x)|.
n n n xR

Note, by Lebesgues dominated convergence theorem, that we have n f f as n


in L2 for every f L2 (R). Let now un = n (n u). Clearly un Cc (R) since
n Cc (R) and n u C (R), cf. Lemma 2.3. We have

un u = n (n u) u = n [(n u) u] + (n u u).

It follows that

un uL2 (R) n [(n u) u]L2 (R) + n u uL2 (R)

n u uL2 (R) + n u uL2 (R) 0, n

by the above and Lemma 2.3(iii). Now un = n (n u) + n (n u). (Note that for
u H 1 (R), n u H 1 (R) and (n u) = n u : The interested reader may verify
that for Cc1 (R) the following equalities hold



(n u) = u(n (x) ) =
u(n (x) ) = u (n (x) )


= (n u ).)

Hence, since un u = n (n u ) + n (n u) u , we have


un u L2 (R) n (n u)L2 (R) + n [(n u ) u ]L2 (R)

+ n u u L2 (R) max |n (x)| n uL2 (R)


xR

+ n u u L2 (R) + n u u L2 (R) 0 as n ,

by Lemma 2.3, since n uL2 (R) , n = 1, 2, 3, . . . is bounded. 


We are able now to prove Sobolevs imbedding theorem for H 1 (I).

42
Theorem 2.5. There exists a constant C (depending only on (I) ) such that

uL (I) C uH 1 (I) u H 1 (I). (2.1)

(We say that H 1 (I) L (I), i.e. that H 1 (I) C(I) in view of Theorem 2.2 if I
bounded, with continuous imbedding).

Proof. Again it suces to prove the result for I = R. (For suppose it holds for R
and let u H 1 (I). Extend u to Eu in H 1 (R) as in Theorem 2.3. Then

uL (I) EuL (R) C EuH 1 (R) C uH 1 (I) ,

using (iii) in Theorem 2.3 and (2.1) for I = R). Suppose rst that v Cc (R). Then
for every x R:
x x
2
2
v (x) = (v ) = 2 vv 2vL2 (R) v L2 (R)

2
v2L2 (R) + v L2 (R) = v2H 1 (R) .

Hence vL (R) vH 1 (R) v Cc (R), i.e. (2.1) holds on Cc (R). Now given
u H 1 (R), since Cc (R) is dense in H 1 (R) (Theorem 2.4), we can nd a sequence
{un }n=1,2,... in Cc (R) such that un u in H 1 (R). It follows that un u in L2 (R).
Therefore there is a subsequence of un (denote it again by un , i.e. consider that
subsequence to be the original sequence) such that un (x) u(x) a.e. on R as n .
By (2.1), which was established on Cc (R), we have that un is Cauchy in L (R) (since
it is Cauchy in H 1 (R)). Therefore it converges in L to some element u L (R).
It follows that u = u a.e. on R, i.e. that u L (R). Taking limits in un L (R)
un H 1 (R) we obtain (2.1). 

Remarks.

(i) If I is bounded, the imbedding H 1 (I) C(I) is compact. This follows from
Theorem 2.2: If u N (=the unit ball in H 1 (I) with center zero), we have
x
u (t) dt| u L2 (I) |x y| 2 x, y I.
1
|u(x) u(y)| = |
y

Hence u N , |u(x) u(y)| |x y|1/2 and the conclusion follows from the
ArzelaAscoli theorem.

43
(ii) (2.1) for I = R implies that if u H 1 (R), then

lim u(x) = 0.
|x|

For if Cc (R) un u in H 1 (R), then un uL (R) 0, n . Hence


> 0 N such that uN uL (R) < |u(x)| < for |x| suciently large,
since uN Cc (R), i.e. lim|x| u(x) = 0.

The following two propositions follow from Theorem 2.5 and they are useful in the
applications.

Proposition 2.1. Let u, v H 1 (I). Then uv H 1 (I) and (uv) = u v +uv . Moreover
we can integrate by parts:
x x

u v = u(x)v(x) u(y)v(y) uv x, y I.
y y

Th.2.5
Proof. Since u H 1 (I) = u L (I). Hence v L2 (I) uv L2 (I). Now let
un , vn , n = 1, 2, . . . be sequences in Cc (R) such that un |I u in H 1 (I) and vn |I v
in H 1 (I) as n (Theorem 2.4). It follows by Theorem 2.5 that un |I u in L (I)
and vn |I v in L (I). Now

un vn uvL2 (I) un vn un vL2 (I) + un v uvL2 (I)


n
un L (I) vn vL2 (I) + vL (I) un uL2 (I) 0.

Hence un vn |I uv in L2 (I).
In addition (un vn ) = un vn + un vn u v + uv ( L2 (I)) in L2 (I). (To see this note
e.g. that

un vn u vL2 (I) un vn u vn L2 (I) + u vn u vL2 (I)

vn L (I) un u L2 (I) + u L2 (I) vn vL (I) 0.


n

Similarly un vn uv in L2 (I)).
L2
We now have a sequence n un vn |I in H 1 (I) such that n uv, and such
L2
that n uv + u v, L2 (I). Hence n is Cauchy in L2 and n is Cauchy in L2
n is Cauchy in H 1 n w in H 1 . Hence n w in L2 w = and n w
in L2 w = ; thus = (uv) = = u v + uv . The integration by parts formula

44
follows by integrating both members of (uv) = u v + uv and using, since uv H 1 (I),
Theorem 2.2. 
Remark: It is well known that u, v L2 (I) uv L2 (I). (Take e.g. I = (0, 1), u =
v = x 4 ). Hence {L2 (I), L2 (I) } is not a Banach algebra whereas {H 1 (I), H 1 (I) }
1

is.

Proposition 2.2. Let G C 1 (R) such that G(0) = 0 and let u H 1 (I). Then
G(u) H 1 (I) and (G(u(x))) = G (u(x))u (x).

Proof. Since u H 1 (I) u L (I). Let M = uL (I) . Since G C 1 (R) and


G(0) = 0, by the mean value theorem, given s there exists : G(s) = G ()s. Hence,
given > 0, there exists a constant C = C(M, G, ) such that

|G(s)| C|s| for s [M , M + ]. (2.2)

Since |u(x)| uL (I) a.e. on I, it follows that M u(x) M a.e. on I, i.e. that
|G(u(x))| C|u(x)| a.e. on I. Since u L2 (I) it follows that G(u) L2 (I). Also by
(2.2) we have that |G (u(x))| C a.e. on I. It follows that |G (u)||u | C|u | a.e.
G (u)u L2 (I) since u L2 (I). It remains to show that

G(u) = G (u)u Cc1 (I).

I I

Since u H 1 (I), it follows by Theorem 2.4 that there exists a sequence {un } Cc (R)
such that un u in H 1 (I). Moreover, by Theorem 2.5 we have that un u in
L (I). By continuity, G(un ) G(u) in L (I). Since un L (I) uL (I) it follows
that for n large enough, un M + . Hence (2.2) gives that for n large enough
|G(un )| C|un | G(un ) L2 (I) and G(un )L2 (I) C uL2 (I) . By the dominated
convergence theorem it follows that G(un ) G(u) in L2 (I). Hence Cc1 (I)


I
G(u n ) I
G(u) , as n . Now

G (un )un G (u)u L2 (I) G (un )un G (un )u L2 (I) +

+G (un )u G (u)u L2 (I) G (un )L (I) un u L2 (I) +

+ u L2 (I) G (un ) G (u)L (I) .

Now, un u L2 (I) 0 since un u in H 1 (I). Also, for n large enough un L (I)


M+
2
and therefore G (un )L (I) C. Since un u in L (I), by continuity we

45
have that G (un ) G (u) in L (I). It follows that G (un )un G (u)u in L2 (I);

hence I G (un )un I G (u)u Cc1 (I). Since Cc1 (I), un Cc (R) we
have that

G(un ) = G (un )un , n = 1, 2, 3, . . . .
I I

Letting n we obtain the desired equality



G(u) = G (u)u .

I I

2.4 The Sobolev spaces H m(I), m = 2, 3, 4, . . .


In analogy to H 1 (I) we dene for m 2 integer the space

H m (I) = {u L2 (I) : gi L2 (I), i = 1, . . . , m such that



(i)
u = (1) i
gi , 1 i m Cc (I)}.
I I

Here
d i
(i) = ( ) .
dx
It follows easily that H m (I) H 1 (I) and that g1 = u , that u H 1 (I), and that
(u ) = g2 etc. . . , and that nally

u(m1) = (((u ) ) . . . ) H 1 (I) and that u(m) = (u(m1) ) = gm .


| {z }
m1 times

We call gi the (uniqueness easy) weak (generalized) derivative of order i (in the L2
sense) of u H m (I) and dene

Di u u(i) gi , 1 i m.

It follows easily that for m 1

H m (I) = {u H m1 (I) : u H m1 (I)}.

Here H 0 (I) L2 (I). We can easily construct examples of functions in H m (I). For
example, on a bounded interval if u C 1 (I) with u (classical derivative) piecewise
continuous on I, then u H 2 (I) and its weak second derivative coincides a.e. with u .

46
We equip H m (I) with the inner product (, )m , where

m
(u, v)m = (Dj u, Dj v), (u(0) D0 u = u), u, v H m (I).
j=0

This inner product induces the norm


( ) 21
1
m
um = (u, u)m = 2
Di u2 .
i=0

An obvious modication of Theorem 2.1 shows that {H m (I), (, )m } is a Hilbert space.


By denition and Theorem 2.2, it follows that if u H m (I), then there exists u
C m1 (I) such that u = u a.e. on I and
x
D u(x) D u(y) =
i i
Di+1 u(t) dt, x, y I, i = 0, 1, 2, . . . , m 1.
y

Also, given u H m (I), there exists a sequence {un } Cc (R) such that un |I u in
H m (I) (density) and that H m (I) C m1 (I) with

m1
Dj uL (I) CuH m (I) , u H m (I).
j=0

If u, v H m (I), then uv H m (I) and



m
m
Dm (uv) = Dj u Dmj v , m 1 (Leibnizs rule).
j=0 j

These results follow easily with techniques similar to the ones used in their H 1 coun-
terparts.
We nally mention without proof the following interpolation result. If 1 j m1,
then > 0 C = C(, (I) ) such that

Dj u Dm u + C u u H m (I).

It follows that the quantity u+Dm u for u H m (I) is a norm on H m (I), equivalent
to um .

2.5 The space H 1(I)


0
Denition We dene H 1 (I) to be the closure of Cc1 (I) in H 1 (I), i.e. the (closed)
subspace of H 1 (I) whose elements are limits in H 1 (I) of sequences of functions in
Cc1 (I).

47
0
It follows that {H 1 (I), (, )1 } is a complete Hilbert space (separable).
Remarks.

(i) If I = R, since Cc (R) Cc1 (R) H 1 (R) and Cc (R) is dense in H 1 (R) (cf. The-
0
orem 2.4) it follows that Cc1 (R) is dense in H 1 (R) H 1 (R) = H 1 (R). However
0
if I = R, then H 1 (I) H 1 (I). For example, on a bounded interval I consider
u(x) = c1 ex + c2 ex C (I) for which u = u. Hence


0 = (u u, ) = (u u) = u u = (u, )1 Cc1 (I).
I I I

Hence u is orthogonal in H 1 (I) to Cc1 (I), which therefore cannot be dense in


H 1 (I).
0 0
(ii) In fact Cc (I) is dense in H 1 (I). To see this, let, for u H 1 (I), > 0, Cc1 (I)
be such that u 1 2 . Now extending Cc1 (I) by zero outside its support
to the whole of R, we have Cc1 (R) and n Cc (I) for suciently large
n (cf. Lemma 2.3). Moreover, (cf. Remark (iii), p. 30) n H 1 (I) 0 as
n . Therefore choose n so that n 1 2 , from which it follows that
0
u n 1 , i.e. that Cc (I) is dense in H 1 (I).
0
(iii) Let us also remark that u H 1 (I) Cc (I) u H 1 (I). In fact, if u
H 1 (I) Cc (I), extending u by zero outside I to the whole of R, we have, by
Lemma 2.3, that, for n suciently large, n u Cc (I) and (cf. Proof of
0
Theorem 2.4), u n u1 0 as n . Hence by (ii) above, u H 1 (I).

These remarks have prepared the ground for the following theorem which characterizes
0
the functions in H 1 (I) in a very useful way.
0
Theorem 2.6. Let u H 1 (I). Then u H 1 (I) if and only if u = 0 on I (= the
boundary of I).

Comments: Again for u H 1 (I) the statement u = 0 on I is well understood,


0
in the sense of Theorem 2.2 see the remarks there. Theorem 2.6 makes then H 1 (I) a
very useful space, in which homogeneous (zero) boundary conditions are automatically
satised for u| I , i.e. a space in which weak solutions of boundaryvalue problems
0
such as (), p. 25 may be naturally sought. Again assume I = R, since H 1 (R) = H 1 (R).

48
0
Proof. If u H 1 (I), then there exists a sequence {un } Cc1 (I) such that un u in
H 1 (I). By Sobolevs Theorem 2.5 un u in L (I), i.e., if u(x) C(I), u = u a.e. on
I supxI |un (x) u(x)| 0, n . Since un | I = 0 u| I = 0 u| I = 0.
0
Suppose now that u H 1 (I) and u| I = 0. We shall show that u H 1 (I).
First, let us examine the case of a bounded interval and take, with no loss of
generality, I = (0, 1). Consider the intervals
( ) ( )
1 1 2 4
K1 = , , I = (0, 1), K2 = , .
3 3 3 3
Then, we may nd functions 1 Cc (K1 ), 0 Cc (I), 2 Cc (K2 ) such that
0 i 1 and (extending them by zero outside their intervals of denition) 1 (x) +
0 (x) + 2 (x) = 1, x I. (The functions i form a partition of unity corresponding
to the open cover {K1 , I, K2 } of I, and can be constructed e.g. as follows:
It is clear that given any interval (a, b) we can nd Cc (a, b) such that (for > 0

(x)

. .
a b

small enough) (x) = 1 for a + 2 x b 2 and (x) = 0 for x (a, a + ] and


x [b , b). With the same (take any 0 < < 1
12
) construct such functions 1 (x) for
( 31 , 13 ), 0 (x) for (0, 1), 2 (x) for ( 32 , 43 ). It is clear then that 0 (x)+1 (x)+2 (x) 1,
x [ 31 + 2, 43 2].
Let (x) be such a function (with the same e.g. ) for the interval [2, 1 + 2]. Then
1 1 2
2
0

1/3 0 1/3 2/3 1 4/3


0 1 2 0

let
i (x)
i (x) = (x), i = 0, 1, 2.
i (x)
It is easily seen that the i s satisfy the desired properties).
Let now ui (x) = u(x)i (x), i = 0, 1, 2, so that

2
u(x) = ui (x), x I.
i=0

49
Now u0 (x) = u(x)0 (x). Since u H 1 (I), 0 Cc (I) u0 H 1 (I) Cc (I)
0
(identifying u0 with its continuous representative). By Remark (iii) p. 46, u0 H 1 (I).
We consider now u1 (x) = u(x)1 (x). Extending 1 by zero to the whole of I, we
see that u H 1 (I), 1 Cc ( 31 , 1) u1 H 1 (I). By hypothesis we also have that
u1

0 1/3 1

u1 (0) = u(0)1 (0) = 0 1 (0) = 0. Also suppu1 [0, 13 ). Extend now u1 by zero to R,
i.e. consider the function

u (x) if x I
1
u1 (x) =
0 if x I.

This function belongs to H 1 (R), since, for any Cc (R),


1 1
Prop.2.1

u1 = u1 = u1 (1) (1) u1 (0) (0) u1
0 | {z } | {z } 0
=0 =0

= u1 ,

where u1 , the extension by zero outside I of u1 , is in L2 (R) since u1 H 1 (I).


Now, for any function f on R let for h > 0 h f denote the right htranslate of f ,

~
~
u1 h u1

0 h 1/3 1

(h f )(x) f (x h).

Since u1 H 1 (R) it follows that h u1 H 1 (R) and

lim h u1 u1 H 1 (R) = 0.
h0

(To see this, given > 0 let Cc (R) be such that u1 H 1 (R) < 3 . It follows, for
any h > 0, that

h u1 h H 1 (R) = u1 H 1 (R) < .
3
50
But it is obvious, since , h Cc (R), that there exists h0 such that


0 h < h0 h H 1 (R) .
3

Hence, given > 0 h0 such that

0 h < h0 h u1 u1 H 1 (R) <

by the triangle inequality). Now, the restriction h u1 |I , for h suciently small, belongs
to H 1 (I) Cc (I) (possibly upon modication it on a set of measure zero). Hence, by
0
Remark (iii), p. 46, h u1 |I H 1 (I). Therefore, given > 0 we have that Cc (I)
such that h u1 H 1 (I) < 2 . Since limh0 h u1 u1 H 1 (R) = 0 h such that


h u1 u1 H 1 (R) = h u1 u1 H 1 (I) < .
2
0
It follows that u1 H 1 (I) < , i.e. that u1 H 1 (I).
0
Entirely analogous considerations show that u2 H 1 (I). Since u = u0 + u1 + u2 it
0
follows that u H 1 (I) QED.
For a semiinnite interval the proof follows in the analogous manner. Let I =
(0, ), with no loss of generality. Construct 1 Cc ( 13 , 31 ), 0 Cc (0, ) with
supp0 [, ), > 0 suciently small, so that 0 i 1 and so that (extending
1 by zero to [ 13 , )) 1 (x) + 0 (x) = 1 for x [0, ). (This can be achieved as on p.
47, by taking 1 as before and extending 0 (x) and w(x) by setting them equal to 1
0
for x 21 ). Again with ui = ui we have as before that, since u0 = 0, u1 H 1 (0, ).
Consider u0 = u0 . Extend it by zero to the whole of R, i.e. put

u (x) if 0x<
0
u0 (x) =
0 if < x < 0.

It is clear that u0 H 1 (R) (since u0 H 1 (I)). Since u0 it has support in [, ),


> 0, it is not hard to see that for n suciently large, n u0 , has support in [ , ),
> 0, belongs to C (R) H 1 (R) and, of course,

u0 n u0 H 1 (R) = u0 (n u0 )|I H 1 (I) 0 as n .

Consider now the functions


u0,n = n |I (n u0 )|I ,

51
where n (x)|I is the restriction to I = [0, ) of the truncation function n (x) introduced
in the proof of Theorem 2.4. It follows that for n suciently large, u0,n Cc (I). A
similar calculation to the one used in the proof of Theorem 2.4 shows nally that

u0,n u0 H 1 (I) 0 as n .
0
Hence u0 H 1 (I). 
0
Remark: Essentially the proof above shows the following characterization of H 1 (I)
which is of interest by itself. For u L2 (I) let u(x) be its extension by zero to the
whole real line, i.e. let
u(x) if x I
u(x) =
0 if x R I.
0
Then, u H 1 (I) if and only if u H 1 (R).
We nally mention a result which will be very useful in the existence uniqueness
theory of weak solutions of boundaryvalue problems.

Proposition 2.3 (Inequality of PoincareFriedrichs). Suppose that I is a


bounded interval. Then, there exists a constant C (depending on (I)) such that
0
u1 C u u H 1 (I), (2.3)
0
in other words, the quantity u is a norm on H 1 (I), equivalent to u1 .
0 0
Proof. If u H 1 (I) H 1 (a, b) we have, by Theorems 2.2 and 2.6 that for x I
x b

|u (t)| dt (b a) 2 u .
1
|u(x)| = | u (t) dt|
a a

Hence b
u =2
u2 (t) dt (b a)2 u 2 ,
a

and (2.3) holds with C = (1 + (b a)2 )1/2 . 


Remarks:
0
(i) It follows that on H 1 (I), I bounded, the expression (u , v ) denes an inner prod-
uct.

52
0
(ii) Entirely analogously one may dene, for m 2 integer, the spaces H m (I) as
completions of Cc (I) in H m (I). One may show that
0
H m (I) = {u H m (I) : u = Du = = Dm1 u = 0 on I}.

We should keep in mind the distinction between e.g.


0
H 2 (I) = {u H 2 (I) : u = Du = 0 on I}

and
0
H 2 (I) H 1 (I) = {u H 2 (I) : u = 0 on I}.

2.6 Twopoint boundaryvalue problems


We return now to the twopoint boundaryvalue problem () (p. 25) (also called a
SturmLiouville problem). We shall mainly discuss homogeneous (zero) Dirichlet and
Neumann boundary conditions and follow the variational method outlined in 2.1.

2.6.1 Zero Dirichlet boundary conditions.

We let I = (a, b) be a bounded interval. We consider the problem of nding u(x),


x [a, b] such that

(pu ) + qu = f in (a, b), (2.4)

u(a) = u(b) = 0, (2.5)

where p, q, f are given functions on I such that p C 1 (I), p(x) > 0 x I,


q C(I) such that q(x) 0 x I and where we shall assume f C(I) (sometimes
just f L2 (I)). Based on the discussion on p. 26 we make the following
Denition Let f C(I). Then a classical solution of (2.4), (2.5), is a function
u(x) C 2 (I) which satises the D.E. (2.4) in the usual sense for each x I and which
also satises the b.c. (2.5) in the usual sense. (We shall say that u satises (2.4)
and (2.5) in the usual sense). If f L2 (I), then a weak solution of (2.4), (2.5), is a
0
function u H 1 (I) which satises

0

pu v + quv = f v v H 1 (I). (2.6)
I I I

53
Following the program outlined on pp. 2526 we prove a series of results:
(i) A classical solution of (2.4), (2.5) is a weak solution of (2.4), (2.5) as
well.
Let u satisfy (2.4), (2.5) classically. Since u C 2 (I) and u(a) = u(b) = 0 u
0 0
H 2 (I) H 1 (I). We multiply the D.E. (2.4) by any v H 1 (I) and integrate on I. By
Proposition 2.1, p. 42, since pu H 1 we can integrate by parts and obtain

b
f v = (pu ) v + quv = pu v|a + pu v + quv
I I I I I

= pu v + quv,
I I

0 0
since v H 1 = H 1 (I) (we suppress I from the symbols of the function spaces). Hence
u is a weak solution.
(ii) Existence and uniqueness of the weak solution.

Let f L2 (I) and let B(v, w) = I pv w + I qvw. Clearly, by our hypotheses B(, )
0 0
is a bilinear, symmetric form on H 1 H 1 (i.e. on H 1 H 1 ). Moreover, for v, w H 1
we have


|B(v, w)| |p||w ||v | + |q||w||v|
I I
max |p(x)| w v + max |q(x)| w v
xI xI
c1 v1 w1 , (2.7)

where c1 = maxxI |p(x)| + maxxI |q(x)|. Hence B is continuous on H 1 H 1 .


0
Now, for v H 1 we have, by our hypotheses on p and q that

2
B(v, v) = p(v ) + qv (v )2 c2 v21 ,
2
(2.8)
I I I

where c2 = /C2 , with C being the constant in the PoincareFriedrichs inequality,


0
which was used (since v H 1 ) in the last step.
0
Hence the bilinear form B satises, on the Hilbert space {H 1 , 1 }, the hypotheses
(i) and (ii) of the LaxMilgram theorem (p. 19). In addition it is straightforward to

see that in the R.H.S. of (2.6), F (v) = I f v is a continuous, linear functional on
0 0
H 1 since |F (v)| f v f v1 v H 1 . (Note that ||| F ||| f where by
0
|F (v)|
||| F ||| we denote the norm of the b.l.f. F on H 1 , i.e. the sup ). Hence the
0
0=vH 1 v1

54
LaxMilgram theorem applies and shows that the problem

0

pu v + quv = f v , i.e. B(u, v) = F (v) v H 1 ,
I I

0
has a unique solution u H 1 . Moreover

1
u1 f . (2.9)
c2

Since a classical solution is a weak solution and we have shown now uniqueness of the
weak solution, it follows that there is at most one classical solution.
(iii) Regularity of the weak solution.
Let f L2 (I). Then (2.6) gives that if u is the weak solution, then

0

pu v = (f qu)v v H 1 .
I I

Hence

pu =
(f qu) Cc (I),
I I

which shows that pu H , since pu L2 and since there exists g (= qu f ), g L2
1


such that I pu = I g Cc (I). It follows that pu H 1 and (pu ) = qu f .
i.e. (pu ) + qu = f holds in L2 .
Now, since p(x) > 0, x I, p C 1 (I), it follows that 1
p(x)
C 1 (I). Hence
u = 1
p
(pu ) H 1 , since pu H 1 and p1 H 1 . It follows that the weak solution
0 0
u (shown to be in H 1 by the LaxMilgram theorem) actually belongs to H 2 H 1 .
Moreover, since (pu ) + qu = f in L2 , we have now pu = p u + qu f . Hence

u max |p (x)|u + max |q(x)|u + f .


xI xI

This estimate coupled with (2.9) (u1 c1


2 f ), shows that there exists a constant
0
c3 = c3 (p, q, I), such that for the weak solution u H 2 H 1 of (2.4), (2.5) we also have

u2 c3 f . (2.10)

(Estimates such as (2.10)are called elliptic regularity estimates (H 2 L2 ) for equation


(2.4)).
Now, let f C(I). We shall show now that the weak solution u is in C 2 (I)
0
(u H 1 (I) guarantees that u(a) = u(b) = 0 in the sense of Sobolevs theorem). Since

55
u H 2 (I) u C(I) (take the continuous representative of u ). Hence u C 1 (I).
But pu = p u +quf (in L2 ). Since f C(I), the R.H.S. is continuous u C(I)
by our hypotheses on p(x). Hence u C 2 (I).
(iv) If f C(I), the weak solution is a classical solution.
Since f C(I) the above shows that u C 2 (I). Now (2.6) gives that

(pu ) + qu = f Cc (I).

I I I

Integrating by parts, since u C 2 (I), we have that



((pu ) + qu f ) = 0 Cc (I).
I

Since Cc (I) is dense in L2 (I) (pu ) + qu f = 0 a.e. on I. (This also follows


from the fact that (pu ) + qu f = 0 in L2 ). But (pu ) + qu f C(I). Therefore
(pu ) + qu = f x I. Since u(a) = u(b) = 0 as well, it follows that for f C(I)
the weak solution is also classical.

Remarks

(i) Since B(u, v) is symmetric, the weak solution of (2.4), (2.5), i.e. the solution of
the variational problem
0
B(u, v) = (f, v) v H 1 ,

may be characterized as the (unique) solution of the minimization problem

J(u) = min J(v),


0
vH 1

where J is the energy functional



1 1 2
J(v) = B(v, v) F (v) = p(v ) + qv
2
f v.
2 2 I I

This follows from the RayleighRitz Theorem 1.5 and is known in the present
context as Dirichlets principle.

(ii) It is evident that the weak solution of (2.4), (2.5), i.e. the solution of (2.6), exists
under much weaker conditions on p and q than those assumed. For example, the
two integrals appearing in the denition of B(v, w) exist if e.g. p, q L (I).

56
Similarly (2.7) and (2.8) hold only under the additional assumptions p > 0,
q 0 a.e. on I. Moreover the righthand side F (v) makes sense (interpreting
0
(f, v) properly) as a bounded linear functional on H 1 with much more general
0
f (than f L2 ). More precisely, f may belong to the dual of H 1 , the so called
negative Sobolev space H 1 (I); e.g. the delta function is an H 1 function in
1 dimension. It is evident that such weak conditions guarantee only existence
0
uniqueness of u in H 1 . To obtain more smoothness for u, i.e. to show that u H 2
or that u C 2 (I), it is clear that one has to assume more smoothness for the
coecients p, q and f . In the same vein of thought one may allow q(x) to take
on negative values as long as q(x) , x I, where is such that in (2.8),
B(v, v) c2 v21 for some c2 > 0. (A lower bound on may be easily found
in terms of 0 < = minxI p(x) and the constant of Poincares inequality
2
I
v I (v )2 . It is not hard to see that the best such constant is equal
to 1
1
, where 1 is the smallest eigenvalue of the problem u = u with zero
2
Dirichlet b.c. at a and b, i.e. 1 = (ba)2
).

(iii) Using (2.9) and Sobolevs inequality we have that under our hypotheses the weak
solution u belongs to L (I) and that uL cf . Using the elliptic regularity
(2.10) and Sobolevs inequality we can conclude that u L and u L cf .
Finally, using the equation we conclude that u L cf , i.e. that the weak
0
solution u H 2 H 1 actually belongs to a space of functions with bounded
generalized derivatives. For u C 2 (I), the classical solution, we then have

max (|u| + |u | + |u |) c max |f |.


xI xI

0 0
(iv) Consider again the weak solution, i.e. u H 1 solving B(u, v) = (f, v) v H 1
for f L2 . The map f 7 u is linear; let us denote it by T f = u. T , the solution
operator of the problem (2.6), is, by the above, a bounded linear operator from
0
L2 into H 2 H 1 : we interpret (2.10) as T f 2 c3 f . T is then the inverse
0
of the elliptic operator Lu = (pu ) + qu dened on H 2 H 1 . Note that T is
dened by
0
B(T f, v) = (f, v), v H 1

57
for each f L2 . T is actually a bounded, selfadjoint operator from L2 into L2 ;
that it is bounded follows from (2.10). To see that it is selfadjoint, note that
0
for f , g L2 , T g H 2 H 1 and

(f, T g) = B(T f, T g) = B(T g, T f ) = (g, T f ) = (T f, g).


0
Note that T L = I on H 2 H 1 and LT = I on L2 (I = identity). Note also that
(T f, f ) = B(T f, T f ) c2 T f 21 . If T f = 0 u = 0 f = 0, i.e. T is positive
denite.

(v) The case of the b.v.p. with non homogeneous Dirichlet boundary conditions
u(a) = a1 , u(b) = a2 easily reverts to the problem (2.4), (2.5). Let (x) be a
linear function (or any other function in C 2 (I)) such that (a) = a1 , (b) = a2 .
Then if u is the solution of the nonhomogeneous b.v.p., the function v = u
satises the homogeneous b.c. v(a) = v(b) = 0 and the D.E.

(pv ) + qv = g := f L = f + (p ) q,

i.e. a D.E. of the same form with new R.H.S. g = f L C(I).

2.6.2 Neumann boundary conditions.

We now consider the problem

(pu ) + qu = f in (a, b), (2.11)

u (a) = u (b) = 0, (2.12)

i.e. the (homogeneous) Neumann b.c. problem. Again we let p C 1 (I), p(x) > 0
x I, f C(I) (or f L2 (I)), but now we assume that q C(I) witht q(x) > 0
x I. (Note possible nonuniqueness if q = 0: e.g. consider p = 1, f = 0, q = 0, i.e.
u = 0, u (a) = 0, u (b) = 0, which has any constant as its solution). It is not hard
to motivate the following
Denition Let f C(I). Then a classical solution of the Neumann b.v.p. (2.11),
(2.12), is a function u(x) C 2 (I) which satises (2.11), (2.12) in the usual sense. If
f L2 (I), then a weak solution of (2.11), (2.12), is a function u H 1 (I) such that


pu v + quv = f v v H 1 (I). (2.13)
I I I

58
Following the steps of the Dirichlet b.c. case we have:
(i) A classical solution of (2.11), (2.12) is a weak solution as well.
Proof obvious as before (now multiply by v H 1 and use the natural b.c. u (a) =
u (b) = 0 on u; (2.13) follows).
(ii) Existence and uniqueness of the weak solution.

Let f L2 (I) and B(v, w) = I pv w + I qvw. By our hypotheses B(, ) is a bilinear,
symmetric form on H 1 H 1 . (2.7) holds with the same constant c1 as on p. 52. For
v H 1 we have

2 2
B(v, v) = p(v ) + qv
2
(v ) + v 2 c2 v21 , (2.14)
I I I I

where c2 = min{, } > 0. Since F (v) = I
f v is a b.l.f. on H 1 there follows, from
the LaxMilgram theorem that there exists a weak solution u H 1 of (2.13) and that
u1 1
c2
f . (Note that since u H 1 we cannot give meaning to the point values
u (a), u (b) unless we prove that the weak solution is in H 2 , something that we do in
the next step).
(iii) Regularity of the weak solution.
For f L2 (I) the weak solution u satises


pu v = (f qu)v v H 1 .
I I

Hence, a fortiori,

pu =
(f qu) Cc (I).
I I

It follows, as on p. 53, that pu H 1 and that (pu ) + qu = f holds in L2 . Since


u = p1 (pu ), it follows , as on p. 53, that u H 2 . Returning now to




pu v = (f qu)v v H 1 ,
I I

since pu H 1 and v H 1 , integration by parts gives (note that we can assign meaning
to the values u (a), u (b) the values of the continuous representative of u H 1 at
a, b) that

((pu ) v + quv f v) dx + p(b)u (b)v(b) p(a)u (a)v(a) = 0
I

holds, for each v H 1 . Since we already saw that (pu ) + qu = f in L2 , it follows


that p(b)u (b)v(b) p(a)u (a)v(a) = 0 v H 1 u (b) = 0, u (a) = 0. (Choose e.g.

59
v(x) = x a or v(x) = x b). Hence the weak solution (2.11), (2.12), which exists
uniquely by LaxMilgram in H 1 , actually belongs to H 2 , satises (pu ) + qu = f in
L2 and u (a) = u (b) = 0. Moreover, exactly as before we obtain u2 cf (elliptic
regularity).
Assume now f C(I). Then exactly as in the Dirichlet b.c. case, use of Sobolevs
theorem gives that u C 2 (I) for the weak solution.
(iv) If f C(I), the weak solution is classical.
The proof that (pu ) + qu = f a.e. (pu ) + qu = f everywhere in I, i.e. that
the weak solution (which is C 2 if f C(I)) is classical, is identical to the one of the
Dirichlet b.c. case. Of course u (a) = u (b) = 0 already for the weak solution (in H 2 ).

60
Remarks.

(i) The weak solution is characterized now (by the RayleighRitz theorem) as the
solution of the minimization problem

1 2
J(u) = min1 J(v), J(v) = p(v ) + qv 2
f v, v H 1 .
vH 2 I I

This follows from the symmetry of B.

(ii) Analogous remarks to the Dirichlet b.c. remarks (ii)(v) follow, mutatis mutan-
dis.

(iii) We can consider of course other homogeneous b.v. problems for the equation
(pu ) + qu = f on (a, b) as well. For example the problem with b.c. (assume
Dirichlet b.c. assumptions on p, q)

u(a) = 0, u (b) = 0, (mixed b.c.)

has the following weak formulation: Let


0
Ha {v H 1 (I) : v(a) = 0}.
0
Clearly {Ha , 1 } is a Hilbert space (a closed subspace of H 1 (I) that includes
0 0
H 1 ). We then seek u Ha such that
0

B(u, v) pu v + quv = f v holds v Ha .
I I I

The LaxMilgram theorem shows existenceuniqueness of the weak solution since


0 0
B(, ) is bilinear, bounded and coercive on Ha Ha . The rest (f C(I) u
classical etc.) follows easily.

The problem with b.c.

u (a) k u(a) = 0 (k const.), u(b) = 0


0
has the following weak formulation on Hb {v H 1 (I) : v(b) = 0}:
0
Seek v Hb such that
0

Bk (u, v) pu v + quv + p(a)k u(a)v(a) = f v v Hb .
I I I

61
It is straightforward to see that Bk (, ) is bilinear, symmetric, continuous on
0 0
Hb Hb (by Sobolevs theorem) and coercive, provided k 0 (or k < 0 with |k|
0
suciently small). Hence a weak solution u Hb exists and its classical analog
follows.

The periodic boundary condition problem, i.e. the problem with b.c.s

u(a) = u(b), u (a) = u (b)

may be similarly treated with the following weak formulation: Seek u H1


(assume p(a) = p(b))


(pu v + quv) = fv , v H1 ,
I I

where H1 {v H 1 (I) : v(a) = v(b)}, the so called periodic H 1 .

Let us also remark here that with the machinery already developed in this section
(and the added fact that the injection of H 1 (I) , L2 (I) is compact for bounded
I ) we can use the spectral theorem for the bounded, selfadjoint, positive
denite operator T (cf. remark (iv), p. 55) T is compact, as an operator from
L2 into L2 to prove e.g. theorems about SturmLiouville eigenproblems. For
example, we may thus prove that, with p C 1 (I), p(x) > 0, q C(I),
there exists a sequence of reals {n }
n=1 such that n when n , and a

complete orthonormal system {n }


n=1 in L (I), such that n C (I) and such
2 2

that for n = 1, 2, 3, . . .

(p ) + q = , in I
n n n n
(a) = (b) = 0.
n n

The numbers n are called the eigenvalues and the functions n (x) the eigenfunc-
( d )
tions of the SturmLiouville operator L() dx
d
p dx () +q, with homogeneous
Dirichlet boundary conditions.

62
Chapter 3

Galerkin Finite Element Methods


for TwoPoint BoundaryValue
Problems

3.1 Introduction
We consider again the twopoint boundaryvalue problem on a bounded interval (a, b)
with homogeneous Dirichlet b.c.s:

(pu ) + qu = f in (a, b) I, (3.1)

u(a) = u(b) = 0, (3.2)

for which we assume, as in 2.6, that p C 1 (I), p(x) > 0 in I, q C(I), q(x) 0
on I. We recall that if f L2 (I), there exists a unique weak solution of (3.1), (3.2)
satisfying
0 0
u H 1 (I), B(u, v) = (f, v) v H 1 (I) , (3.3)

where
b b

B(v, w) = (pv w + qvw), v, w H (I), (v, w) =
1
vw.
a a
0 0
In fact, u H 2 H 1 (suppress the I in the notation H 1 (I) etc.) and we have the
elliptic regularity estimate
u2 C f , (3.4)

63
where C is a nonnegative constant independent of u and f . If f C(I), then the
(unique) weak solution is in fact in C 2 (I) and solves (3.1), (3.2) in the classical sense.
The Galerkin method for the approximation of the weak solution, i.e. of the solution
of (3.3), follows the abstract framework of 1.9. We note that B satises (see 2.6)

|B(v, w)| C1 v1 w 1 v, w H 1 (I), (3.5)


0
B(v, v) C2 v21 v H 1 , (3.6)

where in fact e.g. C1 = p + q , C2 = /C2 , where C is the constant


in the PoincareFriedrichs inequality. Hence, if Sh , 0 < h 1, is a family of nite
0
dimensional subspaces of H 1 , Galerkins Theorem 1.4 gives that for each h, there exists
a unique element uh Sh , the Galerkin approximation of u in Sh satisfying

B(uh , vh ) = (f, vh ) vh Sh . (3.7)

Moreover, uh satises the H 1 error estimate

C1
u uh 1 inf u 1 . (3.8)
C2 Sh

The discrete problem (3.7) (discrete version of (3.3)) is equivalent to a linear system of
equations. Let N = Nh = dimSh and {i }N
i=1 be a basis of Sh . Then, as we saw in Ch.

1, the coecients {ci } of uh with respect to the basis i , i.e. the numbers ci :


N
uh (x) = ci i (x) (3.9)
i=1

satisfy the linear system


A c = fh (3.10)
b
where A is the N N matrix with elements Aij = B(j , i ) = a
(pj i + qj i ),
1 i, j N , c = [c1 , . . . , cN ]T , fh = [(f, 1 ), . . . , (f, N )]T . The matrix A is symmetric
and positive denite on RN due to our assumptions on B (i.e. on p and q).
The question is, of course, how to choose the nitedimensional subspace Sh of
0
H 1 (I). The choice should be such that

inf Sh u 1 is small (cf. (3.8)), i.e. that we can approximate well elements
0
u of H 2 H 1 by elements of Sh .

64
The system (3.10) is easy to solve.

A classical choice (Ritz, Galerkin, spectral methods) is to let Sh SN be the span


of the rst N eigenfunctions j of the SL operator Lu = (pu ) + qu with zero b.c.s
at x = a and x = b. If n , n are the eigenvalues, resp. eigenfunctions, of this problem
(cf. concluding remarks of Ch. 2 ), then the system (3.10) may be solved explicitly by

(f, i )
ci = , 1iN ,
i
N
and the Galerkin approximation uh uN = i=1 ci i (x) has good approximation

properties; in particular
0
inf u 1 0, as N for u H 2 H 1 .
SN

The diculty in this approach is of course that it requires the explicit knowledge of
the eigenpairs (i , i ), 1 i N , which are, in general, not easy to nd analytically.
Another obvious choise that will also guarantee good approximation properties is
to choose Sh as the vector space of the polynomials of a xed degree that vanish at
the endpoints. The problem with this approach is that, in general, the condition of the
linear system (3.10) will be bad. Moreover, the matrix A will be, in general, full, since
the polynomial basis functions j will not be of small support in I. Since we expect N
to be large for approximability, we conclude that, in general, the system (3.10) will be
very hard to solve accurately. Moreover, since N degree of polynomials in Sh , we
will run into problems trying to compute uh as a polynomial function of large degree.
A good choice turns out to be piecewise polynomial functions (consisting of polyno-
mials of small degree on each subinterval of a partition of I) continuous on I and
endowed with basis functions of small support. In this way, we obtain various nite
0
element subspaces Sh of H 1 (I).

3.2 The Galerkinnite element method with piece-


wise linear, continuous functions
Let a = x0 < x1 < x2 < . . . < xN < xN +1 = b dene an arbitrary partition of [a, b] with
N interior points xi , 1 i N . Let hi = xi+1 xi , 0 i N and put h = maxi hi .

65
(For uniform partitions h = hi = (ba)/N +1). Let Sh be the vector space of functions
dened by

Sh = { : C[a, b], (a) = (b) = 0, is a linear

polynomial on each (xi , xi+1 ), 0 i N } (3.11)

(we write the last condition as |(xi ,xi+1 ) P1 ).


0
It is not hard to see that Sh is a nite dimensional subspace of H 1 and that dimSh =
N . The latter follows e.g. from the fact that the set of functions {i }N
i=1 dened by

{i Sh , i (xj ) = ij },

or, explicitly, for 1 i N , as




xxi1
xi1 x xi

xi xi1
, if
xi+1 x
i (x) = , if xi x xi+1 (3.12)

xi+1 xi

0, if x I \ (xi1 , xi+1 )

forms a basis of Sh : Obviously i Sh .

1 1 1
1 i N

a =x 0 x1 x2 x i1 xi x i+1 x N1 x N x N+1 = b

N
Moreover, if i=1 ci i (x) = 0 x I, putting x = xj , 1 j N , we see that
cj = 0, i.e. that {i } are linearly independent. In addition, since for each Sh ,


N
(x) = ci i (x), where ci = (xi ),
i=1

i.e. since each in Sh is uniquely determined by its nodal values (xi ), 1 i N , we


conclude that {i } spans Sh .

(x i ) = ci
(x)

a =x 0 x1 x2 x i1 xi x i+1 xN x N+1 = b

66
b
Evaluating the elements Aij = a
(pj i + qj i ) of the matrix A of the system
(3.10) yields that (since suppi = [xi1 , xi+1 ]) Aii = 0, 1 i N , Ai,i+1 = 0,
Ai+1,i = 0, 1 i N 1, while all other Aij = 0. Hence, due to the small support
of the basis functions, the matrix A is sparse; in this case tridiagonal. Since it is also
symmetric and positive denite, the system (3.10) may be eciently solved e.g. by a
banded (tridiagonal) Cholesky algorithm. In the special case p = q = 1 and a uniform
partition of I = (0, 1) with h = 1/(N + 1), A is the sum of the stiness matrix S,
1 1
where Sij = 0 j i , and the mass matrix M , where Mij = 0 j i . Indeed, S and M
are then the tridiagonal, symmetric and positive denite matrices


2 1 0
4 1 0

1 2 1 1 4 1
1
. .. .. ..



h . . . . . .


S= . . , M= . . . .
h 6

0 1 2 1 0 1 4 1

1 2 1 4

We now return to the error estimate (3.8). As h 0, i.e. as dimSh , we would


0
like to prove that inf Sh u 1 0, where u H 2 H 1 is the weak solution of
(3.1), (3.2). In fact, we shall prove that there is a constant C, independent of h and v,
0
such that inf Sh v 1 C h v for any v H 2 H 1 .
A simple and convenient way to do this is to study the properties of the interpolant
of v in the space Sh .
The interpolant (Ih v)(x) of a continuous function v(x) (such that v(a) = v(b) = 0)
in the space Sh is dened as the (unique) element of Sh satisfying

(Ih v)(xi ) = v(xi ), 1 i N, (3.13)

Ihv

a =x 0 x1 x2 x i1 xi x i+1 xN x N+1 = b

N 0
i.e. as the element (Ih v)(x) = i=1 v(xi )i (x) of Sh . Hence Ih : H 1 Sh is a linear
0
operator on H 1 (the interpolation operator onto Sh ).

67
0
Lemma 3.1. Let v H 1 . Then Ih v satises

(i) ((Ih v v) , ) = 0 Sh , (3.14)

(ii) v Ih v C h (v Ih v) (3.15)

for some constant C independent of h and v.

Proof. (i) Let Sh . Then


b N
xi+1

((Ih v v) , ) = (Ih v v) dx = (Ih v v) dx =
a i=0 xi
N {
xi+1 }
v) ]x=x
xi+1
= [(Ih v i
(Ih v v) dx = 0,
i=0 xi

since (Ih v)(xi ) = v(xi ), 0 i N + 1, and |[xi ,xi+1 ] = 0.


0
(ii) Since (Ih v v)(xi ) = 0, 0 i N + 1, Ih v v H 1 (xi , xi+1 ) for each
i = 0, 1, . . . , N . By the proof of Proposition 2.3 (the PoincareFriedrichs inequality),
we conclude that there is a constant C, independent of the xi or v such that

Ih v vL2 (xi ,xi+1 ) C (xi+1 xi ) (Ih v v) L2 (xi ,xi+1 )

for all i (in fact the best value of C is 1/). We conclude that


N
N
Ih v v 2
= Ih v v2L2 (xi ,xi+1 ) C h 2 2
(Ih v v) 2L2 (xi ,xi+1 ) =
i=0 i=0
2
= C h (Ih v v)
2 2
q.e.d.


Using Lemma 3.1 we may now prove the following approximation properties of the
interpolant Ih v:

Proposition 3.1. There is a constant C independent of h such that


0
(i) v Ih v + h(v Ih v) C h v v H 1 , (3.16)
0
(ii) v Ih v + h(v Ih v) C h2 v v H 2 H 1 . (3.17)
0
Proof: (i) For v H 1 , (v Ih v) v +(Ih v) . But, by (3.14), ((Ih v) , ) =
(v , ) Sh . Put = Ih v and obtain

(Ih v) 2 = (v , (Ih v) ) v (Ih v) , i.e. (Ih v) v .

68
We conclude that (v Ih v) 2v . Hence, by (3.15)

v Ih v C h (v Ih v) 2 C h v ,

proving (3.16).
0
(ii) Let now v H 2 H 1 . Then,

(v Ih v) C h v (3.18)

for some constant C independent of h and v. To see this, we have

(v Ih v) 2 = ((v Ih v) , (v Ih v) ) = (v , (v Ih v) ) (why?)
b b

v (v Ih v)
b
= v (v Ih v) = [v (v Ih v)]a
a a

( since v H , v Ih v H )
1 1
b by(3.15)
= v (v Ih v) v v Ih v C h v (v Ih v) .
a

We conclude therefore that (v Ih v) C h v . This gives, in view of (3.15),


v Ih v C h2 v . Hence (3.17) holds q.e.d. 
In view of Proposition 3.1, we now have, since

inf (v + h (v ) ) v Ih v + h (v Ih v) ,
Sh

that
inf (v + h (v ) ) Ck hk Dk v (3.19)
Sh
0
for k = 1, 2 if v H k H 1 .
Hence,
inf v 1 C h v (3.20)
Sh
0
for v H 2 H 1 .
We are now ready to prove error estimates in the H 1 and the L2 norms for the error
u uh of the Galerkin approximation uh Sh of u, the weak solution of (3.1), (3.2).

Theorem 3.1. Let u be the weak solution of (3.1), (3.2) and uh Sh its Galerkin
approximation in Sh . Then,

u uh 1 C h u (3.21)

and u uh C h2 u (3.22)

for some constant C independent of h and u.

69
Proof: (3.21) follows from (3.8) and (3.20). The proof of (3.22) follows from the
0
following duality argument (Nitsche trick). Set e = u uh H 1 . Let be the solution
of the problem
0
B(, v) = (e, v) v H 1 , (3.23)

i.e. the weak solution of the problem (p ) + q = e in (a, b), (a) = (b) = 0.
0 0
We know that exists uniquely in H 1 and in fact H 2 H 1 and that it satises
(elliptic regularity)
2 C e. (3.24)

Put v = e in (3.23); then

e2 = (e, e) = B(, e) = B(e, ) = B(u uh , ) = B(e, )

for any Sh (why?). Take = Ih . Then


by(3.17)
e2 = B(e, Ih ) C1 e1 Ih 1

C1 e1 C h C h e1 e, (by (3.24)).

Hence, e C h e1 and (3.22) follows from (3.21). 


Remarks:
1. Property (3.14) characterizes uniquely the interpolant as an element of Sh . I.e. the
equation
((vh v) , ) = 0 Sh (3.25)
0
(given v H 1 ), has a unique solution vh Sh , which, by Lemma 3.1 (i), coincides
with the interpolant Ih v of v. To prove uniqueness, note that (3.25) may be written
as (vh , ) = (v , ) Sh . Hence, if there existed two such elements vhi , we would
have
1 v 2
=vh
((vh1 vh2 ) , ) = 0 Sh = h
(vh1 vh2 ) = 0 vh1 = vh2

by the PoincareFriedrichs inequality.


Equation (3.25) also states that in our case, Ih v is the projection of v on Sh with
0
respect to the inner product (u , v ) on H 1 . We conclude that

(Ih v v) = inf v .
Sh

70
Hence, by PoincareFriedrichs

Ih v v1 C (Ih v v) C inf v 1 .
Sh

Since, obviously, inf Sh v 1 Ih v v1 , we nally obtain that

inf v 1 Ih v v1 C inf v 1 ,
Sh Sh

i.e. that Ih v is a quasioptimal approximation of v in Sh (just as the Galerkin solution


uh is a quasioptimal approximation of u in Sh ). Hence, nothing essentially was lost
by using the upper bound Ih u u1 of inf Sh u 1 in (3.8) which led to (3.21).
2. It may be proved that the exponents of h in the estimates (right hand sides)

Ih u u1 C1 h u

Ih u u C2 h2 u

uh u1 C1 h u

uh u C2 h2 u ,

i.e. the powers 1 (for H 1 norms) and 2 (for L2 norms) in the errors of Ih u and uh
0
(viewed as approximations of u H 2 H 1 ) are optimal, i.e. they cannot be increased
in general.
We may also derive error estimates in other norms. For example, if the solution u
of (3.1), (3.2) belongs to C 2 [a, b] (if it is a classical solution, that is), we may prove
that there exists a constant C, independent of h and u such that

uh u + huh u C h2 u .

(Again, the interpolant satises a similar estimate. E.g. it is obvious (Lagrange inter-
polation) that Ih u u (h2 /8) u ).
3. Other boundary conditions. Let us consider for example (3.1) with the Neu-
mann b.c.
u (a) = u (b) = 0, (3.26)

where we assume now q(x) > 0, x [a, b] for uniqueness. As the space in which
the weak solution lies now is H 1 , we consider subspaces of H 1 in which we seek the
Galerkin approximation uh of u.

71
Let a = x0 < x1 < . . . < xN +1 = b be an arbitrary partition of [a, b] and let
h = maxi (xi+1 xi ) as before. The nitedimensional space

Sh = { : C[a, b], |(xi ,xi+1 ) P1 }

is a subspace of H 1 and has the basis {i }N +1


i=0 consisting of the hat functions i ,

1 i N , that were dened by (3.12), plus the endpoint halfhat functions 0 ,


N +1

1 1 1 1 1

0 1 i N
N+1

a =x 0 x1 x2 x i1 xi x i+1 x N1 x N x N+1 = b

dened as follows:

0 Sh , 0 (x0 ) = 1, 0 (xj ) = 0, j = 0, N +1 (xN +1 ) = 1, N +1 (xj ) = 0, j = N + 1.

Hence, dimSh = N + 2.
Dening again the interpolant Ih v of a continuous function v on [a, b] by the formula


N +1
(Ih v)(x) = v(xi ) i (x),
i=0

i.e. as the unique element of Sh that satises

(Ih v)(xi ) = v(xi ), 0 i N + 1,

we may prove again that:

(i) ((Ih v) , ) = (v , ) v H 1 , Sh . (3.27)

(The proof is identical to that of Lemma 3.1 (i))

(ii) (Ih v) 2 + (v Ih v) 2 = v 2 v H 1 . (3.28)

(This follows from (i) by putting = Ih v)


With these in mind we may now see that

(v Ih v) v , v H 1 . (3.29)

72
0
(A simple consequence of (3.28)). Since for each i, v Ih v H 1 ((xi , xi+1 )), we have
again, as in Lemma 3.1, that

v Ih v h (v Ih v) v H 1 . (3.30)

As a consequence of (3.29) and (3.30),

v Ih v h v v H 1 . (3.31)

Now, we have that for v H 2 ,

(v Ih v) 2 = (v Ih v, v ). (3.32)

The proof of (3.32) is identical to that given for Ih after the estimate (3.18).
As a consequence of (3.32), we have

v (Ih v) 2 v Ih v v ,

which, when combined with (3.30) yields

v (Ih v) h v v H 2 . (3.33)

Using (3.33) in (3.30) we see that for v H 2

v Ih v h2 v . (3.34)

We conclude therefore that

v Ih v + h (v Ih v) C1 h v v H 1 (3.35)

and
v Ih v + h (v Ih v) C2 h2 v v H 2 . (3.36)

These inequalities are the analogs of (3.16) and (3.17) and lead e.g. to the estimate

inf v 1 v Ih v1 C h v , v H 2 . (3.37)
Sh

The Galerkin approximation uh of u in Sh is dened again as the (unique) element of


Sh that satises
B(uh , ) = (f, ) Sh ,

73
N +1
and may be found again in the form uh = i=0 ci i , where c = [c0 , . . . , cN +1 ]T is
the solution of the linear system A c = fh , where the (N + 2) (N + 2) (symmetric,
positive denite) tridiagonal matrix A is given by Aij = B(j , i ), and fh is the vector
[(f, 0 ), . . . , (f, N +1 )]T . As before, if u is the weak solution of (3.1), (3.26), then

u uh 1 C inf u 1 ,
Sh

which yields, in view of (3.37) the H 1 error estimate

u uh 1 C h u , (3.38)

since u H 2 . The error bound (O(h)) is of optimal rate.


The L2 error estimate (of optimal rate as well)

u uh C h2 u (3.39)

follows again by a duality argument, exactly as in the proof of (3.22) in Theorem 3.1.
(Use is made of the elliptic regularity inequality for the Neumann problem).
Analogously, we may approximate the solution of other twopoint boundaryvalue
problems with (3.1) as the underlying dierential equation. For example, with the
boundary conditions u(a) = 0, u (b) = 0 we use the nite element space (dened on
our usual partition):

{ C[a, b], (a) = 0, |(xi ,xi+1 ) P1 },

which is a nitedimensional subspace of the Hilbert space {v : v H 1 (a, b), v(a) = 0}


etc.
4. A note on implementation. (From BrennerScott, pp. 10-12)
A basic computational problem in the nite element method is the evaluation, or
assembly, of the matrix and the righthand side of the system A c = fh of the Galerkin
equations, and in general, of the inner product (f, v) and the bilinear form
b
B(v, w) = (p(x)v (x)w (x) + q(x)v(x)w(x)) dx
a

given v, w Sh . We will discuss this problem in its present simple 1dimensional


context; the strategy extends in a natural way to the multidimensional case.

74
Given the partition a = x0 < x1 < x2 < . . . < xN < xN +1 = b of [a, b] we will
simply refer to a subinterval Ik = [xk1 , xk ] as an element. Hence in our problem we
have N + 1 elements,

I1 I2 Ie IN+1
a =x 0 x1 x2 x e1 xe xN x N+1 = b

the subintervals Ie = [xe1 , xe ], e = 1, 2, . . . , N + 1. Each element Ie

Ie

j=0 j=1

has two nodes, its endpoints, corresponding to the values of the local node index j = 0
(for the left endpoint xe1 ) and j = 1 (for the right endpoint xe ). The element number
e and the local index j determine the global index i for the node xi . In our case we
have
i = i(e, j) = e + j 1, e = 1, 2, . . . , N + 1, j = 0, 1.

Thus the left endpoint of the element I15 has global index i = 15 + 0 1 = 14, i.e. it
is the node x14 , which of course coincides with the right endpoint of the element I14
(given by e=14, j=1).
Let us consider the basis {i }N +1
i=0 of hat functions, introduced as basis of the

subspace Sh used for the Neumann problem, i.e. the basis for the nite element space
of piecewise linear, continuous functions with no boundary conditions imposed. Each
function v(x) in this subspace may be written as


N +1
N +1
v(x) = v(xi )i (x) = vi i (x), vi v(xi ).
i=0 i=0

There is a particularly eective way of describing v(x) (for the purposes of ecient
assembly of (f, v) and B(v, w)) in terms of its nodal values v(xi ) and the local basis
functions ej , j = 0, 1. The latter are simply the restrictions on [xe1 , xe ] of the basis
functions e1 (x), e (x), respectively, and may be described in terms of two xed

75
v e1 ve
e e

0 1

x e1 Ie xe x

functions 0 , 1 on the reference element [0,1], as follows: First dene 0 , 1 as


1 y if 0 y 1 y if 0 y 1
0 (y) = , 1 (y) =
0 otherwise 0 otherwise.

Associated with Ie = [xe1 , xe ] (dened by the ane transformation x = he y + xe1 ,


0 y 1, where he = xe xe1 , that maps the reference element [0,1] onto Ie ), dene
the local basis functions e0 , e1 as

e0 (x) = 0 (y), e1 (x) = 1 (y), whenever x = he y + xe1 ,

i.e. as ( )
x xe1
ej (x) = j , j = 0, 1.
he
(Consequently, ej (x) = 0 for x Ie ).
A function v(x) in the nite element subspace may be described now as


N +1
1
v(x) = vi(e,j) ej (x). (3.40)
e=1 j=0

If x (xe1 , xe ) (3.40) reduces to

v(x) = ve1 e0 (x) + ve e1 (x),

1 1 2 2
1
0 1 0

x0 I1 x1 I2 x2

76
which is the correct description of v(x) restricted to Ie . However, (3.40) is not a correct
expression at the nodes. For example, at x = x1 , the righthand side of (3.40) is equal
to
vi(1,1) 11 (x1 ) + vi(2,0) 20 (x1 ) = v1 + v1 = 2v1 ,

instead of correct value v1 = v(x1 ). This inconsistency will not aect the assembly of
B(v, w) or (f, v), which involve integrals over the Ie s.
The assembly e.g. of B(v, w) (for v, w in the nite element subspace) is now done
as follows. We write

B(v, w) = Be (v, w), (3.41)
e

where Be (v, w), is a locally dened bilinear expression given by


xe

Be (v, w) = pv w + qvw = (p(x)v (x)w (x) + q(x)v(x)w(x)) dx
Ie xe1

where = d
dx
.
Using (3.40) we write
( ) ( )
xe
Be (v, w) = p(x) vi(e,j) ej (x) wi(e,j) ej (x) dx
xe1 j j
( )( )
xe
+ q(x) vi(e,j) ej (x) wi(e,j) ej (x) dx.
xe1 j j

Using the map x 7 y = (xxe1 )/he we may transform the integrals above to integrals
over the reference element [0,1]:
( ) ( )
1 1
Be (v, w) = p(he y + xe1 ) vi(e,j) j (y) wi(e,j) j (y) dy
he 0 j j
( )( )
1
+ he q(he y + xe1 ) vi(e,j) j (y) wi(e,j) j (y) dy,
0 j j


where now the prime denotes dierentiation with respect to y. Letting pe (y) =
p(he y + xe1 ), qe (y) = q(he y + xe1 ) (
pe and qe depend on e), we may rewrite the above
in matrixvector form

1 wi(e,0) w
Be (v, w) = (vi(e,0) , vi(e,1) ) Se + he (vi(e,0) , vi(e,1) ) Me i(e,0) , (3.42)
he wi(e,1) wi(e,1)

77
where Se is the 2 2 local stiness matrix given by
1
1 2 1
pe (0 ) p 1 1
Se = 01
0 e 0 1
1
= p
e (y) dy
2
p
0 e 1 0
0 e
p ( 1 ) 0 1 1

and Me is the 2 2 local mass matrix dened as


1
1 2
qe 0 q
Me = 10 0 e 0 1
1 .
2
q

0 e 1 0
q

0 e 1

The formula (3.42) is quite eective for computing the local bilinear form Be (v, w). It
reduces to a few matrix vector operations and the evaluation of the integrals in the
entries of Se and Me . These integrals are evaluated on the reference nite element [0, 1],
in practice by some numerical integration rule. For example, use of the trapezoidal rule
1
f (0) + f (1)
f (y) dy
0 2
yields an approximate local stiness matrix Se given by

p(xe1 ) + p(xe ) 1 1
Se =
2 1 1

and a diagonal (lumped) local mass matrix M e (why?)



q(xe1 ) 0
Me = 1 .
2 0 q(xe )

(It may be proved that using e.g. the trapezoidal rule in computing the elements of the
matrix and the righthand side of the Galerkin system A c = fh , yields a new Galerkin
approximation uh that satises again uh u1 = O(h), uh u = O(h2 ), i.e. the
same type optimalrate error estimates).

3.3 An indenite problem


(After Schatz, Math. Comp. 28 (1974), 959-962).
Let k > 0 be a constant and f L2 (a, b). Consider the model twopoint boundary
value problem for the onedimensional Helmholtz equation, given by

u k 2 u = f (x), a < x < b, (3.43)

u(a) = u(b) = 0. (3.44)

78
0
The weak formulation of this problem is to nd u H 1 so that
0
(u , v ) k 2 (u, v) = (f, v) v H1 (3.45)

is satised. The corresponding Galerkinnite element approximation is to seek uh


0
Sh , where Sh a nitedimensional subspace of H 1 (for deniteness let us take Sh as
the space of continuous, piecewise linear functions on the arbitrary partition a = x0 <
x1 < x2 < . . . < xN +1 = b, that vanish at x = a and b), satisfying

(uh , ) k 2 (uh , ) = (f, ) Sh . (3.46)

The bilinear form B(u, v) (u , v ) k 2 (u, v) corresponding to this problem is still


continuous on H 1 H 1 but since B(v, v) = v 2 k 2 v2 , it is not coercive (elliptic) on
H 1 . Hence the LaxMilgram theorem for the existenceuniqueness of the weak solution
of (3.43)(3.44) cannot be used. In fact, a unique solution may even fail to exist. This
2
happens e.g. if f = 0 and k 2 is one of the eigenvalues of dx
d
2 with zero b.c. at x = a

n2 2
and x = b, i.e. one of the numbers n (ba)2
, n = 1, 2, 3, . . .. However if k 2 = n
(equivalently if f = 0 u = 0) we can easily see, by solving (3.43)(3.44) using the
Greens function integral representation of the solution, that a unique classical solution
exists. More generally, we can state the following lemma:
0
Lemma 3.2. Suppose that for f = 0, (3.45) has only the trivial solution u = 0 in H 1 .
0
Then for each f L2 , there exists a unique solution u H 1 of (3.45), which, actually,
0
belongs to H 2 H 1 and satises

u2 C1 (k) f . (3.47)

(In the sequel we shall denote constants that depend on k by C1 (k), C2 (k), . . ..
Constants independent of k will be generically denoted by C.).
We proceed now to analyze the Galerkin approximation uh Sh of u. Because
the bilinear form B(, ) of the problem is indenite, the bilinear system that must be
solved to determine uh in terms of its coecients with respect to the usual hat function
basis {i }N
i=1 may not have a unique solution (its marix is symmetric but indenite).

We shall show however that if h is suciently small, then, provided that k 2 = n ,

79
0
(3.46) has a unique solution uh Sh which converges to u at the optimal rates in H 1
and L2 , as h 0. We will prove the following result:

Theorem 3.2. Suppose that f and u satisfy the conditions of Lemma 3.2. Then, there
exists a constant C2 (k) such that if C2 (k) h < 1, (3.46) has a unique solution uh Sh .
Moreover, for constants C3 (k), C4 (k) we have

u uh 1 C3 (k) h u (3.48)

and u uh C4 (k) h2 u . (3.49)

Proof. Under our hypothesis, (3.45) has a unique solution u. Suppose for the
0
moment that (3.46) has a solution uh . Set e = u uh H 1 . Then, subtracting (3.45)
and (3.46) yields
(e , ) k 2 (e, ) = 0 Sh . (3.50)

Now

e 2 k 2 e2 = (e , e ) k 2 (e, e) = (e , u uh ) k 2 (e, u uh ) = (by (3.50))

= (e , u ) k 2 (e, u).

Hence, by the CauchySchwarz inequality we have

e 2 k 2 e2 e u + k 2 eu.

Applying in the right hand side of the above, the elementary inequality 1
2
2 +
1
2
2 , we see that

1 2 1 k2 k2
e 2 k 2 e2 e + u 2 + e2 + u2 ,
2 2 2 2

from which we obtain that

e 2 3k 2 e2 u 2 + k 2 u2 . (3.51)

Now, still under the hypothesis that uh exists, by a duality argument we may prove

that there is a constant C(k) such that

e C(k)
he . (3.52)

80
0
Indeed, let H 1 be the (unique) solution of the problem
0
( , v ) k 2 (, v) = (e, v) v H 1 . (3.53)
0
By Lemma 3.2 H 2 H 1 and

2 C1 (k)e. (3.54)

Putting v = e in (3.53) we see that

(e, e) = ( , e ) k 2 (, e) = (e , ) k 2 (e, ) = (see (3.50))

= (e , (Ih ) ) k 2 (e, Ih ),

where Ih is the Sh interpolant of . By the estimates of 3.2 we see that

Ih C h2 , (Ih ) C h ,

for a constant C independent of h and (and k). Therefore, by the above

e2 C h e + k 2 e C h2 C h e (1 + Ck 2 h),

where use was made of the Poincare inequality. Hence by (3.54), assuming h < 1, we
have e.g.
e2 C C1 (k) e (1 + Ck 2 ) e h,

whence e C(k)
h e , i.e. that (3.52) holds with C(k)
= C(1 + Ck 2 )C1 (k).
We combine now (3.51) and (3.52) and obtain

e 2 u 2 + k 2 u2 + 3k 2 (C(k))
2 2 2
h e ,

i.e.
( )
1 3k C(k) h e 2 u 2 + k 2 u2 .

2 2 2
(3.55)

Let now C2 (k) =
3 k C(k). It follows from (3.55) that if h is suciently small, so
that C2 (k) h < 1, then
1
e
(u 2 + k 2 u2 ), (3.56)

where 0 < 1 (C2 (k) h)2 .

81
After all these preliminary estimates, we now prove that the linear system repre-
sented by the equations (3.46) has a unique solution. For this purpose, consider the
homogeneous system associated with (3.46), i.e. the equations

(uh , ) k 2 (uh , ) = 0 Sh , (3.57)

and suppose that (3.57) has a solution uh Sh . Let f = 0. Then, by our hypothesis
that (3.45) has a unique solution, u = 0. Let h be suciently small so that C2 (k) h < 1,
i.e. > 0. Then, since uh , a solution of (3.57), may be considered as a Galerkin
approximation of u, all estimates leading to (3.56) hold, and (3.56) gives e 0,
which implies that e = 0, i.e. uh = 0, since u = 0. We conclude that under the
hypothesis C2 (k) h < 1, the homogeneous system of equations represented by (3.57)
has only the trivial solution. Therefore the nonhomogeneous system (3.46) has, for
each f , a unique solution uh , the nodal values of which may be computed e.g. by
Gauss elimination with pivoting of the matrixvector form of (3.46).
We now turn to the proof of the error estimates (3.48) and (3.49). Note that (3.48)

and (3.52) imply (3.49) with C4 (k) = C3 (k)C(k). Hence, it suces to prove (3.48).
We have

e 2 = (u uh ) 2 = u uI + uI uh 2 2u uI 2 + 2uI uh 2 , (3.58)

where uI denotes the Sh interpolant of u, and where use was made of the elementary
inequality ( + )2 22 + 2 2 .
To estimate the second term in the righthand side of (3.58), note that (3.50) with
= uI uh gives

(u uh , uI uh ) k 2 (u uh , uI uh ) = 0

((u uI ) + (uI uh ), uI uh ) k 2 ((u uI ) + (uI uh ), uI uh ) = 0

(u uI , uI uh ) + uI uh 2 k 2 (u uI , uI uh ) k 2 uI uh 2 = 0.

Hence

uI uh 2 k 2 uI uh 2 = (u uI , uI uh ) + k 2 (u uI , uI uh )

u uI uI uh + k 2 u uI uI uh
1 1
u uI 2 + uI uh 2 +
2 2
2
k k2
+ u uI 2 + uI uh 2 .
2 2
82
Hence, we obtain an analog of (3.51), namely that

uI uh 2 3k 2 uI uh 2 u uI 2 + k 2 u uI 2 .

Hence,

uI uh 2 3k 2 uI uh 2 + u uI 2 + k 2 u uI 2

3k 2 uI u + u uh 2 + u uI 2 + k 2 u uI 2
e
z }| {
6k u uh 2 + u uI 2 + 7k 2 u uI 2
2

(3.52)
2 2 2 2

6(C(k)) k h e + u uI 2 + 7k 2 u uI 2 . (3.59)

We use now (3.59) in (3.58), obtaining:

e 2 12k 2 (C(k))
2 2 2
h e + 4u uI 2 + 14k 2 u uI 2

( )
1 12k 2 (C(k))
h e 2 4u uI 2 + 14k 2 u uI 2
2 2

C h2 u 2 + C k 2 h4 u 2 (error of the interpolant)

C (1 + k 2 ) h2 u 2 (if h < 1);

where C is independent of h and u.




Letting now C2 (k) = 2 3 k C(k) (note that C2 = 2C2 ), and supposing that
C2 (k) h < 1 which is slightly stronger than C2 (k) h < 1, the inequality needed
for existence of uh we have that

1
e 2 C (1 + k 2 ) h2 u 2 ,

where = 1 12k 2 (C(k))


2 2
h > 0, i.e.

e C3 (k) hu ,

where C3 = (C (1 + k 2 )/)1/2 .
Hence (3.48) is proved. 

83
3.4 Approximation by Hermite cubic functions and
cubic splines
In this section we will construct two examples of nite dimensional subspaces of H 1 (I)
0
(or H 1 (I)), I = (a, b), consisting of piecewise cubic polynomials. The Galerkin approx-
imation uh of u, the solution of (3.1),(3.2) or of (3.1),(3.24), will have higher order of
accuracy. For example, its L2 error u uh will have an O(h4 ) bound, provided u is
in H 4 (I).

3.4.1 Hermite, piecewise cubic functions


ba
On I = [a, b] we consider, for simplicity, the uniform partition of meshlength h = N +1
,
dened by xi = a + i h, i = 0, . . . , N + 1, and the associated vector space of functions

H = { : C 1 [a, b], is a cubic polynomial on each (xi , xi+1 ), 0 i N }.

The space H is called the vector space of Hermite, piecewise cubic functions on I,
relative to the partition {xi }.
It is not hard to see that H is a (2N + 4)-dimensional subspace of H 2 (I). To
construct a suitable basis for H we dene two functions, V and S on [1, 1] as follows:

V C 1 [1, 1], S C 1 [1, 1],



V [1,0] , V [0,1] P3 , S [1,0] , S [0,1] P3 ,
V (1) = 0, V (1) = 0, V (0) = 1, S(1) = 0, S (1) = 0, S(0) = 0,

V (0) = 0, V (1) = 0, V (1) = 0, S (0) = 1, S(1) = 0, S (1) = 0.

Since a cubic polynomial is uniquely dened by its values and the values of its deriva-
tive at two points, it follows that the functions V and S exist uniquely. In fact, they
are given by the following formulas:

1
V(x)

(x + 1)2 (2x + 1) 1 x 0,
V (x) =
(x 1)2 (2x + 1) 0 x 1,

-1 0 1 x

84
1
S(x)


x(x + 1)2 1 x 0,
S(x) =
x(x 1)2 0 x 1, 1/6

-1 0 1 x

Using V and S we dene the functions {Vj , Sj }, 0 j N + 1, on I as follows:


For j = 1, . . . , N we let:
(
V xxj ) x hS ( xxj ) x
h j1 x xj+1 h j1 x xj+1
Vj (x) = , Sj (x) = .
0 otherwise 0 otherwise

We also dene V0 , VN +1 , S0 , SN +1 as
(
V xa ) a x x hS ( xa ) a x x
h 1 h 1
V0 (x) = , S0 (x) = ,
0 otherwise 0 otherwise
( )
V xb x x b hS ( xb ) x x b
h N h N
VN +1 (x) = , SN +1 (x) = ,
0 otherwise 0 otherwise
We may easily verify that Vj , Sj H, 0 j N +1, and that Vj (xk ) = jk , Vj (xk ) = 0,
Sj (xk ) = 0, Sj (xk ) = jk , 0 j, k N + 1.

V0 V1 Vj VN VN+1
1

x0 =a x1 x2 xj1 xj xj+1 xN1 xN xN+1 =b

S0 S1 Sj SN

x0 =a x1 x2 xj1 xj xj+1 xN1 xN xN+1 =b

SN+1

The 2N + 4 functions {Vj , Sj }, 0 j N + 1, form a basis of H. Indeed, any


function H may be written in the form

N +1
(x) = (xj )Vj (x) + (xj )Sj (x), x I, (3.60)
j=0

85

since, on the interval [xk , xk+1 ] the cubic polynomial [x is uniquely determined
k ,xk+1 ]

by the values (xk ), (xk ), (xk+1 ) and (xk+1 ). On the other hand, the right-hand
side of (3.60) is a cubic polynomial on [xk , xk+1 ], whose values at xk , xx+1 are (xk ),
(xk+1 ), respectively, and whose derivative values are (xk ) and (xk+1 ) as well. In
addition, since the relation


N +1
cj Vj (x) + dj Sj (x) = 0,
j=0

implies (set x = xk , 0 k N + 1) that ck = 0 all k, and that


N +1
cj Vj (x) + dj Sj (x) = 0.
j=0

(from which, similarly, dj = 0 all j), we conclude that the set {Vj , Sj } is linearly
independent.
Note that supp(Vj ) = supp(Sj ) = [xj1 , xj+1 ] for 1 j N . Hence, the functions
Vj , Sj have the minimal support possible in H: A function in H with support in one
interval [xk , xk+1 ] is identically zero.
If we are given now a function f C 1 [a, b], there is a uniquely determined element
f H such that

f (xj ) = f(xj ), f (xj ) = f (xj ), 0 j N + 1. (3.61)

f is called the cubic Hermite interpolant of f (relative to the partition {xi } of I) and
is given by the formula


N +1
f(x) = f (xj )Vj (x) + f (xj )Sj (x), xI. (3.62)
j=0

We shall study the approximation properties of the Hermite interpolant. Denote by


k the norm on the Sobolev space H k = H k (I). Then, the following theorem holds:

Theorem 3.3. There exists C > 0 (independent of h), such that for each f H m ,
m = 2, 3 or 4, we have

f fk 5 C hmk f (m) k k = 0, 1, 2.  (3.63)

86
Before proving it, let us remark that m = 2 since, otherwise, f (xk ) may not have
a meaning. In addition, since f H 2 but f
/ H 3 in general, we take k 2. If f is
taken in H 4 , then we have the best order of accuracy, O(h4k ) in H k , k = 0, 1, 2. This
rate cannot be improved even if f is smoother.
Theorem 3.3 follows from two lemmas:

Lemma 3.3. Let f H 2 and k = 0 or 1. Then we have for 1 j N + 1

Dk (f f)L2 (xj1 ,xj ) h Dk+1 (f f)L2 (xj1 ,xj ) (3.64)

Proof: Since for k = 0 or 1, Dk (f f)(xj ) = 0, all j, we have Dk (f f)(x) =


x ( k+1 )

D (f f ) (t)dt, xj1 x xj .
xj1

Hence, for x [xj1 , xj+1 ], from the Cauchy-Schwartz inequality:


xj
|D (f f)(x)| 5
k
|Dk+1 (f f)| dt 5 hDk+1 (f f)L2 (xj1 ,xj ) .
xj1

Therefore
xj ( )2
D (f f)2L2 (xj1 ,xj ) 5
k
D (f f)
k
dt 5 h2 Dk+1 (f f)2L2 (xj1 ,xj ) ,
xj1

Q.E.D. 

Lemma 3.4. Let f H m , m = 2, 3, or 4. Then

D2 (f f) hm2 Dm f . (3.65)
xj
Proof: For 1 j N +1 we have the orthogonality relation xj1 D2 (f f)D2 f =
( xj
0. This follows from the observation that xj1 D2 (f f)D2 f = (integrating by parts
xj [ ]xj
and using D(f f)(xj ) = 0 all j ) = xj1 D(f f)D3 f = (f f)(x)D3 f(x) +
xj x=xj1

xj1
(f f)D4 f = 0, since (f f)(xj ) = 0, all j, and D4 f [xj1 ,xj ] = 0 since
)
f P3 [xj1 , xj ].
This orthogonality relation implies, for f H m , m = 2, 3, or 4

D2 (f f)2L2 (xj1 ,xj ) D4m (f f)L2 (xj1 ,xj ) Dm f L2 (xj1 ,xj ) , 1j N +1


(3.66)
To see (3.66), consider
xj

D (f f )L2 (xj1 ,xj ) =
2 2
D2 (f f)D2 (f f) = (orthogonality) =
x
xj j1 xj
= D (f f)D f = (1)
2 2 m
D4m (f f)Dm f,
xj1 xj1

87
where the last equality is trivial for m = 2, requires one integration by parts and use
of the fact that D(f f)(xj ) = 0 all j if m = 3, and one more integration by parts and
the fact that (f f)(xj ) = 0 all j, if m = 4. The Cauchy-Schwarz inequality yields
now (3.66).
If m = 2, (3.66) gives D2 (f f)L2 (xj1 ,xj ) D2 f L2 (xj1 ,xj ) . Hence, squaring
and summing over j we get

N +1
N +1
D (f f)2 =
2
D (f f)2L2 (xj1 ,xj )
2
D2 f 2L2 (xj1 ,xj ) = D2 f 2 ,
j=1 j=1

which is (3.65) for m = 2. If m = 3, (3.66) and (3.64) for k = 1, give

D2 (f f)2L2 (xj1 ,xj ) D(f f)L2 (xj1 ,xj ) D3 f L2 (xj1 ,xj )

hD2 (f f)L2 (xj1 ,xj ) D3 f L2 (xj1 ,xj ) .

Hence, D2 (f f)L2 (xj1 ,xj ) hD3 f L2 (xj1 ,xj ) , which gives (3.65) after squaring and
summing over j. Finally, if m = 4, (3.66), and (3.64) for k = 0 yield

D2 (f f)2L2 (xj1 ,xj ) f fL2 (xj1 ,xj ) D4 f L2 (xj1 ,xj )

hD(f f)L2 (xj1 ,xj ) D4 f L2 (xj1 ,xj ) .

Again, by (3.64) for k = 1, we obtain

D2 (f f)2L2 (xj1 ,xj ) h2 D2 (f f)L2 (xj1 ,xj ) D4 f L2 (xj1 ,xj ) ,

i.e. D2 (f f)L2 (xj1 ,xj ) h2 D4 f L2 (xj1 ,xj ) . Squaring and summing we get (3.65).

Proof of Theorem 3.3: It follows from Lemma 3.3 and 3.4 easily. For example,
if m = 4, we have:

k=0: f f (Lemma 3.3) hD(f f) (Lemma 3.3)

h2 D2 (f f) (Lemma 3.4) h4 D4 f .

k=1: (f f) (Lemma 3.3) hD2 (f f) (Lemma 3.4)

h3 D4 f .
( )
Hence f f21 5 (h4 )2 + (h3 )2 D4 f 2 C h6 D4 f 2 .

k = 2 : D2 (f f) (Lemma 3.4) h2 D4 f

f f2 C h2 D4 f .

88
The cases m = 2, 0 k 2, m = 3, 0 k 2, also follow analogously. 
We conclude that given f H m (m = 2, 3, or 4), there exists an element H (we
took = f) such that

2
hk f k C hm Dm f . (3.67)
k=0

We turn now to the Galerkin solution of the 2-pt. b.v.p. for the d.e. (3.1). Considering
rst Neumann boundary conditions, i.e. the b.c. (3.24), we see that uh is the unique
element of H that satises

B(uh , ) = (f, ) H ,

for which, as usual (taking as our Hilbert space H 1 (I) ), there follows that

u uh 1 C inf u 1 .
H

Hence, by (3.67) we conclude that if u H m , m = 2, 3, or 4, then

u uh 1 C hm1 Dm u. (3.68)

By a duality argument (Nitsche trick) it follows, just as in the proof of Thm 3.1 (with
0
H 1 instead of H 1 ), that in L2 we have

u uh C hm Dm u. (3.69)

Hence, the (optimal) rate of accuracy one may achieve for uh is 4 in L2 (3 in H 1 ),


provided u, the solution of (3.1), (3.24), is in H 4 . (If for example p C m1 , q
C m2 , f H m2 for some m 2, it may be shown that the elliptic regularity estimate
(3.4) generalizes and gives that u H m and
)
um 5 Cm f m2 .

In the case of homogeneous Dirichlet boundary conditions, i.e. the problem (3.1)-(3.2),
0 0
we may take as our subspace of H 1 (I) the vector space H= { : H, (a) = (b) = 0}.
0 0
It is easily seen that H is a (2N +2)-dimensional subspace of H 2 H 1 . Its basis consists
of the elements Vj , 1 j N , and Sj , 0 j N + 1. We may dene the interpolant
0
f of a function f H 2 H 1 in the natural way, and show as before that

f fk C hmk Dm f , k = 0, 1, 2 ,

89
0
provided f H m H 1 , and m is 2, 3, or 4. The analogous estimates to (3.68) and
(3.69) still hold, i.e. we have

u uh j C hmj Dm u, j = 0, 1 ,
0
provided u, the solution of (3.1),(3.2), belongs to H m H 1 .

The system that denes the Galerkin equations is now of size (say, for the Neu-
mann problem) (2N + 4) (2N + 4). Ordering the basis vectors {1 , . . . , 2N +4 } as
{V0 , S0 , V1 , S1 , . . . , VN +1 , SN +1 }, we see that e.g. the Gram matrix (with elements
b
) is block-tridiagonal with 2x2 blocks. Its general line of blocks is:
a i j

Vj Vj1 Vj Sj1 (Vj )2 Vj Sj Vj Vj+1 Vj Sj+1

Sj Vj1 Sj Sj1 Sj Vj (Sj )2 Sj Vj+1 Sj Sj+1

The cubic Hermite elements will, then, give an accuracy of O(h4 ) in L2 (if u H 4 )
at the expense of solving a (2N + 4)(2N + 4) 7-diagonal linear system.
We close this section with two remarks:
(i) We may dene Hermite piecewise cubic functions on a general partition a = x0 <
x1 < . . . < xN +1 = b of [a, b]. The associated basis {Vj , Sj }, 0 j N + 1 may be de-
ned in a straightforward way; its members have support in [xj1 , xj+1 ] for 1 j N ,
etc. An interpolant may be dened in the natural way. The estimate (3.63) still holds
with h := max(xj+1 xj ).
j
(ii) The cubic Hermite functions are a special case of the space
{ }
Hm = C m1 [a, b], [xi ,xi+1 ] P2m1 , on which the error of the Galerkin approx-
imation is O(h2m ) in L2 .

3.4.2 Cubic splines

The dimension of H, the space of Hermite cubics is, as we saw, 2N + 4. This, in


particular, leads to rather large linear systems that have to be solved for the Galerkin
approximation uh . A natural idea is to lower the dimension of the space by requiring
more continuity at the nodes xj . In the cubic polynomial case this leads to the space

S := { : C 2 [a, b], [xi ,xi+1 ] P3 , 0 i N },

90
of the so-called cubic splines. (We suppose that the {xi } dene a uniform partition of
[a, b] with x0 = a, xN +1 = b, of meshlength h = (b a)/(N + 1). )
A count of the free parameters and the constraints of functions in S leads to the
conjecture that S is a (N +4)-dimensional subspace of H 3 ((a, b)). To prove this fact,
we shall construct a basis of S, in fact a basis with elements of minimal support.
It is not hard to see that there are no nontrivial elements of S vanishing outside
less than four adjacent mesh intervals. (For example, prove that the only element of
S which vanishes identically outside [xj1 , xj+2 ] is the zero element.) In addition, it is
easy to see that there is a unique function S C 2 [2, 2], such that supp(S) = [2, 2],

S [k,k+1] P3 for k = 2, 1, 0, 1, S(2) = S (2) = S (2) = 0, S(0) = 1. This
function is given by the formula


1
(x + 2)3 2 x 1,

4



1
[1 + 3(x + 1) + 3(x + 1)2 3(x + 1)3 ] 1 x 0,

4
S(x) = 1
[1 + 3(1 x) + 3(1 x)2 3(1 x)3 ] 0 x 1, (3.70)

4




1
(2 x)3 1x2

4

0 x R [2, 2].

S(x)

1 1
4 4

-2 -1 0 1 2

Using S(x), we dene the functions {j }, j = 1, 0, 1, . . . , N + 2, on [a, b] as follows:


We introduce the extra nodes x1 := a h, xN +2 := b + h and put
( )
x xj
j (x) = S , 1 j N + 2 ,
h
[a,b]

( xxj )
(the restriction of S h
to [a, b]). It may be seen immediately that each j , 1
j N + 2, is a cubic polynomial on each interval [xk , xk+1 ], 0 k N , belongs to
C 2 [a, b], and hence j S, 1 j N + 2.

91
Moreover, supp(j ) = [xj2 , xj+2 ], for 2 j N 1 ,

1 vanishes for x [a, b] [x0 , x1 ],

0 vanishes for x [a, b] [x0 , x2 ],

1 vanishes for x [a, b] [x0 , x3 ],

N vanishes for x [a, b] [xN 2 , xN +1 ],

N +1 vanishes for x [a, b] [xN 1 , xN +1 ],

N +2 vanishes for x [a, b] [xN , xN +1 ],

In addition, j (xj ) = 1 for 0 j N + 1.

0 1 2 j N 1 N N +1
1 1

1 1
4 4
1 N +2
x0 = a x1 x2 x3 x4 x j2 x j1 xj x j+1 x j+2 x N 3 x N 2 x N 1 xN x N +1 = b

The N + 4 functions j , 1 j N + 2, are known as B-splines, since they form a


basis of S. We shall show this in a series of lemmata:

Lemma 3.5. The {j }N +2


j=1 are linearly independent.

Proof: Suppose that for real constants cj , 1 j N + 2, we have



N +2
cj j (x) = 0, x [a, b] .
j=1

Then, in particular we have



N +2
cj j (xk ) = 0, k = 0, . . . , N + 1 , (3.71a)
j=1


N +2
and cj j (xk ) = 0, k = 0, . . . , N + 1 . (3.71b)
j=1

Using the facts that







1 if j = k ,
j (xk ) = 1
if |j k| = 1 ,

4

0 if |j k| = 2 ,

92
we may write the relation (3.71a) as

1 1
4
ck1 + ck + 4
ck+1 = 0, k = 0, . . . , N + 1. (3.72)

Now, using the fact that S is even (about zero) and therefore that S is an odd function,
we conclude from the denition of j that

j (xj ) = 0, j (xj1 ) = j (xj+1 ), j (xk ) = 0, |j k| 2 .

Therefore, (3.71b) for k = 0 gives

c1 1 (x0 ) + c1 1 (x0 ) = 0 c1 = c1 . (3.73a)

In addition, (3.71b) for k = N + 1 gives

cN N (xN +1 ) + cN +2 N +2 (xN +1 ) = 0 cN = cN +2 . (3.73b)

The relations (3.73a) and (3.73b) are used to eliminate the unknowns c1 and cN +2
from the linear system (3.72), which takes the form

Ac = 0, (3.74)

where c = [c0 , . . . , cN +1 ]T and A is the (N + 2)(N + 2) tridiagonal matrix




4 2 0


1 4 1
.. ... ...
A=
. .


1 4 1

0 2 4

The matrix A is strictly diagonally dominant and therefore is invertible by Gerschgorins


lemma. Therefore, cj = 0, 0 j N + 1, since c solves the homogeneous problem
(3.74). By (3.73a) and (3.73b) cN +2 = 0, c1 = 0. We conclude that the {j },
1 j N + 2, are linearly independent. 

We let now M := 1 , . . . , N +2 be the subspace of S which is spanned by the elements


of the set {j }N
j=1 . We shall show that in fact M = S , thus completing the proof
+2

that {j }N
j=1 is a basis for S. This will be accomplished by two intermediate results
+2

of interpolation:

93
Lemma 3.6. Let f C 1 [a, b]. Then, there exists a unique element f M satisfying
the interpolation conditions


f(xj ) = f (xj ), 0 j N + 1,


f (a) = f (a), (3.75)



f (b) = f (b)

Proof:
The functions {j }N +2 N +2 cj j M
j=1 form a basis for M . We seek therefore a function f = j=1

which satises (3.75), i.e. the linear system (for its coecients cj ):

N+2

cj j (xk ) = f (xk ) , 0 k N + 1,



j=1

(3.76)
c1 1 (a) + c1 1 (a) = f (a) ,





cN N (b) + cN +2 N +2 (b) = f (b) .

The linear system (3.76) has a unique solution, since the associated homogeneous
system (obtained by putting f (xk ) = 0, 0 k N + 1, f (a) = f (b) = 0 in (3.76)),
has only the trivial solution as we argued in the proof of Lemma 3.5. We conclude that
there is a unique element f M satisfying the interpolation conditions (3.75). 

Lemma 3.7. Let f C 1 [a, b]. Then, there is a unique element s S which satises
the analogous to (3.75) interpolation conditions

s(xj ) = f (xj ), 0 j N + 1,



s (a) = f (a) , (3.77)




s (b) = f (b) .

Proof: See Akrivis-Dougalis Introduction to Numerical Analysis, Thm. 4.7.


The function s(x) is explicitly constructed by the values of its second derivative at the
points xj , 0 j N + 1. 
We may now conclude that the {i } span S i.e. that M = S:

Theorem 3.4. The {j }N


j=1 are a basis of S.
+2

Proof: In view of Lemma 3.5 we need only to show that S M . Let f S.


Then, by Lemma 3.6, there is a unique element f M that satises the interpolation
conditions (3.75). But M S, hence f S. However, by Lemma 3.7 there is a unique

94
element of S that satises the interpolation conditions (3.77) which are the same as
(3.75). This element coincides with f since f S. We conclude that f = f. Hence
f M , i.e. S M . 

Given f C 1 [a, b], we call f(= s) its cubic spline interpolant (with rst derivative
endpoint conditions f (a) = f (a), f (b) = f (b).) It is not hard to see that if we are
given f (a), f (b), we may construct a cubic spline interpolant f satisfying, in addition
to the conditions f(xj ) = f (xj ), 0 j N + 1, the endpoint second derivative
boundary conditions f (a) = f (a), f (b) = f (b). Similarly, for periodic f C 1 [a, b]
we may construct the periodic spline interpolant etc. All these interpolant functions
satisfy error estimates of O(h4 ) accuracy in L2 provided f H 4 (I). For example,
we shall prove the following result for the interpolant with rst-derivative boundary
conditions:

Theorem 3.5. There exists a constant C > 0 (independent of h), such that for each
f H m , m = 2, 3, or 4, we have

f fk C hmk f (m) k , k = 0, 1, 2.  (3.78)

As before , we shall prove the theorem using two intermediate results. First, we
prove the analog of Lemma 3.3; but now the norms are global, i.e. they are the L2 (a, b)
norms.

Lemma 3.8. Let f H 2 and k = 0 or 1. Then, there exists a constant C independent


of h and f , such that

Dk (f f) C h Dk+1 (f f) . (3.79)

Proof: For k = 0 we may prove the local result as well. Since f coincides with f at
the nodes xj , 0 j N + 1, we have, for x [xj1 , xj ], for some 1 j N + 1, that
x

f (x) f (x) = D(f f)(y) dy .
xj1

Hence, for xj1 x xj , using the Cauchy-Schwarz inequality,


( ) 12
x xj ( )2
|(f f)(x)| xj1 |D(f f)(y)| dy (xj xj1 ) xj1 D(f f)(y) dy

h D(f f)L2 (xj1 ,xj ) .

95
Therefore,
f fL2 (xj1 ,xj ) 5 h D(f f)L2 (xj1 ,xj ) , (3.80)

from which, by squaring and summing, we get (3.79) for k = 0 with C = 1.


To prove (3.79) for k = 1, we note that since

(xj ) = 0, 0 j N + 1,

where := f f, there follows, by Rolles theorem, that there exist j (xj1 , xj ),


1 j N + 1, where (j ) = 0.
0 1 j j+1 N+1 N+2

x0 =a x1 xj1 xj xj+1 xN xN+1 =b

(Note that H 2 [a, b]; hence, C 1 [a, b]). Note also that, by denition of f,
(x0 ) = (xN +1 ) = 0. Introduce then 0 := x0 and N +2 := xN +1 , so that (j ) = 0,
x
0 j N + 2. Therefore, for x [j1 , j ], (x) = j1 (y) dy. Hence | (x)|

(x j1 ) L2 (j1 ,j ) 5 (j j1 ) L2 (j1 ,j ) 5 2h L2 (j1 ,j ) . We con-
clude that

L2 (j1 ,j ) (j j1 )(2h) 2L2 (j1 ,j ) 5 4h2 2L2 (j1 ,j ) .

Hence,

N +2
N +2
2
= 2L2 (j1 ,j ) 4h2
2L2 (j1 ,j ) = 4h2 2 .
j=1 j=1

We conclude that (3.79) is valid for k = 1 as well, with C = 2. 

We proceed now to the analog of Lemma 3.4:

Lemma 3.9. Let f H m , m = 2, 3 or 4. Then, for some C independent of h and f :

D2 (f f) 5 C hm2 Dm f . (3.81)

Proof: First we prove the orthogonality relation


b
D2 (f f)D2 f = 0 . (3.82)
a

We have, using the endpoint condition f (a) = f (a), f (b) = f (b),


b [ ]b b

2
D (f f )D f = (f f ) f
2 (f f) f
a a a
b
= (f f) f . (Note that f H 3 (a, b)) .
a

96
N
{ }
b xj+1
N xj+1
Now (f f) f = (f f) f = [(f f)f ]xxjj+1 (f f)D4 f =
a j=0 xj j=0 xj

0, since (f f)(xj ) = 0, 0 j N + 1, and f P3 in each [xj , xj+1 ]. Thus, (3.82) is


proved. Using now (3.82) we have by Cauchy-Schwarz
b b
2
D (f f ) =
2
D (f f )D (f f ) =
2 2
D2 (f f)D2 f 5 D2 (f f)D2 f .
a a

Hence, (3.81) holds, if m = 2, with C = 1. If now m = 3, we have, integrating by parts


once and using the endpoint derivative conditions:
b [ ]b b
2
D (f f ) =
2
D (f f )D f = D(f f )D f
2 2 2
(f f) D3 f
a a a
b
= ((f f) D3 f (f f) D3 f ,
a

from which, by (3.79) k = 1,D2 (f f)2 5 C h D2 (f f) D3 f , which shows that


(3.81) holds if m = 3. Finally, if m = 4, we may integrate by parts once more; using
the interpolation conditions f (a) = f(a), f (b) = f(b), we have:
b [ ]b b
3
D (f f ) =
2 2
(f f ) D f = (f f )D f +3
(f f)D4 f
a a a
b
= (f f)D4 f f f D4 f C h2 D2 (f f) D4 f .
a

(In the last step, we used Lemma 3.8 twice). We conclude that (3.81) holds as well for
m = 4. 

Proof of Theorem 3.5: It is straightforward to see that (3.78) follows from the results
of Lemmata 3.8 and 3.9 . 

We conclude that given f H m , (m = 2, 3, or 4), there exists an element S (take


= f), such that

2
hk f k C hm Dm f . (3.83)
k=0

As before, it follows for the Galerkin approximation uh to u, the solution of the b.v.p.
(3.1),(3.24), that
u uh k C hmk Dm u , (3.84)

for k = 0, 1, provided u H m , m = 2, 3 or 4. In the case of homogeneous Dirichlet


0
b.c.s, i.e. of the problem (3.1),(3.2), we take as subspace of H 1 (I) the vector space

97
0 0
S = { : S, (a) = (b) = 0} . It is easily seen that S is a (N + 2)-dimensional
0
subspace of H 1 H 3 . A basis of this subspace consists of the previously dened B-
0
splines j , 2 j N 1, plus four functions 0 , 1 , N , N +1 S , taken as
(independent) linear combinations of the 1 , 0 , 1 and N , N +1 , N +2 B-splines,
which are such that 0 (a) = 0, 1 (a) = 0, N (b) = 0, N+1 (b) = 0. For example, we
take
0 := 0 41 , 1 := 1 1 etc .
0 0
Given f H 1 H 2 , we construct again an interpolant f S satisfying f(xi ) = f (xi ),
0 i N + 1, and f (x0 ) = f (x0 ), f (xN +1 ) = f (xN +1 ), for example. The error
estimates are entirely analogous. (The linear system that denes the Galerkin equations
has now a seven-diagonal, banded matrix B(i , j ). )
As before, we may prove the error estimates (3.83), (3.84) etc. for cubic splines
dened on a general partition {xj } of [a, b]. Also, one one may dene higher-order
smooth spline spaces, as follows: For m = 2 we let
{ }
S(m)
= : C [a, b], [xi ,xi+1 ] P2m1 ,
m

in which we may prove L2 error estimates for the Galerkin approximations of O(h2m ).

98
Chapter 4

Results from the Theory of Sobolev


Spaces and the Variational
Formulation of Elliptic
BoundaryValue Problems in RN

In this chapter we let be a bounded domain in RN (i.e. an open, connected, bounded


point set in RN ). Many of the results that we shall quote hold for arbitrary open sets
in but we do not strive for generality here, having in mind applications in the case of
boundaryvalue problems on bounded domains. Let x = (x1 , . . . , xN ) denote a generic
point of or RN . All functions are realvalued.

4.1 The Sobolev space H 1().


Denition. The Sobolev space H 1 = H 1 () is dened by

H 1 () = {u L2 () : g1 , g2 , . . . , gN L2 (), such that




u = gi , Cc (), i : 1 i N }.
x i

For u H 1 we denote gi = u
xi
and call gi the weak (generalized) partial derivative of
u with respect to xi .
(Note e.g. that this denition does not need that be bounded).

99
Remarks.

(i) Using Lemma 2.4 we conclude that each generalized partial derivative gi in the
above denition is unique, as in the case of one dimension.

(ii) Again, Cc1 () may be used in place of Cc () for the test functions .

(iii) It is clear that if u C 1 () L2 () and if the classical partial derivatives u


xi

belong to L2 () for i = 1, 2, . . . , N , then the weak derivatives exist and coincide


with the classical; in particular u H 1 (). Of course C 1 () H 1 (). It can be
shown with some care that, inversely, if u H 1 () C() and if the generalized
derivatives u
xi
belong to C() for 1 i N , then u C 1 ().

(iv) Since u L2 () u L1loc (), we can dene the distributional derivatives of


u
u, xi
, in the sense of the theory of distributions. We can say that H 1 is the
set of elements of L2 () whose distributional derivatives u
xi
, 1 i N , are
represented by functions in L2 .

It is clear that H 1 is a subspace of L2 . Denoting by (, ), the inner product,


respectively the norm, on L2 (), we introduce the quantities:

N ( )
u v
(u, v)1 = (u, v) + , , u, v H 1 ,
i=1
x i x i
( ) 21
N
u
u1 = u2 + 2 , u H 1,
i=1
x i

which clearly dene an inner product, resp. (the induced) norm, on H 1 . Hence H 1 is
a normed linear space.

Theorem 4.1. The space (H 1 (), 1 ) is a Hilbert space.

Proof. Adapt the 1dim. proof. 


Remark. (H 1 (), 1 ) is separable.
In the case of one dimension we had shown (Theorem 2.4) that the restrictions on
= I of functions in Cc (R) form a dense set in H 1 . In more than one dimensions
this is not true for an arbitrary . We list below several density results:

100
1. (Friedrichs) Let u H 1 (). Then, there exists a sequence {un } Cc (RN )
such that:

() un | u in L2 ()
un u
() i : | | in L2 () for every precompact .
xi xi

2. (Meyers Serrin) If u H 1 (), then sequence {un } C () H 1 () such


that un u in H 1 ().

3. If is an arbitrary open (or even open and bounded) set and if u H 1 (), in
general there does not exist a sequence un Cc1 (RN ) such that un | u in
H 1 (). (The problem is at the boundary; however, compare with Theorem 4.2
below).

With the aid e.g. of Friedrichs result (1), above, we can show the following analog
of Proposition 2.1:

Proposition 4.1. If u, v H 1 () L (), then uv H 1 L and

u v
(uv) = v+u , 1 i N.
xi xi xi

As in the case of one dimension, many properties of H 1 are established easier if


= RN . Then an extension result is needed to establish the same property for an
RN . Such extensions (of functions i.e. of H 1 () to functions of H 1 (RN ))are not
always possible to construct, unless the set has a regular boundary in a certain
sense.
Denition. Let be a bounded domain in RN . We say that is of class C 1 if there
exist nitely many open balls Bi RN , i = 1, 2, . . . , M such that

i=1 Bi , Bi = .
(i) M

(ii) There is for each i, 1 i M , a function y = f (i) (x) in C 1 (Bi ) which maps the
ball Bi in an onetoone and onto way onto a domain in RN so that Bi gets
mapped onto a subset of the hyperplane yN = 0 and Bi into a simply connected

( in) the halfspace {y : yN > 0}. Moreover the Jacobian determinant


domain
(i)
f
det xkl does not vanish for x Bi .

101
The C 1 property of permits for example the advertized extension result:

Proposition 4.2. Suppose that is of class C 1 (or that = RN + = {x RN : xN >


0}). Then there exists a linear extension operator P : H 1 () H 1 (RN ) such that,

(i) P u| = u u H 1 ().

(ii) C such that P u L2 (IRN ) C u L2 () , u H 1 ().

(iii) C such that P u H 1 (IRN ) C u H 1 () u H 1 ().

Using e.g. this extension one may show the following important density result:

Theorem 4.2. If is of class C 1 and u H 1 (), then, there exists a sequence


un Cc (RN ) such that un | u in H 1 . That is to say, the restriction to of
functions in Cc (RN ) are dense in H 1 ().

For the purposes of studying boundaryvalue problems, it is important to study the


behavior of functions in H 1 () at the boundary of . We suppose to this eect that
is of class C 1 . Then we can measure (in the sense of the above denition of the C 1
domain) the content of pieces on the hypersurface (through measuring plane
surface pieces on the hyperplane yN = 0, i.e. measuring the area of the images of the
pieces of that are mapped on yN = 0 by the functions y = f (i) (x)). One may dene
open sets on as intersections of and open sets in RN . These open sets on one
can then complete into a algebra and then extend the elementary content measure
into the Lebesque measure on . With respect to this measure we may dene the

surface integral g(y)dy. Consequently, one may consider the Hilbert space L2 ()
of functions dened on with norm
( ) 12
g L2 () 2
g (y) dy for g L2 ().

One may show rst that the following result holds for smooth functions:

Lemma 4.1. Let be of class C 1 . Then there exists a constant C such that for all
functions f C () we have

f L2 () C f H 1 () .

102
This lemma, along with Theorem 4.2, permits us to dene boundary values for
functions in H 1 () for C 1 domains . Let f H 1 (). Then fn C () such that
fn f in H 1 () (Theorem 4.2). By Lemma 4.1 {fn } is Cauchy in L2 (). So fn g
in L2 (). (It is easily seen that this g is independent of the chosen sequence fn ).
This limiting function g we denote by f | and call it boundary value on of f ,
or trace of f on (sometimes we say that g = f | is the boundary value of f in the
sense of trace). We summarize in the following theorem.

Theorem 4.3 (Trace theorem). Let be of class C 1 . Then, every f H 1 () possesses


boundary values in the above sense (also denoted by f ), which belong to the Hilbert space
L2 (). Moreover there exists a constant C such that

f L2 () C f H 1 () f H 1 ().

Remarks.

(i) It can be shown that under our hypotheses on , the boundary value f | of
a function f H 1 () actually belongs to the socalled fractionalorder space
H 1/2 (), an intermediate space dened by interpolation between the spaces
H 0 () L2 () and H 1 ().

(ii) Let be a C 1 domain. Then we may show that Greens formula (Gausss theo-
rem) holds: i : 1 i N

u v
v= u + u v i dy
xi xi

for u, v H 1 (). Here dy is the surface Lebesgue measure constructed on


as above and i = n ei is the ith component of the unit outward normal n(y)
dened on the boundary of the C 1 domain . (Note that since u, v H 1 ()
u
, v
xi xi
L2 (), u, v L2 () and all terms in the above equality make
sense).

4.2 The Sobolev space H 1().


0
Denition. We dene the space H 1 () as the completion of Cc () with respect to
the H 1 () norm.

103
( )
0
Hence H (), 1
1
is a Hilbert space (a closed subspace of H 1 ). (We may show
that the completion of Cc (RN ) under the H 1 (IRN ) norm is H 1 (RN ) itself, i.e. that
0 0
H 1 (RN ) = H 1 (RN ). But for RN we have in general that H 1 () H 1 ()). More
0
precisely, we may show that for suciently smooth (e.g. C 1 ) then H 1 () consists
precisely of those functions in H 1 () which vanish (in the sense of trace) on .

Theorem 4.4. Let be of class C 1 . Then


0
H 1 () = {v H 1 () : v| = 0},

(where by v| = 0 we mean that the trace of v on (a function in L2 ()), is equal


to the zero function in L2 ()).
0
On H 1 () we also have the analog of Proposition 2.3:

Proposition 4.3 (Inequality of PoincareFriedrichs). Let be a bounded domain.


Then, there exists a constant C = C () such that
( N ) 21
u 2 0
u C u H 1 ().
i=1
xi
( )1/2 0
In particular the expression x
u

N 2
i=1 is a norm on H 1 (), equivalent to the
(N u v )
i
0 0
norm 1 on H 1 (). The quantity 1
i=1 xi xi dx is an inner product on H ().

0
Remark. As in the 1dim case, if is of class C 1 , then u H 1 () if and only if
the extension
u(x) if x
u(x) =
0 if x RN \
u u

belongs to H 1 (RN ). (In such a case also xi
= xi
).

4.3 The Sobolev spaces H m(), m = 2, 3, 4, . . .


For m 2 an integer we can dene the spaces H m () recursively by

u
H m = H m () = {u H m1 () : H m1 (), i = 1, 2, . . . , N }.
xi

104
We introduce some notation. A multiindex = (1 , . . . , N ) is an N vector of non-

negative integers i 0, 1 i N . If is a multiindex we let | |= Ni=1 i . Then,

the partial derivatives of a function of N variables may be denoted by

||
D = .
x1 1 . . . xNN
It follows that H m is the set:

H m = H m () = {u L2 () : with | | m, g L2 () such that



||

u D = (1) g , Cc ()}.

We call g the generalized partial derivative of order of u and denote g = D u.


The space H m with the norm
21

um = D u 2 ,
0||m

induced by the inner product



(u, v)m = (D u, D v)
0||m

is a Hilbert space. One may show that if is suciently regular (C 1 will certainly
suce), then the norm m on H m () is equivalent to the norm
21

u2 + D u 2 .
||=m

In eect one may show that , 0 <| | m and > 0, there exists a constant
C = C(, , ) such that the interpolation inequality

D u D u +C u , u H m ()
||=m

holds.
Now, since u H m D u H 1 () for each : 0 | | m 1, we can dene by
the trace theorem, boundary values on (for , say, of class C 1 ) for all derivatives
D u, 0 | | m 1 of u. In this sense we can dene e.g. the normal derivative on
of a function u H 2 () as the linear combination

u u
N
= | ni ,
n i=1
xi

105
where n(y) is the unit outer normal on . For u H 2 (), u
n
L2 (). One has
another formula of Greens too:
N
u v u
u v = v dy , u, v H 2 ().
i=1 xi xi n

0
One may dene again H m () as the completion of Cc () in the m norm. For
suciently smooth (e.g. for a domain of class C m replace C 1 by C m in the denition
0
of C 1 domain) one may show that H m () is equal to the subspace of H m () for which
0
u H m () u H m () and D u| = 0 (in the sense of trace), 0 | | m 1.
Again we note the dierence between the spaces
0
H 2 H 1 = {u H 2 : u| = 0}

and
0 u
H 2 = {u H 2 : u| = | = 0}.
xi

4.4 Sobolevs inequalities.


In one dimension we had proved that H 1 (I) C(I) for a bounded interval I and that
uL (I) C uH 1 (I) . In more than one dimensions this is no longer true. There is
a wealth of imbedding theorems of which we quote two results:

Theorem 4.5 (Sobolev). Let be of class C 1 . Then

(a) If N = 2, H 1 Lp p, 1 p < .
If N > 2, H 1 Lp where 1 p 2N
N 2
.

(b) If m > N
2
we have H m () C k () where 0 k < m N
2
and

sup | D u(x) | C um u H m ().


x, 0||k

This theorem tells us that if N > 1 the functions in H 1 () are no longer continuous
(in the sense of a.e. equality as usual). For example if N = 2 we need to go to H 2 ()
to obtain continuous functions in . As a counterexample in this direction we may
verify that the function
( )
1 1 1
u = log with 0 < < , = {x R2 : | x |< }
|x| 2 2
belongs to H 1 () but it is not bounded because of the singularity at x = 0.

106
4.5 Variational formulation of some elliptic boundary
value problems.

4.5.1 (a) Homogeneous Dirichlet boundary conditions.

We consider the following problem. Let RN be a C 1 domain. We seek a function


u : R satisfying

N
2
u + u = f in , = (4.1)
i=1
x2i
u = 0 on , (4.2)

where f is a given function on . The boundary condition u = 0 on is called


homogeneous (zero) Dirichlet b.c.
Denition Let f C(). Then, a classical solution of (4.1), (4.2) is a function
u C 2 () satisfying the P.D.E. (4.1) and the b.c. (4.2) in the usual (pointwise)
0
sense. Let now f L2 (). Then, a weak solution of (4.1), (4.2) is a function u H 1
which satises the weak form of (4.1), (4.2), i.e. the relation
( N
)
u v 0
+ u v dx = f v v H 1 . (4.3)
i=1
xi xi

(i) A classical solution of (4.1), (4.2) is a weak solution.


Let f C() and let u C 2 () be a classical solution of (4.1), (4.2). Then, since
0
u H 1 () and u = 0 on , it follows that u H 1 . Multiplying u + u = f by a
function Cc () we have, using Greens theorem (cf. p. 88) that
( N
)
u
+ u dx = f
i=1
xi xi

0
holds. Since now Cc () is dense in (H 1 , 1 ) (4.3) follows by the above by approxi-
0
mating in H 1 , v H 1 () by a sequence i Cc (). Hence u is a weak solution.
(ii) Existence and uniqueness of the weak solution.
0 0
Let f L2 () and consider the bilinear form a(v, w) dened on H 1 H 1 by
( N
)
v w
a(v, w) = + v w dx. (4.4)
i=1
x i x i

107
0 0
By our hypotheses, a(, ) is a bilinear, symmetric form on H 1 H 1 . Moreover, for
0
v, w H 1 we have
N

v w
a(v, w) | |+ | vw |
i=1 xi xi

N
v w
+ v w
i=1
xi xi
( N ) 12 ( N ) 21
v w
2 2 + v 1 w 1
i=1
xi i=1
xi
2 v 1 w 1 , (4.5)
0 0
i.e. that a(v, w) is continuous on H 1 H 1 . Also
N
N

v 2 v 2
a(v, v) = ( ) + v
2
( )
i=1 xi i=1 xi
c v21 , c>0 (4.6)

using the PoincareFriedrichs inequality. Hence a(, ) satises the hypotheses of the
0
LaxMilgram theorem on the Hilbert space H 1 . Since f 7 f v is a bounded linear
0 0
functional on H 1 , it follows that (4.3) has a unique solution u H 1 that satises

u1 C f .

(Note that by the symmetry of a(, ), the weak solution can be also characterized as
0
the (unique) element of H 1 that solves the minimization problem

J(u) = inf J(v),


0
vH 1 ()

where (N
)
1 v 2
J(v) = ( ) +v 2
f v.
2 i=1
x i

This is Dirichlets principle).


0 0
(Note also that simply a(v, w) = (v, w)1 on H 1 H 1 . Hence an appeal to the Riesz
theorem would solve the problem. Of course, a(v, w) v1 w 1 and a(v, v) = v21
too. But the proof of (4.5), (4.6) above indicates the general way of proving (4.5) and
(4.6) in the case e.g. of a positive denite form with variable coecients).
(iii) Regularity of the weak solution.

108
0
If f L2 () and is a domain of class C 2 and if u H 1 is a weak solution of (4.1),
(4.2), i.e. a solution of (4.3), then one may show that in fact u H 2 () and that
u2 C f (elliptic regularity estimate). Here C is a constant depending only
on . More generally, if is of class C m+2 and if f H m (), then u H m+2 () and
um+2 Cm f m . (In particular, using Sobolevs Theorem 4.5 we may conclude
that if m > N
2
, then u C 2 ()).
(iv) If the weak solution is in C 2 (), then it is classical.
Let u, the weak solution of (4,1), (4.2) be in C 2 () and let f C(). Since u
0
H 1 C 2 () we conclude that u| = 0 (in the classical sense). Applying Greens
formula we have now from (4.3)

(u + u) v dx = f v v Cc ()

from which u + u f = 0 a.e. in since Cc () is dense in L2 (). We conclude,


by our hypotheses that u + u = f x . Hence u is a classical solution.
Remarks

(i) The above discussion extends with no extra diculties to the case of a linear,
selfadjoint elliptic operator with variable coecients. Consider the problem of
nding u : R such that
N
u
(aij ) + a0 u = f in (4.7)
i,j=1
xi xj
u = 0 on , (4.8)

where we suppose that aij (x) = aji (x) are functions in C 1 () such that the matrix
aij is symmetric and uniformly positive denite, i.e. that the ellipticity condition

N
N
aij (x)i j i2
i,j=1 i=1

holds for some > 0 x , RN . We also suppose that a0 C() and that
a0 (x) 0 on . We dene a classical solution of (4.7), (4.8) to be a function
u C 2 () satisfying (4.7), (4.8) in the usual sense, while a weak solution is an
0
element of H 1 satisfying
( )
u v 0
A(u, v) = aij + a0 uv = (f, v) v H 1 . (4.9)
i,j
xi xi

109
0
For f L2 () we may show that (4.9) has a unique solution u H 1 : It is easy
0 0
to see that A(u, v) C1 u1 v1 u, v H 1 and A(u, v) C2 v21 v H 1 for
C1 , C2 > 0. The regularity in H 2 also holds under our hypotheses on aij , a0 . In
general, for m 1 if f H m (), aij C m+1 (), a0 C m () yields u H m+2 ()
(elliptic regularity).

(ii) The fact that the weak solution of (4.1), (4.2) is in C 2 () if f H m with m > N
2

follows from Sobolevs theorem. There is a sharper theory based on Schauders


estimates which states that if is of class C 2,a (Holder spaces) with 0 < a < 1 and
f C 0,a (), then u C 2,a (), unique solution of (4.1), (4.2) in the classical
sense. Moreover of is of class C m+2 () (m 1 integer) and f C m,a (), then
u C m+2,a () and an analogous elliptic regularity result holds. Here

| u(x) u(y) |
C 0,a () = {u C(), sup < }
x,y,x=y | x y |a
C m,a () = {u C m (), D u C 0,a () : | |= m}.

4.5.2 (b) Homogeneous Neumann boundary conditions.

We now consider the problem of nding u : R such that

u + u = f in (4.10)
u
= 0 on (4.11)
n

with f given on . As usual u


n
= u
n is the normal derivative at the boundary

(again is of class C 1 ). A classical solution of (4.10), (4.11) (for f C())is a C 2 ()


function u satisfying (4.10), (4.11) in the classical sense. A weak solution is an element
u H 1 () satisfying (for f L2 () say)
( N
)
u v
a(u, v) + u v dx = fv v H 1 (). (4.12)
i=1
xi xi

(i) Every classical solution is weak.


Let u C 2 () be a classical solution of (4.10), (4.11). Then Greens formula gives
N
u v u
u v = + v dy v C ().
i=1 xi xi n

110
Hence (4.10), (4.11) yields that
( N
)
u v
+ uv = f v v C ()
i=1
xi xi

and density of C () in H 1 () yields then that (4.12) is satised.


(ii) An immediate application of the LaxMilgram theorem yields that there exists a
unique weak solution.
(iii) More rened theory yields again the regularity of the weak solution. Exactly the
same results as in the Dirichlet b.c. case hold.
(iv) If the weak solution is in C 2 (), then it is classical.
For if this case we have by (4.12) that (f C() here)

u
(u + u) v + v dy = f v v C ().
n

Choosing v Cc () we obtain as in the Dirichlet b.c. case that u + u = f in .


u
It follows that n v dy = 0 v C () n
u
= 0 on .

111
Chapter 5

The Galerkin Finite Element


Method for Elliptic
BoundaryValue Problems

5.1 Introduction
Let be a bounded domain in RN . Consider the boundaryvalue problem of nding
u = u(x), x , such that
Lu = f, x
(5.1)
u = 0, x
where L is the elliptic operator given by
N ( )
u
Lu = aij (x) + a0 (x) u.
i,j=1
xi xj

As in the previous chapter, we assume that for 1 i, j N aij (x) = aji (x), x ,
that c > 0 such that

N
N
aij (x)i j c i2 RN ,
i,j=1 i=1

that aij , a0 are suciently smooth functions of x, and that a0 0 in . Under


these hypotheses, and if f L2 (), we have shown that there exists a weak solution
0
u H 1 () of (5.1) satisfying
0
B(u, v) = (f, v) v H 1 , (5.2)

112
where (
N
)
u v
B(u, v) = aij (x) + a0 (x)uv dx.
i,j=1
xi xj

We shall assume that the data are such that the unique solution u of (5.2) belongs to
0
H 2 H 1 and satises the elliptic regularity estimate that for some C > 0, independent
of f and u, we have
u2 C f . (5.3)

As we have seen in Chapters 1 and 3, the (standard) Galerkin method for approximating
the solution u of (5.2) amounts to constructing a family of nitedimensional subspaces
0
Sh of H 1 , say for 0 < h < 1, and seeking uh Sh satisfying the linear system of
equations
B(uh , vh ) = (f, vh ) vh Sh . (5.4)

Under our hypotheses, we have seen that a unique solution uh of (5.4) exists and
satises
u uh 1 C inf u 1 (5.5)
Sh

for some constant C independent of h. Assuming e.g. that


0
inf ( v + h v 1 ) C h2 v2 v H 2 H 1 , (5.6)
Sh

we obtain from (5.5) the optimalrate H 1 error estimate

u uh 1 C hu2 . (5.7)

The L2 error estimate is obtained again by the Nitsche trick, by letting e = u uh


0
and considering w H 2 H 1 , the solution of the problem
0
B(w, v) = (e, v) v H 1 . (5.8)

Then, e 2 = (e, e) = B(w, e) = B(e, w) = B(e, w ) for any Sh we used (5.2)


and (5.4) . By the continuity of B in H 1 H 1 we have then
by(5.6)
e 2 C e 1 w 1 C e 1 h w 2 C h e 1 e .

Hence e C h e 1 C h2 u2 by (5.7)

113
In general, assuming that for some r 2 (integer) we have
0
inf ( v + h v 1 ) C hr vr v H r H 1 , (5.9)
Sh

0
and that the weak solution u of (5.2) is in H r H 1 , we see that (5.5) and the Nitsche
argument give
u uh + h u uh 1 C hr ur . (5.10)
0
Hence, our task is to construct subspaces of H 1 (endowed with bases of small support
so that the linear system (5.4) is sparse) so that (5.6) or, in general, (5.9) holds. In
what follows we shall consider the subspace of piecewise linear, continuous functions
on a polygonal domain in R2 (subdivided into triangular elements).

5.2 Piecewise linear, continuous functions on a tri-


angulation of a plane polygonal domain
(This section is largely based on Ciarlet (1978), chapter 3).
We consider a convex polygonal domain in R2 and the elliptic b.v.p. (5.1) associated
with it. Although is not C 1 , it is known that (5.3) still holds.
We subdivide into triangles i , 1 i M , M = M (h), forming a triangulation
Th = {i } of . We assume that the i are open and disjoint, that maxi (diam i ) h,
0 < h < 1, and that = Int(M
i=1 i ). The vertices of the triangles are called nodes

of the triangulation. We shall assume that Th is such that there are no nodes in the
interior of the interior sides of triangles:

1
4 1
3
2
3 2

NO YES

We let now Sh be the vector space of continuous functions on that are linear on
each i and vanish on , i.e. let

Sh = { C(), |i = i + i x + i y (i.e. P1 (i )), | = 0}.

114
Let N = N (h) be the number of interior nodes Pi of the triangulation (i.e. those
nodes are not on ). Since these points which are not collinear dene a single plane,
it follows that an element Sh is uniquely dened by its values (Pj ), 1 j N .
(This means that dimSh = N ). A suitable basis of Sh (for our purposes) consists of
the functions i Sh , 1 i N , such that

i (Pj ) = ij , 1 i, j N.

1
i

It is clear that the support of i consists exactly of those triangles of Th that share

Pi as a vertex. The i are linearly independent, since if N i=1 ci i (x) = 0, then putting

x = xj yields cj = 0, 1 j N . Moreover, given Sh , we can write


N
(x) = (Pj ) j (x), x , (5.11)
j=1

since both sides of (5.11) are elements of Sh that coincide at the interior nodes Pj ,
1 j N . Hence, {i }N
i=1 form a basis for Sh .
0
Sh is a subspace of H 1 (): Obviously, Sh L2 () and the elements of Sh vanish
(pointwise) on . Hence, to show that v Sh belongs to H 1 (), it suces to prove
that there exist gi L2 (), i = 1, 2, such that


v = gi Cc (), i = 1, 2. (5.12)
x i

Let v ( ) be the restriction of v to Th . For i = 1, 2 let gi L2 () be dened as the


piecewise constant function given by

( )
gi = (v ) i = 1, 2, if x . (5.13)
xi

115
There follows that for Cc ()

(Gauss theorem on )
v dx = v dx = v ( ) dx =
x i
Th x i
Th x i
( )
= gi dx + v ( ) i dy,
Th Th

( ) ( )
where ( ) = (1 , 2 )T is the unit outward normal on the boundary of .

()


The rst term of the righthand side of the above equality is equal to
gi dx, where
gi L2 () was dened (piecewise) by (5.13). The second term vanishes. To see this,
note that the second term is eventually the sum of terms

( )
v ( ) i dy
s

where s any side of any triangle . If s , then the corresponding term is zero
since = 0 on . If s = AB is an interior side, suppose it is the common side of two
adjacent triangles 1 and 2 .

A
2
(2) (1)
s
1
B


In that case, there are precisely two terms in the sum s: s s
. . . involving AB,
namely
(1 ) ( ) (2 )
v i 1 dy and v (2 ) i dy,
AB AB

which cancel each other since v (1 ) |AB = v (2 ) |AB (v is continuous in as an element


of Sh ), Cc (), and i
(1 ) (2 )
= i , i = 1, 2. We conclude then that (5.12) holds,
i.e. that v H 1 (), q.e.d.

116
Given v C(), v | = 0, we dene the interpolant Ih v of v in Sh , as the unique
element Ih v of Sh that coincides with v at the interior nodes Pi , 1 i N , of the
triangulation Th , i.e. as


N
(Ih v)(x) = v(Pi ) i (x), x .
i=1

0
It will be the objective of this section to show that for v H 2 () H 1 () (note that
v C(), v | = 0), we have, for some constant C independent of h and v:

v Ih v + h v Ih v 1 C h2 | v |2, . (5.14)

(Given v H m (), where is a subdomain of , we dene

| v |0, = v L2 ()
( ) 21
v 2 v 2
| v |1, = 2 + 2
x1 L () x2 L ()
..
.
21
1 +2
| v |m, = D v 2L2 () , D = .
x1 1 x2 2
||=m

Then | v |m, is in general a seminorm on H m ()).


If (5.14) is established, then (5.6) holds and, as a consequence, we have our optimal
order L2 and H 1 error estimates for u uh .
The estimate (5.14) will be proved as a consequence of two facts:
(i) A local L2 and H 1 estimate for the interpolant:
Given a function v C( ), where is the triangle with vertices P1 , P2 , P3 dene
the (local) interpolant I v P1 ( ) as the unique linear polynomial in x1 , x2 on such
that
(I v)(Pi ) = v(Pi ), i = 1, 2, 3.

Note that if v C(), then Ih v | = I v.


Following Ciarlet, we shall prove that there exists a constant C, independent of
and Th , such that for each v H 2 ( ), Th

h2
| v I v |m, C | v |2, , m = 0, 1, (5.15)
m

117
where h = diam the length of the largest side of the triangle , and is the
diameter of the inscribed circle in the triangle .

(ii) The regularity of the triangulation:


We shall assume that the triangulation Th is regular, in the sense that there exists
a constant > 0, independent of and Th , such that
h
Th . (5.16)

(The regularity condition (5.16) essentially states that h = max h 0, i.e. the
triangulation is rened, if and only if max 0 i.e. all triangles tend to become
h
points and not needles, (for which
would become unbounded). It may be shown
that (5.16) is equivalent to requiring that there exists a 0 > 0 independent of Th
such that 0 Th , where is the minimum (interior) angle of . It is also
equivalent to requiring that c0 , independent of Th , such that ( ) c0 h2 Th ,
where ( ) = area( )).
Indeed, if (5.15) and (5.16) hold, we have
1 1 m
m m , m = 0, 1,
h h
so that (5.15) gives

| v I v |m, Cm h2m
| v |2, , m = 0, 1, v H 2 ( ), (5.17)

for constants C0 , C1 independent of Th .


0
We conclude then for v H 2 () H 1 () ( v C() by Sobolevs theorem)

v Ih v 2 v Ih v 2L2 () = | v I v |20, C02 h4 | v |22,
Th Th

C02 4
h |v |22, = C02 h |v4
|22, .
Th

118
Hence
v Ih v C h2 | v |2, (5.18)

for some constant C independent of Th .


In addition, analogously

| v Ih v |21, = | v I v |21, C12 h2 | v |22,
Th Th

C12 h 2
|v |22, = C12 h2 | v |22, .
Th

Therefore
| v Ih v |1, C h | v |2, , (5.19)

for some C independent of Th .


The estimates (5.18) and (5.19) yield then (5.14) as advertized. We turn then to
proving (5.15). To accomplish this we shall need a series of results and denitions:
be two bounded domains in RN . We say that and
Denition: Let , are anely

equivalent if there exists an invertible ane map F : x RN 7 F (


x) = B x + b RN
= .
such that F ()
In the denition of the invertible ane map B is an N N invertible matrix of
constants and b RN . Hence, if x RN , x = F 1 (x) B 1 x B 1 b.
Hence x
x = F (
x) and if v :
R is a realvalued function dened on
then dening v = v F 1 : R, we have if x , x = F (
, x), i.e. if x = F 1 (x),
that
v F 1 )(x) = v(F 1 (x)) = v(
v(x) = ( x).

(Note that if v = v F 1 : R, then v = v F :


R).

Using this notation we may prove:

be two anely equivalent bounded domains in RN and let F be


Lemma 5.1. Let ,
= . Then v H m (), m 0 integer, if
the associated ane map such that F ()
and only if v = v F H m ().
Moreover, there exist constants C and C depending

only on m and N such that

| v |m, C | B |m | detB | 2 | v |m,


1
(5.20)

| v |m, C | B 1 |m | detB | 2 | v |m, .


1
and (5.21)

119
Here | B | denotes the matrix norm induced by the Euclidean vector norm in RN , i.e.
( N ) 12
| Bx |
| B |= sup , where | x |= x2i .
IRN x=0 |x| i=1

Proof: We shall prove (5.20) for m = 0 and m = 1. Recall that whenever x =


x) = B x + b, i.e. whenever x = B 1 x + c, c = B 1 b, then v(
F ( x) = v(x). Hence

2
v (
x) d
x= v 2 (x) | J | dx,

where J is the determinant of the Jacobian matrix of the transformation x = B 1 x + c.


Hence J = det(B 1 ) = (det(B))1 and we conclude that

1
v 0, =
2 2
v ( x = | detB |
x) d v 2 (x) dx = | detB |1 v20, ,

which implies (5.20) with m = 0, C = 1.


Let now m = 1 and v H 1 (). x) and 1 i N ,
Then, if x = F (

v v N
xj v
N
Fj (
x)
(
x) = (v(x)) = (x) = (x) ,
xi xi j=1
xj xi j=1
xj xi

N Fj (
x)
x)
where xj = Fj ( k=1 Bjk xk + bj . Hence x
i
= Bji and we conclude that

v v N
(
x) = (x) Bji , 1 i N,
xi j=1
x j

which we may write as


b v )(
( x) = B T (v)(x), (5.22)

where ( )T ( )T
bv = v v v v
,..., and v = ,..., .
x1 xN x1 xN
From (5.22), taking Euclidean norms we see that

b v | | B T || v |=| B || v |
| (5.23)

since | B T |=| B |. (To see this, recall that



| B |= max i (B T B) = max i (BB T ) =| B T |,
i i

since BB T and B T B have the same eigenvalues).

120
Hence, using (5.23) we have

N ( )2
v b v | d
| v |21, = (
x) dx= | 2
x | B |2
| v |2 | J | dx

i=1
xi

= | B |2 | detB |1 | v |21, ,

which is (5.20) for m = 1.


The rest of the proof is analogous. 
we let
Given two bounded, anely equivalent domains and

^ F
h

^

y^
^
F(z)

^
^z ^
F(y)

h

0

= diam
h sup | x

y |, = diameter of the inscribed ball in
= sup{diamS,
S
x y
,
and h, be the corresponding quantities for .
is a ball contained in },

be two anely equivalent bounded domains in RN such that


Lemma 5.2. Let ,
F (
= F (), x) = B x + b. Then

h
h
| B | , | B 1 | . (5.24)

Proof: An easy scaling argument yields that

1
| B |= sup | B | .
IRN ,||=

Now given RN such that | |= , we may nd two points y, z


such that

y z = (see Fig. 5.6). For these points we have that F (


y ) F (
z ) = B y B z = B,
with F ( z ) . Hence supIRN ,||= | B | h. We conclude that | B |
y ), F ( h

. 

121
Let us consider now our triangulation Th = { } of the polygonal domain in R2 .

^
Q Q
3 3
F

^
Q
x^ x 2
^ ^
Q Q
1 2
Q
1

It is clear that any triangle of the triangulation is anely equivalent to a xed


reference triangle . (E.g. we can take to be the triangle with vertices (0,0), (0,1),
(1,0). Indeed we have for each Th

F (
) =

x) = B x + b , x , depending on . (We construct


for some invertible ane map F (
i ), 1 i 3, where Q
the map F by requiring that Qi = F (Q i , Qi , i = 1, 2, 3, are

the vertices of , , respectively. These three 2vector equations determine uniquely


the six constants which are the entries of the matrix B and the vector b . Then we
may easily check that Q1 Q2 = F (Q1 Q2 ) etc., and that each point x in is mapped
onto a uniquely dened point x in , and that each point x is mapped onto
a corresponding point x on . The idea is to work on and obtain corresponding
estimates on (such as the required (5.15)) by using the properties of the interpolant in
spaces of piecewise linear continuous functions as well as the scaling and transformation
inequalities of Lemmata 5.1 and 5.2.
To this eect, dene the interpolant I on the reference triangle as the map
) P1 (
I : C( ) such that

i ) = v(Q
(I v)(Q i ), 1 i 3. (5.25)

for any continuous realvalued function v dened on . Our basic step towards proving
(5.15) is the following

Lemma 5.3. Let v H 2 (


). Then, there exists a constant C(
) such that for m =
0, 1, 2
| v I v |m, C(
) | v |2, . (5.26)

122
Proof: The estimates (5.26) will be proved as a consequence of three important
facts:
(i) The linear map I preserves linear polynomials, i.e.
p P1 (
), I p = p.
This is obvious and implies that for any v H 2 (
) and any p P1 (
)

v I v = v p I v + I p = (
v p) I (
v p). (5.27)

). By this we mean that there exists a constant C = C(


(ii) I is stable on H 2 ( )

such that
I w
2, C(
) w
2, w
H 2 (
). (5.28)

(To see this, let i P1 (


) be the hat basis functions associated with the vertices
i of , i.e. let i P1 (
Q ) be dened for i = 1, 2, 3 by the relations

i (Q
j ) = ij , 1 i, j 3.

Then, for w H 2 (
)

3
I w
2, = w(
Q 1 )1 + w(
Q 2 )2 + w(
Q 3 )3 2, | w(
Q i ) | i 2,
i=1

C (
) max | w(
x) | C(
) w
2, , by Sobolevs theorem in R2 ).

x

As a consequence of (5.27) and (5.28) note that for m = 0, 1, 2 and v H 2 (


)

| v I v |m, | v p |m, + | I (
v p) |m, v p 2, + I (
v p) 2,

v p 2, +C(
) v p 2,
) v p
C(
2, p P1 (
). (5.29)

We invoke now the


(iii) BrambleHilbert Lemma, which in our case asserts that there exists a constant
C (
) such that:

min v p 2, C (
) | v |2,
v H 2 (
). (5.30)
pP1 (
)

(We postpone for the moment the proof of this important result; we shall prove it later
in more generality).
Putting together (5.29) and (5.30) yields now

) min v p 2, C(
| v I v |m, C( ) | v |2, ,
pP1 (
)

123
which is the advertized result (5.26). 
Before turning to the proof of (5.30) we rst complete the argument and show how
(5.26) implies (5.15). To do this, recall that if w is dened on , then for x = F (
x) we
have w( F1 , dened on
x) = w(x), where w is the image function, i.e. where w = w
\
. Note that the operation is linear, in the sense that u + v = v , , R.
u +
\
(Indeed (u + v)(
x) = (u + v)(x) = u(x) + v(x) =
u(
x) +
v (
x)).
Hence, if I v is the (local) linear interpolant on P1 ( ) of v H 2 ( ), we have

(v I v) = v Ic
v. (5.31)

Now, Ic
v = I v
. To see this, note that for i = 1, 2, 3

(Ic
v)(Qi ) = (I v)(Qi ) = v(Qi ) = v
i ) = (I v)(Q
(Q i)

i.e. Ic
v and I v
(which are realvalued functions dened on ) coincide at the vertices
Q ) and (Ic
i of . However I v P1 ( v)( x)) P1 (
x) = (I v)(x) = (I v)(F ( ). Hence
(Ic
v)( x)
x) = (I v)( x . Therefore, (5.31) gives that

(v I v) = v I v. (5.32)

Now, using (5.21) for m = 0, 1, 2, we have (h = diam(


))

| v I v |m, C | B1 |m | detB | 2 | (v I v) |m,


1

hm 1
(by (5.24), (5.26), (5.32)) C m

| detB | 2 C( ) | v |2,

1
C (
1
) m | detB | 2 | v |2,

1
(using again (5.20)) C ( ) m | detB | 2 | B |2 | detB | 2 | v |2,
1 1


h2 1
(by (5.24)) C ( ) m 2 | v |2,

h2
C ( ) m | v |2, ,

which is (5.15), since C (


) C is a constant depending only on the xed reference
triangle and is, hence, independent of Th .
We nally turn to proving (5.30). This will be done in some generality in the
following Proposition in which is assumed to be a bounded, Lipschitz domain in RN .

124
Prpoposition 5.1 (BrambleHilbert/DenyLions). For k 1 there exists a constant
Ck = Ck () such that for every u H k ()

min u + p k Ck | u |k . (5.33)
pPk1

Remark: (5.33) may be viewed as a type of Taylors theorem. Also, note that
obviously | u |k minpPk1 u + p k . Hence (5.33) implies that | |k is a norm,
equivalent to the quotient norm minpPk1 + p k on the space H k /Pk1 . (From now
on H k = H k ()).
Proof. The estimate (5.33) is a direct consequence of two facts:

(i) For each u H k there exists a unique q Pk1 such that



: | | k 1,
D q dx = D u dx. (5.34)

(ii) There exists Ck = Ck () such that u H k :


1
( )2 2
uk Ck | u |2k + D u dx . (5.35)

||<k

Indeed, if (i) and (ii) hold, then for u H k , with q as in (i),


1
(ii) ( )2 2
min u + p k u q k Ck | u q |2k + (D u D q) dx
pPk1
||<k
(i)
= Ck | u q |k = Ck | u |k

since q Pk1 D q = 0 for | |= k.


Therefore, (5.33) is a consequence of (i) and (ii).
We now prove (i) and (ii).
(i). Let u H k be given. We shall construct a polynomial q Pk1 such that the
relations (5.34) hold. Let q be of the form


k1
k1
q(x) = c x
c1 ...N x1 1 x2 2 . . . xNN .
m=0 ||=m m=0 ||=m

We shall determine the unknown coecients c , | | k 1, from the relations (5.34)


which represent a linear system of equations for the c of size Mk Mk where Mk is the

125
number of multiindices = (1 , 2 , . . . , N ) with | | k 1. We rst determine the
coecients c multiplying the terms x of highest degree, i.e. the c with | |= k 1.
For a multiindex with | |= k 1 we have


D q = D c x + D c x = c !,
: ||=k1 : ||<k1
| {z }
Pk2

where ! 1 !2 ! . . . N ! for = (1 , 2 , . . . , N ).
(To see the last equality, consider any one term D (C x ) of the sum


D c x .
: ||=k1

Hence, for such a term, | |=| |= k 1. If = then D (C x ) = 0. For if


= there must exist an index j, 1 j N , such that j = j . Then for the
( ) j

corresponding factor in D x we will have x j xj j = 0. Now, if = we have
( ) 1 ( ) N
D (x ) = D (x ) = x 1 x1 1 . . . xN xNN = 1 !2 ! . . . N ! !).
Hence, the relation (5.34) for such that | |= k 1 yield


D u dx =
D q dx = c ! dx = c ! ()


D u dx
c =
, | |= k 1, (5.36)
! ()

i.e. the coecients of q with | |= k 1 have been determined.



Write now qk1 (x) = ||=k1 c x (qk1 is now known). Hence

q(x) = c x + qk1 + s(x),
||=k2

where s Pk3 , from which, for | |= k 2, as before

D q = c ! + D qk1 .

Therefore, (5.34) for | |= k 2 yield




D u dx = c ! () + D qk1 dx

from which the c , | |= k 2 are determined. We continue in the same fashion to


determine all c .

126
Note that the polynomial q satisfying (5.34) is necessarily unique: if two such

q1 , q2 Pk1 exist, then q = q1 q2 will satisfy D q dx = 0, | | k 1, which is a
homogeneous linear system for the associated c . The formulas (5.36) would yield now
c = 0, | |= k 1. Therefore qk1 = 0, i.e. c = 0, | |= k 2, and so on, implying
nally that q = 0.
(ii). We argue by contradiction: Suppose (5.36) does not hold. This means that for
any constant C > 0 there exists a u H k such that
1
( ) 2 2
uk > C | u |k + 2
D u dx ,

||<k

i.e. such that


{ ( )2 } 12
| u |2k ||<k
D u dx
C + < 1.
u2k u2k
This implies that for any constant C > 0 there exists a v H k with vk = 1 (take
v = u/uk ) such that
1
( )2 2
C | v |2k + D v dx < 1.

||<k

Take C = n, n = 1, 2, . . .. Hence, there exists a sequence {un } of functions in H k with


un k = 1, such that
( )2
1
| un |2k +
D un dx < 2 . (5.37)
n
||<k

We now use the fact (Rellichs theorem cf. Adams) that for a domain such as
(in fact for any bounded, Lipschitz domain) and for k 1, H k may be compactly
imbedded in H k1 , in the sense that every bounded subset of H k is relatively compact
when viewed as a subset of H k1 . This means that every bounded sequence in H k has
a subsequence which converges in the H k1 norm. Therefore the bounded sequence
un H k ( un k = 1) has a subsequence, which we denote again by un without loss of
generality, and which converges in H k1 . But (5.37) yields that | un |k 0, n ,
i.e. that D un 0 in L2 for | |= k. Since un converges in H k1 already, we conclude
that un converges in H k . Let the limit of {un } in H k be denoted by w. Since un k = 1
w k = 1. But since D un 0 in L2 , | |= k, we conclude that D w = 0, | |= k,
i.e. w Pk1 .

127
( )2
Now, (5.37) also yields ||<k
D un dx 0, n , i.e. that

D un dx 0, for | | k 1.


We conclude, since un w in H k , that D w = 0 for | | k 1. Since the

polynomial w Pk1 satises D w = 0 for | | k 1, the construction in (i)
yields that w = 0, a contradiction, since w k = 1; q.e.d. 

5.3 Implementation of the nite element method


with P1 triangles
In this section we shall study the details of the implementation of the standard Galerkin
/ nite element method for a simple elliptic boundaryvalue problem on a polygonal
plane domain, based on the ideas of MODULEF, cf. Bernadou et al., 1985. We seek
u(x) = u(x1 , x2 ) dened on , where is a convex, polygonal domain in R2 , and
satisfying
u + a(x)u = f (x), x ,
(5.38)
u = 0, x .

Here f , a are given, say continuous, functions on with a 0. The weak formulation
0 0
of the problem is, as usual, to seek u H 1 = H 1 (), such that

0
(u v + a(x) u v) dx = f v dx, v H 1 . (5.39)

0
Let Sh be a nitedimensional subspace of H 1 . The standard Galerkin method for the
approximation of the solution of (5.39) consists in seeking uh Sh such that

(uh vh + a(x) uh vh ) dx = f vh dx, vh Sh . (5.40)

We take Sh to be the space of continuous functions on that vanish on and are


polynomials of degree at most 1 on each triangle of a triangulation Th of . (For
notation cf. Section 5.2). Accordingly, we refer to (5.40) as the standard Galerkin /
nite element method with P1 triangles.

128
(i) Local degrees of freedom.


P
3



P
2
P
1

On each triangle Th , uh is represented in terms of the local basis functions j (x),


j = 1, 2, 3, (which are P1 polynomials on such that j (Pi ) = ij , 1 i, j 3) as

3
uh (x) = j (x) uh (Pj ), x . (5.41)
j=1

The values {uh (Pj )}, 1 j 3, coecients of j (x) in the linear combination of the
{j (x)} in the r.h.s. of (5.41), are, in our case, the local degrees of freedom that
determine uh (x) uniquely on . (In general, there are N degrees of freedom not
all function values of uh necessarily on each triangle . In our case N = 3 ).
Introducing the 1 3 matrix of local basis functions = (x) by

= [1 (x), 2 (x), 3 (x)] (5.42)

and the 3 1 vector


U = [uh (P1 ), uh (P2 ), uh (P3 )]T (5.43)

of the local degrees of freedom, we may rewrite (5.41) as

uh (x) = (x) U . (5.44)

Let uh = [ u
x1
h uh T
x2
] denote the gradient of uh . Then, (5.41) gives

uh j
3
= uh (Pj ), i = 1, 2,
xi j=1
x i

i.e. that
uh (x) = D (x) U , x , (5.45)

where D = D (x), x , denotes the 2 3 matrix




1 2 3
D =
x1 x1 x1 . (5.46)
1 2 3
x2 x2 x2

129
The Galerkin equations (5.40) become, since = Th ,

(uh vh + a(x) uh vh ) dx = f (x) vh dx, vh Sh . (5.47)
Th Th

We shall write (5.47) in matrixvector form in terms of the local basis functions and
the local degrees of freedom {U } and {V } of uh and vh , respectively. Writing

uh = U , vh = V , x ,

we have

uh vh = ( U )( V ) = (V )T ( )T U , x .

uh vh = (D U ) (D V ) = (D V )T (D U )

= (V )T (D )T D U , x .

f vh = f V = ( V )T = (V )T ( )T f, x .

Using these expressions in (5.47) we have


{ }
(V )T (D )T D U + a(x)(V )T ( )T U dx =
Th

= (V )T ( )T f (x) dx, vh Sh . (5.48)
Th

Let K , M denote, respectively, the 3 3 local stiness and mass matrix. These are
given by the formulas

K := (D )T D dx,

(5.49)

M := a(x) ( )T dx. (5.50)

Let also A := K + M , and b be the 3 1 vector



b := f (x) ( )T dx.

(5.51)

Using these local quantities in (5.48) yields the desired matrixvector form of (5.47):

(V )T A U = (V )T b , vh Sh . (5.52)
Th Th

130
(ii) The global - to - local degrees of freedom map.


P
3
{Pi
3



P
P
1 {P2
{P i2
i1

Suppose that the triangulation Th consists of N nodes (vertices) (including the nodes
on the boundary ), that are denoted by Pi , i = 1, 2, . . . , N , in some global indexing
scheme. Then the vertices P1 , P2 , P3 of of a given triangle Th correspond to the
points Pi1 , Pi2 , Pi3 , respectively, in the global enumeration. We would like to nd an
ecient way of expressing the correspondence Pik Pk . Specically, we are realy
interested in expressing the local degrees of freedom, i.e. in our case the values uh (Pk ),
k = 1, 2, 3, in terms of the global degrees of freedom, i.e. the values uh (Pi ), i =
1, 2, . . . , N .
Let us consider an example: let be the rectangle (0, ) (0, ). Subdivide it in 20
rectangles of size x1 x2 , where x1 = 5 , x2 =
4
and then in 40 equal triangles
as shown, by bisecting the rectangles.

(0,) 18 17 16 15 14 13

32 34 36 38 40 P
3
31 33 35 37 39 {P10
19 9 10 11 12 30
22 24 26 28 30
21 23 25 27 29
20 5 6 7 8 29 = 25
12 14 16 18 20
11 13 15 17 19 P P
1 2
21 1 2 3 4 28 {P {P
2 4 6 8 10 6 7
1 3 5 7 9
22
23 24 25 26 27
(0,0) (,0)

We number the triangles from 1 to 40 as shown and introduce the following global
indexing scheme for the nodes: The interior nodes are the points Pi shown, with
i = 1, 2, . . . , 12, and the boundary nodes are the points Pi , i = 13, . . . , 30. Here N = 30,

131
therefore. For example, the triangle with = 25 is dened by the nodes P6 , P7 and P10
in the global numbering. These coincide with the local nodes P1 , P2 , P3 , ( = 25),
respectively. For = 25, let the local degrees of freedom be represented by the 3 1
vector U = [U1 , U2 , U3 ]T (Uj = uh (Pj ), j = 1, 2, 3). The global degrees of freedom
uh (Pi ), i = 1, 2, . . . , N = 30, are arranged in the 30 1 vector U = [U1 , . . . , U30 ]T .
Clearly, we have U = G U , where G is a 3 30 matrix whose elements are 0 or 1
(Boolean matrix). In our case, i.e. for = 25, we have

U1

6th 7th 10th
U1 0 0 0 0 0 1 0 0 0 0 0 ... 0 U2


U2 = 0 0 0 0 0 0 1 0 0 0 0 ... 0 U3 .

..

U3 0 0 0 0 0 0 0 0 0 1 0 ... 0 .

U30

In general, let Th be the triangulation consisting of triangles { } that we label, abusing


notation a bit, as = 1, 2, . . . , J. For each , let

i = g(, j)

be the map that associates the local index j, 1 j 3, of the local vertices Pj to the
global index i, 1 i 30, of the corresponding points Pi . Then g is a function dened
on the set {1, 2, . . . , J} {1, 2, 3} with values onto the set {1, 2, . . . , N } that can be
easily stored. In our example, the values of i for = 25 and j = 1, 2, 3 are

g(25, 1) = 6,

g(25, 2) = 7,

g(25, 3) = 10.

Let G , for each , denote the 3 N matrix whose elements are given by

Gkl

= g(,k),l 1 k 3, 1 l N, (5.53)

where i,j is the Kronecker delta, i.e. i,j = 1 if i = j, i,j = 0 if i = j. Then, the
relation between the global degrees of freedom vector U , Ui = uh (Pi ), 1 i N , and
the local degrees of freedom vector U on , Uj = uh (Pj ), j = 1, 2, 3, is expressed as

U = G U. (5.54)

132
Substituting (5.54), and the corresponding expression V = G V , into (5.52), we have

V T (G )T A G U = V T (G )T b , vh Sh
Th Th

or, since V , U do not depend on ,


{ }

VT (G )T A G U = V T (G )T b , vh Sh .
Th Th

Hence, (5.52) in terms of global degrees of freedom may be written as

V T A U = V T b, V RN such that vh Sh , (5.55)

where A is the N N matrix dened by



A := (G )T A G , (5.56)
Th

and b the N 1 vector given by



b := (G )T b . (5.57)
Th

(iii) Assembly of A and b.


The matrix A and the vector b dened by (5.56) and (5.57) should be assembled from
their local contributions A and b . In doing this we should not form G and perform
the indicated matrixmatrix and matrixvector operations, since this would be very
costly in terms of storage and number of operations. Instead, recalling the denition
(5.53) of G , we have, for the vector b = {bi }, 1 i N :
( 3 ) ( 3 )
( )
bi = (G )T b i = Gji

bj = g(,j),i bj .
Th Th j=1 Th j=1

We conclude that the following algorithm computes the bi :



For i = 1, 2, . . . , N do:

b =0
i
For i = 1, 2, . . . , N do:


For Th do:


For j = 1, 2, 3 do:

bi = bi + g(,j),i bj .

133
Since g(,j),i = 0 unless i = g(, j), the last term contributes only to the bi with
i = g(, j). Hence, the above algorithm can be written, more compactly, in the form:

For i = 1, 2, . . . , N do:

b =0
i
For Th do: (5.58)


For j = 1, 2, 3 do:

bg(,j) = bg(,j) + bj ,

i.e. the bj is added to the previous value of that bi that has i = g(, j). Similarly by
(5.56) we have
( ) ( )

3
3
Aij = Gki

Akl Glj = g(,k),i Akl g(,l),j ,
Th k,l=1 Th k,l=1

i.e. the term Akl contributes to the element Aij if i = g(, k) and j = g(, l). Conse-
quently, the following algorithm may be used to assemble the matrix A:

For i = 1, 2, . . . , N do:


For j = 1, 2, . . . , N do:

Aij = 0

For Th do: (5.59)


For k = 1, 2, 3 do:


For l = 1, 2, 3 do:

Ag(,k),g(,l) = Ag(,k),g(,l) + Akl .

The algorithms (5.58) and (5.59) implement the assembly of the (global) matrix A and
vector b in the equations (5.55) from their local parts A and b .
(iv) Reduction of (5.55) to a linear system of equations.
The components of the global vector of degrees of freedom U RN , may be ordered
(although this is not necessary always) so that U be of the form

U = [U1 , U2 , . . . , UN N0 , UN N0 +1 , . . . , UN ]T ,

so that the degrees of freedom U1 , U2 , . . . , UN N0 are the values of uh at the interior


nodes P1 , P2 , . . . , PN N0 and UN N0 +1 , . . . , UN are the values of uh at the N0 boundary
nodes PN N0 +1 , . . . , PN . (In the example on p. 114, N0 = 18, N = 30).

134
Then, writing

UI
U = , with UI RNN0 , UII RN0 ,
UII

and partitioning conformably V , A and b in (5.55), we can write it in the form



AI,I AI,II U b
[VIT VIIT ] I = [VIT VIIT ] I , vh Sh . (5.60)
AII,I AII,II UII bII

Since Sh is such that vh Sh VI RNN0 , VII = 0 in RN0 , and since uh Sh as


well, we rewrite the above as

VIT AI,I UI = VIT bI VI RNN0 ,

which is of course equivalent to the (N N0 ) (N N0 ) system

AI,I UI = bI . (5.61)

The reduction of the N N system (5.60) to the system (5.61) is usually referred to
as taking into account the boundary conditions of the problem.
(v) Computing K , M and b .
There remains one important issue of implementation, namely the computation of the
local stiness and mass matrices K and M , as well as the computation of the (local)
b , cf. (5.49)(5.51). This can be accomplished eciently by letting be the ane map
of a xed reference triangle and transforming the integrals in the formulas (5.49)
(5.51) to . To this end we will use the notation introduced in Section 5.2.

^
P (0,1)
3
F P (x 3,x 3 )
3 1 2

^
x2 2 2
P (x 1,x 2 )
2
1 1
^ ^ P (x 1,x 2 )
P (0,0) P (1,0) 1
1 2
x1

Let P1 P2 P3 be the unit right triangle with vertices (0,0), (1,0), (0,1). Let Th
be an arbitrary triangle in the triangulation with vertices Pj , 1 j 3, where

135
Pj = (xj1 , xj2 ), j = 1, 2, 3. Consider the ane map F that maps onto and is
dened by the requirements that Pj = F (Pj ), j = 1, 2, 3. Then F is of the form

x = F (
x) = B x + c , (5.62)

where B is the 2 2 invertible matrix



x 1 x1 x1 x 1
2 1 3 1
B =
x22 x12 x32 x12

and c = (x11 , x12 )T . Notice that | detB |= 2 | |:= 2 area( ).


Recall that functions on are transformed to the corresponding functions on by
v(
x) = v(x), whenever x = F (
x). I.e. v( x)) and v(x) = v(F1 (x)). (Under
x) = v(F (
our assumption on B , the map F is invertible, with x = F1 (x) = B1 x B1 c ).
(In our example (p. 114) the triangle = 25 has

x1 0 2 x1
B = , c = , where x1 = , x2 = ).
0 x2 2 x2 5 4

x), 1 i 3 be the local basis functions on the triangle , i.e. let i P1 be


Let i (
dened by the relations i (Pj ) = ij . Then, we easily see that

x1 , x2 ) = 1 x1 x2 ,
1 (

2 (
x1 , x2 ) = x1 ,

3 (
x1 , x2 ) = x2 .

It is easy to see that the corresponding local basis functions on , i.e. the elements of
P1 that satisfy i (Pj ) = ij , 1 i, j 3, are given by the relations

i (x) = i (
x), whenever x = F (
x).

Dening the 1 3 matrix of reference basis functions


by

= (
x) = [1 (
x), 2 ( x)]T ,
x), 3 ( (5.63)

x), whenever x = F (
we see that (x) = ( x), where (x) is the matrix of the local
basis functions on dened by (5.42).

136
We now compute the quantities b , M and K in terms of integrals of functions
dened on . We already seen that the area element transforms so that dx =| detB | d
x
i.e. dx = 2 | | d
x. Then we have

b = f (x) ( (x)) dx = 2 | | f(
T x))T d
x) (( x. (5.64)

Hence, to compute b we must evaluate the integral on of the vectorvalued function


f (F (
x))(( is dened in (5.63). Unless f(
x))T , where x) is a very simple function,
such integrals are evaluated numerically by an integration rule on . A simple but
eective rule is the barycenter rule, which is exact for P1 polynomials and states that

v( x
x) d = | | v(M
),

where | |= area(
) = 1/2 and M = (1/3, 1/3) is the barycenter of . Hence using

1
v( x
x) d = v(1/3, 1/3)
2

in (5.64) we see that



1/3
| |


b = | | f (1/3, 1/3) 1/3 , i.e. bi
= f (F (1/3, 1/3)), 1 i 3.
3
1/3

Similarly, we may easily compute the elements of M . Since



M = a(x) ( (x)) (x) dx = 2 | | a
T
( x))T (
x) (( x) d
x,

we have, for 1 i, j 3

Mij =2| | a(F (
x)) i (
x) j (
x) d
x.

(If numerical integration with the barycenter rule is used, we have that

| |
Mij
= a(F (1/3, 1/3)), 1 i, j 3).
9

The computation of


K = [D (x)]T [D (x)] dx

137
requires transforming the matrix D . We have, for 1 i, j 3

j (x) 3
xk
(D (x))ij = = (j (
x)) = (j (
x)) .
xi xi k=1
x
k x i

x) be the 2 3 matrix with elements


(
In analogy to (5.46) let D

(
x))ij = j (
x)
(D . (5.65)
xi

In addition, note that from x = B1 x B1 c , we infer that

xk
= (B1 )ki .
xi

Hence,

3

(D (x))ij = (B1 )ki (D
(
x))kj , 1 i, j 3,
k=1

i.e.
D (x) = (B1 )T D
(
x). (5.66)

We conclude that

K =2| |

(D x))T B1 (B1 )T D
( (
x) d
x.

The quantities inside the integral are independent of x. Indeed,



1 1 0
x) := J =
(
D ,
1 0 1

and therefore
K = | | J T (BT B )1 J.

138
Chapter 6

The Galerkin Finite Element


Method for the Heat Equation

6.1 Introduction. Elliptic projection


In this chapter we shall construct and analyze Galerkin nite element methods for the
following model parabolic initial-boundary-value problem. Let be a bounded domain
in Rd , d = 1, 2, or 3. We seek a real function u = u(x, t), x , t 0, such that


u u = f, x , t 0,

t
u = 0, x , t 0, (6.1)



u(x, 0) = u0 (x), x .

Here f = f (x, t) and u0 are given real functions on [0, ) and , respectively.
We shall assume that the initial-boundary-value problem (ibvp) (6.1) has a unique
solution which is suciently smooth for the purposes of the analysis of its numerical
approximation. For the theory of existence-uniqueness and regularity of problems
like (6.1) see [2.2] - [2.5]. In this chapter we will just introduce some basic issues of
approximating ibvps like (6.1) with Galerkin methods. The reader is referred to [3.6]
and [3.7] for many other related topics.
The spatial approximation of functions dened in will be eected by a Galerkin
nite element method. For this purpose we suppose that for h > 0 we have a nite-
0 0
dimensional subspace Sh of H 1 = H 1 () such that, for integer r = 2 and h suciently

139
small, there holds
0
inf {v + h(v )} Chs vs for v H s H 1 , 2 s r, (6.2)
Sh

where C is a positive constant independent of h and v. (In (6.2) we have used the
( )1/2 ( )1/2
notation v |v|1 d

v 2
= v v dx (v, v)1/2 ). For
i=1 xi
{ }

example, (6.2) holds in R when Sh = C(), P1 Th , = 0 ,
2
h

where h is a polygonal domain included in and Th a regular triangulation of h


with triangles with h = max(diam ) whose vertices on h lie on , cf. [3.5].
Given v H 1 , we dene Rh v, the elliptic projection of v in Sh , by the linear mapping
Rh : H 1 Sh , such that

(Rh v, ) = (v, ), Sh . (6.3)

Given v H 1 it is easy to see that Rh v exists uniquely in Sh and satises Rh v 5


v. The following error estimates follow from (6.2) and (6.3). (We have essentially
seen their proof in Section 5.1 but we repeat it here for the convenience of the reader).
0
Proposition 6.1. Suppose that v H s H 1 , where 2 s r. Then, there exists a
constant C independent if v and h such that

Rh v v + h(Rh v v) C hs vs , 2 s r. (6.4)

Proof. We have, by (6.3)

(Rh v v)2 = ((Rh v v), Rh v v)

= ((Rh v v), v) = ((Rh v v), v)

for any Sh . Hence, by (6.2)

(Rh v v)2 5 (Rh v v) (v ) C(Rh v v) hs1 vs ,

from which the H 1 estimate in (6.4) follows. For the L2 estimate we use Nitsches trick.
Given g L2 = L2 (), consider the bvp


= g in

= 0 on .

140
0
This problem has a unique solution H 2 H 1 such that 2 C = C g
0
(Elliptic regularity). For v H s H 1 , 2 s r, we have by Gausss theorem and
(6.3)

(Rh v v, g) = (Rh v v, ) = ((Rh v v), ) = ((Rh v v), ( ))

for any Sh . Hence, the H 1 estimate in (6.4), (6.2), and elliptic regularity imply
that
(Rh v v, g) 5 Chs1 vs h 2 Chs vs g .

Taking g = Rh v v gives that the L2 estimate in (6.4). 

6.2 Standard Galerkin semidiscretization


(In this and in the sections 6.2 and 6.3 we generally follow Thomee, [3.6, Ch.1])

0
Multiplying the pde in (6.1) by a function v H 1 , and integrating over using
Gausss theorem, we see that

(ut , v) + (u, v) = (f, v), t0. (6.5)

Motivated by (6.5), for each t 0 we approximate u(t) = u(, t) by a function uh (t) =


uh (, t) in Sh , called the (standard Galerkin) semidiscrete approximation (or spatial
discretization) of u in Sh , and dened by the equations

(uht , ) + (uh , ) = (f, ), Sh , t 0,
(6.6)
u (0) = u0 ,
h h

where u0h is an approximation of u0 in Sh to be specied later.


The equations (6.6), that will be called the (standard Galerkin) semidiscretization of
(6.1) in Sh , are equivalent to a linear system of ordinary dierential equations (odes).
To see this, let {j }N
j=1 be a basis of Sh , where Nh = dimSh , and let
h


Nh
uh (x, t) = j (t)j (x)
j=1

141
be the unknown semidiscrete approximation of u. Substituting this expression for uh
in (6.6) and taking = k , k = 1, . . . , Nh , we see that



Nh

j (t)(j , k ) + j (t)(j , k ) = (f, k ), 1 k Nh , t 0,
j=1



j (0) = 0 ,
j 1 j Nh ,
Nh
where u0h = j=1 j0 j . Hence the vector of unknowns = (t) = [1 , . . . , Nh ]T
satises the initial-value problem

G + S = F (t), t 0,
(6.7)
(0) = 0 ,

where G = (Gij ) is the Nh Nh mass (Gram) matrix dened by Gij = (j , i ), S = (Sij )


the Nh Nh stiness matrix given by Sij = (j , i ), and Fi = (f, i ). As we know,
G and S are real, symmetric, positive denite matrices. In particular G is invertible
and the ivp (6.7) has a unique solution (t) for all t = 0. We conclude that the Galerkin
semidiscrete approximation uh exists uniquely for all t 0.
For each t > 0 choose = uh in (6.6). Then

(uht , uh ) + uh 2 = (f, uh ) .

Since (uht , uh ) = 1
2
t (u2h (, t)) dx = 1 d
2 dt
uh (t)2 , we have
1d
uh 2 + uh 2 = (f, uh ) 5 f uh , t0.
2 dt
Recall the Pointcar`e-Friedrichs inequality, i.e. that
0
v Cp v, v H 1 (), (6.8)

valid for some Cp = Cp (). Using (6.8) in the above gives


1d Cp2 1
uh 2 + uh 2 Cp f uh f 2 + uh 2 ,
2 dt 2 2
from which
d
uh 2 + uh 2 Cp2 f 2 , t 0 .
dt
We conclude that for any t > 0 there holds that
t t
uh (t) +
2
uh (s) ds 5 uh + Cp
2 0 2 2
f (s)2 ds . (6.9)
0 0

In particular, for f = 0, we get uh (t) u0h for t = 0, i.e. that uh is stable in L2 .


We now prove the main error estimate of this section.

142
Theorem 6.1. Let uh , u be the solutions of (6.6), (6.1), respectively. Then, there
exists a constant C > 0, independent of h such that
( t )
uh (t) u(t) 5 uh u + Ch u r +
0 0 r 0
ut r ds , t0. (6.10)
0

Proof. Following Wheeler (SIAM J. Numer. Anal., 10 (1973), 723-759), we write

uh u = + , (6.11)

with = uh Rh u, where Rh is the elliptic projection operator onto Sh dened by


(6.3), and = Rh u u. Note that Sh for t 0, and in order to estimate uh u
we should estimate and . For the latter, we have

(t) = Rh u(t) u(t) Chr u(t)r , t0,


t 0
by (6.4). Since u(x, t) = u0 (x) + 0 ut (x, s) ds, assuming u0 H r H 1 and ut
0 t 0
H r H 1 for t 0 with 0 ut r ds < , we see that u H r H 1 for t 0 and
t
u(t)r u0 r + 0 ut r ds.
Therefore ( )
t
(t) 5 Ch r
u r +
0
ut r ds for t 0 . (6.12)
0

In order to get an equation for note that for t 0 and any Sh we have

(t , ) + (, ) = (uht , ) + (uh , ) ((Rh u)t , ) (Rh u, ) .

Hence, using (6.6), (6.3), and the fact that ((Rh u)t , ) = (Rh ut , ) for Sh (this
follows from (6.3) by dierentiating both sides with respect to t), we have

(t , ) + (, ) = (f, ) (Rh ut , ) (u, ) .

Therefore, by (6.5) and the denition of , we see that

(t , ) + (, ) = (t , ), Sh , t0. (6.13)

Given t > 0, take = in the above to obtain


1d
2 + 2 = (t , ) . (6.14)
2 dt
We would like to argue now that 12 dt
d
2 = dt
d
and conclude from (6.14) that
t
(t) (0)+ 0 t ds, but we dont know whether dtd
(t) exists if = 0 for some

143
t. Therefore we argue as follows: For all > 0 we have from (6.14) 21 dt
d
2 = 21 dt
d
(2 +

2 ) 5 t . Hence 2 + 2 dt d
2 + 2 5 t t 2 + 2 . There-
t
d
fore, dt 2 + 2 t for all t > 0 , from which (t)2 + 2 0 t ds +

(0)2 + 2 . Letting 0 we obtain the desired inequality


t
(t) t ds + (0), t 0 . (6.15)
0
t t
Since t = Rh ut ut , we have 0
t ds Chr 0
ut r ds form (6.4). On the other
hand,
(0) 5 u0h u0 + Rh u0 u0 u0h u0 + Chr u0 r .

Therefore, (6.15), (6.11) and (6.12) yield the desired estimate (6.10). 

Remarks
a. Theorem 6.1, and subsequent error estimates, depend on assumptions of sucient
regularity of the solution u of the continuous problem. Such assumptions will not
normally be explicitly made in the statements of theorems but will appear in the con-
clusions or in the course of proofs of the error estimates. For example, in the case of
0 0
the estimate at hand the proof requires that u0 H r H 1 and ut (, s) H r H 1 for
0 s t. These assumptions guarantee in particular that the bound of uh (t) u(t)
in (6.10) is of the form u0h u0 + O(hr ) where the O(hr ) term is of optimal order of
convergence in L2 for Sh , as evidenced by the approximation property (6.2).
b. The initial value u0h may be chosen in various ways so that u0h u0 = O(hr ). For
example, we could choose it as u0h = Rh u0 or u0h = P u0 (the L2 projection of u0 onto
Sh ) or equal to an interpolant of u0 in Sh . For example, if one of the rst two choices
0
is made, (6.2) and (6.4) yield that u0h u0 Chr u0 r , provided u0 H r H 1 , and
the overall optimal-order accuracy O(hr ) is preserved in the right-hand side of (6.10).

Exercise 1. In the proof of Theorem 6.1 take in (6.11) uh u = + , where, for


example, = uh P u, where P is the L2 -projection operator onto Sh , or = uh Ih u,
where Ih u is an interpolant of u in Sh , satisfying v Ih v + h(v Ih v) 5 Chr vr
0
for v H r H 1 . For these choices show that the best L2 -error estimate that one
could obtain is of O(hr1 ), i.e. of suboptimal order. Try to understand from these

144
considerations why Wheelers choice = uh Rh u is crucial.

We now prove an optimal-order H 1 estimate for uh .

Proposition 6.2. Under the hypotheses of Theorem 6.1, there holds


[ ( t )1/2 ]
(uh (t)u(t)) 5 (u0h u0 )+Chr1 u0 r + u(t)r + ut 2r1 ds , t 0.
0
(6.16)

Proof: As before, we write (uh u) = + , where = uh Rh u, = Rh u u.


Note that Chr1 ur . From (6.13) with = t it follows that

1d 1 1
t 2 + 2 = (t , t ) 5 t 2 + t 2 .
2 dt 2 2

Therefore
d
2 5 t 2 , t0
dt
form which ( )1/2
t
(t) (0) + t ds
2
, t0.
0

We conclude that
( t )1/2
(uh u) (0) + t ds
2
+
0
( t )1/2
(u0h u ) + (Rh u u ) + + Ch
0 0 0 r1
ut 2r1 ,
0

from which (6.16) follows. 

Exercise 2. Consider instead of (6.1) the initial-boundary value problem with Neu-
mann boundary conditions:


ut u = f, x , t 0,



u
= 0, x , t 0,

n



u(x, 0) = u0 (x), x .

Construct the standard Galerkin semidiscretization for this problem in a nite-dimensional


subspace Sh of H 1 and prove an analog of Theorem 6.1. (Dene now an elliptic projec-
tion Rh : H 1 Sh by the equations a(Rh v, ) = a(v, ), Sh for v H 1 , where

145
a(v, w) = (v, w) + (v, w) for v, w H 1 . )

Exercise 3. Generalize the results of this section to the case of the initial-boundary
value problem with variable coecients
( )

d
u

ut aij (x) + a0 (x)u = f (x, t), x , t 0,

xi xj
i,j=1


u = 0 on , t 0,




u(x, 0) = u (x), x ,
0

d d
where the functions aij satisfy aij (x) = aji (x), x and i,j=1 aij i j c0 2
i=1 i ,

x , = (1 , . . . , d ) Rd , for some positive constant c0 independent of x and ,


i.e. when the matrix-valued function aij is symmetric and uniformly positive denite
for x , and where a0 (x) 0, x . Assume that the coecients aij , a0 are smooth
0
enough on . The weak formulation of this ibvp is to nd u H 1 for t 0 such that
0
(ut , v) + a(u, v) = (f (t), v), v H 1 , t 0,
u(0) = u ,
0

d
where a(u, v) := i,j=1 a uvdx +
ij
a0 uv dx. (Establish rst that there exist pos-
0
itive constants C1 , C2 such that |a(v, w)| C1 v1 w1 v, w H 1 , and a(v, v)
0 0
C2 v21 v H 1 , and introduce now the elliptic projection of v H 1 onto Sh by
a(Rh v, ) = a(v, ) Sh . Use the Lax-Milgram theorem and Nitches trick to
prove analogous properties of Rh to those of Section 6.1, assuming elliptic regularity,
i.e. that u2 Cf holds for the associated elliptic bvp
( )

d
u

aij (x) + a0 (x) u = f (x), e
x ,
i,j=1
x i x j



u = 0 on .

146
6.3 Full discretization with the implicit Euler and
the Crank-Nicolson method
To solve the o.d.e. system (6.6) (or (6.7)) we need to discretize it in t and obtain
a fully discrete method. This may be done by various time-stepping techniques. In
this section we shall examine two of them, the implicit Euler and the Crank-Nicolson
methods, that are unconditionally stable in a sense that we shall make precise.
Let t = k be the length of the (uniform) timestep and tn = nk, n = 0, 1, 2, . . ..
The implicit Euler full discretization of (6.16) is dened as follows. We seek for n =
0, 1, 2, . . . approximations U n Sh of uh (tn ) that satisfy
( n )
U U n1
, + (U n , ) = (f n , ), Sh , n 1,
k (6.17)

0
U = u0h ,

where f n = f ( , tn ).
Finding U n for n 1, given U n1 , requires solving a linear system for the coecients
Nh n
of U n with respect to the basis {j }N j=1 of Sh . Let U
h n
= n
i=1 i i , where =

(1n , . . . , N
n
h
) RNh . Then, putting = j , 1 j Nh , in (6.17) gives

(G + kS)n = Gn1 + kF n , n 1, (6.18)

where Fin = (f n , i ), 1 i Nh . Thus, computing n requires forming the right-


hand side of (6.18) and solving a Nh Nh linear system with the matrix G + kS, which
is symmetric positive denite, and has the sparsity structure of G and S. If a direct
method, like Choleskys method, is used to solve this linear system, the LLT analysis
of G + kS may be done only once and n computed for each n using two backsolves
with L and LT . In more than one spatial dimensions such linear systems are usually
solved by a preconditioned conjugate-gradient type method.
Putting = U n in (6.17) gives
( )
U n 2 + k U n 2 = (U n1 , U n ) + k(f n , U n ) 5 U n1 + kf n U n .

Hence, U n U n1 + kf n , for n = 1, 2, . . .. This implies that



n
U U + k
n 0
f j , n = 1, 2, . . .
j=1

147
If f = 0 we see that U n U 0 , i.e. that the implicit Euler scheme is L2 -stable. In
fact for each n we have U n 5 U n1 , i.e. the L2 norm of U n is non-increasing. In
particular, if f n = 0, U n1 = 0, we see that U n = 0, i.e. that the homogenous linear
system of equations of the form (6.18) has only the trivial solution, implying that (6.18)
has a unique solution; this is an alternative to the matrix argument used previously
to show the same result. Note that the L2 stability of the scheme was proved without
any assumption on the timestep k, i.e. that the scheme (6.17) is unconditionally stable
in L2 .
As expected, the implicit Euler method is rst-order accurate in the time variable
as the following estimate suggests.

Theorem 6.2. Let U n , u(t) = u(, t), be the solutions of (6.17), (6.1), respectively.
Then, there exists a positive constant C, independent of h, k, and n, such that for
n0
[ tn ] tn
U u(t ) 5
n n
u0h u + Ch u r +
0 0 r
ut r ds + k utt ds . (6.19)
0 0

Proof: As in the proof of Theorem 6.1 we write U n u(tn ) = n + n , where n =


U n Rh u(tn ), n = Rh u(tn ) u(tn ).
Using (6.4) we see that
[ tn ]
C h u(t )r C h u r +
n r n 0 r
ut r ds , (6.20)
0

and it remains to estimate n . Let U n := k1 (U n U n1 ). By (6.17), (6.5) and for


Sh we have

(n , ) + (n , ) =(U n , ) + (U n , ) (Rh u(tn ), ) (Rh u(tn ), )

=(f (tn ), ) (Rh u(tn ), ) (u(tn ), )

=(ut (tn ) Rh u(tn ), ),

i.e.
(n , ) + (n , ) = ( n , ), Sh , n 1, (6.21)

where

n =Rh u(tn ) ut (tn )

=(Rh I)u(tn ) + (u(tn ) ut (tn )) =: 1n + 2n . (6.22)

148
Putting = n in (6.21) gives

(n , n ) + n 2 n n , i.e.

n 2 (n1 , n ) h n n , from which

n 5 n1 + k n , n1.

Therefore, in view of (6.22), summation with respect to n gives


n
n
5 + k
n 0
1j +k 2j , n 1. (6.23)
j=1 j=1

Now

1
1j =(Rh I)u(tj ) = (Rh I) (u(tj ) u(tj1 ))
k
tj j
1 1 t
=(Rh I) ut (s)ds = (Rh I)ut (x)ds .
k tj1 2 tj1

Therefore by (6.4)
tj
hr
j1 C ut r ds, giving
k tj1

n tn
k 1j Ch r
ut r ds . (6.24)
j=1 0

For the last term in (6.23) we note

1( j )
2j = u(tj ) ut (tj ) = u(t ) u(tj1 ) ut (tj ) .
k

For a real function v = v(t) recall Taylors theorem with integral remainder:

(t a)p (p) 1 t
v(t) = v(a) + (t a)v (a) + . . . + v (a) + (t s)p v (p+1) (s)ds .
p! p! a

Hence tj1
u(t j1
) = u(t ) kut (t ) +
j j
(tj1 s)utt (s)ds .
tj

We conclude tj1
1
2j = (tj1 s)utt (s)ds ,
k tj

from which
tj tj
1
2j (s t j1
)utt (s)ds 5 utt (s)ds ,
k tj1 tj1

149
yielding

n tn
k 2j 5k utt (s)ds . (6.25)
j=1 0

Since 0 = U 0 Rh u0 u0h u0 + C hr u0 r , we conclude from (6.23), (6.24),


(6.25) that
[ tn ] tn
5
n
u0h u + C h u r +
0 0 r
ut r ds + k utt (s)ds, n 0,
0 0

which, in view of the inequality U n u(tn ) n + n and (6.20) gives (6.19).

Remark. The inequality (6.23) is essentially a stability inequality for the error equa-
tion (6.21), whereas the estimates (6.24) and (6.25) are bounds on the spatial and
temporal truncation errors in the right-hand side of (6.23) and express the consis-
tency of the fully discrete scheme in that the tend to zero as h 0, k 0 under
the implied regularity assumptions on u. In fact, they imply that the spatial accuracy
of the scheme in L2 is of O(hr ) and the temporal accuracy of O(k). Thus the proof
of Theorem 6.2 is an illustration of the general principle that stability + consistency
convergence. Such a statement has to be veried in any given particular case and
depends on the choice of norms and the regularity of solutions.
We turn now to the Crank - Nicolson scheme for discretizing (6.6) in t with second-
order accuracy and retaining unconditional stability. We seek for n 0 approximations
U n Sh of uh (tn ) satisfying
( n )
U U n1 1
, + ((U n + U n1 ), ) = (f n1/2 , ), Sh , n 1,
k 2 (6.26)

0
U = u0h ,
( )
where f n1/2 = f , tn k
2
. Using our previous notation, we see that for n 1 the
matrix-vector representation of (6.26) is
( ) ( )
k k
G + S = G S n1 + k F n1/2 ,
n
n 1, (6.27)
2 2

where again n is the vector of coecients of U n with respect to the basis of Sh . The
matrix G+ k2 S is again sparse, symmetric and positive denite, and similar remarks hold
for computing n as in the case of the implicit Euler scheme. Putting = U n + U n1

150
in (6.26) gives for n 1

k
U n 2 U n1 2 + (U n + U n1 )2 = k(f n1/2 , U n+1 + U n )
2
( )
5 k f n1/2 U n + U n1 .

Hence U n 5 U n1 + kf n1/2 , for n = 1, implying that


n
U U + k
n 0
f j1/2 , n = 1, 2, . . . .
j=1

From these relations, if f = 0, we see that the Crank-Nicolson method is L2 -stable and
that U n is non-increasing, unconditionally. In particular, if f n1/2 = 0, U n1 = 0,
it follows that U n = 0, which means that the homogenous linear system of the form
(6.27) has only the trivial solution, implying that (6.27) has a unique solution given
n1 and F n1/2 . We proceed with an error estimate for the scheme.

Theorem 6.3. Let U n , u(t) = u( , t), be the solutions of (6.26), (6.1), respectively.
Then, there exists a positive constant C, independent of h, k and n, such that for n 0
[ tn ] tn
U u(t ) uh u +C h u r +
n n 0 0 r 0
ut r ds +C k 2
(uttt + utt ) ds .
0 0
(6.28)

Proof: We write again U n u(tn ) = n + n , where n = U n Rh u(tn ), n =


Rh u(tn ) u(tn ) and note that
[ tn ]
C h u r +
n r 0
ut r ds . (6.29)
0

From (6.26), (6.5) and for Sh we have

1( )
(n , ) + (n + n1 ),
2
1( )
= (f (tn1/2 ), ) (Rh u(tn ), ) (u(tn ) + u(tn1 )),
2
= (f (tn1/2 ), ) (ut (tn1/2 ), ) + (u(tn1/2 ), )
1
+ (ut (tn1/2 ) Rh u(tn ), ) (u(tn1/2 ) (u(tn ) + u(tn1 )), ) ,
2
where we used Gausss theorem (integration by parts) in the last term. Hence

1
(n , ) + ((n + n1 ), ) = ( n , ), Sh , n 1 , (6.30)
2
151
where

( )
n = 1n + 2n + 3n := (Rh I)u(tn ) + u(tn ) ut (tn1/2 )
( 1 )
+ u(tn1/2 ) (u(tn ) + u(tn1 )) . (6.31)
2

If we put = n + n1 in (6.30), we obtain, as in the stability proof of the scheme,


n
n
n
5 + k
n 0
1j +k 2j +k 3j , n 1. (6.32)
j=1 j=1 j=1

As in the case of the implicit Euler scheme (cf. (6.24)), we have


n tn
k 1j Ch r
ut r ds . (6.33)
j=1 0

To estimate 2j we note that for j 1

1( j )
2j = u(tj ) ut (tj1/2 ) = u(t ) u(tj1 ) ut (tj1/2 ) .
k

Using Taylors theorem gives


tj
k k2 1
j
u(t ) = u(t j1/2
) + ut (tj1/2 ) + utt (tj1/2 ) + (tj s)2 uttt (s)ds ,
2 2! 4 2! tj1/2

tj1
k k2 1
u(t j1
) = u(t j1/2
) ut (tj1/2 ) + utt (tj1/2 ) + (tj1 s)2 uttt (s)ds ,
2 2! 4 2! tj1/2

so that
[ ]
tj tj1/2
1
2j = (tj s)2 uttt (s)ds + (s tj1 )2 uttt (s)ds .
2k tj1/2 tj1

Therefore, for 1 j
( j j1/2 ) j
1 k2 t k2 t k t
2 5
j
uttt ds + uttt ds = uttt ds ,
2k 4 tj1/2 4 tj1 8 tj1

i.e.

n
k2 tn
k 2j uttt ds . (6.34)
j=1
8 0

For 3j , j 1, we have
( 1( ))
3j = u(tj1/2 ) u(tj ) + u(tj1 ) .
2

152
By Taylors theorem
tj
k
j
u(t ) = u(t j1/2
) + ut (tj1/2 ) + (tj s)utt (s)ds ,
2 tj1/2

tj1
k
u(t j1
) = u(t j1/2
) ut (tj1/2 ) + (tj1 s)utt (s)ds ,
2 tj1/2

so that
tj tj1/2
1 1
3j = (t s)utt ds
j
(s tj1 )utt ds .
2 tj1/2 2 tj1

Therefore

n
k2 tn
k 3j 5 utt ds . (6.35)
j=1
4 0

The desired estimate (6.28) follows now from the inequalities (6.29), (6.32)-(6.35), and
the fact that 0 = U 0 Rh u0 5 u0h u0 +u0 Rh u0 5 u0h u0 +C hr u0 r . 

Exercise 1. Let 12 5 5 1 and consider the following family of fully discrete schemes
( n )
U U n1 ( )

, + (U n + (1 )U n1 ),

k

( )
= f (tn ) + (1 )f (tn1 ), Sh , n 1,




U 0 = u0 .
h

Show that the schemes are L2 -stable and prove error estimates of the form
U n u(tn ) u0h u0 + O(k p + hr ), where p = 1 if 1/2 < 1 and p = 2 if
a = 1/2.

Exercise 2.Consider the implicit Euler method with variable step kn , where kn =
tn tn1 , n 1:
( n )
U U n1
, + (U n , ) = (f (tn ), ), Sh , n 1,
kn

0
U = u0h .

Show that the scheme is L2 -stable and prove an error estimate of the form (6.19) with
k = maxn kn .

153
6.4 The explicit Euler method. Inverse inequalities
and stiness
The explicit Euler full discretization of (6.6) is the following scheme: We seek for
n = 0, 1, 2, . . . approximations U n Sh of uh (tn ) that satisfy
( n )
U U n1
, + (U n1 , ) = (f n1 , ), Sh , n 1,
k (6.36)

0
U = u0h .
Finding U n for n 1, given U n1 , requires solving the linear system

G n = (G k S) n1 + k F n1 , n 1, (6.37)

where G, S, F n have been previously dened and n is as usual the vector of coecients
of U n with respect to the basis of Sh . (Note that although the time-stepping method
is explicit as a scheme for solving initial-value problems for odes, we still have to solve
linear systems with the mass matrix G at each time step.) Obviously the linear system
(6.37) has a unique solution n given n1 and F n1 .
In order to study the stability of the scheme put = U n in (6.36). Then, for n 1
we have
U n 2 (U n1 , U n ) + k (U n1 , U n ) = k (f n1 , U n ) .

Use now the identities


1( n )
(U n1 , U n ) = U U n1 2 U n 2 U n1 2 ,
2
1 1
(U n1 , U n ) = (U n + U n1 )2 (U n U n1 )2
4 4
to obtain
k k
U n U n1 2 + U n 2 U n1 2 + (U n + U n1 )2 (U n U n1 )2 =
2 2
2 k (f n1 , U n ), n 1 . (6.38)

In the left-hand side of this identity the troublesome term is k2 (U n U n1 )2 which


is non-positive. To resolve this problem we write (6.38) in the form
k
U n U n1 2 + U n 2 U n1 2 + (U n + U n1 )2
2
k
= (U n U n1 )2 + 2 k (f n1 , U n )
2

154
and use the inverse inequality (valid for quasiuniform partitions of )
C
, Sh , (6.39)
h
where C is a constant independent of h and , in the rst term of the right-hand side
to get
k C2 n
U n U n 2 + U n 2 U n1 2 U U n1 2 + 2 k f n1 U n ,
2 h2
i.e. ( )
k C2
1 2 U n U n1 2 + U n 2 U n1 2 2 k f n1 U n .
h 2
Therefore, if
k 2
2
2, (6.40)
h C
the above inequality gives
( )
U n 2 U n1 2 2 k f n1 U n 2 k f n1 U n + U n1 ,

from which
U n U n1 + 2 k f n1 , n 1,

and nally

n1
U U + 2 k
n 0
f j ,
j=0

which is the required L2 -stability inequality, analogous to those that were derived for
the implicit Euler and the Crank-Nicolson schemes. However, whereas such inequalities
in the case of the previous schemes were valid unconditionally, the explicit Euler scheme
needs a stability condition of the form (6.40). This condition is very restrictive in that
it requires taking k = O(h2 ), i.e. very small time steps. It can be shown that such a
condition is also necessary for stability. (A simple numerical experiment with piecewise
linear functions in 1D gives a clear indication !).
We postpone for the time being the proof of the inverse inequality (6.39) in order
to prove an L2 error estimate for the fully discrete scheme (6.36):

Theorem 6.4. Suppose that (6.39) and (6.40) hold and let U n , u be the solutions of
(6.36), (6.1) respectively. Then, there exists a positive constant C, independent of h, k
and u, such that for n 0
[ tn ] tn
U u(t )
n n
u0h u + C h u r +
0 r0
ut r ds + 2 k utt ds . (6.41)
0 0

155
Proof: Writing again U n u(tn ) = n + n , n = U n Rh u(tn ), n = Rh u(tn ) u(tn ),
and noting that (6.29) holds for n , we obtain for n 1 and any Sh

(n , ) + (n1 , ) = ( n , ) ,

where
( )
n = 1n + 2n := (Rh I)u(tn ) + u(tn ) ut (tn1 ) .

Putting = n we see, as in the derivation of (6.38), that

k k
n 2 n1 2 + n n1 2 + (n + n1 )2 = (n n1 )2 2 k ( n , n )
2 2
k C2 n
5 2
n1 2 2 k ( n , n ) .
(6.39) 2 h

Therefore, using (6.40), as in the stability proof



n
n
n
+ 2 k
n 0
+ 2 k
j 0
1j + 2k 2j . (6.42)
j=1 j=1 j=1

For the 1j term we have as in (6.24)



n tn
k 1j Ch r
ut r ds .
j=1 0

Since by Taylors theorem


tj
k 2j = u(t ) u(t
j j1
) k ut (t j1
)= (tj s)utt (s)ds, j 1,
tj1

we see that

n tn
k 2j 5k utt ds ,
j=1 0

and (6.41) follows from (6.42). 

We proceed now with verifying the inverse inequality (6.39). This is straightforward
to do in 1D. Consider an interval (, ) with < 1 and let Pk be the polynomials
of degree k. Then for some constant C = C(k) independent of and it holds that
C(k)
H 1 (,) L2 (,) , Pk . (6.43)

To see this, observe that there exists a constant C = C(k) such that

H 1 (0,1) C L2 (0,1) , Pk , (6.44)

156
as a consequence of the fact that Pk is a nite-dimensional vector space and the norms
H 1 (0,1) and L2 (0,1) are equivalent on Pk (0, 1). To derive (6.43) from (6.44) requires
just a change of scale. Write the inequality (6.44) for Pk as
1 1
( 2
)
(x) + ( (x)) dx C
2 2
2 (x)dx ,
0 0

and make in the integrals the change of variable x 7 y, y = ( )x + , that maps


[0, 1] onto [, ]. Then the above inequality becomes
[ ( )2 ]
1 d C2
(y) + ( )
2 2
(y) dy 2 (y)dy ,
dy

giving [
( )2 ]
d
( )2 2 (y) + (y) dy C 2
2 (y)dy ,
dy

since we assumed < 1. Hence (6.43) follows.


Let now (a, b) be any xed nite interval and let a = x0 < x1 < . . . < xJ+1 = b
be an arbitrary partition of [a, b]. Letting hi = xi+1 xi , 0 i J, we obtain from
(6.43) that
C(k)
H 1 (xi ,xi+1 ) L2 (xi ,xi+1 ) , Pk (xi , xi+1 ) . (6.45)
hi
{ }

Let now Sh = C[a, b] : [xi ,xi+1 ] Pk . Since Sh H 1 , using (6.45) we get for
any Sh


J J
1
2H 1 (a,b) = 2H 1 (xi ,xi+1 ) 5 C (k)
2
2
2L2 (xi ,xi+1 ) . (6.46)
i=0
h
i=0 i

We now assume that the partition {xi } of [a, b] is quasiuniform, i.e. that there is a
constant independent of the partition (in the sense that as the partition is rened
does not change) such that
h
5 , i , (6.47)
hi
where h = maxi hi . In view of (6.47), (6.46) gives

C
H 1 (a,b) 5 L2 (a,b) , Sh , (6.48)
h

with C = C(k) , from which (6.39) follows in 1D.


In order to prove (6.39) in 2D, we assume that is a polygonal domain and let
Th = { } be a regular (cf. (5.16)) triangulation of . We recall from section 5.2 that

157
any triangle of the triangulation is anely equivalent to a xed reference triangle b.
As in the case of 1D, there exists a constant C1 = C1 (b
, k) such that

b 1,b C1
b 0,b , b Pk (b
) , (6.49)

as a consequence again of the fact that Pk (b


) is a nite-dimensional space and that
1,b = H 1 (b ) and 0,b = L2 (b ) are norms on Pk (b
). Suppose now that
{ }
Sh = C() : Pk ( ) and let F be the ane map that maps b one-one onto
. Following the notation of section 5.2 we write F (b b +b , for x
x) = B x b = (b b2 ) b,
x1 , x
where B is a 22 invertible matrix and b a 2-vector. If Pk ( ) we dene b Pk (b
)
b x), where x = (x1 , x2 ) and x = F (b
by (x) = (b x) as usual. Then for Pk ( ), using
the transformation norm inequalities (5.20), (5.21) and (6.49) we get

b 1 1/2 b
||1, 5 C(k)|B | | det B | ||1,b
b 1 1/2 b
5 C1 C(k)|B | | det B | 0,b

5 C(k, b)|B1 | | det B |1/2 | det B |1/2 0, .

Therefore, using the regularity of the triangulation and (5.24) we see that there exists
a constant C independent if such that
C
||1, 0, , Pk ( ) ,
h
where h = diam( ). Since Sh H 1 () we see that
1
||21, = ||21, 5 C 2 2
20, , Sh .
Th T
h
h

If the triangulation is quasiuniform in the sense that for some constant independent
of the partition
h
5 , Th , (6.50)
h
where h = max h , we see that
C
||1, 0, , Sh , (6.51)
h
from which (6.39) (and also 1 Ch1 ) follows.
We mention that similar scaling arguments yield, for quasiuniforms partitions, the
more general inverse inequalities

5 Ch , Sh ,

158
provided , are nonnegative integers so that < and Sh H (), and

Chd/2 , Sh

for Rd . Of interest is also the nonstandard inverse almost Sobolev inequality

C| ln h|1/2 1 , Sh ,

valid for R2 , cf. [3.6, p.68].


We close this section with a few remarks about sti systems of odes and the in-
terpretation of the stability condition (6.40) as a restriction on the time step related
to the size of the eigenvalues of the matrix G1 S. Let A be a symmetric and positive
denite real m m matrix and consider the ode ivp for y = y(t) Rm

y + Ay = 0, t 0,
(6.52)
y(0) = y0 .

Let 0 < 1 2 . . . m be the eigenvalues of A. Then from the theory of


numerical solution of odes, we know that if a method for the numerical solution of
ivps has interval of absolute stability [, 0], where > 0 (a = + if the method
is A0 -stable), it will give stable approximations, when applied to (6.52), provided the
time step t is chosen so that (i )t [, 0] for all i, i.e. so that t 5
m
.
We recall that for the explicit Euler method = 2, while = + for the implicit
Euler and the trapezoidal method. Hence, the latter two methods are suitable for sti
systems, i.e. systems for which 1 = O(1) and m >> 1.
Consider now the ode system (6.7) corresponding to the Galerkin semidiscretization
of the ibvp (6.1). For f = 0 we write the system as

Gy + Sy = 0, t 0,
(6.53)
y(0) = y0 ,

where y Rm with m = Nh = dimSh . This system is of the form (6.52) with A = G1 S


but G1 S is not symmetric. To transform the system into the form (6.52) with a
symmetric positive denite matrix, consider the matrix G1/2 . Since G is symmetric
and positive denite, G1/2 is dened e.g. using the spectral representation of G and is

159
symmetric and positive denite. Multiplying by G1/2 both sides of the ode system in
(6.53) we have
G1/2 y + G1/2 S G1/2 G1/2 y = 0 ,

or
z + Az = 0, t 0, (6.53)

where z(t) = G1/2 y(t), and A = G1/2 S G1/2 is easily seen to be symmetric and
positive denite. Let 0 < 1 2 . . . m be the eigenvalues of A. Then they
satisfy
G1/2 S G1/2 x = i x , (6.54)

where x Rm is a corresponding to i eigenvector. It follows that S G1/2 x = i G1/2 x.


Therefore, with G1/2 x = w, i.e. Gw = G1/2 x, we see that the eigenvalue problem
(6.54) is equivalent to the generalized eigenvalue problem

Sw = i Gw (6.55)

and so the i are eigenvalues of G1 S. It follows from (6.55) that

wT Sw
i = .
wT Gw
m
Let now = i=1 wi i Sh , where {i } is the chosen basis of Sh . Then 2 = wT Gw,
2
2 = wT Sw, and i = 2
. Therefore, if the inverse inequality (6.39) holds, we
have that
C2
m = max (G1 S) 5 . (6.56)
h2
Hence, a sucient condition for the stability of the explicit Euler scheme for the ivp
( 2)
(6.53 ) or, equivalently, for the ivp (6.53), is, since Ch2 k 5 (m )k, k = t, that
2
Ch2 k = 2, i.e. k
h2
2
C2
, which is precisely the restriction (6.40) found by the energy
method. For the implicit Euler or the Crank-Nicolson (i.e. the trapezoidal) scheme for
which = +, there is no restriction.

160
Chapter 7

The Galerkin Finite Element


Method for the Wave Equation

7.1 Introduction
In this chapter we shall consider Galerkin nite element methods for the following
model second-order hyperbolic initial-boundary-value problem. Let be a bounded
domain in Rd , d = 1, 2, or 3. We seek a real function u = u(x, t), x , t 0, such
that



utt u = f, x , t 0,


u = 0, x , t 0, (7.1)




u(x, 0) = u0 (x), ut (x, 0) = u0t (x), x .

Here f = f (x, t) and u0 , u0t are given real functions on [0, ), and respectively.
We shall assume that the ibvp (7.1) has a unique solution, which is suciently smooth
for the purposes of the analysis of its numerical approximation. For the theory of
existence-uniqueness and regularity of (7.1) see [2.2]-[2.5]. We just make two remarks
here for the homogeneous problem, i.e. when f = 0 in (7.1).
i. Multiplying both sides of the pde in (7.1) by ut , integrating over using Gausss
theorem and the boundary and initial conditions, we easily get the energy conservation
identity
ut 2 + u2 = u0t 2 + u0 2 , t 0 . (7.2)

161
ii. The regularity theory for (7.1) requires smoothness and compatibility conditions at
of the initial data u0 , u0t , and establishes corresponding smoothness and compati-
bility properties in the same spaces of the solutions u and ut for t > 0. Thus, in the
case of (7.1) we do not have any smoothing eect on the solutions for t > 0 as in the
case of the heat equation, cf. e.g [2.4].

7.2 Standard Galerkin semidiscretization


0
Multiplying the pde in (7.1) by v H 1 and integrating over using Gausss theorem
we obtain
(utt , v) + (u, v) = (f, v), t 0. (7.3)
0
Let Sh be a nite-dimensional subspace of H 1 satisfying (6.2). Motivated by (7.3) we
dene the (standard Galerkin) semidiscrete approximation uh of u as a function uh (t)
in Sh , t 0, such that




(uhtt , ) + (uh , ) = (f, ), Sh , t 0,


uh (0) = u0h , (7.4)





uht (0) = u0t,h ,

where u0h , u0t,h are given elements of Sh approximating u0 , u0t , respectively. If {j }N


j=1 is
h

a basis of Sh and we let



Nh
uh (x, t) = j (t)j (x),
j=1

we see that



G + S = F (t), t 0,


(0) = , (7.5)





(0)
= ,

where Gij = (j , i ), Sij = (j , i ), 1 i, j Nh , Fi = (f, i ), 1 i Nh ,


= (t) = [1 , . . . , Nh ]T , and = (i ), = (i ) are the coecients of u0h , u0t,h , with

162
respect to the basis {i }. The ivp (7.5) may also be written as




I 0 d 0 I 0

= + , t 0,
dt
0 G S 0 F




((0), (0))
T
= (, )T ,

i.e. as an ivp for a rst-order system of ode s; it is then evident that (7.5) has a unique
solution for t 0. If we put = uh in (7.4) with f = 0 it is straightforward to prove
that
uht (t)2 + uh (t)2 = u0t,h 2 + u0h 2 , t 0, (7.6)

which is the discrete analog of (7.2) and expresses the stability (conservation) of
0
(uh , uht ) in H 1 L2 .
0
We recall now the denition of the elliptic projection of a function v H 1 as the
element Rh v Sh such that (Rh v, ) = (v, ) for all Sh . Following Dupont,
SIAM J. Numer. Anal., 10(1973), 880-889, we have

Theorem 7.1. Let u, uh be the solutions of the ivp s (7.1) and (7.4) respectively.
Then, given T > 0, there exists a positive constant C = C(T ) such that
{
uh (t) u(t) C (u0h Rh u0 ) + u0t,h Rh u0t
[ ( )1/2 ] }
t
+h r
u(t)r + utt 2r ds , 0 t T. (7.7)
0

Proof. We write as usual uh u = + , where = uh Rh u, = Rh u u. We have


as before
(t) Chr u(t)r , t 0. (7.8)

Now, for t 0 and any Sh we have using (7.4) and (7.3)

(tt , ) + (, ) = (uhtt , ) + (uh , ) (Rh utt , ) (Rh u, )

= (f, ) (Rh utt , ) (u, )

= (utt Rh utt , ) = (tt , ).

Given t > 0 take = t in the above to obtain

1d 1 1
(t 2 + 2 ) = (tt , t ) tt 2 + t 2 .
2 dt 2 2
163
Hence
d
(t 2 + 2 ) tt 2 + (t 2 + 2 ), t 0. (7.9)
dt
We recall now Gronwall s lemma: If (t) (t) + (t), t 0, then (t) et (0) +
t ts
0
e (s)ds. To see this, note that for t 0

et (t) et (t) et (t).

Therefore
d t
(e (t)) et (t), i.e.
dt t
t
e (t) (0) es (s)ds,
0

from which the conclusion follows. Using this result in (7.9) for = t 2 + 2 we
have for 0 t T
( t )
t + C(T ) t (0) + (0) +
2 2 2 2
tt ds ,
2
0

where C(T ) = eT . By the denition of and it follows that for 0 t T


[ t ]
t + C(T ) ut,h Rh ut + (uh Rh u ) + Ch
2 2 0 0 2 0 0 2 2r
utt ds .
2
0
n n 1/2 n
Using now the inequalities 1
n i=1 i ( i=1 i2 ) i=1 i , valid for i 0, we
conclude that for some constant C = C(T )

t + C[u0t,h Rh u0t + (u0h Rh u0 )


( t )1/2
+h r
utt ds
2
], 0 t T. (7.10)
0
t
We note now that (t) = t (s)ds + (0), from which for t 0
0
t
(t) (0) + t ds (0) + t max t (s).
0 0st

Therefore, using (7.10) we have for 0 t T

uh (t) u(t) (t) + (t)

Chr u(t)r + (0) + t max t (s)


0st

C(T ){u0t,h Rh u0t + (u0h Rh u0 )


( t )1/2
r
+ h [u(t)r + utt r ds
2
]},
0

164
which is the desired estimate (7.7). (Note that we used Poincare s inequality (0)
C(0) in the last inequality.)

Remarks
a. The estimate (7.7) implies that if we choose u0h = Rh u0 , u0t,h = Rh u0t , then we
have the optimal-order L2 -estimate uh (t) u(t) = O(hr ). We may also choose u0t,h
as any optimal-order (in L2 ) approximation to u0t in Sh , e.g. as P ut , where P is the
L2 -projection onto Sh . However we cannot do the same for u0h , which must be close to
Rh u0 in H 1 to O(hr ) to guarantee the optimal-order bound in (7.7).
b. From (7.10) it also follows that for 0 t T
{
uht ut t + t C u0t,h Rh u0t
[ ( t )1/2 ]}
+(u0h Rh u0 ) + hr ut (t)r + utt 2r ds ,
0

and
{
(uh u) + C u0t,h Rh u0t
[ ( )1/2 ]}
t
+(u0h Rh u0 ) + hr1 ur1 + utt 2r ds .
0

The latter inequality implies that initial conditions e.g. of the type u0h = P u0 ,
u0t,h = P u0t will give an optimal-order estimate for (uh u).

The following result, due to G. Baker, SIAM J. Numer. Anal. 13(1976), 564-576,
relies on a duality argument in time and shows that one may after all get an L2
estimate of optimal order starting with any optimal-order L2 approximation of u0 and
u0t . We state it in a form that also improves Theorem 7.1 with respect to the required
regularity of the solution and the dependence of C on T .

Theorem 7.2. Let u, uh be the solutions of the ivp s (7.1), (7.4), respectively. Then
there exists a constant C, independent of t and h, such that

uh (t) u(t) C{u0h P u0 + tu0t,h P u0t


t
+ h [u r +
r 0
ut r ds]}. (7.11)
0

165
Proof. We put e := uh u = + , where = uh Rh u, = Rh u u. As in the proof
of Theorem 7.1 we have (tt , ) + (, ) = (tt , ) for all Sh , t 0. Since
(tt , ) = d
( , )
dt t
(t , t ) we have for any Sh , t > 0

d d
(t , t ) + (, ) = (t , ) (t , ) + (t , t )
dt dt
d
= (et , ) + (t , t ). (7.12)
dt

Fix > 0 and let (t)
:= t
(s)ds, t 0. Then (t)
Sh for t 0, ()
= 0, and
t = . Select = in (7.12) to obtain

d
(t , ) (t , )
= (t , ),
(et , )
dt

i.e.
1d 1d d
2
2 = (et , )
(t , ), t 0.
2 dt 2 dt dt
Integrating both sides of the above with respect to t from 0 to , we obtain

()2 (0)2 ()
2
+ (0)
2


= 2(et (), ())
+ 2(et (0), (0))
2 (t , )dt.
0

Since ()
= 0, Sh , we have for any > 0:

() (0) + 2(P et (0), (0))
2
2
2 (t , )dt.
0

Hence
() (0) + 2P et (0)(0)
2 2
+2 t dt.
0

We recall that (0)
= 0
(t)dt. Therefore (0)
0
dt and therefore

() (0) + 2P et (0)
2 2
dt + 2
t dt
0 0

sup (s)((0) + 2P et (0) + 2 t dt).
0s 0

Since was arbitrary, the inequality above holds for all [0, ]. Let [0, ] be a
point where ( ) = sup0s (s). Applying the inequality for = we get
( )
( ) ( ) (0) + 2P et (0) + 2
2
t dt .
0

166
Hence, since () 5 ( )

() (0) + 2P et (0) + 2 t dt,
0

for any > 0. Therefore, for t 0, using the denition of


t
(t) (0) + 2tP et (0) + Ch r
ut r ds, (7.13)
0

and
t
uh (t) u(t) (t) + (t) (0) + t ds + (t)
{ 0
t }
C uh P u + tut,h P ut + h [u r +
0 0 0 0 r 0
ut r ds] ,
0

which is (7.11). (To get the last inequality of the right-hand side we used (7.13)
and that (0) = Rh u0 u0 Chr u0 r , (0) u0h P u0 + Chr u0 r , and
P et (0) = P (u0t,h u0t ) = u0t,h P u0t .)

It is evident that (7.11) implies that u(t) uh (t) = O(hr ) if u0h , u0t,h are any
optimal-order L2 -approximations of u0 , u0t in Sh , respectively.
Remark
It is straightforward to check that analogs of Theorems 7.1 and 7.2 hold for the ibvp




utt di,j=1 x i (aij (x) x
u
) + a0 (x)u = f (x, t), x , t 0,

j

u = 0 on , t 0,




u(x, 0) = u0 (x), ut (x, 0) = u0t (x), x ,

where the coecients aij , a0 satisfy the conditions set forth in Exercise 3 of section
6.2. L.A. Bales has proved (Math. Comp. 43(1984), 383-414) that similar results hold
when aij and a0 depend also on t.

7.3 Fully discrete schemes


In this section we shall examine some simple methods for discretizing in time the
semidiscrete Galerkin equations (7.4). As usual we put tn = nk, n = 0, . . . , M , M k =

167
T > 0. We seek U n Sh , 0 n M , approximations of the solution u of (7.1) at tn ,
satisfying


12 (U n+1 2U n + U n1 , ) + (U n , ) = (fn , ),
k Sh , 1 n M 1,

U 0 , U 1 given in Sh ,
(7.14)
where, for 1 n M 1 and 0, Un := U n+1 + (1 2)U n + U n1 , fn =
f n+1 + (1 2)f n + f n1 , f n = f ( , tn ). Given U n1 , U n , (7.14) denes uniquely
U n+1 Sh as solution of the linear system of equations
( ) ( ) ( )
G + k 2 S n+1 = 2G (1 2)k 2 S n G + k 2 S n1 + k 2 Fn ,
Nh
where U n = i=1 in i , {i }, G, S, were dened in section 7.2, and Fn is the Nh -
vector with components (fn , i ). If = 0 (7.14) corresponds to the classical Courant-
Friedrichs-Lewy two-step scheme for the wave equation.
Taking for simplicity f = 0 in (7.14) and putting = U n+1 U n1 we obtain for
1nM 1

(U n+1 2U n + U n1 , U n+1 U n1 ) + k 2 (Un , (U n+1 U n1 )) = 0. (7.15)

Now
( )
U n+1 2U n + U n1 , U n+1 U n1
( )
= (U n+1 U n ) (U n U n1 ), (U n+1 U n ) + (U n U n1 )

= U n+1 U n 2 U n U n1 2 .

In addition

(Un , (U n+1 U n1 )) = ((U n+1 + (1 2)U n + U n1 ), (U n+1 U n1 ))

= ((U n+1 , U n+1 ) (U n1 , U n1 ))

+ (1 2)((U n+1 , U n ) (U n , U n1 )).

Hence, summing in (7.15) with respect to n from n = 1 to l, where 1 l M 1, we


see that

U l+1 U l 2 + k 2 (U l+1 2 + U l 2 ) + (1 2)k 2 (U l+1 , U l )

= U 1 U 0 2 + k 2 (U 1 2 + U 0 2 ) + (1 2)k 2 (U 1 , U 0 ),

168
which, motivated by the identity

(x + y)2 1
(x + y ) + (1 2)xy =
2 2
+ ( )(x y)2 ,
4 4

we rewrite as

k2 1
U l+1 U l 2 + (U l+1 + U l )2 + k 2 ( )(U l+1 U l )2
4 4
k2 1
= U 1 U 0 2 + (U 1 + U 0 )2 + k 2 ( )(U 1 U 0 )2 , (7.16)
4 4
which is valid for 0 l M 1. With the aid of the identity (7.16) we may prove the
following stability result for the fully discrete scheme (7.14).

Proposition 7.1. Suppose that there exists a constant c, independent of h and k, such
that
U 1 U 0 ck (7.17)

and
(U 1 U 0 ) c. (7.18)

Then, there exists a constant C, independent of h and k, such that for all 1
4
the
solution of (7.14) with f = 0 satises

max U n C. (7.19)
0nM

If 0 < 1
4
and the inverse inequality (6.39) is valid in Sh , then (7.19) holds provided
k
h
, where is some constant depending on C and .

Proof. If 14 , (7.16) and (7.17) - (7.18) give, for l 0 and k 1, U l+1 U l C


and (U l+1 U l ) C. Hence, by Poincares inequality, U l+1 U l C, l 0,
( n+1 n ) ( n+1 n )
and (7.19) follows since U n = U 2+U U 2U .
If 0 < 14 , (7.16) and (7.17) - (7.18) give for l 0

k2 1
U l+1 U l 2 + (U l+1 + U l )2 k 2 ( )(U l+1 U l )2 + Ck 2 .
4 4

Therefore, by the inverse inequality (6.39) we have

k2 k2 1
U l+1 U l 2 + (U l+1 U l )2 2 ( )C2 U l+1 U l 2 + Ck 2 ,
4 h 4

169
i.e.
2
2k 1 k2
[1 C 2 ( )]U l+1
U + (U l+1 + U l )2 Ck 2 ,
l 2
l 0.
h 4 4
Hence, if
k 2
< (7.20)
h C 1 4
we have that U l+1 U l C() and (7.19) follows.

As we remarked in Section 6.4, a more general sucient stability condition is

[ ]1/2 2
k max (G1 S) < . (7.21)
1 4

The inverse inequality (6.39) and (7.20) imply (7.21).


We proceed now to derive L2 -error estimates for the scheme (7.14), following e.g.
Dupont, op.cit. For simplicity we treat only the case = 14 , i.e. the case of smallest
value of for which (7.19) holds unconditionally. The proof for the other 0 follows
along similar lines. The basic step of the proof is the following result.

Proposition 7.2. Let u be the solution of (7.1), U n the solution of the scheme (7.14)
for = 41 , and n = U n Rh un , where un = u(tn ). Then, there exists a constant C
independent of h, k and T such that for 0 n M 1

( ) 1
1 l+1
max l + (l+1 + l ) C 1 0 + (1 + 0 )
0ln k k
(
)1/2 ( n+1 )1/2
tn+1 t
+ T hr utt 2r ds + k2 t4 u2 ds . (7.22)
0 0

Proof. For 1 n M 1 we denote 2 g n = 1


k2
(g n+1 2g n + g n1 ), gn = g1/4
n
=
1 n+1
4
(g + 2g n + g n1 ). If U n un = (U n Rh un ) + (Rh un un ) = n + n , using (7.14)
with = 1/4 and (7.1), we have for 1 n M 1, Sh

(2 n , ) + (n , ) = (fn , ) (2 (Rh un ), ) (
un , )

untt 2 (Rh un ), )
= (

= (2 un 2 (Rh un ) + untt 2 un , )

= (2 n + n , ),

170
where n := untt 2 un . Therefore, for any Sh

(n+1 2n + n1 , ) + k 2 (n , ) = k 2 (2 n + n , ), 1 n M 1. (7.23)

We take now = U n+1 U n1 and sum both sides of (7.23) with respect to n from
n = 1 to l, for 1 l M 1. As in the proof of Proposition 7.1 we obtain

k2 k2
l+1 l 2 + (l+1 + l )2 = 1 0 2 + (1 + 0 )2
4 4
l
+ k2 (2 n + n , n+1 n1 ). (7.24)
n=1

For any > 0 we have


l
l
k 2
(2 n + ,n n+1
n1
)k 2
2 n + n n+1 n1
n=1 n=1

k2 k 2 n+1
l l
2 n + n 2 + n1 2
2 n=1 2 n=1

k 2 2 n 2 k 2 n 2 k 2 n+1
l l l
+ + ( n + n n1 )2
n=1 n=1 2 n=1

k2 2 n 2 k2 n 2
l l l
+ + 2k 2
n+1 n 2 . (7.25)
n=1 n=1 n=0

We estimate now the rst two terms in the right-hand side of the above. By Taylors
theorem we have
tn+1
n+1
= + n
knt + (tn+1 )tt ( ) d,
tn
tn1
n1
= n
knt + (tn1 )tt ( ) d.
tn

Therefore
[ ]
tn+1 tn
1
2 n = 2 n+1
(t )tt d n1
(t )tt d .
k tn tn1

Since
( )2
tn+1 tn+1 tn+1
n+1
(t )tt d (t n+1
) d 2
(tt )2 d
tn tn tn
tn+1
k3
(tt )2 d,
3 tn

171
we have n+1
C 3 t
4k
(2 n )2 (tt )2 ds,
k tn1
and we conclude by the denition of that
l l+1 tl+1
C t 1 2r

n 2
tt ds Ck h
2
utt 2r ds.
n=1
k 0 0

Now
n = untt 2 un = (
untt untt ) + (untt 2 un ) =: 1n + 2n .

For 1n by Taylor s theorem we have


[ n+1 tn1 ]
t
1
1n = (tn+1 s)t4 uds + (tn1 s)t4 uds .
4 tn tn

tn+1
Hence (1n )2 ck 3 tn1
(t4 u)2 ds and

l tl+1
1n 2 ck 3
t4 u2 ds.
n=1 0

Similarly,

l tl+1
2n 2 ck 3
t4 u2 ds.
n=1 0

We conclude that
( l ) [ l+1 tl+1 ]
k2 2 n 2 n 2
l
ck 2r t
+ h utt 2r ds + k 4 t4 u2 ds
n=1 n=1
0 0

ck
=: l , 1 l M 1. (7.26)

Therefore, by (7.24) - (7.26) we obtain for 0 l M 1
k2 k2
l+1 l 2 + (l+1 + l )2 1 0 2 + (1 + 0 )2
4 4
l ( )
ck k2
+ l + 2k 2
n+1
+ (
n 2 n+1 n 2
+ )
n=0
4
k2 ck
1 0 2 + (1 + 0 )2 + l
(4 2
)
k
+ 2kT max n+1
+ (
n 2 n+1 n 2
+ ) ,
0nl 4
since (l + 1)k T . Choose now = 1
4T k
. Then, for 0 l M 1
k2 k2
l+1 l 2 + (l+1 + l )2 1 0 2 + (1 + 0 )2
4 ( 4 )
1 k2
+ CT k l + max
2 n+1
+ (
n 2 n+1 n 2
+ ) . (7.27)
2 0nl 4

172
Fix l for the moment and let m, 0 m l, be an integer for which
( )
k2 k2
max n+1
+ (
n 2 n+1
+ ) = m+1 m 2 + (m+1 + m )2 .
n 2
0nl 4 4

Then, from (7.27), since n is increasing with n

k2 k2
m+1 m 2 + (m+1 m )2 1 0 2 + (1 + 0 )2
4 ( 4 )
2
1 k
2
+CT k l + m+1
+ (
m 2 m+1 m 2
+ ) .
2 4

Therefore

k2 k2
m+1 m 2 + (m+1 m )2 2[1 0 2 + (1 + 0 )2 ] + CT k 2 l .
4 4

But

k2 k2
l+1 l 2 + (l+1 + l )2 m+1 m 2 + (m+1 + m )2 .
4 4

Hence,

k2 k2
l+1
+ ( + ) C[ + (1 + 0 )2 + T k 2 l ].
l 2 l+1 l 2 1 0 2
4 4

Dividing both sides of the above by k 2 and taking square roots we obtain (7.22) in
view of (7.26).

Exercise 1. Using the technique of the proof of Proposition 7.1 show that an estimate
of the form (7.22) holds for the solution of (7.14) for any 0, provided the stability
condition (7.20) holds if 0 < 14 .
1
Exercise 2. The scheme (7.14) in the case = 12
is known as the Stormer-Numerov
method. Show that this scheme is fourth-order accurate in time: Specically prove that
1
for = 12

l tl+1
ck
n 2 7
t8 u2 ds,
n=1 0

and consequently that the last term in the right-hand side of (7.22) is of O(k 4 ), provided
1
(7.20) holds with = 12
. (The Stormer-Numerov scheme is the only fourth-order
accurate in k scheme of the family (7.14). For all other 0 the temporal truncation
error is of O(k 2 ).)
We present now some straightforward implications of Proposition 7.2.

173
Proposition 7.3. Let the hypotheses of Proposition 7.2 hold and assume in addition
that for some constant C independent of h and k we have

1 1
0 + (1 + 0 ) C(k 2 + hr ). (7.28)
k

Then, there is a constant C = C(u, T ) such that

1 k
(i) (U n+1 U n ) ut (tn+1/2 ) C(k 2 + hr ), tn+1/2 = tn + , 0 n M 1,
k 2
1
(ii) (U n+1 + U n ) u(tn+1/2 ) C(k 2 + hr1 ), 0 n M 1,
2
(iii) max U n un C(k 2 + hr ).
0nM

Proof. (i). Let 0 n M 1. Since

1 n+1 1 1
(U U n ) ut (tn+1/2 ) = (n+1 n ) + Rh (un+1 un ) ut (tn+1/2 )
k k k
1 n+1 un+1 un un+1 un un+1 un
= ( n ) + Rh ( )( )+( ut (tn+1/2 ))
k k k k
n+1
1 n+1 1 t 1
= ( )+
n
(Rh ut ut )ds + ( (un+1 un ) ut (tn+1/2 )),
k k tn k
we have

1 1
(U n+1 U n ) ut (tn+1/2 ) n+1 n + Chr n maxn+1 ut (s)r
k k t st

+ Ck 2 max uttt (s),


tn stn+1

and (i) follows from (7.22) and (7.28).


(ii). Let 0 n M 1. Since

1 1 1
(U n+1 + U n ) u(tn+1/2 ) = (n+1 + n ) + (Rh (un+1 + un ) (un+1 + un ))
2 2 2
1 n+1
+ ( (u + un ) u(tn+1/2 )),
2

we have

1 1
(U n+1 + U n ) u(tn+1/2 ) (n+1 + n ) + chr1 (un+1 r + un r )
2 2
+ Ck 2 max utt (s),
tn stn+1

and (ii) follows from (7.22) and (7.28).

174
(iii). Since
U l ul = l + l , 0 l M,

we have
U l ul l + Chr ul r , 0 l M. (7.29)
n+1 +n n+1 n
But for 0 n M 1, n+1 = 2
+ k2 ( k
). Hence by (7.22), (7.28) and
Poincare s inequality

1 k 1 n+1
n+1 n+1 + n + n
2 2k
k n+1 n
C(n+1 + n ) + C(k 2 + hr ),
2 k
1 +0 1 0
for 0 n M 1. In addition, since 0 = 2
k2 ( k
), we have by the Poincare
inequality and (7.28)

1 1 k 1 0 C k 1 0
+ +
0 0
( + ) +
1 0
C(k 2 + hr ).
2 2 k 2 2 k

Therefore (iii) follows from (7.29) and these estimates.

We turn now to the matter of choosing U 0 and U 1 in Sh so that (7.28) holds. We


choose rst
U 0 = R h u0 . (7.30)

This implies that 0 = 0; hence, U 1 must be chosen so that

1 1
+ 1 C(k 2 + hr ). (7.31)
k

A straightforward way of doing this is using Taylor expansions:


Let
k2
0
u (x) = u (x) + ku0t (x) + (u0 (x) + f (x, 0)), x . (7.32)
2
Since by (7.1) utt = u + f , it is clear that u  = 0 and that we have

u = u(k) + O(k 3 ), u = u(k) + O(k 3 ).

We let
U 1 = R h u . (7.33)

175
Then by Poincare s inequality

1 = U 1 Rh u(k) = Rh (u u(k))

CRh (u u(k)) C(u u(k)) Ck 3 ,

and
1 = Rh (u u(k)) (u u(k)) Ck 3 .

Therefore k1 1 + 1 Ck 2 and (7.31) holds. It is then straightforward to check


that the initial conditions (7.30) and (7.33) satisfy the hypotheses (7.17) and (7.18) for
the L2 -stability of the scheme (7.14). Indeed, we have by Poincare s inequality, that

U 1 U 0 = Rh (u u0 ) C(u u0 ) Ck,

and
(U 1 U 0 ) = Rh (u u0 ) C(u u0 ) C.

Exercise 3. Consider the Stormer-Numerov method (see Exercise 2.). Take U 0 = Rh u0


and U 1 = Rh u , where u = u0 + ku0t + k2
u (0)
2 tt
+ k3 3
u(0)
3! t
+ k4 4
u(0).
4! t
Prove that
with these choices one has

1 1
0 + (1 + 0 ) Ck 4 , (7.34)
k

so that by Exercises 1 and 2 and Proposition 7.3 (iii) one obtains for the Stormer-
Numerov scheme the error estimate

max U n un C(k 4 + hr ),
0nM

1
provided (7.17) holds for = 12
. (Note that for the wave equation,

t2 u(0) = u0 + f (0), t3 u(0) = u0t + ft (0),

t4 u(0) = t2 (u + f ) t=0 = 2 u0 + f (0) + ftt (0),

so that, in principle, u can be computed by the data of (7.1).) Finally show that
these choices in U 1 and U 0 also satisfy the estimates (7.17) and (7.18).

Remarks
a. The choice U 1 = Rh u has the disadvantage that it needs the computation of u0 .

176
Alternatively (see Dougalis and Serbin, Comput. Math. Appl. 7(1981), 261-279)),
one may compute initial conditions for the scheme (7.14) for = 1
12
as follows: (For
simplicity we consider only the homogeneous equation, f = 0).Take again U 0 = Rh u0
and compute U 1 Sh as solution of the problem

(U 1 , ) + k 2 (U 1 , ) = (U 0 , ) + ( 21 )k 2 (U 0 , ) + k(u0t , ), Sh .

It can be shown that for this choice of U 0 and U 1 , (7.28) and (7.17)-(7.18) hold.
b. In the case of the Stormer-Numerov method ( = 1
12
) the choice U 1 = Rh u (see
Exercise 3 above) requires computing u0 , u0t , 2 u0 , ft (0), ftt (0), f (0), which may
be dicult or impossible in practice. A more ecient way (see Dougalis and Serbin,
op. cit) of computing U 0 , U 1 , so that (7.34) and (7.17)-(7.18) hold, is the following.
(We take f = 0 for simplicity.) Dene again U 0 = Rh u0 and compute successively U 0,1 ,
U 0,2 , and U 1 in Sh by the equations
k2 k2
(U 0,1 , ) + (U 0,1 , ) = 6(U 0 , ) + (U 0 , ) + k(u0t , ), Sh ,
12 2
2
k
(U 0,2 , ) + (U 0,2 , ) = (U 0,1 , ), Sh ,
12
U 1 = U 0,2 5U 0 .

c. For higher-order accurate in time full discretizations of the second-order semidis-


crete problem (7.4) one may use e.g. the so-called cosine methods (cf. e.g. Baker et
al., Numer. Math. 35(1980), 127-142, RAIRO Anal. Numer. 13 (1979), 201-226),
extended to problems with time-dependent coecients by Bales et al., Math.Comp.
45(1985), 65-89, and to nonlinear problems by Bales and Dougalis 52(1989) 299-319,
and Makridakis, Comput. Math. Appl. 19(1990), 19-34, or linear multistep methods
(cf. e.g. Dougalis, Math. Comp. 33(1979), 563-584), etc.

Another class of fully discrete Galerkin methods for the wave equation is obtained
by writing (7.1) as a rst-order system in the temporal variable:


qt = u + f, x , t 0,






ut = q, x , t 0,
(7.35)



u = 0, q = 0, x , t 0,




u(x, 0) = u0 (x), q(x, 0) = u0 , x .
t

177
We may then consider the Galerkin semidiscretization of (7.35) and its temporal dis-
cretization by single-step or multistep schemes. For example, consider the trapezoidal
method in which we seek U n , Qn in Sh for 0 n M satisfying

U 0 = P u0 , Q0 = P u0t ,

and for n = 0, 1, . . . , M 1:


( Qn+1 Qn , ) + 1 ( U n+1 +U n , ) = 1 (f n+1 + f n , ),
k 2 2 2
Sh ,

1 (U n+1 U n ) = 1 (Qn+1 + Qn ).
k 2

(Here P is the L2 -projection onto Sh .) The scheme is easily solvable for Qn+1 by
substituting U n+1 from the second equation into the rst one. Taking = U n+1 U n
in the rst equation and the L2 -inner product of both sides of the second equation by
Qn+1 Qn we easily obtain the stability estimate

k k
Qn 2 + U n 2 = Q0 2 + U 0 2 ,
2 2

that may be viewed as a discrete analog of (7.2). Baker, SIAM J. Numer. Anal.,
13(1976), 564-576, has analyzed the convergence of the scheme and shown that
maxn U n un = O(k 2 + hr ). Higher-order temporal discretizations for the Galerkin
semidiscretization of (7.35) have been studied by Baker and Bramble, RAIRO Anal.
Numer. 13(1979), 75-100, and extended by Bales (Math. Comp. 43(1984), 383-414,
SIAM J. Numer. Anal., 23(1986), 27-43, Comput. Math. Appl. 12A(1986), 581-604)
to the case of equations with time-dependent coecients and nonlinear terms.

178
References

INTRODUCTORY

[1.1] G. Strang and G.J. Fix, An Analysis of the Finite Element Method, Prentice Hall
1973.

[1.2] M.H. Schultz, Spline Analysis, Prentice Hall 1973.

[1.3] C. Johnson, Numerical Solution of Partial Dierential Equations by the Finite


Element Method, Cambridge University Press 1989.

SOBOLEV SPACES AND PDEs

[2.1] R.A. Adams, Sobolev Spaces, Academic Press 1975.

[2.2] H. Brezis, Analyse fonctionnelle, theorie et applications, Masson, Paris, 1983.


(Greek translation, University Press N.T.U.A., 1997)

[2.3] H. Brezis, Functional Analysis, Sobolev Spaces and Partial Dierential Equations,
Springer 2011.

[2.4] L.C. Evans, Partial Dierential Equations, American Math. Society, Providence
1998.

[2.5] H. Triebel, Hohere Analysis, VEB, Berlin 1972.

BOOKS EMPHASIZING THE MATHEMATICAL THEORY

[3.1] A.K. Aziz (editor), The Mathematical Foundations of the Finite Element Method
with Applications to Partial Dierential Equations, Academic Press 1972.

179
[3.2] S.C. Brenner and L.R. Scott, The Mathematical Theory of Finite Element Meth-
ods, Springer-Verlag, New York 1994, 2nd ed 2002.

[3.3] P.G. Ciarlet, The Finite Element Method for Elliptic Problems, North Holland
1978. (Reprinted SIAM, 2002)

[3.4] P.G. Ciarlet and J.L. Lions, editors. Handbook of Numerical Analysis Vol. 2, Pt.
1, Elsevier 1991.

[3.5] P.A. Raviart et J.M. Thomas, Introduction `a lAnalyse Numerique des equations
aux derivees partielles, Masson, Paris 1983

[3.6] V. Thomee, Galerkin Finite Element Methods for Parabolic Problems, Springer-
Verlag 1997.

[3.7] S. Larsson and V. Thomee, Partial Dierential Equations with Numerical Meth-
ods, Springer-Verlag 2009.

FINITE ELEMENTS IN ENGINEERING

[4.1] T.J.R. Hughes, The Finite Element Method, Dover 2000.

[4.2] O.C. Zienkiewicz, The Finite Element Method, 3rd ed., McGraw Hill 1977.

[4.3] M. Bernadou et al., MODULEF, a Modular Library of Finite Elements, INRIA,


France 1988.

SPLINES

[5.1] C. de Boor, A Practical Guide to Splines, SpringerVerlag 1978.

NUMERICAL ANALYSIS AND ODEs (In Greek)

[6.1] G.D. Akrivis and V.A. Dougalis, Introduction to Numerical Analysis, Crete Uni-
versity Press, 4th revised edition 2010. (In Greek).

[6.2] G.D. Akrivis and V.A. Dougalis, Numerical Methods for Ordinary Dierential
Equations, Crete University Press 2006. (In Greek).

180
[6.3] V.A. Dougalis, Numerical Analysis: Lecture Notes for a graduate-level course,
1987. (In Greek).

181

Вам также может понравиться