0 оценок0% нашли этот документ полезным (0 голосов)

2K просмотров103 страницыthis is helpful notes in numerical Computions course

Jan 06, 2011

© Attribution Non-Commercial (BY-NC)

PDF, TXT или читайте онлайн в Scribd

this is helpful notes in numerical Computions course

Attribution Non-Commercial (BY-NC)

0 оценок0% нашли этот документ полезным (0 голосов)

2K просмотров103 страницыthis is helpful notes in numerical Computions course

Attribution Non-Commercial (BY-NC)

Вы находитесь на странице: 1из 103

CSC 2702

Required textbook:

Numerical Analysis: Burden & Faires: 8th edition Thomson Brooks/Cole

Dr Azeddine M

KICT, CS

IIUM

October 12, 2009

1

Contents

1 Mathematical Preliminaries 4

1.1 Review of Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2 Round-off Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.1 Bisection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2 Fixed-Point Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.3 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.4 Secant Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.5 False Position Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.1 Linearly and Quadratically Convergent Procedures . . . . . . . . . . . . . . . . . . . . . . 29

3.2 Zero multiplicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4 Accelerating Convergence 33

4.1 Aitken’s ∆2 method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2 Steffensen’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.3 Zeros Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.4 Horner’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.5 Deflation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.6 Müller’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.1 Weierstrass Approximation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.2 Lagrange Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.3 Neville’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.4 Newton Interpolating Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.5 Polynomial Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.6 Spline Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.7 Parametric Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

2

6 Numerical Differentiation and Integral 57

6.1 Numerical Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

6.2 Richardson’s Extrapolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.3 Elements of Numerical Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6.3.1 Trapezoidal rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6.3.2 Simpson’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.3.3 Degree of precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.3.4 Newton-Cotes Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

6.4 Composite Numerical Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

6.5 Adaptive Quadrature Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.6 Gaussian Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6.7 Improper Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

7.1 introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

7.2 Elementary Theory of Initial-Value Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 78

7.3 Euler’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

7.4 Higher-Order Taylor Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

7.5 Runge-Kutta Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

7.6 Predictor Corrector Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

8.1 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

8.2 Pivoting Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

8.3 Matrix Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

8.4 Determinant of Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

8.5 Matrix Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

9.1 Norms of vectors and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

9.2 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

9.3 Iterative Techniques for Solving Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . 93

10.1 Largest Possible Root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

10.2 Convergence of Bisection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

10.3 Convergence of False Position Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

10.4 Convergence of Newton-Raphson Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

10.5 Convergence of Secant Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

10.6 Convergence of Fixed Point Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

11 Exams 98

11.1 exam 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

11.2 exam 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

11.3 exam 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

3

Chapter 1

Mathematical Preliminaries

Definition of the limit:

A function f defined on a set X of real number has a limit L at x0 , written

x→x0

f is continuous at x0 if

lim f (x) = f (x0 ) (1.3)

x→x0

The sequence has a limit x (converges to x) if

Differentiable functions:

The function f is differentiable at x0 if

f (x) − f (x0 )

f ′ (x) = lim (1.5)

x→x0 x − x0

exists. This limit is called the derivative of f at x0 .

The set of all function that have n continuous derivatives on X is denoted by C n (X).

Rolle’s Theorem:

Suppose f ∈ C[a, b] and f is differentiable on (a, b). If f (a) = f (b), then a number c ∈ (a, b) exists

with f ′ (c) = 0.

4

Mean value theorem:

Suppose f ∈ C[a, b] and f is differentiable on (a, b). A number c ∈ (a, b) exists with

f (b) − f (a)

f ′ (c) = (1.6)

b−a

If f ∈ C[a, b], then c1 and c2 exist with f (c1 ) ≤ f (x) ≤ f (c2 ), for all x ∈ [a, b]. In addition, if f is

differentiable on (a, b), then c1 and c2 occur either at the endpoints of [a, b] or where f ′ is zero.

Riemann integral:

The Riemann integral of a function f on the interval [a, b] is the limit (provided it exists) of

Z b n

X

f (x)dx = lim f (zi )∆xi , (1.7)

a max∆xi →0

i=1

where the numbers xi satisfy a = x0 ≤ x1 ... ≤ xn = b, and where ∆xi = xi −xi−1 , and zi is arbitrarily

chosen in [xi−1 , xi ]. If the points are equally spaced and we choose zi = xi , in this case The Riemann

integral of a function f on the interval [a, b] is the limit (provided it exists) of

Z b n

b−aX

f (x)dx = lim f (xi ), (1.8)

a n→∞ n i=1

Weighted Mean Value Theorem for Integral: Suppose f ∈ C[a, b], the Riemann integral of g

exists on [a, b], and g(x) does not change sign on [a, b]. Them there exist a number c in (a, b) with

Z b Z b

f (x)g(x)dx = f (c) g(x)dx. (1.9)

a a

When g(x) = 1, it gives the average value of the function f over the interval [a, b].

Z b

1

f (c) = f (x)dx. (1.10)

b−a a

Generalized Rolle’s Theorem: Suppose that f ∈ C[a, b] is n times differentiable on (a, b). If f (x)

is zero at n + 1 distinct numbers x0 , ..., xn in [a, b], then a number c in (a, b) exists with f (n) (c) = 0.

Intermediate Value Theorem: If f ∈ C[a, b] and K is any number between f (a) and f (b), then

there exists c in (a, b) for which f (c) = K.

Taylor’s Theorem: If f ∈ C n [a, b] and f (n+1) exists on [a, b], and x0 ∈ [a, b]. For every x ∈ [a, b],

there exists a number ξ(x) between x0 and x with f (x) = Pn (x) + Rn (x), where

n

X f (k) (x0 )

Pn (x) = (x − x0 )k (1.11)

k=0

k!

(n+1)

f (ξ(x))

Rn (x) = (x − x0 )n+1 (1.12)

(n + 1)!

5

Pn (x) is called the nth Taylor polynomial for f about x0 , and Rn (x) is called the remainder

term (or truncation error). In the case when x0 = 0, the Taylor polynomial is often called a

Maclaurin polynomial.

if we take n → ∞ the Taylor polynomial is called Taylor series for f about x0 . For x0 = 0, the

Taylor series is called Maclaurin series.

Example:

We want to determine an approximate value of cos (0.01) using second Maclaurin polynomial

1 1

cos x = 1 − x2 + x3 sin(ξ) (1.13)

2 6

where ξ is number between 0 and x.

where we use the bar over 6 to indicate that this digit repeats indefinitely.

which gives

≤ 0.99995 + 0.16 × 10−6 < 0.99995017

(1.16)

The error bound is much larger than the actual error. This is due in part to the poor bound we used

for sin ξ. It can be shown that | sin x| ≤ |x|. Since 0 ≤ ξ ≤ 0.01, we find bound 0.16 × 10−8 .

1 1

Note that eq. (1.13) can be written as cos x = 1 − x2 + x4 cos(ξ ′ ) and the error will be no more

2 24

1 −8 −9

than × 10 = 0.416 × 10 .

24

This example illustrate two objectives of numerical analysis:

find an approximation to the solution and determine a bound for the error.

1.1.1 Exercises

The exercises are from the textbook sec 1.1 pages 14-16.

Tutor: Exercises 1,2,3,4,15,23

Students: All odd exercises except 17

Assignment 1: Exercises 15 and 26

2

The Taylor polynomial P4 (x) for the function f (x) = x ex is given by

P4 (x) = x + x3

6

The remainder is given by

f (5) (ξ(x)) 5

R4 (x) = x

5!

1 ξ2

= e (15 + 90ξ 2 + 60ξ 4 + 8ξ 6 )x5

30

≤ 1.211406197 x5

≤ 0.01240479946

we find the last inequality by substituting ξ = x = 0.4.

The integral can be approximated by

Z0.4 Z0.4

f (x)dx ≈ x + x3 dx = 0.086400

0 0

Z0.4

1.211406197 x5 = 0.0008269866307

0

f ′ (0.2) ≈ 1 + 3 x2 |0.2 = 1.12

2

f ′ (0.2) = ex (1 + 2x2 )|0.2 = 1.124075636

The actual error is

1.124075636 − 1.12 = 0.004075636

The nth derivative of cos(x) is

π

cos (x)(n) = cos x + n

2

According to Taylor’s theorem the error is

(x − x0 )(n+1) π

Rn (x) = sin ξ + n

(n + 1)! 2

where ξ is between x and x0 . For x = 42o and x0 = π/4 we find that the error is less than

(π/60)n+1

Rn (x) ≤ E =

(n + 1)!

n=1 , E = 0.001370778390

n=2 , E = 0.00002392459621

n=3 , E = 3.131722321 × 10−7

n=4 , E = 3.279531946 × 10−9

7

So n = 3 is sufficient to get the value accuracy to 10−6 . In this case P3 (x) is equal to

= 0.7431446016

the actual value of cos 42o = 0.7431448255..., so the error is of the order of 0.2239 × 10−6 .

let assume that m = Min [f (x1 ), f (x2 )] and M = Max [f (x1 ), f (x2 )]. So, for any value ξ between x1

and x2 we have

c1 m ≤ c1 f (x1 ) ≤ c1 M

c2 m ≤ c2 f (x2 ) ≤ c2 M

which lead to

and therefore

c1 f (x1 ) + c2 f (x2 )

m≤ ≤M

c1 + c2

without loss of generality, let assume that m = f (x1 ) and M = f (x2 ), then the last equation gives

c1 f (x1 ) + c2 f (x2 )

f (x1 ) ≤ ≤ f (x2 )

c1 + c2

According to the intermediate value theorem ∃ξ between x1 and x2 such that

c1 f (x1 ) + c2 f (x2 )

f (ξ) =

c1 + c2

An n-digit floating-point number in base β has the form

where (.d1 d2 ...dn )β is a β-fraction called the mantissa, end e is an integer called the exponent.

Such a floating-point number is said to be normalized in case d1 6= 0, or else d1 = d2 = ... = dn = 0.

The first bit is a sign indicator, denoted by s

This is followed by 11-bit exponent, c, called the characteristic,

and, a 52-bit binary fraction, f , called mantissa.

the base for the exponent is 2.

The exponent of 11 binary digits gives a range of 0 to 211 − 1 = 2047.

To use small number we have to shift the exponent by 1023, so the range is actually between

0 − 1023 = −1023 and 2047 − 1023 = 1024

8

To save storage and provide a unique representation for each floating-point number we use a nor-

malized form

(−1)s 2c−1023 (1 + f ) (1.18)

Example: consider the machine number

0 |10000000011

{z } 1011100100010000000000000000000000000000000000000000

| {z }

The left most bit is zero, the number is positive.

The next eleven bits, (10000000011)2 = 1 + 21 + 210 = 1027.

The exponent part is 21027−1023 = 24 . The final 52 bits specify the mantissa

1 1 1 1 1 1

f = (.101110010001)2 = + 3 + 4 + 5 + 8 + 12

2 2 2 2 2 2

This number represents

0 |10000000011

{z } |1011100100001111111111111111111111111111111111111111

{z }

and the next largest number is

0 |10000000011

{z } |1011100100010000000000000000000000000000000000000001

{z } This means that our orig-

27.566406249...

27.566406250...

27.56640625

11

00

00

11

00

11

next smallest machine number next largest machine number

Figure 1.1:

inal number represents not only 27.56640625, but also half of the real numbers that are between the

numbers 27.56640625 and its two nearest machine-number neighbors see (Fig. 1.1).

..............

Round-off errors:

Round-off errors arise because it is impossible to represent all real numbers exactly on a finite-state

machine (which is what all practical digital computers are).

On a pocket calculator, if one enters 0.0000000000001 (or the maximum number of zeros possible),

then a ’+’, and then 100000000000000 (again, the maximum number of zeros possible), one will obtain

the number 100000000000000 again, and not 100000000000000.0000000000001. The calculator’s

answer is incorrect because of round-off in the calculation.

9

Round-off errors in a computer1

The most basic source of errors in a computer is attributed to the error in representing a real number

with a limited number of bits.

The machine epsilon, ǫ, is the interval between 1 and the next number greater than 1 that is

distinguishable from 1. This means that no number between 1 and 1 + ǫ can be represented in

the computer.

Machine epsilon can be found by the following program:

10 E=1

20 IF E+1>1 THEN PRINT E ELSE STOP

30 E=E/2: GOTO 20

When numbers are added or subtracted, an accurate representation of the result may require much

larger number of digits than needed for numbers added or subtracted. Serious amounts of round-off

error occur in situations:

1. when adding (or subtracting) a very small number to (or from) a large number

2. when a number is subtracted from another that is very close

To test the first case on the computer, let us add 0.00001 to unity ten thousand times. The program

to do this job would be:

10 sum=1

20 for i=1 to 10000

30 sum=sum+0.00001

40 next

50 print*, sum

sum = 1.100136

Since the exact answer is 1.1, the relative error of this computation is

1.100136 − 1.1

= 0.000124 = 0.0124%

1.1

The cause of this round off error can be understood like this. Let consider the computation of

1+0.00001. The binary representation of 1 and 0.00001 in 32-bit (binary digit) are, respectively,

(1)10 = (0.1000 0000 0000 0000 0000 0000)2 × 21

(0.00001)10 = (0.1010 0111 1100 0101 1010 1100)2 × 2−16

We adjust their exponent we get

(0.10000 0000 0000 0000 0000 0000 0000 0000 0000 0000)2 × 21

+

(0.00000 0000 0000 0000 1010 0111 1100 0101 1010 1100)2 × 21

(0.10000 0000 0000 0000 1010 0111 1100 0101 1010 1100)2 × 21

1

Applied Numerical Methods with Software, Shoichiro Nakamura

10

Now we have to use only 24-bit for the mantissa, we get

(0.10000 0000 0000 0000 1010 0111 1100 0101 1010 1100)2 × 21

Thus, whenever 0.00001 is added to 1, the result gains 0.0000100136 as an error. When addition of

0.00001 is added to 1 ten thousand times, an error of exactly ten thousand times 0.0000100136 is

generated. Although the calculated result gains in the present example, it can lose if some digits are

cut off. Loss and gain are both referred to round-off error.

1. Double precision

2. Grouping

3. Taylor expansion

4. Changing definition of variable

5. Rewriting the equation to avoid subtractions

Example:

We want to add 0.00001 ten thousand times to unity by using:

(a)-double precision

(b)-grouping method

10 SUM=1.0D0

20 DO I=1,10000

30 SUM=SUM+0.00001D0

40 END DO

50 PRINT *, SUM

Grouping method:

SUM=1

DO 47 I=1,100

TOTAL=0

DO 40 K=1,100

TOTAL=TOTAL+0.00001

40 CONTINUE

11

SUM=SUM+TOTAL

47 CONTINUE

PRINT *, SUM

sin(1 + θ) − sin(1)

d=

θ

becomes very poor because of the round-off errors. By using Taylor expansion we can write

Therefore,

0.

The FORTRAN program is:

program testeps

implicit none

real :: d,da,t=1.0e0,h=10.0e0

integer :: i

do i=1,7

t=t/h

da=cos(1.0e0)-0.5e0*t*sin(1.0e0)

d=(sin(1.0e0+t)-sin(1.0e0))/t

print*,t,d,da

end do

end

angle d D

-------------------------------------

0.10000000E-00 0.49736413 0.49822873

9.99999978E-03 0.53608829 0.53609490

9.99999931E-04 0.53993475 0.53988153

9.99999902E-05 0.54062998 0.54026020

9.99999884E-06 0.54383242 0.54029804

9.99999884E-07 0.54327762 0.54030186

12

Law of Arithmetic Due to errors introduced in floating point arithmetic, the associative and

distributive laws of arithmetic are not always satisfied. that is

x + (y + z) =

6 (x + y) + z

x × (y × z) = 6 (x × y) × z

x × (y + z) 6= (x × y) + (x × z)

(x + y) − z = 0.248018 − 0.248000 = 0.000018 = 0.180000 × 10−4

(y + z) = 0.243451 − 0.248000 = −0.0045490

x + (y + z) = 0.00456732 − 0.00454900 = 0.0001832 = 0.183200 × 10−4

Let use normalized decimal floating-point form

where 1 ≤ d1 ≤ 9 and 0 ≤ di ≤ 9. This number is called k-digit decimal machine numbers. Any

positive real number within the numerical range of the machine can be normalize to the form

decimal digits. There are two ways of performing this termination. One called chopping. is simply

chop off the digits dk+1 dk+2 .... The other method, called rounding, add 5 × 10n−(k+1) to y and then

chops the result. So, when rounding, if dk+1 ≥ 5, we add 1 to dk to obtain f l(y). This is we round

up. When dk+1 < 5, we merely chop off all but the first k digits; so we round down. Rounding up

the digits and even the exponent might change.

Example: the number π = 0.314159... × 101 . The floating-point form of π using five-digit chopping

is f l(π) = 0.31415 × 101 = 3.1415. The floating-point form of π using five-digit rounding is 3.1416,

because of the sixth digit expansion of π which is 9 > 5.

|p − p∗ |

If p⋆ is an approximation to p, the absolute error is |p − p⋆ |, and the relative error is .

|p|

Significant digit:

The number p∗ is said to approximate p to t significant digits if t is the largest nonegative integer

for which

|p − p∗ |

≤ 5 × 10−t (1.21)

|p|

A more formal definition of significant digits is as following. Let the true value have digits p =

d1 d2 ...dk dk+1 ...dn .

Let the approximate value have p∗ = d1 d2 ...dk ek+1 ...en .

where d1 6= 0 and with the first difference in the digits occurring at the (k + 1)st digit. We then say

13

that p and p∗ agree to k significant digits if |dk+1 − ek+1 | < 5. Otherwise, we say they agree to k − 1

significant digits.

Example: Let the true value p = 10/3 and the approximate value p∗ = 3.333.

The absolute error is |10/3 − 3.333| = 1/3000.

The relative error is 1/10000=10−4 < 5 × 10−4

The number of significant digits is 4.

Assume that the floating-point representations f l(x) and f l(y) are given for the real number x and y

and the symbols ⊕, ⊖, ⊗, ⊘ represent addition, subtraction, multiplication, and division operations,

respectively. The finite-digit arithmetic is given by

x⊖y = f l(f l(x) − f l(y))

x⊗y = f l(f l(x) × f l(y))

x⊘y = f l(f l(x) ÷ f l(y))

y − f l(y)

≤ 10−k+1 (1.22)

y

y − f l(y)

≤ 0.5 × 10−k+1 (1.23)

y

One of the most common error involves the cancellation of significant digits due to the subtraction

of two nearly equal numbers. Suppose we have two nearly equal numbers x and y, with x > y,

we have

f l(y) = 0.d1 d2 ...dp βp+1 βp+2 ...βk × 10n

f l(f l(x) − f l(y)) = αp+1 αp+2 ...αk × 10n−p − βp+1 βp+2 ...βk × 10n−p

= 0.σp+1 σp+2 ...σk × 10n−p

The floating-point number used to represent x − y has at most k − p digits of significance. Any

further calculations involving x − y retain the problem of having only k − p digits of significance.

Loss of significance: Consider, for example, x∗ = 0.76545421 × 101 and y ∗ = 0.76544200 × 101 to

be an approximation to x and y, respectively, correct to seven significant digits. Then, in eight-digit

floating-point arithmetic, the difference z ∗ = x∗ − y ∗ = 0.12210000 × 10−3 . But as an approximation

to z = x − y is good only to three digits, since the fourth significant digit of z ∗ is derived from the

eight digits of x∗ and y ∗ , both possibly in error. Hence, while the error in z ∗ is at most the sum of

the error in x∗ and y ∗ , the relative error in z ∗ is possibly 10000 times the relative relative error in x∗

and y ∗ . Loss of significant digits is therefore dangerous only if we wish to keep the relative error

small

14

We can also have error when dividing by a small number of multiplying by large number. Suppose,

for example, that the number z has a finite-digit approximation z + δ, where the error δ is introduced

by representation or previous calculation. If we divide it by ǫ = 10−n , where n > 0, then

z f l(z)

≈ fl = (z + δ) × 10n

ǫ f l(ǫ)

so, the absolute error in this approximation, |δ| × 10n , is the original absolute error, |δ|, multiplied

by a factor 10n .

Example:

Let p = 0.54617 and q = 0.54601. The exact value of r = p − q = 0.16 × 10−5 . If we perform the

subtraction using 4-digit rounding we find p∗ = 0.5462 and q ∗ = 0.5460, and r∗ = p∗ −q ∗ = 0.2×10−4 .

The relative error is

r − r∗

r = 0.25

which has only one significant digit, whereas p∗ and q ∗ were accurate to four and five significant

digits, respectively.

Example:

The quadrature formula states that the roots of ax2 + bx + c = 0, when a 6= 0, are

√

−b ± b2 − 4ac

x± = (1.24)

2a

using four-digit rounding arithmetic, consider this formula applied to x2 + 62.1x + 1 − 0, whose

roots are approximately x+ = −0.01610723 and x− = −62.08390. we can√see that b ≫ 4ac, so

the numerator of x+ involves the subtraction of two nearly equal numbers. b2 − 4ac = 62.06, we

get f l(x+ ) = −0.02 which is a poor approximation to x+ = −0.01611, with a relative error about

2.4 × 10−1 . The other root f l(x− ) = −62.10 has a small relative error around 3.2 × 10−4 .

To obtain more accurate we can use the formula

√

−b + b2 − 4ac

x+ =

√2a √

−b + b2 − 4ac b + b2 − 4ac

= √

2a b + b2 − 4ac

−2c

= √

b + b2 − 4ac

so we can get f l(x+ ) = −0.01610 which has the small relative error 6.2 × 10−4 . we can also derive a

formula for x2

−2c

x− = √

b − b2 − 4ac

In this case f l(x− ) will be −50.00 which has the large relative error 1.9 × 10−1 .

15

Example:

This example shows how we can avoid loss of significance. We want to evaluate f (x) = 1 − cos(x)

near zero in six-digit arithmetic. Since cos(x) ≈ 1 for x near zero, there will be loss of significant

digits by first finding cos(x) and then subtracting it from 1. Without loss of generality, assume that

x is close to zero with x > 0 , we have

the difference is

if we use rounding and if a7 ≥ 5 we cannot calculate the value of cos x using six-digit arithmetic at

all x ≤ x0 , because the rounding value of 1 − cos(x) is zero. for example 1 − cos(0.001) = 0.000000

but it is equal to 0.500000 × 10−6 . To overcome this we can use another formula

1 + cos(x)

1 − cos(x) = (1 − cos(x))

1 + cos x

sin2 (x)

=

1 + cos x

If we use this last equation we find that for x = 0.001

sin2 (0.001)

1 − cos(0.001) =

1 + cos 0.001

0.1 × 10−5

=

2

= 0.5 × 10−6

x2 x4

1 − cos x ≈ − + ...

2 24

which gives

0.0012 0.0014

1 − cos 0.001 ≈ − + ...

2 24

0.1 × 10−11

≈ 0.5 × 10−6 − + ...

24

= 0.5 × 10−6 − 0.416667 × 10−13 + ...

≈ 0.5 × 10−6

Example:

The value of the polynomial p(x) = 2x3 − 3x2 + 5x − 4 at x = 3 can be calculated as:

16

⋆ x2 = 9, x3 = 27, then we put every thing together, p(3) = 54 − 17 + 15 − 4 = 38.

We have five multiplication: x2 , x3 , 2x3 , 3x2 , 5x, and

one addition and two subtractions. We need in total 8 operations

⋆ The polynomial can be arrange as p(x) = [(2x − 3)x + 5]x − 4, nested manner

We need three multiplications and one addition and two subtractions. In total we need six.

– In general, for a polynomial of degree n we need (n − 1) + n = 2n − 1 multiplications:

( n−1 for xn , xn−1 ,...,x2 . and n for the multiplication of coefficients, an ×xn ,an−1 ×xn−1 ,...a1 ×x.

. However, for the nested form we need only n multiplications.

– Both need n addition/subtraction operations.

1.2.1 Exercises

Assignment: odd exercises From section 1.2 pages 26-29

Tutorial: 1,

22

Exercise 1: = π = 3.1415926..., and p∗ = = 3.142857. The absolute error is

7

|p − p∗ | = 3.142857 − 3.141592 6...

0.0012644 < |p − p∗ | < 0.0012645

If we round it we find that the absolute error is 0.00126. The relative error is

0.0012644 p − p∗ 0.0012645

< <

3.1415927 p 3.1415926

−4

p − p∗

4.0247 × 10 < < 4.0250 × 10−4

p

If we round it we find that the relative error is 4.025 × 10−4

∗ |p − p∗ |

The relative error in p , as an approximation to p, is defined by α = . Note that this number

|p|

|p − p∗ |

is close to if α ≪ 1. One can show that

|p∗ |

|p − p∗ | |p − p∗ | α

= α =⇒ = ≈α (1.25)

|p| |p∗ | |1 ± α|

a) 133 + 0.921 = 133.921 is 134.

b) 133 − 0.499 = 132.501 is 133.

c) (121 − 0.327) − 119 = (120.673) − 119 = 121 − 119 is 2.

d) (121 − 119) − 0.327 = (2) − 0.327 = 1.673 is 1.67

13

−6 0.929 − 0.857 0.072

e) 14 7 = = = 1.80

2e − 5.4 5.44 − 5.4 0.04

The absolute error and relative error are:

a) 0.79 × 10−1 , 0.59 × 10−3

b) 0.499, 0.377 × 10−2

e) 0.154 0.0786

17

Assignment

Suppose two points (x0 , y0 ) and (x1 , y1 ) are on the straight line with y0 6= y1 . The x-intercept of the line

is given by

x0 y1 − x1 y0

x=

y1 − y0

or

(x1 − x0 )y0

x = x0 −

y1 − y0

Group1 Use the data (x0 , y0 ) = (1.31, 3.24) and (x1 , y1 ) = (1.93, 4.76) and three-digit rounding arithmetic

to compute x-intercept both ways. Which method is better and why?

Group2 Use the data (x0 , y0 ) = (0.2, 0.2) and (x1 , y1 ) = (1.2, 1.01) and three-digit rounding arithmetic to

compute x-intercept both ways. Which method is better and why?

Solution

-y1 x0 + x1 y0

X1 := --------------

-y1 + y0

y0 (x1 - x0)

X2 := x0 + ------------

-y1 + y0

Group 1 Group 2

------------------------------------- ---------------------------------------

x0 := 1.31 x0 := 0.2

y0 := 3.24 y0 := 0.2

x1 := 1.93 x1 := 1.2

y1 := 4.76 y1 := 1.01

Actual solution is -0.01157894737 Actual solution is -0.04691358025

X1 := -0.00658 X1 := -0.0469

X2 := -0.01 X2 := -0.047

Relative error for X2 is 0.1363636365 Relative error for X2 is 0.001842105197

X2 is better than X1 X1 is better than X2

18

Chapter 2

Finding the roots of a function f is very important in science and engineering and it is not always simple.

Let consider the function

It is clear that x = 1 is the only real root of f . The graph of f is given in Fig. 2.1. It shows that there are

many roots for f because of many positive and negative values of f (x).

10 −14

10.0

7.5

y 5.0

2.5

0.0

0.97 0.98 0.99 1.0 1.01 1.02 1.03

x

Figure 2.1: The strange behavior of f (x) near x = 1 is due to round off errors in the computation of

expanded for of f (x). of f (x) = 0.

19

2.1 Bisection Method

Definition: The first technique, based on intermediate value theorem, is called bisection method.

To begin, set a1 = a and b1 = b, and let p1 the midpoint of the interval [a, b]; that is

b 1 − a1 a1 + b 1

p 1 = a1 + = (2.1)

2 2

if f (p1 ) = 0, then the root of f (x) = 0 is p = p1 . If f (p1 ) 6= 0, then f (p1 ) has the same sign of as

either f (a1 ) or f (b1 ). When f (p1 ) and f (a1 ) have the same sign, p ∈ (p1 , b1 ), and we set a2 = p1 and

b2 = b1 . When f (p1 ) and f (b1 ) have the same sign, p ∈ (a1 , p1 ), and we set a2 = a1 and b2 = p1 . We

then reapply the process to the interval [a2 , b2 ].

Algorithm:

INPUT: endpoints a, b; Tolerance T OL; maximum number of iteration N0

OUTPUT: approximate solution p or message of failure.

Step 1: Set i=1; FA=f(a);

Step 3: set p = a + (b − a)/2; (compute pi )

FP=f(p);

Step 4: if F P = 0 or (b − a)/2 < T OL then OUTPUT (p);

(procedure terminated successfully)

STOP.

Step 5: set i = i + 1

Step 6: If F A.F P > 0 then set a = p; (compute ai , bi ) FA=FP

else set b = p

STOP

The stopping procedure in step 4 by selecting a tolerance ǫ > 0 with the following conditions:

|p − pN −1 | < ǫ (2.2)

N

pN − pN −1

<ǫ , pN 6= 0 (2.3)

pN

|f (pN )| < ǫ (2.4)

The first and the last one are not good measure of the tolerance (see Ex.16 and 17 page 52). The

middle one which is the relative error is better that the two others.

1

If you apply the bisection method to the function f (x) = on the interval [0, 2] you will find

x−1

that the method can catch a singularity at x=1. The stopping procedure in step 4 will never be

satisfied for any number of iterations N0 and the method fails.

To start the algorithm we need to check that the sign of the product f (a).f (b) ≤ 0. We can use the

sign function defined by

−1, if x < 0

sign(x) = 0, if x = 0 (2.5)

1, if x > 0

20

and we just test sign(f(a)).sign(f(b)) instead of f (a).f (b).

It is good practice to set an upper bound for the number of iterations. This eliminate the possibility

in entering an infinite loop.

It is good to choose the interval [a, b] to be small as possible so we can reduce the number of iterations.

The bisection is slow to converge, N may become quite larger for small tolerance.

Theorem: Suppose that f ∈ C[a, b] and f (a).f (b) < 0. The bisection method generates a sequence

pn approximating a zero p of f with

b−a

|pn − p| ≤ , n ≥ 1. (2.6)

2n

Proof: For each n ≥ 1, we have

b1 = b , a1 = a (2.7)

and (2.8)

b−a

b n − an = (2.9)

2n−1

we also have that

b n − an b−a

|p − pn | ≤ = n (2.10)

2 2

which shows that the sequence pn converges to p.

The bound for number of iterations assumes calculation performed using infinite-digit arithmetic.

When implementing the method on a computer, we have to consider round-off error. For example,

the computation of midpoint of [a, b] should be found from the equation

b n − an

p n = an + (2.11)

2

instead from the algebraic equivalent equation

an + b n

pn = (2.12)

2

The first equation add a small correction (bn − an )/2 to the known value an . if bn − an is near the

maximum precision of the machine this correction will not affect significantly pn . However,

(an + bn )/2 may return a midpoint that is not even in the interval [an , bn ].

Exercises:

Odd numbers of sec. 2.1 page 51-52.

– Ex 13: √

An approximate value to 3 25 correct to within 10−4 .

Let consider the function f (x) = x3 − 25. We can choose the interval [2, 3], We have f (2) = −17

and f (3) = 2. The two values have different signs, so we can apply the bisection method.

21

n an bn pn b n − an f (pn )

1 2 3 2.5 1 −9.3750

2 2.5 3 2.75 0.5 −4.2031

3 2.75 3 2.8750 0.25 −1.2363

4 2.875 3 2.93750 0.125 +0.34741

5 2.8750 2.93750 2.906250 0.0625 +0.0625

6 2.90625 2.93750 2.921875 0.031250 +0.03125

7 2.921875 2.937500 2.929688 0.015625 +0.145710

8 2.9218750 2.9296875 2.9257812 0.0078125 +0.0452607

9 2.9218750 2.9257812 2.9238281 0.0039062 −0.0048632

10 2.9238 2.9258 2.9248 1.9531E − 03 +2.0190E − 02

11 2.9238 2.9248 2.9243 9.7656E − 04 +7.6615E − 03

12 2.9238 2.9243 2.9241 4.8828E − 04 +1.3986E − 03

13 2.9238 2.9241 2.9240 2.4414E − 04 −17.324E − 03

14 2.9240 2.9241 2.9240 1.2207E − 04 −1.6692E − 04

√3

So, the approximate value of 25 is p14 = 2.9240, because (b14 − a14 )/2 = 6.1035E − 05 and

(b13 − a13 )/2 = 1.2207E − 04. If we use the Theorem:

b−a

|pn − p| ≤ < 10−4 (2.13)

2n

we find that

1 −4 4

< 10 ⇒ −n log 2 < −4 ⇒ n > = 13.288

2n log 2

So, n should be at least 14.

– Ex 18:

The function f (x) = sin (πx) has zeros at every integer. We want to show that when −1 < a < 0

and 2 < b < 3, the bisection method converges to:

0 for a + b < 2

2 for a + b > 2

1 for a + b = 2

a+b

we have to check the sign of sin (aπ), sin π ,and sin (bπ) for each iteration.

2

For the starting point we have:

for a ∈ (−1, 0), the function sin (aπ) ∈ [−1, 0)

and for b ∈ (2, 3), the function sin (bπ) ∈ (0, 1]

So, the bisection method can apply on the interval [a,b].

The only root that we can get are, 0, or 1, or 2; because these are the only integers belong to

(−1, 3).

a+b

Next, we have to check the sign of sin π . We know that a + b ∈ (1, 3),

2

a+b a+b

* if a + b < 2 we have p = 0.5 < < 1 and sin π > 0.

2 2

so the sign of sin (aπ) < 0 and sin (pπ) > 0 are different and the only root between a and

p is 0. Therefore the bisection method converge to 0.

22

a+b a+b

* if a + b > 2 we have p = 1 < < 1.5 and sin π < 0.

2 2

so the sign of sin (pπ) < 0 and sin (bπ) > 0 are different and the only root between p and b

is 2. Therefore the bisection method converge to 2.

a+b

* if a + b = 2 we have p = 1 and sin π = 0.

2

which of cause is the root 1.

Definition: A number p is a fixed point for a given function g if

g(p) = p (2.14)

Theorem: If g ∈ C[a, b] and g(x) ∈ [a, b] for all x ∈ [a, b], the g has a fixed point in [a, b].

In addition, g ′ (x) exists on (a, b) and a positive constant k < 1 exists with

proof: Let consider the function f (x) = g(x) − x. We have the following relations:

So, according to the Intermediate value theorem f (x) = 0 has a root. Thus g(x) = x has a solu-

tion.Therefore g has a fixed point.

Assume that there are more than one fixed points, let say, p and q with p 6= q. We know from the

mean value theorem that it exists ξ between q and q such that

g(p) − g(q)

= g ′ (ξ) = 1

p−q

which contradicts the fact that |g ′ (x)| < 1, so, there will be only one fixed point.

What about the case when |g ′ (x)| > 1, do we have such function?

Indeed if |g ′ (x)| =

6 1 for all x the only possible case is that |g ′ (x)| < 1.

Proof: Suppose that g ′ (x) 6= 1 for all x ∈ (a, b). Because g(x) ∈ [a, b] and g ′ (x) exists so, we have

g ′ (x) < −1 or −1 ≤ g ′ (x) < 1 or g ′ (x) > 1:

The first one and last on are not true because

g(b) − g(a)

−1 ≤ <1 (2.17)

b−a

So, we can say that for the case when |g ′ (x)| > 1 no function exists.

Algorithm:

INPUT: initial point p0 a, b; Tolerance T OL; maximum number of iteration N0

23

OUTPUT: approximate solution p or message of failure.

Step 1: Set i=1;

Step 2: while i ≤ N0 do steps 3-6.

Step 4: if |p − p0 | < T OL then

OUTPUT (p);

(procedure terminated successfully)

STOP.

Step 5: set i = i + 1

Step 6: Set p0 = p; (update p0 )

STOP.

Fixed-point theorem If g ∈ C[a, b] and g(x) ∈ [a, b] for all x ∈ [a, b], suppose in addition, g ′ (x)

exists on (a, b) and a positive constant 0 < k < 1 exists with

then for any p0 ∈ [a, b], the sequence pn = g(pn−1 ) converges to the unique fixed point p ∈ [a, b].

Proof:

If g satisfies the fixed point theorem, then bounds for the error involved in using pn to approximate

p are given by

|p − pn | ≤ k n max{p0 − a, b − p0 } (2.19)

kn

|p − pn | ≤ |p1 − p0 | (2.20)

1−k

proof:

Exercises:

– Ex.5: We use a fixed-point iteration method to determine a solution accurate to 10−2 for

x4 − 3x2 − 3 = 0 on [1, 2] and p0 = 1.

Solution:

To use the fixed-point iteration we have to find a function g(x) which satisfies fixed-point theo-

1/4

rem. the equation x4 −3x2 −3 = 0 leads to x4 = 3x2 +3, which in turn leads to x = 3x2 + 3 .

1/4

Now we check if the function g(x) = 3x2 + 3 satisfies the fixed-point theorem.

3 x

g ′ (x) =

2 (3x + 3)3/4

2

24

The derivative is always positive in the region [1, 2]. So, g(1) = 1.565084580 and g(2) =

1.967989671. Therefore g(x) ∈ [1, 2].

3 x 3 2

g ′ (x) = 3/4

≤

2 (3x2 + 3) 2 (3 × 12 + 3)3/4

≤ 0.7825422900 < 1

So, the function g(x) satisfies the fixed-point theorem. For more precisely, the second derivative

is given by

3 x2 − 2

g ′′ (x) = −

4 (x2 + 1)(3x2 + 3)3/4

√

In the region [1, 2] we have g ′ (1) = 0.3912711450, g ′ (2) = 0.3935979342 and g( 2) = 0.4082482906,

so g ′ (x) ≤ 0.4082482906 < 1. Therefore our k = 0.4082482906.

according to the theorem we have

|p − pn | ≤ k n max[p0 − a, b − p0 ]

which becomes

|p − pn | ≤ k n max[p0 − a, b − p0 ]

≤ 0.4082482906n ≤ 10−2

which implies that n ≥ 6. The answer will be p6 = 1.943316930 is accurate to within 10−2 .

– Ex 7:

We want to show that the function g(x) = π + 0.5 sin (x/2) has a unique fixed point on [0, 2π].

We know that

0 < π − 0.5 ≤ g(x) ≤ π + 0.5 < 2π (2.21)

The derivative of the function g is given by

cos (x/2) 1

g ′ (x) = ≤ (2.22)

4 4

Therefore the function has a unique fixed point. To find an approximation to the fixed-point

that is accurate to 10−2 let estimate the number of iteration. Let use p0 = π

n

1

|p − pn | ≤ π ≤ 10−2 (2.23)

4

kn

|p − pn | ≤ |p1 − p0 | (2.24)

1−k

25

we have p1 = π + 1/2, so the last equation becomes

n

(1/4)n 1 2 1

|p − pn | ≤ = ≤ 10−2 (2.25)

3/4 2 3 4

This leads to the number of iteration n ≥ 4. Therefore

p0 = 3.141592654

p1 = 3.641592654

p2 = 3.626048865

p3 = 3.626995623

p4 = 3.626938795

p5 = 3.626942209

Derivation of Newton’s (or the Newton-Raphson) method:

Suppose that f ∈ C 2 [a, b], have continuous second derivative. Let p0 ∈ [a, b] be an approximation to

the solution p of f (x) = 0 such that f ′ (p) 6= 0 and |p − p0 | is “small”. The first Taylor polynomial

of f (x) about p0 is

(p − p0 )2 ′′

f (p) = f (p0 ) + (p − p0 )f ′ (p0 ) + f (ξ(p)). (2.26)

2

where ξ(p) lies between p and p0 . Since f (p) = 0, and if we neglect the quadrature term we get

f (p0 )

p ≈ p1 = p0 − (2.27)

f ′ (p0 )

We can write the sequence

f (pn−1 )

pn = pn−1 − , n≥1 (2.28)

f ′ (pn−1 )

The last equation is what we call Newton’s Method.

Algorithm (please see Page 64)

Newton’s method is a functional iteration technique of the form pn = g(pn−1 ) where

f (pn−1 )

g(pn−1 ) = pn−1 − , n≥1 (2.29)

f ′ (pn−1 )

Newton’s method derivation depends on the assumption that p0 is close to the p. So, it is important

that the initial approximation p0 is chosen to be close to the actual value p. In some cases even with

poor initial approximation the Newton’s method converges.

The following theorem illustrates the theoretical importance of the initial approximation choice of

p0 .

Theorem: Let f ∈ C 2 [a, b]. If p ∈ [a, b] is such that f (p) = 0 and f ′ (p) 6= 0, then there exists a δ > 0

such that Newton’s method generates a sequence {pn } converging to p for any initial approximation

p0 ∈ [p − δ, p + δ].

Proof: (see page 66).

26

The theorem states, that under reasonable assumptions, Newton’s method converges provided a

sufficiently accurate initial approximation is chosen. In practice, the method doesn’t tell us how to

calculate δ. In general either the method converges quickly or it will be clear that convergence is

unlikely.

Newton’s method is a powerful technique, but it has a major weakness: the need of the first deriva-

tive.The calculation of the first derivative f ′ (x) needs more arithmetic operations than f (x).

Newton’s method is a powerful technique, but it has a major weakness: the need of the first derivative.The

calculation of the first derivative f ′ (x) needs more arithmetic operations than f (x).

f (x) − f (pn−1 )

f ′ (pn−1 ) = lim (2.30)

x→pn−1 x − pn−1

Letting x = pn−2 , we have

f (pn−2 ) − f (pn−1 )

f ′ (pn−1 ) ≈ (2.31)

pn−2 − pn−1

using this last equation in the Newton’s method we get

pn−1 − pn−2

pn = pn−1 − f (pn−1 ) (2.32)

f (pn−1 ) − f (pn−2 )

This is called the Secant Method. Starting with two initial approximation p0 and p1 , the approx-

imation p2 is the x-intercept of the line joining the two points (p0 , f (p0 )) and (p1 , f (p1 )).

The approximation p3 is the x-intercept of the line joining (p1 , f (p1 )) and (p2 , f (p2 )) (see the Fig.2.2).

p0 p p3 p1 p p p p1

2 0 2 3

p4

p

4

Figure 2.2: Secant Method and False Position method for finding the root of f (x) = 0.

27

False position method: generates approximations in the same way as the Secant method, but

includes a test to ensure that the root is always bracketed between successive iterations. This

method is not recommended and it is just to illustrate how bracketing can be incorporated.

First we choose two approximations p0 and p1 such that f (p0 ) · f (p1 ) < 0. The approximation p2 is

chosen the same manner as the secant method, as the x-intercept of the line joining (p0 , f (p0 )) and

(p1 , f (p1 )). To decide which secant line to be use to compute p3 , we check the sign of f (p1 ) · f (p2 ).

If it is negative then p1 and p2 bracket a root, and we choose b3 as the x-intercept of the line joining

(p1 , f (p1 )) and (p2 , f (p2 ). If not we choose p3 as the x-intercept of the line joining p0 , f (p0 )) and

(p2 , f (p2 )). In similar manner we can found pn , for n ≥ 4.

Exercises

Ex 5: We want to use Newton’s method to find an approximate solution accurate to 10−4 of for the

following equation.

Solution:

The Newton’s method is:

f (pn−1 )

pn = pn−1 − , n≥1 (2.33)

f ′ (pn−1 )

The approximation is accurate to the places for which pn−1 and pn agree.

The Newton’s methods gives:

1 2.500000000 2.714285714 0.214285714

2 2.714285714 2.690951516 0.023334198

3 2.690951516 2.690647499 0.000304017

4 2.690647499 2.690647448 5.110−8

5 2.690647448 2.690647448 0.00

which is p4 = 2.690647448

If we use p0 = 2 we get

1 2.000000000 3.250000000 1.250000000

2 3.250000000 2.811036789 0.438963211

3 2.811036789 2.697989503 0.113047286

4 2.697989503 2.690677153 0.007312350

5 2.690677153 2.690647448 0.000029705

which is p5 = 2.690647448

28

Chapter 3

We investigate the order of convergence of functional iteration.

Definition: Suppose {pn } is a sequence that converges to p, with pn 6= p for all n. If positive

constant λ and α exist with

|pn+1 − p|

lim =λ (3.1)

n→∞ |pn − p|α

An iterative techniques of the form pn = g(pn−1 ) is said to be of order α if the sequence {pn }

converges to the solution p = g(p) of order α.

In general, a sequence with high order of convergence converges more rapidly than a sequence with

a lower order. The asymptotic constant affects the speed of convergence but is not as important as

the order.

If α = 2, the sequence is quadratically convergent.

Assume that we have two sequences one converges linearly to zero and the other converges quadrat-

ically to zero. For simplicity , suppose that

|pn+1 |

≈ 0.5 Linearly convergent

|pn |

|p̃n+1 |

≈ 0.5 Quadratically convergent

|p̃n |2

For linearly convergent, we have

n −1 n

|p̃n − 0| = |p̃n | ≈ 0.5|p̃n−1 |2 ≈ 0.53 |p̃n−2 |4 ≈ ... ≈ (0.5)2 |p̃0 |2

n −1

|pn | ≈ 0.5n , |p̃n | ≈ (0.5)2

29

after seven iterations we get

|pn | ≈ 0.78125 × 10−2 |p̃n | ≈ 0.58775 × 10−38

In order for the linearly convergent to have the same accuracy as quadratically convergence we need:

7 −1

(0.5)n = (0.5)2 ⇒ n = 27 − 1 = 127

Theorem: Let g ∈ C[a, b] be such that g(x) ∈ [a, b], for all x ∈ [a, b]. Suppose, in addition, that g ′

is continuous on (a, b) and that a positive constant k < 1 exists with |g ′ (x)| ≤ k, for all x ∈ (a, b). If

g ′ (x) 6= 0, then for any p0 ∈ [a, b], the sequence

pn = g(pn−1 ), n ≥ 1, (3.2)

converges only linearly to the unique fixed point p in [a, b].

proof:

Theorem: Let p a solution of the equation x = g(x). Suppose that g ′ (p) = 0 and g ′′ is continuous

with |g ′′ (x)| < M on an open interval I containing p. Then there exists a number δ > 0 such that, for

p0 ∈ [p−δ, p+δ], the sequence defined by pn = g(pn−1 ), where n ≥ 1, converges at least quadratically

to p. Moreover, for sufficient large values of n,

M

|pn+1 − p| < |pn − p|2 . (3.3)

2

Proof:

The easiest way to construct a fixed-point problem associated with a root-finding problem f (x) = 0

is to subtract a multiple of f (x) from x.

pn = g(pn−1 ), with g(x) = x − φ(x)f (x), where φ(x) is a differentiable function that will be chosen

later.

If p satisfies f (p) = 0 then it is clear that g(p) = p.

If the iteration procedure derived from g to be quadratically convergent, we need g ′ (p) = 0 when

f (p) = 0. Since

g ′ (x) = 1 − φ′ (x)f (x) − φ(x)f ′ (x)

we have

g ′ (p) = 1 − φ′ (p)f (p) − φ(p)f ′ (p)

= 1 − φ(p)f ′ (p)

which implies

1

φ(p) =

f ′ (p)

If we let φ(x) = 1/f ′ (x), we will ensure that φ(p) = 1/f ′ (p) and produce quadratically convergent

procedure

f (pn−1 )

pn = g(pn−1 ) = pn−1 − (3.4)

f ′ (pn−1 )

This is of cause Newton’s method which is quadratically convergent provided that f ′ (pn−1 ) 6= 0.

30

3.2 Zero multiplicity

Definition: A solution p of f (x) = 0 is zero multiplicity m of f if for x 6= p, we can write

f (x) = (x − p)m q(x), where lim q(x) 6= 0.

x→p

Theorem: f ∈ C 1 [a, b] has a simple zero at p in (a, b) iff f (p) = 0 but f ′ (p) 6= 0.

Proof:

If p is a simple root of f then Newton’s method converges quadratically. If p is not a simple root

then Newton’s method may not converge quadratically (see Example 2 page 79).

Theorem: The function f ∈ C m [a, b] has a zero of multiplicity m at p in (a, b) iff

f m (p) 6= 0

f (x)

µ(x) = (3.5)

f ′ (x)

If p is zero of f of multiplicity m and f (x) = (x − p)m q(x), then

q(x)

µ(x) = (x − p)

mq(x) + (x − p)q ′ (x)

which has a simple zero. Newton’s method can be applied to µ to give

µ(x)

g(x) = x −

µ′ (x)

f (x) f ′ (x)

= x− (3.6)

[f ′ (x)]2 − f (x) f ′′ (x)

If g has the required continuity conditions, functional iteration applied to g will be quadratically

convergent regardless of the multiplicity of the zero of f . Theoretically, the only drawback to this

method is the additional calculation of the second derivative f ′′ . In practice, multiple roots can cause

serious round-off problems since the denominator consists of the difference of two numbers that are

both close to zero.

3.3 Exercises

Ex 6: Show that the following sequence converges linearly to p = 0.

How large must n before we have |p − pn | ≤ 5 × 10−2 ?

a)pn = 1/n. b) qn = 1/n2 .

It is clear that

|pn+1 − p|

lim =1

n→∞ |pn − p|

31

for (a) we have 1/n ≤ 5 × 10−2 implies that n ≥ 20. for (b) we have 1/n2 ≤ 5 × 10−2 implies that

n2 ≥ 20, which in turn gives n ≥ 5.

n

Ex 8a: We want to show that the sequence pn = 10−2 converges quadratically to 0.

n

lim 10−2 = 0

n→∞

n+1 n

10−2 (10−2 )2

lim n = lim n =1

n→∞ (10−2 )2 n→∞ (10−2 )2

k

Ex 8b: We want to show that the sequence pn = 10−n doesn’t converge to zero quadratically, regardless of

the size of the exponent k > 1.

k

lim 10−n = 0

n→∞

k

10−(n+1)

lim k

n→∞ (10−n )2

" k #

n+1

lim 2nk − (n + 1)k = lim nk 2 − =∞

n→∞ n→∞ n

So, we cannot find a positive number λ. Therefore, the sequence doesn’t converge quadratically.

32

Chapter 4

Accelerating Convergence

We consider a technique that can be used to accelerate the convergence of a sequence that is

linearly convergent.

Suppose that {pn } is a linearly convergent sequence with limit p. The Aitken’s ∆2 method on the

assumption that {p̂n } converges defined by

(pn+1 − pn )2

p̂n = pn −

pn+2 − 2pn+1 + pn

(∆pn )2

= pn − for n ≥ 0 (4.1)

∆2 p n

converges more rapidly to p than does the original sequence {pn }. The symbol ∆pn is the forward

difference which is defined by

(4.3)

(4.5)

For example

∆2 p n = ∆(∆pn )

= ∆(pn+1 − pn )

= (pn+2 − pn+1 ) − (pn+1 − pn )

= pn+2 − 2pn+1 + −pn (4.6)

Theorem: Suppose that {pn } is a sequence that converges linearly to the limit p and that

pn+1 − p

lim <1 (4.7)

n→∞ pn − p

33

Then the sequence

(∆pn )2

p̂n = pn − (4.8)

∆2 p n

p̂n − p

lim =0 (4.9)

n→∞ pn − p

Example:

Let consider pn = cos(1/n). This sequence converges linearly to p = 1,

1

1

cos n+1 −1 n2 sin n+1

lim = lim

n→∞ cos 1 − 1 n→∞ (n + 1)2 sin 1

n n

1

sin n+1

= lim

n→∞ sin 1

n

1

n2 cos n+1

= lim

n→∞ (n + 1)2 cos 1

n

= 1

(∆pn )2

p̂n = pn −

∆2 p n

(pn+1 − pn )2

= pn − (4.10)

pn+2 − 2pn+1 + pn

n pn p̂n

1 0.5403023059 0.9617750599

2 0.8775825619 0.9821293535

3 0.9449569463 0.9897855148

4 0.9689124217 0.9934156481

5 0.9800665778 0.9954099422

6 0.9861432316 0.9966199575

7 0.9898132604 0.9974083190

Example:

The function f (x) = x3 − 3x + 2 = (x − 1)2 (x + 2) has a double root p = 1. If Newton’s method

converges to p = 1 it converges linearly. We choose p0 = 2. The Newton’s method produces the

following sequence:

p0 = 2.

p3n − 3pn + 2

pn+1 = pn −

3pn − 3

34

pn − 1

n pn

pn−1 − 1

1 1.555555555555556 0.5555555560

2 1.297906602254429 0.5362318832

3 1.155390199213768 0.5216071009

4 1.079562210414361 0.5120156259

5 1.040288435171017 0.5063765197

6 1.020276809786733 0.5032910809

7 1.010172323431420 0.5016727483

8 1.005094741093272 0.5008434160

9 1.002549528082823 0.5004234759

10 1.001275305026243 0.5002121961

11 1.000637787960288 0.5001062491

12 1.000318927867152 0.5000533092

13 1.000159472408516 0.5000250840

14 1.000079738323218 0.5000125414

15 1.000039869690520 0.5000125411

It is clear that Newton’s method is linearly convergent or it converges slowly to p = 1. Let us apply

Aitken’s acceleration process to the sequence pn of iterations generated by Newton’s method

(pn+1 − pn )2

p̂n = pn −

pn+2 − 2pn+1 + pn

(4.11)

which gives:

n pn p̂n pn − 1 p̂n − 1

0 2.0 0.9425287356 1.0 −0.0574712644

1 1.555555556 0.9789767949 0.555555556 −0.0210232051

2 1.297906602 0.9933420783 0.297906602 −0.0066579217

3 1.155390199 0.9980927682 0.155390199 −0.0019072318

4 1.079562210 0.9994865474 0.079562210 −0.0005134526

5 1.040288435 0.9998665586 0.040288435 −0.0001334414

6 1.020276810 0.9999659695 0.020276810 −0.0000340305

7 1.010172323 0.9999914062 0.010172323 −0.0000085938

8 1.005094741 0.9999978406 0.005094741 −0.0000021594

9 1.002549528 0.9999994588 0.002549528 −5.41210−7

10 1.001275305 0.9999998645 0.001275305 −1.35510−7

11 1.000637788 0.9999999661 0.000637788 −3.3910−8

12 1.000318928 0.9999999915 0.000318928 −8.510−9

13 1.000159472 0.9999999979 0.000159472 −2.110−9

14 1.000079738 ∗∗ 0.000079738 ∗∗

15 1.000039870 ∗∗ 0.000039870 ∗∗

By applying a modification of Aitken’s ∆2 method to a linearly convergent sequence obtained from a

fixed-point iteration, we can accelerate the convergence to quadratic. This procedure is known as Stef-

35

fensen’s method.

p0 , p1 = g(p0 ), p2 = g(p1 ), p̂0 = {∆2 }(p0 ), p3 = g(p2 ), p̂1 = {∆2 }(p1 ), ... (4.12)

where {∆2 } indicates that Aitken’s method given by eq.(4.10) is used. Steffensen’s Method constructs

the same first four terms, p0 , p1 , p2 , and p̂0 . However, at this step it assumes that p̂0 is better

approximation to p than p2 and applies fixed-point iteration to p̂0 instead of p2 . This leads to the

following sequence:

(0) (0) (0) (0) (0) (1) (0) (1) (1)

p0 , p1 = g(p0 ), p2 = g(p1 ), p0 = {∆2 }(p0 ), p1 = g(p0 ), ... (4.13)

Note that the denominator can be zero in the next iteration. If this occurs, we terminate the sequence

and select the last one before we get zero1 .

Ex 3 page 86 :

(0) (1)

Let g(x) = cos(x − 1) and p0 = 2. We want to use Steffensen’s method to get p0 .

(0)

p0 = 2

(0)

p1 = cos(2 − 1) = 0.5403023059

(0)

p2 = cos(0.5403023059 − 1) = 0.8961866647

(0) (0)

(1) (0) (p1 − p0 )2

p0 = p0 − (0) (0) (0)

= 0.826427396

p2 − 2p1 + p0

Ex 4 page 86 :

(0) (1) (2)

Let g(x) = 1 + (sin x)2 and p0 = 2. We want to use Steffensen’s method to get p0 and p0

(0)

p0 = 2

(0)

p1 = 1 + (sin 2)2 = 1.708073418

(0)

p2 = 1 + (sin 1.708073418)2 = 1.981273081

(0) (0)

(1) (0) (p1 − p0 )2

p0 = p0 − (0) (0) (0)

= 2.152904629

p2 − 2p1 + p0

(2)

To calculate p0 we start with:

(1)

p0 = = 2.152904629

(1)

p1 = 1 + (sin 2.152904629)2 = 1.697735097

(1)

p2 = 1 + (sin 1.697735097)2 = 1.983972911

(1) (1)

(2) (1) (p1 − p0 )2

p0 = p0 − (1) (1) (1)

= 1.873464043

p2 − 2p1 + p0

1

See page 85 from the textbook

36

4.3 Zeros Polynomial

A polynomial of degree n has the form

Fundamental Theorem of Algebra: If P (x) is polynomial of degree n ≥ 1, then P (x) = 0 has at least

one root (possibly complex).

If P (x) is a polynomial of degree n ≥ 1, then there exist unique constants x1 , x2 , ..., xk , possibly

complex, and unique positive integers m1 , m2 , ..., mk , such that

k

X

mi = n

i=1

P (x) = an (x − x1 )m1 (x − x1 )m2 ...(x − xk )mk

Let P (x) and Q(x) be polynomials of degree at most n, If x1 , x2 , ..., xk , with k > n, are distinct

numbers with P (xi ) = Q(xi ) for i = 1, 2, ..., k, then p(x) = Q(x) for all values of x.

To use Newton’s method to locate approximate zeros of a polynomial P(x), we need to evaluate P (x) and

P ′ (x) at specified values. To compute them efficiency we compute them in the nested manner. Horner’s

method incorporates this nesting technique and require only n multiplications and n additions to

evaluate an arbitrary nth-degree polynomial.

A polynomial of degree n can be written as

Divide this polynomial Pn (x) by (x − x1 ), giving a reduced polynomial Qn−1 (x) of degree n − 1, and

a remainder R

Pn (x) = (x − x1 )Qn−1 (x) + R (4.16)

We can see that Pn (x1 ) = R. If we differentiate Pn (x) we get

thus,

Pn′ (x1 ) = Qn−1 (x1 ) (4.18)

37

We evaluate Qn−1 (x1 ) by a second division whose remainder equals Qn−1 (x1 ), and so on. Now we

can write

Pn (x) = an xn + an−1 xn−1 + ... + a1 x + a0

= (x − x1 )Qn−1 (x) + R

= (x − x1 )(bn−1 xn−1 + bn−2 xn−2 + ... + b1 x + b0 ) + R

= bn−1 xn + bn−2 xn−1 + ... + b1 x2 + b0 x

− bn−1 x1 xn−1 + bn−2 x1 xn−2 + ... + b1 x1 x + b0 x1 + R

We collect the terms

Pn (x) = bn−1 xn + [bn−2 − x1 bn−1 ] xn−1 + [bn−3 − x1 bn−2 ] xn−2 + ...

+ [b0 − x1 b1 ] x + [R − x1 b0 ]

By comparison we get:

bn−1 = an

bn−2 = an−1 + x1 bn−1

.. .

. = ..

bi = ai+1 + x1 bi+1

.. .

. = ..

b 0 = a1 + x 1 b 1

So, the reminder can be evaluated from

R = a0 + x 1 b 0

P (x) = an xn + an−1 xn−1 + ... + a1 x + a0 (4.19)

If bn = an and

bk = ak + bk+1 x0 , for k = n − 1, n − 2, ..., 1, 0 (4.20)

then b0 = P (x0 ), k is from n − 1 to 0, which means you need only n multiplications and n additions

to get p(x0 ). Moreover, if

Q(x) = bn xn−1 + bn−1 xn−2 + ... + b2 x + b1 (4.21)

Then

P (x) = (x − x0 )Q(x) + b0 (4.22)

Proof:

P ′ (x) = Q(x) + (x − x0 )Q′ (x) (4.23)

Thus

P ′ (x0 ) = Q(x0 ) (4.24)

38

Example:

We want to evaluate P (x) = 2x4 − 3x2 + 3x − 4 at x0 = −2 using Horner’s method.

we start by:

a4 = 2 a3 = 0 a2 = −3 a1 = 3 a0 = −4

b 4 = a4 b3 = a3 + b4 (−2) b2 = a2 + b3 (−2) b1 = a1 + b2 (−2) b0 = a0 + b1 (−2)

b4 = 2 b3 = −4 b2 = 5 b1 = −7 b0 = 10

Therefore, P (−2) = 10 and P (x) = (x + 2)(2x3 − 4x2 + 5x − 7) + 10.

Example:

Find an approximation to one of the zeros of P (x) = 2x4 − 3x2 + 3x − 4 using Newton’s Method and

synthetic division to evaluate P (xn ) and P ′ (xn ) for each iterate xn .

at x0 = −2 we use bn = an and bk = ak + bk+1 x0 for k = n − 1 to k = 0.

21 02 −33 34 −45

(26 )(−2) = −47 (−4)8 (−2) = 89 (510 )(−2) = −1011 (−712 )(−2) = 1413

26 02 − 47 = −48 −33 + 89 = 510 34 − 1011 = −712 −45 + 1413 = 1014

Using the theorem P ′ (x0 ) = Q(x0 ) we get

21 −42 53 −74

(25 )(−2) = −46 (−87 )(−2) = 168 (219 )(−2) = −4210

25 −42 − 45 = −87 53 + 168 = 219 −74 − 4210 = −4911

and

P (x0 ) 10

x1 = x0 − = −2 − ≈ −1.796

Q(x0 ) −49

repeating the procedure we get for x1 = −1.796, P (x1 ) = 1.742 and P ′ (x1 ) = −32.565, so, x2 ≈

−1.73896. in a similar manner we get x3 = −1.73897. An actual zero to five decimal places is

−1.73896.

4.5 Deflation

If the N th iterate, xN , in Newton’s method is an approximate zero for the polynomial P (x), then

P (x) = (x − xN )Q(x) + b0 = (x − xN )Q(x) + P (xN ) ≈ (x − xN )Q(x) (4.25)

so, x − xN is an approximate factor of P (x). Letting x̂1 = xN be the approximate zero of P and Q1 (x) =

Q(x) be the approximate factor given by

P (x) ≈ (x − x̂1 )Q1 (x) (4.26)

We can find a second approximate zero of P by applying Newton’s method to Q1 (x). If P (x) is of order n

we can apply repeatedly the procedure to find x̂2 and Q2 (x),..., x̂n−2 and Qn−2 (x). After finding (n − 2)

roots we get a quadrature form which we can solve it to get the last two approximate roots. This procedure

is called deflation.

39

The accuracy difficulty with deflation is due to the fact that, when obtaining the approximate zero

of P (x), Newton’s method is used on the reduced polynomial Qk (x), that is,

An approximate zero x̂k+1 of Qk (x) will generally not approximate a root of P (x) but of Qk (x). To

eliminate this we can use the reduced equations to find approximates x̂i , and then apply Newton’s

Method to the original polynomial P (x).

One problem with applying the Secant, False position, or Newton’ methods to polynomials is pos-

sibility of having complex roots. If the initial approximation is real all subsequent approximations

will also be real. To overcome this problem we start with complex initial approximation.

The secant method use two initial approximations p0 and p1 , to get p2 which is the x-intercept of the line

joining the two points (p0 , f (p0 )), (p1 , f (p1 )).

Müller’s method uses three initial approximations, p0 , p1 ,p2 , and determinate the next approximation p3 by

considering the intersection of the x-axis with the parabola through (p0 , f (p0 )),(p1 , f (p1 )), and (p2 , f (p2 )).

−2c

p3 − p2 = √ (4.29)

b ± b2 − 4ac

This has no problem with subtracting nearly equal numbers (see example 5 section 1.2).

This formula has two roots. In Muller’s method, the sign is chosen to agree with the sign of b, so

2c

p3 = p2 − √ (4.30)

b + sign(b) b2 − 4ac

The method involve square root which means that complex numbers can be found using Muller’s

method.

4.7 Exercises

We want to find the approximation to 10−4 of all real zeros of the following polynomial using New-

ton’s method.

P (x) = x3 − 2x2 − 5

40

sol:

Descartes’s rule of signs. The rule states that the number np of positive zeros of a polynomial P (x)

is less than or equal to the number of variations v in sign of the coefficients of P (x). Moreover, the

difference v − np is nonegative even integer.

For our example, the number of variations v in sign of the coefficients of P (x) is v = 1.

There are at most 1 positive root. Moreover, 1 − np ≥ 0, which implies that np = 1. Therefore there

is one positive root.

Now we change x → −x, we find

Thus, our conclusion is: there is only on real root which is a positive real number.

We then apply Newton’s Method starting with p0 = 2.

f (x) = x3 − 2x2 − 5

f ′ (x) = 3x2 − 4x

p0 = 2

f (pn )

pn+1 = pn − ′ , n≥0

f (pn )

n pn |pn − pn−1 |

1 3.250000000 1.250000000

2 2.811036789 0.438963211

3 2.697989503 0.113047286

4 2.690677153 0.007312350

5 2.690647448 0.000029705

6 2.690647448 0.000000001

by first finding the real zeros using Newton’s method and then reducing to polynomails of lower

degree to determine any complex zeros.

According to Descart rule we have:

1. For positive zeroes, we have: number of variations of sign is 1. Thus, there is only on positive

zero

2. For negative zeroes, we have: number of variartions of sign is 3. Thus, there are one or three

negative zeroes.

41

[1 -1.600000000, 1.600000000]

[2 -2.681394805, 1.081394805]

[3 -5.595348023, 2.913953218]

[4 -4.842605061, 0.752742962]

[5 -4.377210956, 0.465394105]

[6 -4.167343093, 0.209867863]

[7 -4.124721017, 0.042622076]

[8 -4.123107873, 0.001613144]

[9 -4.123105624, 0.000002249]

We now use Horner’s method to get the reduced polynomial. we get

b4 = 1.

b3 = 9.123105625

b2 = 28.61552812

b1 = 32.9848450

b0:= 0.

We use Newton Method to get a solution for Q1 (x) = 0, we find up to 10−5 the root 4.123106. We

use Horners method to get the reduced polynomial

b3 = 1.

b2 = 5.

b1 = 8.

b0 = 0.

Q2 (x) = x2 + 5x + 8

42

Chapter 5

Approximation

Conseder Data with two columns, x and y. We plot this data and see if we can fit this data to a function

y = f (x). This is what we call “interpolation”. We will study the case when we fit the data to a

polynomial.

Theorem: Suppose that f is defined and continuous on [a, b].

For each ǫ > 0, there exists a polynomial P (x), with the property that

The proof of this theorem can be found in most elementary textbook on real analysis.

Taylor polynomials are used mainly to approximate a function at a specified point. A good polyno-

mial needs to provide a relatively accurate approximation over the entire interval.

Taylor polynomial is not always an appropriate for interpolation. As example To approximate

f (x) = 1/x at x = 3 using Taylor polynomial expanded at x = 1, leads to inaccurate result.

n 0 1 2 3 4 5 6 7

Pn (3) 1 −1 3 −5 11 −21 43 −85

In this section we find approximating polynomials that are determined simply by specifying certain points

on the plane through which they must pass.

Let determine a polynomial of degree one that passes through the distinct points (x0 , f (x0 )) and

(x1 , f (x1 )). We define the functions:

x − x1

L0 (x) = (5.1)

x0 − x1

x − x0

L1 (x) = (5.2)

x1 − x0

43

and define

It is clear that the polynomial P (x) coincides with f (x) at x0 and x1 and it is the unique linear

function passing through (x0 , f (x0 )) and (x1 , f (x1 )).

degree at most n that passes through n + 1 points, (x0 , f (x0 )), (x1 , f (x1 )),...,(xn , f (xn )).

Theorem If x0 , x1 ,...,xn are n + 1 distinct numbers and f is a function whose values are given at

these numbers, then a unique polynomial P (x) of degree at most n exists with

n

X

P (x) = f (xi )Ln,i (x) (5.5)

i=0

with

n

Y x − xi

Ln,k (x) = (5.6)

i=0

xk − xi

i 6= k

Proof:

Theorem: Suppose x0 , x1 ,...,xn are n + 1 distinct numbers in [a, b] and f ∈ C n+1 [a, b], then

f n+1 (ξ(x))

f (x) = P (x) + (x − x0 )(x − x1 )...(x − xn ) (5.7)

(n + 1)!

where P (x) is the interpolating polynomial given by eq.(5.5) and ξ(x) ∈ (a, b).

Proof:

Definition: Let f be a function at x0 , x1 ,..., xn , and suppose that m1 , m2 ,...,mk are k distinct

integers, with 0 ≤ mi ≤ n for each i. The Lagrange polynomial that agrees with f (x) at all k points

xm1 , xm2 ,..., xmk , is denoted by Pm1 m2 ...mk (x).

Theorem Let f be defined at x0 , x1 ,...,xk , and xj 6= xi be two numbers from the set. Then,

P (x) = (5.8)

(xi − xj )

Proof:

44

5.3 Neville’s Method

This theorem implies that the interpolating polynomials can be generated recursively.

To avoid multiple subscripts, let Qi,j , for 0 ≤ j ≤ i, denote the interpolating polynomial of degree j

on the (j + 1) numbers xi−j , xi−j+1 ,...,xi−1 , xi ; that is

x0 P0 = Q0,0

x1 P1 = Q1,0 P0,1 = Q1,1

x2 P2 = Q2,0 P1,2 = Q2,1 P0,1,2 = Q2,2 (5.10)

x3 P3 = Q3,0 P2,3 = Q3,1 P1,2,3 = Q3,2 P0,1,2,3 = Q3,3

x4 P4 = Q4,0 P3,4 = Q4,1 P2,3,4 = Q4,2 P1,2,3,4 = Q4,3 P0,1,2,3,4 = Q4,4

Qi,j = (5.11)

(xi − xi−j )

Q0,0 = f (x0 ), Q1,0 = f (x1 ), , ..., Qn,0 = f (xn ), (5.12)

Example:

Suppose function f is given for the following values:

x f (x)

x0 = 1.0 0.7651977

x1 = 1.3 0.6200860

x2 = 1.6 0.4554022

x3 = 1.9 0.2818186

x4 = 2.2 0.1103623

we want to approximate f (1.5) using various interpolating polynomials at x = 1.5. By using Neville’s

method, eq. (5.12), we can calculate Qi,j

Q0,0 = P0 = 0.7651977,

Q1,0 = P1 = 0.6200860,

(x − x0 )Q1,0 − (x − x1 )Q0,0

Q1,1 = P0,1 = = 0.5233449

(x1 − x0 )

Q2,0 = P2 = 0.4554022

(x − x1 )Q2,0 − (x − x2 )Q1,0

Q2,1 = P1,2 = = 0.5102968

(x2 − x1 )

(x − x0 )Q2,1 − (x − x2 )Q1,1

Q2,2 = P0,1,2 = = 0.5124715

(x2 − x0 )

assignments:

study example 6 page 113.

45

5.4 Newton Interpolating Polynomial

Suppose there is a known polynomial Pn−1 (x) that interpolates the data set: (xi , yi ), i = 0, 1, .., n − 1.

When one more data point (xn , yn ), which is distinct from all the other data points, is added to the data

set, we can construct a new polynomial Pn (x) that interpolates the new data set. To do so, let consider

the polynomial

n−1

Y

Pn (x) = Pn−1 (x) + cn (x − xi ) (5.13)

i=0

For the case when n = 0, we write

P0 (x) = y0 (5.14)

It is clear that for all the old points

n−1

Y

Pn (xn ) = Pn−1 (xn ) + cn (xn − xi ) (5.16)

i=0

cn = n−1

= n−1 (5.17)

Y Y

(xn − xi ) (xn − xi )

i=0 i=0

So, For any given data set (xi , yi ), i = 1, 2, ..., n, we can obtain the interpolating polynomial by recursive

process that starts from P0 (x) and uses the above construction to get P1 (x), P2 (x), ..., Pn−1 (x). We will

demonstrate this process through the following example

i 0 1 2 3 4

xi 0 0.25 0.5 0.75 1

yi −1 0 1 0 1

P0 (x) = y0 = −1

0

Y

P1 (x) = P0 (x) + c1 (x − xi )

i=0

= −1 + c1 (x − x0 )

= −1 + c1 x

46

The constant c1 is given by

y1 − P0 (x1 )

c1 = 0

Y

(x1 − xi )

i=0

0 − (−1)

= =4

0.25 − 0

Thus,

P1 (x) = −1 + 4 x

1

Y

P2 (x) = P2 (x) + c2 (x − xi )

i=0

= (−1 + 4x) + c2 (x − x0 )(x − x1 )

= −1 + 4x + c2x(x − 0.25)

The constant c2 is given by

y2 − P1 (x2 )

c2 = 1

Y

(x2 − xi )

i=0

−1 − (−1 + 4 × 0.5)

= =0

(0.5 − 0)(0.5 − 0.25)

Thus,

P2 (x) = −1 + 4x

We continue the calculations we find

−64 64 1 1

c3 = , P3 (x) = −1 + 4x − x x − x−

3 3 4 2

64 1 1 1 1 3

c4 = 64, P4 (x) = −1 + 4x − x x − x− + 64x x − x− x−

3 4 2 4 2 4

Divided difference polynomial: The divided difference polynomial is a helpful method to generate

interpolation polynomials.

The first order divided difference of f at x = xi is give by

f (xi+1 ) − f (xi )

f [xi , xi+1 ] = (5.18)

xi+1 − xi

The second order divided difference of f at xi is given by

f [xi+1 , xi+2 ] − f [xi , xi+1 ]

f [xi , xi+1 , xi+2 ] = (5.19)

xi+2 − xi

47

We can generate this to higher order

f [x1 , . . . , xn ] − f [x0 , . . . , xn−1 ]

f [x0 , x1 , . . . , xn ] = (5.20)

xn − x0

With these definition we get the interpolation polynomial as:

n−1

Y

Pn (x) = f [x0 ] + f [x0 , x1 ](x − x0 ) + . . . + f [x0 , . . . , xn ] (x − xi )

i=0

n

X i−1

Y

= f [x0 ] + f [x0 , . . . , xi ] (x − xi ) (5.21)

i=1 i=0

Example:

i 0 1 2 3 4

xi 0 0.25 0.5 0.75 1

yi −1 0 1 0 1

Let try to find the interpolation polynomial of the above table

i xi f (xi ) 1st DD 2nd DD 3rd DD

0 0.00 f [x0 ] = −1

1 0.25 f [x1 ] = 0

f [x1 ]−f [x0 ]

f [x0 , x1 ] = x1 −x0

=4

2 0.50 f [x1 ] = 1

f [x2 ]−f [x1 ]

f [x1 , x2 ] = x2 −x1

=4

3 0.75 f [x1 ] = 0

f [x0 , x1 , x2 ] = 0

f [x3 ]−f [x2 ]

f [x2 , x3 ] = x3 −x2

= −4

f [x1 , x2 , x3 ] = −16

4 1.00 f [x1 ] = 1

f [x4]−f [3]

f [x3 , x4 ] = x4 −x3

=4

f [x0 , x1 , x2 , x3 ] = − 64

3

f [x2 , x3 , x4 ] = 16

128

f [x1 , x2 , x3 , x4 ] = 3

f [x0 , x1 , x2 , x3 , x4 ] = 64

1 64 1 1

P3 (x) = −1 + 4(x − 0) + 0(x − 0) x − − (x − 0) x − x−

4 3 4 2

1 1 3

+64 (x − 0) x − x− x−

4 2 4

64 1 1 1 1 3

= −1 + 4x − x x − x− + 64x x − x− x−

3 4 2 4 2 4

48

5.5 Polynomial Forms

We follow the book Introduction to numerical analysis, Alastair Wood, Addition-Wesly

Pn (x) = a0 + a1 x + . . . + an xn

Xn

= ak x k (5.22)

k=0

This form is convenient for analysis but may leads to loss of significance. For example, let consider

18001

P1 (x) = −x

3

This polynomial takes 1/3 at x = 6000 and -2/3 at x = 6001. On a finite-precision machine with 5

decimal digits the coefficients are stored as a∗0 = 6000.3 and a∗1 = −1, and hence

P1 (6001) = 6000.3 − 6001 = −0.7

Only one digit of the exact value is recovered, yet the coefficients are accurate to 5 digits! 4 significant

digits have been lost due to subtraction of two near-neighbor large numbers.

Shifted power form: The drawback seen in the previous example can be alleviated by changing

the origin of the x to a non-zero value c an writing the polynomial (5.22) as

n

X

Pn (x) = ak x k

k=0

= b0 + b1 (x − c) + . . . + bn (x − c)n (5.23)

Xn

= bk (x − c)k

k=0

This form is called shifted power form. c is a centre an bk are constant coefficients. The previous

example can be written as

18001

P1 (x) = −x

3

1

= − (x − 6000)

3

So, we get

1 1

P1 (6000) = − (6000 − 6000) = = 0.33333

3 3

1 1

P1 (6001) = − (6001 − 6000) = − 1 = −0.66667

3 3

49

These values are accurate to 5 digits and there is no loss of significance.

We can find the coefficients bk by using Taylor polynomial at x=c. This gives

n (k)

X Pn (c)

Pn (x) = (x − c)k (5.24)

k=0

k!

(k)

where,Pn (c) is the k-th derivative of Pn (x) at x = c. Thus,

(k)

Pn (c)

bk =

k!

Pn (x) = d0 + d1 (x − c1 ) + d2 (x − c1 )(x − c2 ) + . . .

+dn (x − c1 )(x − c2 ) . . . (x − cn ) (5.25)

Xn k

Y

= d0 + dk (x − cj )

k=1 j=1

cn = c we get the shifted power form, and for c1 = c2 = . . . = cn = 0 we cover the power form.

For large data set a single approximation by a polynomial satisfying the data (xi , f (xi )) will give a poly-

nomial of high degree. In general a polynomial of high degree oscillates which may not be acceptable

behavior. One solution of this problem is to use interpolation in a piecewise manner.

The simplest approach uses linear interpolates. Given n + 1 items of data in ascending order by x, the

data (xi , f (xi )) and (xi+1 , f (xi+1 )) are interpolated by a straight line. A piecewise linear interpolation is

called linear spline S1 . The linear spline suffers from the lack of smoothness. The continuity is assured

but there is a change in the first derivative. The solution is to use splines having greater smoothness.

We concentrate on cubic spline.

Definition1 :

For the data (xi , f (xi )), i = 0, . . . , n, S3 is a cubic spline in [x0 , xn ] if:

(1) S3 restricted to [xi−1 , xi ] is a polynomial of degree at most 3

(2) S3 ∈ C 2 [x0 , xn ]

(3) If s3,i and s3,i+1 are cubic interpolation on adjacent sub-intervals then the conditions:

s′3,i (xi ) = s′3,i+1 (xi )

s′′3,i (xi ) = s′′3,i+1 (xi )

The condition of this definition is that individual interpolates can no longer be constructed in isola-

tion. The piecewise interpolates s3,1 , . . . , s3,n are interdependent through the derivatives continuity

condition.

1

Alastair Wood, Introduction to Numerical Analysis

50

On the interval [xi−1 , xi ], and for i = 1, 2, . . . , n, we have

s3,i (x) = f (xi−1 ) + ai (x − xi−1 ) + bi (x − xi−1 )2 + ci (x − xi−1 )3 (5.26)

there are 3n constants to be determined, ai , bi ,ci , i = 1, . . . , n. The continuity enforce that

s3,i (xi ) = f (xi−1 ) + ai (xi − xi−1 ) + bi (xi − xi−1 )2 + ci (xi − xi−1 )3

s3,i+1 (xi ) = f (xi ) + ai+1 (xi − xi ) + bi+1 (xi − xi )2 + ci+1 (xi − xi )3

which leads to

f (xi−1 ) + ai (xi − xi−1 ) + bi (xi − xi−1 )2 + ci (xi − xi−1 )3 = f (xi )

f (xi−1 ) + ai hi + bi h2i + ci h3i = f (xi )

where hi = xi − xi−1 for i = 1, . . . , n.

For the first derivative we have

s′3,i (xi ) = ai + 2bi (xi − xi−1 ) + 3ci (xi − xi−1 )2

s′3,i+1 (xi ) = ai+1 + 2bi+1 (xi − xi ) + 3ci+1 (xi − xi )2

which leads to

ai + 2bi hi + 3ci h3i = ai+1

for i = 1, . . . , n − 1.

For the second derivative we get

s′′3,i (xi ) = 2bi + 6ci (xi − xi−1 )

s′′3,i (xi ) = 2bi+1 + 6ci+1 (xi − xi )

which leads to

bi + 3ci hi = bi+1

for i = 1, . . . , n − 1.

The natural cubic spline is defined by

s′′3,1 (x0 ) = s′′3,n (xn ) = 0 (5.27)

in other words

b1 = 0, and bn + 3cn hn = 0 (5.28)

we have:

3n constants to be determined

Therefore, we can find all the constants.

51

5.7 Parametric Curves

In some cases curves cannot be expressed as a function of one coordinate variable y in terms of the other

variable x. A straightforward method to represent such curves is to use parametric technique. We choose

a parameter t on the interval [t0 , tn ], with t0 < t1 < ... < tn and construct approximation functions with

xi = x(ti ) and yi = y(ti ) (5.29)

Consider a curve given by the figure.3.14 page 158 from the textbook. From the curve we can extract the

following table

i 0 1 2 3 4

ti 0 0.25 0.5 0.75 1

xi −1 0 1 0 1

yi 0 1 0.5 0 −1

Please refer to page 158

Example:

The first part of the graph

x = [10, 6, 2, 1, 2, 6, 10]

y = [3, 1, 1, 4, 6, 7, 6]

t = [0, 1/6, 1/3, 1/2, 2/3, 5/6, 1]

The second part of the graph

x = [2, 6, 10, 10, 13]

y = [10, 12, 10, 1, 1]

t = [0, 1/4, 2/4, 3/4, 1]

52

The cubic polynomials for the first graph are:

10 − 29713

u − 540

13

u3 u < 1/6

115 27

13

− 13 u − 1620

13

u2 + 2700

13

u3 u < 1/3

283

13

− 1539

13

u + 2916

13

u2 − 1836

13

u3 u < 1/2

fx = u 7→ 176 1215 2592 2 1836 3 (5.30)

− 13 + 13 u − 13 u + 13 u u < 2/3

1168

− 4833 u + 6480 u2 − 2700

u3 u < 5/6

13707 13

1917

13

1620 2

13

540 3

− 13 + 13 u − 13 u + 13 u otherwise

(5.31)

3 − 1797

130

u + 4266

65

u3 u < 1/6

367

− 130 u − 65 u2 + 1350

1383 1242

u3

130 13

u < 1/3

2143 17367 22734 2 17226 3

130

− 130 u + 65 u − 65 u u < 1/2

fy = u 7→ (5.32)

1831

− 65 + 17463 130

u − 12096

65

u2 + 5994

65

u3 u < 2/3

389 16521 13392 2 1350 3

− 130 u + 65 u − 13 u

132397 u < 5/6

− 26 + 40629 130

u − 20898

65

u2 + 6966

65

u3 otherwise

2 + 205 u + 152 u3 u < 1/4

113 14137

7

28

− 14 u + 6847

u2 − 760

7

u3 u < 1/2

fx = u 7→ 815 2647 2 1096 3 (5.33)

− + u − 300 u + 7 u u < 3/4

92928 269914

14

− 14 u + 1464 7

u2 − 4887

u3 otherwise

(5.34)

10 + 135 u − 184 u3 u < 1/4

323 14

7

28

− 123

14

u + 516

7

u2 − 872

7

u3 u < 1/2

fy = u 7→ (5.35)

1277

− 28 + 14 u − 612 u2 + 2328

4677

7

u3 u < 3/4

2399 7473 3816 2 1272 3

14

− 14 u + 7 u − 7 u otherwise

53

We merge the two graphs we get

Example:

Given the data points

i 0 1 2

xi 4 9 16

fi 2 3 4

The cubic equations are:

f1 (x) = a0 + a1 (x − x0 ) + a2 (x − x0 )2 + a3 (x − x0 )3

f2 (x) = b0 + b1 (x − x1 ) + b2 (x − x1 )2 + b3 (x − x1 )3

which become

f2 (x) = 3 + b1 (x − 9) + b2 (x − 9)2 + b3 (x − 9)3

3 = 2 + a1 (9 − 4) + a2 (9 − 4)2 + a3 (9 − 4)3

1 = 5a1 + 25a2 + 125a3

4 = 3 + b1 (16 − 9) + b2 (16 − 9)2 + b3 (16 − 9)3

1 = +7b1 + 49b2 + 343b3

f2′ (x) = b1 + 2b2 (x − 9) + 3b3 (x − 9)2

b1 = a1 + 10a2 + 75a3

f2′′ (x) = 2b2 + 6b3 (x − 9)

54

The continuity of the second derivatives lead to

b2 = a2 + 15a3

0 = a2

0 = b2 + 21b3

1 = 7b1 + 49b2 + 343b3

b1 = a1 + 10a2 + 75a3

b2 = a2 + 15a3

0 = a2

0 = b2 + 21b3

55

1 = 5a1 + 125a3

1 = +7b1 + 49b2 + 343b3

b1 = a1 + 75a3

b2 = 15a3

0 = b2 + 21b3

then, we get:

1 = 5a1 + 125a3

and

15a3

1 = 7(a1 + 75a3 ) + 49(15a3 ) + 343 −

21

1 = 7a1 + 1015a3

89

a1 =

420

−1

a3 =

2100

Thus, the cubic polynomial is

89 1

f1 (x) = 2 + (x − 4) − (x − 4)3

420 2100

Therefore

459

f1 (7) = 175

(5.36)

We can find the other polynomial in the same way

37 1 1

f2 (x) = 3 + (x − 9) − (x − 9)2 + (x − 9)3

210 140 2940

56

Chapter 6

Consider a small increment ∆x = h in x. According to Taylor’s theorem, we have

h2 ′′

f (x + h) = f (x) + hf ′ (x) + f (ξ) (6.1)

2

where, ξ is a real number between x and x + h. We can get

f (x + h) − f (x) h ′′

f ′ (x) = − f (ξ) (6.2)

h 2

We can get the same formula using Linear Lagrange polynomial. we use two points x0 and x1 = x0 +h,

we get

x − x1 x − x0 f ′′ (ξ)

f (x) = f (x0 ) + f (x1 ) + (x − x0 )(x − x1 )

x0 − x1 x1 − x0 2!

x − x1 x − x0 f ′′ (ξ)

= −f (x0 ) + f (x1 ) + (x − x0 )(x − x1 )

h h 2!

Now we calculate the derivative at x = x0

−f (x0 ) f (x1 ) f ′′ (ξ) 1 ′

f ′ (x) = + + (2x − x0 − x1 ) + [f ′′ (ξ)] (x − x0 )(x − x1 )

h h 2! 2!

f (x 0 + h) − f (x 0 ) h

f ′ (x0 ) = − f ′′ (ξ)

h 2

This formula is known as the forward-difference if h > 0 and the backward-difference if h < 0.

f (x) − f (x + h)) h ′′

f ′ (x) = + f (ξ) (6.3)

h 2

57

(n+1)-point formula: To obtain general derivative approximation formulas, suppose that {x0 , x1 , ..., xn }

are (n+1) distinct numbers in some interval I and that f ∈ C n+1 (I). We can Use Lagrange polyno-

mials n n

X f (n+1) (ξ(x)) Y

f (x) = f (xk )Lk (x) + (x − xk ) (6.4)

k=0

(n + 1)! k=0

n n

′

X f (n+1) (ξj ) Y

f (xj ) = f (xk )L′k (xj ) + (x − xk ) (6.5)

k=0

(n + 1)! k = 0

k 6= j

In general, using more evaluation points produces greater accuracy, although the number of functional

evaluations and growth of round-off error discourages this somewhat.

Three-point formula:

1 h2

f ′ (x0 ) = [−3f (x0 ) + 4f (x + h) − f (x0 + 2h)] + f (3) (ξ0 ), (6.6)

2h 3

2

1 h

f ′ (x0 ) = [f (x0 + h) − f (x0 − h)] + f (3) (ξ1 ), (6.7)

2h 6

where ξ0 lies between x0 and x0 + 2h and ξ1 lies between x0 − h and x0 + h. Although the errors

in both formulas are O(h2 ), the error in the last equation is approximately half the error in the first

equation. This is because it use data from both sides of x0 .

Five-point formula

1

f ′ (x0 ) = [f (x0 − 2h) − 8f (x0 − h)

12h

h4

+8f (x0 + h) − f (x0 + 2h)] + f (5) (ξ), (6.8)

30

where ξ lies between x0 − 2h and x0 + 2h. The other five-point formula is useful for end-point

approximations. It is given by

1

f ′ (x0 ) = [−25f (x0 ) + 48f (x0 + h) − 36f (x0 + 2h)

12h

h4

+16f (x0 + 3h) − 3f (x0 + 4h)] + f (5) (ξ), (6.9)

5

where ξ lies between x0 and x0 + 4h. Left-endpoint approximation are found using h > 0 and

right-endpoint approximations with h < 0.

Example:

Let f (x) = x exp (x). The values of f at different x are given

x f (x)

1.8 10.889465

1.9 12.703199

2.0 14.778112

2.1 17.148957

2.2 19.855030

58

Since, f ′ (x) = (x+1) exp (x), we have f ′ (2) = 22.167168. Let us approximate f ′ (2) using three-point

formulas.

1

f ′ (x0 ) ≈ [−3f (x0 ) + 4f (x + h) − f (x0 + 2h)])

2h

1

f ′ (2) ≈ [−3f (2.0) + 4f (2.1) − f (2.2)] , h = 0.1

2 × 0.1

≈ 22.032310

1

f ′ (2) ≈ [−3f (2.0) + 4f (1.9) − f (1.8)] , h = −0.1

−2 × 0.1

≈ 22.054525

1

f ′ (x0 ) ≈ [f (x0 + h) − f (x0 − h)] ,

2h

1

≈ [f (2.1) − f (1.9)] , h = 0.1

2 × 0.1

≈ 22.228790

The errors are 22.167168 − 22.03231 = 0.134858, 22.167168 − 22.054525 = 0.112643, and 22.167168 −

22.228790 = −0.61622 × 10−1 . respectively

It is also possible to find approximation to higher derivatives function using only tabulated values

of function at various points.

Expand a function f in a third Taylor polynomial about x0 and evaluate at x0 ± h. Then

(x − x0 )2 ′′ (x − x0 )3 (3)

f (x) = f (x0 ) + (x − x0 )f ′ (x0 ) + f (x0 ) + f (x0 )

2 6

(x − x0 )4 (4)

+ f (ξ), (6.10)

24

h2 ′′ h3 h4

f (x0 + h) = f (x0 ) + hf ′ (x0 ) + f (x0 ) + f (3) (x0 ) + f (4) (ξ+ ) (6.11)

2 6 24

2 3

h h h4

f (x0 − h) = f (x0 ) − hf ′ (x0 ) + f ′′ (x0 ) − f (3) (x0 ) + f (4) (ξ− ) (6.12)

2 6 24

where x0 − h < ξ− < x0 < ξ+ < x0 + h adding the two last equations we get

f (x0 + h) + f (x0 − h) h2 h4 (4)

= f (x0 ) + f ′′ (x0 ) + f (ξ+ ) + f (4) (ξ− ) , (6.13)

2 2 24

solving this last equation we find that

1 h2 (4) (4)

f ′′ (x0 ) = [f (x 0 − h) − 2f (x 0 ) + f (x 0 + h)] − f (ξ+ ) + f (ξ− ) (6.14)

h2 24

Suppose that f (4) is continuous on [x0 − h, x0 + h]. Since 1/2(f (4) (ξ+ ) + f (4) (ξ− )) is between f (4) (ξ+ )

and f (4) (ξ− ), the Intermediate value theorem implies that a number ξ exists between ξ+ and ξ− , and

hence in (x0 − h, x0 + h), with

f (4) (ξ+ ) + f (4) (ξ− )

f (4) =

2

This lead to

1 h2 (4)

f ′′ (x0 ) = [f (x 0 − h) − 2f (x 0 ) + f (x 0 + h)] − f (ξ) (6.15)

h2 12

where ξ ∈ (x0 − h, x0 + h).

59

It is important to pay attention to round-off error when approximating derivatives. Let illustrate

this by an example:

two-point formula

f (x + h) − f (x) f1 − f0

f ′ (x) ≈ = (6.16)

h h

If we assume the round off errors in f0 and f1 as e0 and e1 , respectively, then

f1 + e1 − f0 − e0 f1 − f0 e1 − e0

f ′ (x) ≈ = + (6.17)

h h h

If the errors are of magnitude e, we can at worst get

2e

Rudolf error = (6.18)

h

h Mh

We know that the truncation error is given by − f ′′ (ξ) = , where M is the bound given by

′′

2 2

M = max |f (ξ)| and x ≤ ξ ≤ x + h Thus the bound for total error is then

2e M h

E2 = + (6.20)

h 2

Note that when the step size h increased, the truncation error increases while the round off error

decreases. The optimal value of h can be found to be

r

′ M 2e e

E (h) = − 2 = 0 =⇒ h = 2 . (6.21)

2 h M

Three-point formula

1 h2

f ′ (x0 ) = [f (x0 + h) − f (x0 − h)] − f (3) (ξ) (6.22)

2h 6

Suppose that in evaluating f (x0 ± h) we encounter round-off error e(x0 ± h). Then our computer

values f˜(x0 ± h) are related to f (x0 ± h) by

′ f˜(x0 + h) − f˜(x0 − h) e(x0 + h) − e(x0 − h) h2 (3)

f (x0 ) − = − f (ξ) (6.24)

2h 2h 6

is due in part to the round-off error and in part to the truncation error. If we assume that the

round-off error is bounded by ǫ > 0 and that the third derivative is bounded by M > 0. then the

total error will be bounded by

′ f˜(x0 + h) − f˜(x0 − h) e(x0 + h) − e(x0 − h) h2 (3)

f (x0 ) − = − f (ξ) (6.25)

2h 2h 6

ǫ h2

≤ + M. (6.26)

h 6

60

To reduce the truncation error, h2 M/6, we must reduce h. But if h is reduced, the round-off error

ǫ/h grows. In practice, then, it is seldom advantageous to let h be too small sine the round-off error

will dominate the calculations. The Minimum error can be found for the optimal value of h given by

r

3 3ǫ

h= (6.27)

M

Numerical differentiation is unstable: small values of h needed to reduce truncation error also cause

the round-off error to grow.

Example:

Show that the differentiation rule

f ′ (x0 ) ≈ a0 f0 + a1 f1 + a2 f2

is exact for all f ∈ C 3 , if, and only if, it exact for

f (x) = 1, f (x) = x, f (x) = x2 ,

and find the values of a0 , a1 , and a2 .

solution: According to the following formula

n n

X f (n+1) (ξ(x)) Y

f (x) = f (xk )Lk (x) + (x − xk ) (6.28)

k=0

(n + 1)! k=0

with n

Y (x − xi )

Lk (x) = (6.29)

i=0

(xk − xi )

i 6= k

n n

′

X f (n+1) (ξj ) Y

f (xj ) = f (xk )L′k (xj ) + (x − xk ) (6.30)

k=0

(n + 1)! k = 0

k 6= j

Therefore, we get

n

′ f (3) (ξ0 ) Y

f (x0 ) = f0 L′0 (x0 ) + f1 L′1 (x0 ) + f2 L′2 (x0 ) + (x0 − xk ) (6.31)

6 k=1

– For f (x) = 1, x, x2 , the formula f ′ (x0 ) = a0 f0 + a1 f1 + a2 f2 is exact since ∀ξ0 f (3) (ξ0 ) = 0.

– Now, we want to show that if the formula f ′ (x0 ) = a0 f0 + a1 f1 + a2 f2 is exact, then it should

be also exact for f (x) = 1, x, x2 .

If f ′ (x0 ) = a0 f0 + a1 f1 + a2 f2 is exact then ∀ξ0 f (3) (ξ0 ) = 0.

This implies that ∀α, β, γ, f (x) = α + βx + γx2 .

Then,

if f ′ (x0 ) = a0 f0 + a1 f1 + a2 f2 is exact =⇒ ∀ξ0 f (3) (ξ0 ) = 0

=⇒ ∀α, β, γ, f (x) = α + βx + γx2

=⇒ the formula is exact fo f (x) = 1, x, x2

Exercises section 4.1: 1, 3, 5, 9, 13, 15, 19

61

6.2 Richardson’s Extrapolation

Richardson’s extrapolation is used to generate high-accuracy results while using low-order formulas. Ex-

trapolation can be applied whenever it is known that an approximation technique has an error term with

predictable form like

M − N (h) = K1 h + K2 h2 + ..., (6.32)

for some collection of unknown constants, Ki , where N (h) approximate an unknown value M . In general

M − N (h) ≈ K1 h, unless there was a large variation in magnitude among the constants K, which is O(h).

if M can be written in the form

m−1

X

M = N (h) + Kj hj + O(hm ), (6.33)

j=1

h Nj−1 (h/2) − Nj−1 (h)

Nj (h) = Nj−1 + (6.34)

2 2j−1 − 1

These approximations are generated by rows in the order indicated by the numbered entries in the following

table

O(h) O(h2 ) O(h3 ) O(h4 )

1 : N1 (h/1) ≡ N (h/1)

2 : N1 (h/2) ≡ N (h/2) 3 : N2 (h) (6.35)

4 : N1 (h/4) ≡ N (h/4) 5 : N2 (h/2) 6 : N3 (h)

7 : N1 (h/8) ≡ N (h/8) 8 : N2 (h/4) 9 : N3 (h/2) 10 : N4 (h)

h Nj−1 (h/2) − Nj−1 (h)

Nj (h) = Nj−1 + (6.36)

2 4j−1 − 1

Example:

We want to determine an approximate the value to f ′ (1.0) with h = 0.4 where f (x) = ln x. We use

Richardson’s Extrapolation N3 (h). We have

1 h2 h4 (5)

f ′ (x0 ) = [f (x0 + h) − f (x0 − h)] − f (3) (x0 ) − f (ξ) − ... (6.37)

2h 6 120

In the case h = 0.4 and x0 = 1, we can calculate N3 (h) using

1

N1 (h) = [ln(x0 + h) − ln(x0 − h)]

2h

N1 (0.4) = 1.059122326

N1 (0.2) = 1.013662770

N1 (0.1) = 1.003353478

62

We then use

h Nj−1 (h/2) − Nj−1 (h)

Nj (h) = Nj−1 + (6.38)

2 4j−1 − 1

to get

N1 (0.2) − N1 (0.4)

N2 (0.4) = N1 (0.2) + = 0.9985095847

41 − 1

N1 (0.1) − N1 (0.2)

N2 (0.2) = N1 (0.1) + = 0.9999170473

41 − 1

N2 (0.2) − N2 (0.4)

N3 (0.4) = N2 (0.2) + = 1.000010878

42 − 1

> a1:=1.1;f1:=exp(2*a1);

> a2:=1.2;f2:=exp(2*a2);

> a3:=1.3;f3:=exp(2*a3);

> a4:=1.4;f4:=exp(2*a4);

a1 := 1.1

f1 := 9.025013499

a2 := 1.2

f2 := 11.02317638

a3 := 1.3

f3 := 13.46373804

a4 := 1.4

f4 := 16.44464677

> h:=0.1;

> f1p:=1/2/h*(-3*f1+4*f2-f3);

> f2p:=1/2/h*(f3-f1);

> f3p:=1/2/h*(f4-f2);

> f4p=-1/2/h*(-3*f4+4*f3-f2);

h := 0.1

f1p := 17.76963490

f2p := 22.19362270

f3p := 27.10735195

f4p = 32.51082265

> evalf(subs(x=1.3,h^2/3*diff(exp(2*x),x$3)));

> evalf(subs(x=1.3,h^2/6*diff(exp(2*x),x$3)));

> evalf(subs(x=1.4,h^2/6*diff(exp(2*x),x$3)));

> evalf(subs(x=1.4,h^2/3*diff(exp(2*x),x$3)));

0.3590330144

0.1795165072

0.2192619569

63

0.4385239139

Rb

The basic method involved in approximating a f (x)dx is called numerical quadrature. It uses a sum

Pn

ai f (xi ) to approximate the integral. The methods of quadrature in this section are based on Lagrange

i=0

interpolating polynomial.

To derive the Trapezoidal rule, we use the linear Lagrange polynomial which agrees with the function f (x)

at x0 = a and x1 = b. The Trapezoidal Rule is:

Zb

h h3

f (x)dx = [f (x0 ) + f (x1 )] − f ′′ (ξ), (6.39)

2 12

a

We want to evaluate the integral Z 2

I= (x3 + 1) dx (6.40)

1

using trapezoidal rule.

Z2

1 h3

f (x)dx = [f (1) + f (2)] − f ′′ (ξ)

2 12

1

1 1

=[2 + 9] − 6ξ

2 12

11 1

= − ξ

2 2

Therefore the trapezoidal rule give 5.5 with a maximum error Emax equals to

1

|Emax | = Max ξ=1

2

The exact value of the integral is

Z 2

I = (x3 + 1) dx

1

1 4

= x + x |21

4

1 4

= 2 + 2 − f rac1414 + 1

4

19

= = 4.75

4

Thus, the absolute error is |5.5 − 4.75| = 0.75. Note that the error is less than the maximum error

calculated from the formula.

64

6.3.2 Simpson’s rule

Simpson’s rule uses the second Lagrange polynomial with nodes at x0 = a, x1 = a + h, x2 = b, where

h = (b − a)/2. There are few tricks to get the formula (please see sec 4.3 and exercise 24 of sec 4.3)

Zx2

h h5

f (x)dx = [f (x0 ) + 4f (x1 ) + f (x2 )] − f (4) (ξ). (6.41)

3 90

x0

I= ex dx (6.42)

−1

Z1

1 1

f (x)dx = [f (−1) + 4f (0) + f (1)] − f (4) (ξ) (6.43)

3 90

−1

(6.44)

we get

Z1

1 −1 1

ex dx = [e + 4 + e] − eξ

3 90

−1

1 ξ

= 2.36205 − e

90

The maximum error is

1 ξ

|Emax | = Max e

90

1

= e = 0.03020

90

The exact value of the integral is

Z 1

1

I= ex dx = e − = 2.35040 (6.45)

−1 e

the absolute error is |2.36205 − 2.35040| = 0.01165 which is less than the maximum error 0.03020.

The degree of accuracy, or precision, of a quadrature formula is the largest positive integer n such that

the formula is exact for xk , for each k = 0, 1, ..., n.

What degree of precision does the following formula have?

Z 1

1 1

f (x)dx = f − √ +f √

−1 3 3

65

The integral

Z 1

1 + (−1)k

xk dx =

−1 1+k

and

k k k

1 1 1

−√ + √ = √ 1 + (−1)k

3 3 3

The form has n degree of precision if it is exact for xk , k = 0, . . . , n

k

1 + (−1)k 1

= √ 1 + (−1)k

1+k 3

It is true when k is an odd number. For k even number

k

1 1

= √

1+k 3

which is true for k = 0, 2. Therefore, the formula is true for k = 0, 1, 2, 3. Thus, the degree of the formula

is 3. We can conclude from this example that the formula is true for all the polynomial of degree at most

3.

There are two types of Newton-Cotes formula, open and closed. The (n+1)-point closed Newton-Cote

formula uses nodes xi = x0 + ih, for i = 0, 1, ..., n, where x0 = a, xn = b, and h = (b − a)/n. It called

closed because the endpoints of the closed interval [a, b] are included as nodes.

The open Newton-Cotes formulas use nodes xi = x0 + ih, for each i = 0, 1, ..., n, where h = (b −

a)/(n + 2) and x0 = a + h. This implies that xn = b − h. so, we label the endpoints by setting x−1 = a

and xn+1 = b. where

Zb n

X

f (x)dx ≈ ai f (xi ), (6.46)

a i=0

Zb

ai = Li (x)dx (6.47)

a

Theorem 4.2

Theorem 4.3

The Newton-Cotes formulas are generally unsuitable for use over large integration intervals. One way to

overcome this problem is to split the large interval into subintervals and sum Newton-Cotes formulas over

all these subintervals.

66

Composite Simpson’s rule:

Let f ∈ C 4 [a, b], n be even number, h = (b − a)/n, and xj = a + hj, for each j = 0, 1, . . . , n. There

exists a µ ∈ (a, b) for which the composite Simpson’s rule for n subintervals can be written as

Z b (n/2)−1 n/2

h X X b − a 4 (4)

f (x) dx = f (a) + 2 f (x2j ) + 4 f (x2j−1 ) + f (b) − h f (µ) (6.48)

a 3 j=1 j=1

180

Let f ∈ C 2 [a, b], h = (b − a)/n, and xj = a + hj, for each j = 0, 1, . . . , n. There exists a µ ∈ (a, b)

for which the composite trapezoidal rule for n subintervals can be written as

Z b " n−1

#

h X b − a 2 ′′

f (x) dx = f (a) + 2 f (xj ) + f (b) − h f (µ) (6.49)

a 2 j=1

12

Let f ∈ C 2 [a, b], n be even, h = (b − a)/(n + 2), and xj = a + h(j + 1), for each j =, −1, 0, . . . , n + 1.

There exists a µ ∈ (a, b) for which the composite Midpoint rule for n + 2 subintervals can be

written as

Z b n/2

X b − a 2 ′′

f (x) dx = 2h f (x2j ) − h f (µ) (6.50)

a j=0

6

An important property shared by all these composite integration techniques is a stability with respect

to round off errors. Let demonstrate this property for Composite Simpson’s rule with n subintervals

to a function f on [a, b]. Assume that

f (xi ) = f˜(xi ) + ei (6.51)

where ei is the round off error and f˜(xi ) is an approximation to f (xi ). From the Composite Simpson’s

rule

Z b (n/2)−1 n/2

h X X b − a 4 (4)

f (x) dx = f (a) + 2 f (x2j ) + 4 f (x2j−1 ) + f (b) − h f (µ)

a 3 j=1 j=1

180

(n/2)−1 n/2

h X X

e(h) = e0 + 2 e2j + 4 e2j−1 + en

3 j=1 j=1

h

e(h) ≤ ǫ [1 + (n − 2) + 2n + 1]

3

≤ nhǫ = (b − a)ǫ

It is clear from the last equation that the bound is independent of h and n. This means that,

even though we may nee to divide an interval into more parts to ensure accuracy, the increased

computation that is required does not increase the round off error. This means that the procedure

is stable as h approaches zero. Recall that was not true for the numerical differentiation procedures.

67

Exercises section 4.4:

The composite quadrature rules use equally spaced points. This is not good if the function to be integrated

has large variations in some region and small variations at other regions. So, it is useful to introduce a

method to adjust the step size. The step size have to be smaller over region where a large variation occurs.

This technique is called adaptive quadrature. The method is based on Simpson’s rule. But we can also

use any Newton-Cotes formula.

We know that Simpson’s rule uses two subintevals over [ak , bk ]:

h

S(ak , bk ) = (f (ak ) + 4f (ck ) + f (bk )) , (6.52)

3

b k − ak

where ck is the center of [ak , bk ], and h = . Furthermore, if f ∈ C (4) [ak , bk ], then there exist

2

ξk ∈ [ak , bk ] so that

Z b

h5 (4)

f (x) dx = S(ak , bk ) − f (ξk ) (6.53)

a 90

A composite Simpson rule using four subintervals of [ak , bk ] can be performed by bisecting this interval

into two equal subinterval [ak , ck ] = [ak1 , bk1 ] and [ck , bk ] = [ak2 , bk2 ]. We then write

h

S(ak1 , bk1 ) + S(ak2 , bk2 ) = (f (ak1 ) + 4f (ck1 ) + f (bk1 )) (6.54)

3×2

h

+ (f (ak2 ) + 4f (ck2 ) + f (bk2 )) (6.55)

3×2

where only two additional evaluation of f (x) are needed at ck1 and ck2 , which are the midpoint of the

intervals [ak1 , bk1 ], and [ak2 , bk2 ], respectively.

Furthermore, if f ∈ C (4) [ak , bk ], then there exist ξk1 ∈ [ak , bk ] so that

Z bk

h5

= S(ak1 , bk1 ) + S(ak2 , bk2 ) − f (4) (ξk1 ) (6.56)

ak 16 × 90

h5 (4) h5

S(ak , bk ) − f (ξk ) ≈ S(ak1 , bk1 ) + S(ak2 , bk2 ) − f (4) (ξk1 )

90 16 × 90

which can be written as

h5 (4) 16

− f (ξk1 ) ≈ (S(ak1 , bk1 ) + S(ak2 , bk2 ) − S(ak , bk ))

90 15

Thus, we can find that

Z b

≈ 1 |S(ak1 , bk1 ) + S(ak2 , bk2 ) − S(ak , bk )|

f (x) dx − S(a k1 , b k1 ) − S(a k2 , b k2 ) 15 (6.57)

a

68

1 1

Because of the assumption f (4) (ξk ) ≈ f (4) (ξk1 ), the fraction is replaced with when implementing

15 10

the method in a program.

Assume that we want the tolerance to be ǫk > 0 for the interval [ak , bk ]. If

1

|S(ak1 , bk1 ) + S(ak2 , bk2 ) − S(ak , bk )| < ǫk (6.58)

10

we infer that Z b

f (x) dx − S(a k1 , b k1 ) − S(a k2 , b k2 ) < ǫk

(6.59)

a

Z b

f (x) dx ≈ S(ak1 , bk1 ) + S(ak2 , bk2 ) (6.60)

a

and the error bound for this approximation over the interval [ak , bk ] is ǫk .

The adaptive quadrature is implemented by applying Simpson’s rules in this way:

h

S(ak , bk ) = (f (ak ) + 4f (ck ) + f (bk )) ,

3

h

S(ak1 , bk1 ) + S(ak2 , bk2 ) = (f (ak1 ) + 4f (ck1 ) + f (bk1 ))

3×2

h

+ (f (ak2 ) + 4f (ck2 ) + f (bk2 ))

3×2

3. The interval is refined into subintervals labeled [a01 , b01 ] and [a02 , b02 ].

1

|S(a01 , b01 ) + S(a02 , b02 ) − S(a0 , b0 )| < ǫ0 (6.61)

10

is passed, the quadrature formula

h

S(a01 , b01 ) + S(a02 , b02 ) = (f (a01 ) + 4f (c01 ) + f (b01 ))

6

h

+ (f (a02 ) + 4f (c02 ) + f (b02 ))

6

is applied to [a0 , b0 ] and we are done.

The two subintervals are labeled [a1 , b1 ] and [a2 , b2 ], over which the tolerances are halved,

ǫ1 = ǫ/2, ǫ2 = ǫ/2.

We repeat the steps 4-5 for the two intervals with the new tolerances.

6. we add all the quadrature formulas where the accuracy test are passed

69

example We apply the adaptive quadrature algorithm to approximate

Z 1

3√

x dx = 1

0 2

v−µ µ+v

S(µ, v) = f (µ) + 4f + f (v)

6 2

S(0, 1) = 0.9571067813

S(0, 0.5) = 0.3383883477

S(0.5, 1) = 0.6464010497

|S(0, 0.5) + S(0.5, 1) − S(0, 1)| − 10ǫ0 = 0.0176826161 > 0

We have to refine the interval [0, 1] into [0, 0.5] and [0.5, 1]

S(0, 0.25) = 0.1196383477

S(0.25, 0.5) = 0.2285372827

|S(0, 0.25) + S(0.25, 0.5) − S(0, 0.5)| − 10ǫ1 = 0.004787282700 > 0

We have to refine the interval [0, 5] into [0, 0.25] and [0.25, 0.5]

S(0, 0.125) = 0.04229854347

S(0.125, 0.25) = 0.08080013118

|S(0, 0.125) + S(0.125, 0.25) − S(0, 0.25)| − 10ǫ2 = 0.000960326900 > 0

We have to refine the interval [0, 25] into [0, 0.125] and [0.125, 0.25]

S(0, 0.0625) = 0.01495479346

S(0.0625, 0.125) = 0.02856716035

|S(0, 0.0625) + S(0.0625, 0.125) − S(0, 0.125)| − 10ǫ3 = 0.000026589660 < 0

The test has passed. So, we can keep the interval [0, 0.125] with

S(0, 0.0625) + S(0.0625, 0.125) = 0.04352195381.

We go now back and keep ǫ3

70

Accuracy test for [0.125,0.25]

S(0.125, 0.25) = 0.08080013118

S(0.125, 0.1875) = 0.03699538942

S(0.1875, 0.25) = 0.04381002180

|S(0.125, 0.1875) + S(0.1875, 0.25) − S(0.125, 0.25)| − 10ǫ3 = 0.001244719960 < 0

The test has passed. So, we can keep the interval [0.125, 0.25] with

S(0.125, 0.1875) + S(0.1875, 0.25) = 0.08080541122.

We go now back with ǫ2

Accuracy test for [0.25 0.5]

S(0.25, 0.5) = 0.2285372827

S(0.25, 0.375) = 0.1046387629

S(0.375, 0.5) = 0.1239134540

|S(0.25, 0.375) + S(0.375, 0.5) − S(0.25, 0.5)| − 10ǫ2 = 0.002485065800 < 0

The test has passed. So, we can keep the interval [0.25, 0.5] with

S(0.25, 0.375) + S(0.375, 0.5) = 0.2285522169.

We go now back with ǫ1

Accuracy test for [0.5 1]

S(0.5, 1) = 0.6464010497

S(0.5, 0.75) = 0.2959631153

S(0.75, 1) = 0.3504801743

|S(0.5, 0.75) + S(0.75, 1) − S(0.5, 1)| − 10ǫ1 = 0.004957760100 < 0

The test has passed. So, we can keep the interval [0.5, 1] with

S(0.5, 0.75) + S(0.75, 1) = 0.6464432896.

Now we can add

Z 1

3√

x dx ≈ S(0, 0.0625) + S(0.0625, 0.125) + S(0.125, 0.1875) + S(0.1875, 0.25) +

0 2

S(0.25, 0.375) + S(0.375, 0.5) + S(0.5, 0.75) + S(0.75, 1)

= 0.9993228715

We can summarize our results:

ak bk AQ

0.000 0.125 0.04352195381

0.125 0.250 0.08080541122

0.250 0.500 0.2285522169

0.500 1.000 0.6464432896

0.9993228715

71

1.5

0.5

0

0 0.125 .0.25 0.5 1

Gaussian integration is based on the concept that accuracy of quadrature form n can be improved by

choosing nodes wisely, rather than on the basis of equal spacing nodes. Gauss integration assumes an

approximation of the form

This equation contains 2n unknowns to be determined. Gaussian Quadrature use the fact that the

quadrature form has the high degree of precision as possible.

Z 1 Xn

f (x) dx ≈ ai f (xi ) (6.62)

−1 i=1

Let us find the Gaussian quadrature formula for n = 2. In this case the function 1, x, x2 , and 4x3 should

give exact results.

Z 1

f (x) = 1 =⇒ a1 + a2 = dx = 2

−1

Z 1

f (x) = x =⇒ a1 x1 + a2 x2 = xdx = 0

−1

Z 1

2 2 2 2

f (x) = x =⇒ a1 x1 + a2 x2 = x2 dx =

−1 3

Z 1

f (x) = x3 =⇒ a1 x31 + a2 x32 = x3 dx = 0

−1

1

a1 = a2 = 1, x1 = −x2 = √ (6.63)

3

Thus, we have the Gaussian Integration for two nodes

Z 1

1 1

f (x) dx ≈ f √ +f √ (6.64)

−1 3 3

72

This method can be generalized to more than two nodes but there is another alternative way to get

more easily. This alternative way use what we call Legendre Polynomials. They are defined by

For each n, Pn is a monic polynomial of degree n, a polynomial xn + a(n−1) x(n−1) + ... + a1 x + a0 in

which the coefficient of the highest order term is 1.

whenever P (x) is polynomial of degree less then n we have

Z 1

P (x) Pn (x) dx = 0 (6.65)

−1

1

P0 (x) = 1, P1 (x) = x, , P2 (x) = x2 − ,

3

3 4 3

P3 = x3 − x, P4 (x) = x4 − x2 + (6.66)

5 6 35

The nodes xi , i = 1, . . . , n are determined by the roots of Pn (x) and the coefficients ai are defined by

Z 1 Y

x − xj

ai = dx (6.67)

−1 j = 1 xi − xj

j 6= i

Z 1 Z 1 n

X

f (x) dx ≈ P (x) dx = ai P (xi ) (6.68)

−1 −1 i=1

Note that Gaussian formula imposes a restriction on the limits of integration to be from -1 to 1. It is

possible to overcome this restriction by using the technique of changing variable.

Z b Z 1

f (x) dx = g(z)dz (6.69)

a −1

We define

z = Ax + B (6.70)

z−B

x = (6.71)

A

So, we can get

−1 = A a + B

1 = Ab + B

2

A =

b−a

a+b

B =

a−b

73

Therefore

Z b Z 1

1

f (x) dx = g(z) dz (6.72)

a A −1

where

z−B

g(z) = f (6.73)

A

Example:

Convert the integral

Z 2

I= e−x/2 dx

−2

Solution:

Z 2

I = e−x/2 dx

−2

Z

1 1 z−B

= f dz

A −1 A

We have

2 1

A= =

b−a 2

a+b

B= =0

a−b

Thus

Z 2

I = e−x/2 dx

−2

Z 1

= 2 f (2z) dz

−1

Z 1

= 2 e−z dz

−1

Improper integral result in two cases:

the notion of integration is extended to an interval of integration on which the function is unbounded

74

Let consider the first case, when the integrand f (x) is unbounded. The second case can be reduced to the

first case. It is well known that the improper integral

Z b

dx

p

(6.74)

a (x − a)

converges iff the power 0 < p < 1. Thus, we define the improper integral

Z b

dx (b − a)1−p

p

= (6.75)

a (x − a) 1−p

If the function f (x) can be written as

g(x)

f (x) = (6.76)

(x − a)p

where g is continuous on [a, b], the improper integral

Z b

f (x)dx (6.77)

a

Let now approximate this integral given by eq. (6.77) using composite Simpson’s rule. Assume that

g ∈ C 5 [a, b]. In this case, the fourth Taylor polynomial, P4 (x), for g can be written as

P4 (x) = g(a) + (x − a)g ′ (a) + (x − a)2 + (x − a)3 + (x − a)4 (6.78)

2 3! 4!

Therefore

Z b Z b Z b

g(x) − P4 (x) P4 (x)

f (x)dx = dx + dx (6.79)

a a (x − a)p a (x − a)p

The last term can be integrated like

Z b 4 Z

P4 (x) X g (k) (a)

p

dx = (x − a)k−p dx

a (x − a) k=0

k!

4

X g (k) (a)

= (b − a)k−p+1 (6.80)

k=0

(k − p + 1)k!

g(x) − P4 (x)

, if a < x ≤ b

G(x) = (x − a)p (6.81)

0, if x = a

Since 0 < p < 1 and P4k (a) agrees with g (k) (a) for each k = 0, dots, 4, we have G ∈ C 4 [a, b]. This implies

that the Composite Simpson’s rule can be applied to approximate the integral of G on [a, b].

Example:

We want to approximate Z 1

sin (x)

dx (6.82)

0 x1/4

Using Simpson’s composite rule with n = 4.

75

1. We find the fourth order Taylor polynomial for sin(x) at x = 0

1

sin(x) ≈ P4 (x) = x − x3 (6.83)

6

Z 1 Z 1

P4 (x) 3/4 1 11/4

dx = x − x dx

0 x1/4 0 6

166

= = 0.5269841270

315

1

sin(x) − x + x3

6 , if 0 < x ≤ 1

G(x) = x 1/4

0, if x = 0

Z 1

1

G(x) dx ≈ [G(0) + 2G(0.5) + 4G(0.25) + 4G(0.75) + G(1)] = 0.001432198742

0 12

5. Now we get

Z 1

sin (x)

dx ≈= 0.001432198742 + 0.5269841270 = 0.5284163257

0 x1/4

To approximate the improper integral with a singularity at the right endpoint, we could apply the

technique used above with the following transformation

Z b Z −a

f (x)dx = f (−z)dz (6.84)

a −b

which has a singularity at the left endpoint. An improper with a singularity at a < c < b can be treated

as the sum

Z b Z c Z b

f (x)dx = f (x)dx + f (x)dx (6.85)

a a c

Other type of improper integral involves infinite limits of integration can be treated as

Z ∞ Z 1/a

f (x)dx = t−2 f (1/t)dt (6.86)

a 0

for a 6= 0. For the case when a = we can split the integral into two parts. One from 0 to c and the other

from c 6= 0 to ∞.

76

Chapter 7

7.1 introduction

An ordinary Differential Equation (ODE) is an equation containing one ore more derivatives of y.

Differential equations are classified according to their order. The order of differential equation is

the highest derivative that appears in the equation. When the the equation contains only a first

derivative, it is called first-order differential equation. A first order differential equation can be

expressed as

dy

= f (t, y) (7.1)

dt

Degree of a differential equation is the power of the highest-order derivative. For example

ty ′′ + 3t2 + 2 = 0

A differential equation is a linear equation when it does not contain terms involving the product of

the dependent variable y or its derivatives. For example y ′′ + 2y ′ + t2 is linear but y ′′ + 2y ′ y + t2 is

not.

If the order of the equation is n, we nee n conditions in order to obtain a unique solution. When

all the conditions are specified at a particular value of independent variable t, then the problem is

called initial-value problem.

It is also possible to specify the conditions at different values of t. such problems are called the

boundary-value problem.

All numerical techniques for solving differential equations involve a series of estimates of y(t) starting

from the given conditions. There are two basic approaches, one-step and multistep methods.

In one-step methods, we use information from only one preceding points. To estimate yi we only

need yi−1 .

Multistep methods use information at two or more previous steps to estimate a value.

77

7.2 Elementary Theory of Initial-Value Problems

Lipschitz condition: A function f (t, y) is said to satisfy a Lipschitz condition in the variable y on

a set D ⊂ R2 if a constant L > 0 exists with

Convex Set: A set D ∈ R2 is said to be convex if whenever (t1 , y1 ) and (t2 , y2 ) belong to D and λ

is in [0, 1], the point

((1 − λ)t1 + λt2 , (1 − λ)y1 + λy2 ) belongs to D. This means that the entire straight line segment

between the two points also belongs to the set D.

Non Convex

Convex

Theorem:

Suppose that f (t, y) is defined on a convex set D ∈ R2 . If a constant L > 0 exists with

∂f

(t, y) ≤ L, for all (t, y) ∈ D,

∂y

proof:

Theorem:

Suppose that D = {(t, y)|a ≤ t ≤ b, y ∈ R} and that f (t, y) is continuous on D. If f satisfies a

Lipschitz condition on D in the variable y, then the initial-value problem

Example:

We want to show that the following initial-value problem has a unique solution

y ′ = y cos(t), 0 ≤ t ≤ 1, y(0) = 1

78

Suppose that D = {(t, y)|a ≤ t ≤ b, y ∈ R} and that f (t, y) is continuous on D. If f satisfies a

Lipschitz condition on D in the variable y, then the initial-value problem

y ′ (t) = f (t, y), a ≤ t ≤ b, y(a) = α,

has a unique solution y(t) for a ≤ t ≤ b.

Let check that

f (t, y) = y ′ = y cos(t), 0 ≤ t ≤ 1

satisfies Lipschitz condition.

|y1 cos(t) − y2 cos(t)| = cos(t)|y1 − y2 | ≤ |y1 − y2 |

Thus, L = 1, f (t, y) satisfies the Lipschitz condition. Therefore, there is a unique solution.

example:

′ y3 + y

y =− 2

(3y + 1)t

has solution y 3 tyt = 2 for 1 ≤ t ≤ 2, and y(1) = 1.

– we calculate the derivative of y 3 + t + yt = 2 we find

3y 2 y ′ t + 3y 3 + y + y ′ t = 0

y ′ (3y 2 t + t) + y 3 + y = 0

which gives

y3 + y

y′ = −

(3y 2 + 1)t

we have also from y 3 t + yt = 2 that at t = 1 we get y 3 + y = 2. There is only one real root for

y 3 + y = 2 which is y = 1. To get y(2) we can use Newton method

f (y) = y 3 + y − 1 = 0

p3 + pn − 2

pn+1 = pn − n 2

3pn + 1

It is clear that f (0) = −1 and f (1) = 1. We can start Newton iteration from p0 = 0.5. We get

i pi

0 0.5

1 0.7142857143

2 0.6831797236

3 0.6823284233

4 0.6823278037

5 0.6823278038

6 0.6823278038

So, the approximate value of y(2) = 0.6823278038.

Exercises section 5.1: 1, 3, 5, 7

79

7.3 Euler’s Method

The objective of Euler’s method is to obtain an approximate solution to the well-posed initial-value

problem

dy

= f (x, y), a ≤ t ≤ b, y(a) = α (7.3)

dt

We can obtain approximate solutions at fixed points, called mesh points.

Let assume that the mesh points are equally distributed throughout the interval [a, b]. So, we choose

The common distance between the points h = (b − a)/N , is called the step size. To derive Euler’s

method we use Taylor’s Theorem

′ (ti+1 − ti )2 ′′

y(ti+1 ) = y(ti ) + (ti+1 − ti )y (ti ) + y (ξi )

2

h2

= y(ti ) + h y ′ (ti ) + y ′′ (ξi )

2

h2

= y(ti ) + h f (ti , y(ti )) + y ′′ (ξi ) (7.5)

2

Euler’s method constructs wi ≈ y(ti ), for each i = 1, 2, ..., N , by dropping the remainder term. Thus,

it is given by

w0 = α,

wi+1 = wi + hf (ti , wi ), for each i = 0, 1, ..., N. (7.6)

This last equation is called difference equation associated with Euler’s method.

y(0) = 0.5, and N = 10 (please see Example 1 page 259).

We have h = (b − a)/N = 0.2 and t0 = 0. The Euler’s method is:

w0 = 0.5,

wi+1 = wi + 0.2 × (wi − t2i + 1), for each i = 0, 1, ..., N

ti wi

0.0 0.05000

0.2 0.80000

0.4 1.15200

0.6 1.55040

.. ..

. .

2.0 4.86580

80

Theorem

Suppose that f is continuous an satisfies a Lipschitz condition with constant L on

and wi , (i = 0, . . . , N ) be the approximations generated by Euler’s method for some positive integer

N . Then, for each i

hM L(ti −a)

|y(ti ) − wi | ≤ e −1 (7.10)

2L

Theorem

Assume that the hypotheses of the previous theorem hold and ui (i = 0, . . . , N ) be the approximations

obtained from

u0 = α + δ0 ,

ui+1 = ui + hf (ti , ui ) + δi+1 (7.11)

1 hM δ L(ti −a)

|y(ti ) − ui | ≤ + e − 1 + |δ0 |eL(ti −a) (7.12)

L 2 h

The error bound is no longer linear in h. In fact it goes to infinity for h goes to zero.

hM δ

lim + =∞ (7.13)

h−→∞ 2 h

It can be shown that the minimum value of the error occurs when

r

2δ

h= (7.14)

M

example ex 9:

Given the initial-value problem

2

y ′ = y + t2 et , 1 ≤ t ≤ 2, y(1) = 0.

t

with the exact solution y(t) = t2 (et − e). The Euler’s method with h = 0.1 gives

81

t w(t) y(t) y(t) − w(t)

1. 0 0. 0.

1.1 0.2718281828 0.345919877 0.0740916942

1.2 0.6847555777 0.866642537 0.1818869593

1.3 1.276978344 1.607215080 0.330236736

1.4 2.093547688 2.620359552 0.526811864

1.5 3.187445123 3.967666297 0.780221174

1.6 4.620817847 5.720961530 1.100143683

1.7 6.466396379 7.963873477 1.497477098

1.8 8.809119690 10.79362466 1.984504970

1.9 11.74799654 14.32308154 2.57508500

2.0 15.39823565 18.68309709 3.28486144

A linear interpolation to approximate y(1.04) can be found as follows. We use x0 = 1 and x1 = 1.1

and the values of w(x0 ) and w(x1 ) we get the Lagrange polynomial

P (x) = 2.718281828 x − 2.718281828

Thus, P (1.04) = 0.108731273. The exact value is y(1.04) = 0.119987497 and the error is 0.011256224.

Euler’s method is Taylor’s method of order one.

For order n we write Taylor polynomial for

n

X hk (k) hn+1

y(ti+1 ) = y (ti ) + y (n+1) (ξi ) (7.15)

k=0

k! (n + 1)!

for some ξi ∈ (ti+1 , ti ) The difference-equation method correspond to previous equation can be found

by deleting the remainder term. we get the Taylor method order n:

ω0 = α,

ωn+1 = ωi + h T (n) (ti , ωi ), i = 0, 1, ..., N − 1, (7.16)

n−1

X hj

T (n) = f (ti , ωi ) + f (j) (ti , ωi )

j=1

(j + 1)!

Example:

We want to approximate the solution of the initial-valued problem

y ′ = t2 + y 2 , 0 ≤ t ≤ 0.4, y(0) = 0

with h = 0.2 using Taylor’s method of order 4.

We calculate the derivatives

y′ = t2 + y 2

y ′′ = 2t + 2yy ′

y ′′′ = 2 + 2y ′2 + 2yy ′′

y (4) = 6y ′ y ′′ + 2yy ′′′

82

we find

y(0) = 0.0

y(0.2) = 0.00266666667

y(0.4) = 0.02135325469

If we use step size h = 0.4 we get y(0.4) = 0.02133333333. The correct answer is y(0.4) = 0.021359.

It shows that the accuracy has been improved by using subintervals, i.e., decreasing the step size.

The Taylor’s method provides the formal definition of a step-by-step numerical method for solving

initial-value problems. The difficulty of applying Taylor’s method is connected with evaluating higher

derivatives which is extremely complicated. We can explore a class of methods that agree with the

first n + 1 terms of the Taylor series using function value only (no need to construct f (r) ). These are

Runge-Kutta Methods.

Runge-Kutta methods refer to a family of one-step methods. They all based on the general form of

the extrapolation equation,

= yi + m h (7.18)

where m is the slope that is weighted averages of the slopes at various points in the interval. If

we estimate m using slopes at r points in the interval (ti , ti+1 ), then m can be written as m =

w1 m1 + w2 m2 + . . . + wr mr , where wi are weights of the slopes at various points.

Runge-Kutta methods are known by their order. For instance, a Runge-Kutta method is called the

r-order Runge-Kutta method when slopes at r points are used to construct the weighted average

slope m. In Euler’s method we use only one point slope at (ti , yi ) to estimate yi+1 , and therefore,

Euler’s method is a first-order Runge-Kutta method.

Second Taylor Polynomial in two variables for the function f (t, y) near the point (t0 , y0 ) can be

written as

(t − t0 )2 (y − y0 )2

+ ftt + fyy + (t − t0 )(y − y0 )fty (7.19)

2 2

where

m1 = f (ti , yi ) (7.21)

m2 = f (ti + a1 h, yi + b1 m1 h) (7.22)

83

The weights w1 and w2 and the constants a1 and b1 are to be determined. The principle of Runge-

Kutta method is that these parameters are chosen such that the power series expension of the right

side of eq. (7.20) agrees with Taylor series expension of yi+1 in terms of yi and f (ti , yi ).

The second-order Taylor series expension of yi+1 about yi is given by

y ′′ 2

yi+1 = yi + y ′ h + h (7.23)

2

We know that

yi′ = f (ti , yi )

dy ′ ∂f ∂f

yi′′ = = + f (ti , yi )

dt ∂t ∂y

We get

h2

yi+1 = yi + f h + (ft + fy f )

2

Now consider the right side of eq. (7.20). To get Taylor’s expension we need the Taylor’s series of

two variables. we can write

yi+1 = yi + (w1 m1 + w2 m2 )h

= y1 + (w1 f + w2 f (ti + a1 h, yi + b1 m1 h))

= yi + [w1 f + w2 (f + a1 hft + b1 m1 hfy + O(h2 ))]h

= yi + w1 hf + w2 hf + w2 a1 h2 ft + w2 b1 m1 h2 fy + O(h3 )

= yi + (w1 + w2 )hf + w2 (a1 ft + b1 m1 fy )h2 + O(h3 )

h2

yi+1 = yi + f h + (ft + fy f )

2

yi+1 = yi + (w1 + w2 )hf + w2 (a1 ft + b1 m1 fy )h2

we find

1

w1 + w2 = 1, w2 a1 = w2 b 1 =

2

Note that we have only three equations but four variables. These set of equations has no unique

solution. The index i = 0, 1, . . . , N − 1.

m1 = f (ti , yi )

h m1

m2 = f (ti + , yi + h)

2 2

yi+1 = yi + m2 h

84

If we choose w1 = w2 = 1/2 and a1 = b1 = 1 we get what we call Modified Euler Method

m1 = f (ti , yi )

m2 = f (ti + h, yi + m1 h)

1

yi+1 = yi + (m1 + m2 )h

2

If we choose w1 = 1/4, w2 = 3/4 and a1 = b1 = 2/3 we get what we call Heun’s method

m1 = f (ti , yi )

2 2

m2 = f ti + h, yi + m1 h

3 3

h

yi+1 = yi + (m1 + 3m2 )h

4

The derivation of Runge-Kutta Order Four is too long, we just give it here without details.

m1 = f (ti , yi )

h 1

m2 = f ti + , yi + m1 h

2 2

h 1

m3 = f ti + , yi + m2 h (7.24)

2 2

m4 = f (ti + h, yi + m3 h)

1

yi+1 = yi + (m1 + 2m2 + 2m3 + m4 )h

6

Examples:

We want to approximate the solution of

N = (1 − 0)/h = 2

y0 = 0

ti = 0 + ih, i = 0, 1, . . . , N − 1

m1 = f (ti , yi )

m2 = f (ti + h, yi + m1 h)

h

yi+1 = yi + (m1 + m2 )h

2

we get

ti wi

0.0 0.0

0.5 0.560211134

1.0 5.301489796

85

Example:

We want that the midpoint method, the modified Euler’s method, and Heun’s method give the same

approximations to the initial-value problem

y ′ = −y + t + 1, 0 ≤ t ≤ 1, y(0) = 1

= yi + f h + w2 (a1 − b1 m1 )h2

with w1 + w2 = 1.

– Midpoint method:

1

w2 = 1, a1 = b 1 =

2

h2

yi+1 = yi + f h + (1 − m1 )

2

– Modified Euler’s method:

1

w2 = , a1 = b 1 = 1

2

h2

yi+1 = yi + f h + (1 − m1 )

2

– Heun’s method:

3 2

w2 = , a1 = b 1 =

4 3

h2

yi+1 = yi + f h + (1 − m1 )

2

Therefore, all the three methods give the same approximations.

example:

We want to find y(0.2) by using Runge-Kutta fourth order method for

y′ = 1 + y2, y(0) = 0

m1 = f (t0 , y0 ) = 1

m2 = f (t0 + h/2, y0 + hm1 /2) = 1.01000000

m3 = f (t0 + h/2, y0 + hm2 /2) = 1.01020100

m4 = f (t0 + h, y0 + m3 h) = 1.040820242

y(0.2) = y0 + h(m1 + 2m2 + 2m3 + m4 )/6 = 0.2027074080

86

7.6 Predictor Corrector Methods

In the previous methods, Euler, Heun, Taylor, and Runge-Kutta are called one-step methods because

only the value of y0 at the beginning of the interval was required. They use information from one previous

point to compute the successive point; that is, yi is needed to compute yi+1 . The Multistep methods

make use of information about the solution at more then one point. A desirable feature of multistep

method is that the local truncation error can be determined and a correction term can be included, which

improves the accuracy of the answer at each step.

Definition: An m-step multistep method for solving the initial-value problem

y ′ f (t, y), a ≤ t ≤ b, y(a) = α (7.25)

has a difference equation for finding the approximation wi+1 at the mesh point t − i + 1 represented by

the following equation, where the integer m > 1

wi+1 = am−1 wi + am−2 wi−1 + . . . + a0 wi+1−m

+h [bm f (ti+1 , wi+1 ) + bm−1 f (ti , wi ) + . . . + b0 f (ti+1−m , wi+1−m )] (7.26)

for i = m − 1, . . . , N − 1, where h = (b − 1)/N , the a0 , a1 , . . . , am−1 and b0 , . . . , bm are constant, and the

starting values w0 = α, w1 = α1 , . . . , wm−1 = αm−1 . When bm = 0 the method is called explicit, or open,

since in eq. (7.26), wi+1 is explicitly given in terms of previously determined values. When bm 6= 0 the

method is called implicit, or closed, since wi+1 occurs in both sides. Euler’s method gives

yi+1 = yi + f (t0 + ih, yi ), i = 0, 1, . . . (7.27)

The modified Euler’s method can be written as

h

yi+1 = yi + [f (ti , yi ) + f (ti+1 , yi+1 )] (7.28)

2

The value of yi+1 is first estimated by Euler’s method, eq (7.27), and then used in the right hand side of

the modified Euler’s method, eq (7.28), giving a better approximation of yi+1 . The value of yi+1 is again

substituted in the modified Euler’s method, eq (7.28), to find a still better approximation of yi+1 . This

procedure is repeated till two consecutive iterated values of yi+1 agree. The equation (7.27) is therefore

called predictor while eq (7.28) is called corrector.

We will describe only the multistep method called Adam-Bashforth-Moulton Method. It can

be derived from the fundamental theorem of calculus

Z ti+1

yi+1 = yi + f (t, y)dt (7.29)

ti

The predictor uses the Lagrange polynomial approximation for f (t, y) based on the four values, ti−3 ,

ti−2 , ti−1 , and ti .

(t − ti−2 )(t − ti−1 )(t − ti )

P4 (t) = yi−3

(ti−3 − ti−2 )(ti−3 − ti−1 )(ti−3 − ti )

(t − ti−3 )(t − ti−1 )(t − ti )

+yi−2

(ti−2 − ti−3 )(ti−2 − ti−1 )(ti−2 − ti )

(t − ti−3 )(t − ti−2 )(t − ti )

+yi−1

(ti−1 − ti−3 )(ti−1 − ti−2 )(ti−1 − ti )

(t − ti−3 )(t − ti−2 )(t − ti−1 )

+yi

(ti − ti−3 )(ti − ti−2 )(ti − ti−1 )

87

It is integrated over the interval [ti , ti+1 ].

Z ti+1

h

P4 (t)dt = (55yi − 59yi−1 + 37yi−2 − 9yi−3 ) (7.30)

ti 24

h

yi+1 = yi + (55yi − 59yi−1 + 37yi−2 − 9yi−3 ) (7.31)

24

This last equation is called Adams-Bashforth predictor. Note here that extrapolation is used.

The corrector is developed in similar way. A second Lagrange polynomial for f (t, y) is constructed

based on the four points, (ti−2 , yi−2 ), (ti−1 , yi−1 ), (ti , yi ), and the new point (ti+1 , yi+1 ) just calculated

by eq. (7.33).

P4 (t) = yi−2

(ti−2 − ti−1 )(ti−2 − ti )(ti−2 − ti+1 )

(t − ti−2 )(t − ti )(t − ti+1 )

+yi−1

(ti−1 − ti−2 )(ti−1 − ti )(ti−1 − ti+1 )

(t − ti−2 )(t − ti−1 )(t − ti+1 )

+yi

(ti − ti−2 )(ti − ti−1 )(ti − ti+1 )

(t − ti−2 )(t − ti−1 )(t − ti )

+yi+1

(ti+1 − ti−2 )(ti+1 − ti−1 )(ti+1 − ti )

Z ti+1

h

P4 (t)dt = (9yi+1 + 19yi − 5yi−1 + yi−2 ) (7.32)

ti 24

h

yi+1 => yi + (9yi+1 + 19yi − 5yi−1 + yi−2 ) (7.33)

24

We have to repeat the last equation, the corrector, till we obtain the needed accuracy.

Algorithm

(0)

– COMPUTE yi+1 , USING THE FORMULA

(0) h

yi+1 = yi + [55fi − 59fi−1 + 37fi−2 − 9fi−3 ] (7.34)

24

(0) (0)

– COMPUTE fi+1 = f (xi+1 , yn+1 )

(k)

– COMPUTE yi+1 from the equation

(k) h (k−1)

yi+1 = yi + [9f (xi+1 , yi+1 ) + 19fi − 5fi−1 + fi−2 ] (7.35)

24

88

– ITERATE ON i UNTIL

y (k) − y (k−1)

i+1 i+1

(k)

<ǫ (7.36)

y

i+1

Example:

Consider the initial-value problem

y′ = 1 + y2, 0 ≤ t ≤ 0.8, y(0) = 0

The first steps is to calculate the four initial value, w0 , w1 , w2 , and w3 . To do this we can use for

example Four-order Runge-Kutta method with t0 = 0, w0 = 0, and h = 0.2 we get

k1 = hf (t0 , y0 )

k2 = hf (t0 + h/2, y0 + k1 /2)

k3 = hf (t0 + h/2, y0 + k2 /2)

k4 = hf (t0 + h, y0 + k3 )

w1 = w0 + (k1 + 2k2 + 2k3 + k4 )/6 = 0.2027074081

In the same way we can get

w2 = 0.4227889928

w3 = 0.6841334020

so, we get the predictor

w4 = w3 + h/24 [55f (t3 , w3 ) − 59f (t2 , w2 ) + 37f (t1 , w1 ) − 9f (t0 , w0 )]

= 1.023434882

Now, we can correct the predicted value using the formula

t1 = 0.2, t2 = 0.4, t3 = 0.6, t4 = 0.8

(0)

w4 = 1.023434882

h i

(1) (0)

w4 = w3 + h/24 9f (t4 , w4 ) + 19f (t3 , w3 ) − 5f (t2 , w2 ) + f (t1 , w1 )

= 1.029690402

h i

(2) (1)

w4 = w3 + h/24 9f (t4 , w4 ) + 19f (t3 , w3 ) − 5f (t2 , w2 ) + f (t1 , w1 )

= 1.030653654

h i

(3) (2)

w4 = w3 + h/24 9f (t4 , w4 ) + 19f (t3 , w3 ) − 5f (t2 , w2 ) + f (t1 , w1 )

= 1.030653654

So, the predicted-corrector methods gives an approximate solution 1.030653654. The actual solution

of the ODE is

y(t) = tan(x)

y(0.8) = 1.029638557

The errors are

|w4 (0.8) − tan(0.8)| = 0.006203675

(3)

|w4 (0.8) − tan(0.8)| = 0.001015097

89

Chapter 8

Unique solution

No solution

No unique solution

Ill conditioned

x − 2y = −2

0.45x − 0.91y = −1

This system is ill conditioned because the two equations represent a nearly two parallel lines.

Definition: A matrix n × m can be represented by

a11 a12 . . . a1m

a21 a22 . . . a2m

A = (aij ) = .. .. ..

. . .

an1 an2 . . . anm

Linear equations can be represented by matrix. Solving linear equation can be done by three main

operations:

Row Ei can be multiplied by any nonzero constant λ.

Example

1 1 0 3 4

2 1 −1 1 1

3 −1 −1 2 −3

−1 2 3 −1 4

90

it becomes

1 1 0 3 4 1 1 0 3 4

2 1 −1 1 1

→ 0 −1 −1 −5 −7

3 −1 −1 2 −3 0 −4 −1 −7 −15

−1 2 3 −1 4 0 3 3 2 8

1 1 0 3 4

0 −1 −1 −5 −7

→ 0

0 3 13 13

0 0 0 −13 −13

The matrix becomes a triangular matrix. It is possible to solve the linear equations by a backward-

substitution process.

In general the matrix A of dimension n × n can be reduced to triangular by elementary operations.

Example:

Consider the following system

1 1 1 1 3

2 −1 −1 2 12

1 3 −2 −1 −9

−1 −1 1 4 17

After elementary operation we get

1 1 1 1 3

0 −3 −3 0 6

0 0 −5 −2 −8

0 0 0 21

5

84

5

21 84

x4 = ⇒ x4 = 4

5 5

−5x3 − 2 × 4 = −8 ⇒ x3 = 0

−3x2 − 3 × 0 = 6 ⇒ x2 = −2

x1 − 2 + 0 + 4 = 3 ⇒ x1 = 1

If the forward elimination gives the final row

00 . . . 0ann |bn

which gives ann xn = bn , the original system has a unique solution, which will be obtained by backward

substitution.

if the final row has

00 . . . 00|bn

where bn 6= 0, the system has no solution.

91

if the final row has

00 . . . 00|0

8.3 Matrix Inverse

8.4 Determinant of Matrix

8.5 Matrix Factorization

92

Chapter 9

Systems

9.2 Eigenvalues and Eigenvectors

9.3 Iterative Techniques for Solving Linear Systems

93

Chapter 10

For a polynomial represented by

f (x) = an xn + an−1 xn−1 + . . . + a1 x + a0 (10.1)

the largest possible root is given by

an−1

xl = − (10.2)

an

This value is taken as initial approximation when no other value is given.

Search Bracket: All real roots are within the interval

s

2 s 2

− an−1 an−2 an−1 an−2

−2 , −2 (10.3)

an an an an

Another relationship that suggests an interval for roots. All real roots are within the interval

1 1

−1 − Max{|a0 |, . . . , |an−1 }, 1 + Max{|a0 |, . . . , |an−1 } (10.4)

|an | |an |

1 b−a

In the bisection method, after n iteration, the root must lie within , This means that the error

2n

bound at nth iteration is

b−a

En = (10.5)

2n

Similarly,

b−a En

En+1 = n+1

= (10.6)

2 2

That is, the error decreases linearly with each step by a factor of 1/2. Therefore, the bisection method is

linearly convergent.

1

Numerical Methods, E Balagurusamy

94

10.3 Convergence of False Position Method

The false position formula is based on the linear interpolation model. One of the starting points is fixed

while the other moves towards the solution. Assume that the initial points bracketing the solution are a

and b and that a moves towards the solution and b is fixed.

Let x1 = a an p be the solution. Then,

E0 = p − p0 , E1 = p − p1 (10.7)

that is

Ei = p − pi (10.8)

It can be shown that..

Let pn be estimate of a root of the function f (x) = 0. If pn and pn+1 are close to each other, then, we can

use Taylor’s series expansion,

f ′′ (ξ)

f (pn+1 ) = f (pn ) + (pn+1 − pn )f ′ (pn ) + (pn+1 − pn )2 (10.9)

2

where, ξ lies between pn and pn+1 . Let assume that the exact root is p. Then

f ′′ (ξ)

0 = f (pn ) + (p − pn )f ′ (pn ) + (p − pn )2 (10.10)

2

Assume that f ′ (pn ) 6= 0. From Newton-Raphson formula we have

f (pn )

pn+1 = pn − ⇒ f (pn ) = (pn − pn+1 ) f ′ (pn ) (10.11)

f ′ (pn )

f ′′ (ξ)

0 = (p − pn+1 ) f ′ (pn ) + (p − pn )2 (10.12)

2

The error in the estimate xn+1 is given by

En+1 = p − pn+1

En = p − pn

f ′′ (ξ) 2

0 = En+1 f ′ (pn ) + En (10.13)

2

which leads to

f ′′ (ξ) 2

En+1 = − E (10.14)

2 f ′ (pn ) n

The last equation shows that the error is roughly proportional to the square of the error in the previous

iteration. Thus, the Newton-Raphson method is quadratically convergent.

The Newton-Raphson Method has certain limitations. The Method may fail in the following situations:

95

1. If f ′ (pn ) = 0.

2. If the initial guess is too far away from the required root, the process may converge to some other

root.

3. A practical value in the iteration sequence may repeat, resulting in an infinite loop. This occurs

when the tangent to the curve f (x) at pn+1 cuts the x-axis again at pn .

Secant method uses two initial estimates but does not require that they must bracket the root. The secant

formula of iteration is

pn − pn−1

pn+1 = pn − f (pn ) (10.15)

f (pn ) − f (pi−1 )

Let p be actual root of f (x) and En the error in the estimate of pn . Then

pi = Ei + p, for, i = n − 1, n, n + 1 (10.16)

Substituting in eq.(10.15) and simplifying, we get

En−1 f (pn ) − En f (pn−1 )

En+1 = (10.17)

f (pn ) − f (pi−1 )

According to the mean value theorem, there exist at least one point, between pn and p such that

f (pn ) − f (p) f (pn ) f (pn )

f ′ (ξn ) = = = ⇒ f (pn ) = Ei f ′ (ξn ) (10.18)

pn − p pn − p En

Similarly,

f (pn−1 ) = En−1 f ′ (ξn−1 ) (10.19)

The equation (10.17) becomes

f ′ (ξn ) − f ′ (ξn−1 )

En+1 = En En−1 (10.20)

f (pn ) − f (pi−1 )

That is,

En+1 ∝ En En−1 (10.21)

Let us now find the order of convergence of this iteration process. If the order of convergence is α then

En+1 ∝ Enα (10.22)

So, from equation (10.21), we can write

En+1 ∝ En En−1 ⇒ Enα ∝ En En−1

⇒ Enα ∝ En−1

α

En−1

α α+1

⇒ En ∝ En−1

(α+1)/α

⇒ En ∝ En−1

α (α+1)/α

⇒ En−1 ∝ En−1

96

This means that √

α+1 1± 5

=α⇒α= . (10.23)

α 2

Since α is always positive then, the order of convergence of the secant method is α = 1.618 and the

convergence is referred to as superlinear convergence.

97

Chapter 11

Exams

11.1 exam 1

Answer all questions.

1. Let x = 0.456 × 10(−2) , y = 0.134, and z = 0.920.

Use three-digit rounding arithmetic to evaluate:

(a) (x + y) + z.

(b) x + (y + z).

(10 Marks)

(6 Marks)

3. We want to evaluate the square root of 5 using the equation x2 − 5 = 0 by applying the fixed-point

iteration algorithm.

1 5

(a) Use algebraic manipulation to show that g(x) = x + has a fixed point exactly at x2 − 5 = 0

2 2x

(b) Use fixed-point theorem to show that the function g(x) converges to the unique fixed point for

any intial p0 ∈ [2, 5].

(c) Use p0 = 3 to evaluate p2 .

(8 Marks)

Z 4

4. (a) Evaluate exactly the integral ex dx.

0

98

Z 4

(b) Find an approximation to ex dx using Simpson’s rule with h = 2.

0

Z 4

(c) Find an approximation to ex dx using Composite Simpson’s rule with h = 1.

0

(d) Does the composite Simpson’s rule improve the approximation.

(8 Marks)

5. Given the equation y ′ = 3x2 +1, with y(1) = 2. Estimate y(2) by Euler’s method using with h = 0.25.

(8 Marks)

END

Horner’s Theorem Let

P (x) = an xn + an−1 xn−1 + ... + a1 x + a0 (11.1)

If bn = an and

bk = ak + bk+1 x0 , for k = n − 1, n − 2, ..., 1, 0 (11.2)

then b0 = P (x0 ),

Fixed-Point Theorem:

Let g ∈ C[a, b] be such that g(x) ∈ [a, b], for all x in [a, b]. Suppose, in addition, that g ′ exists on

(a, b) and that a constant 0 < k < 1 exists with

|g ′ (x)| ≤ k, for all x ∈ (a, b).

Then, for any number p0 in [a, b], the sequence defined by

pn = g(pn−1 ), n ≥ 1,

converges to the unique fixed point p in [a, b].

Euler’s method:

dy

To approximate the solution of the initial-value problem = f (t, y), a ≤ t ≤ b, y(a) = α at (N+1)

dt

equally spaced numbers in the interval [a, b], we construct the solution y(ti ) = wi for i = 0, 1, ..., N −1

and

w0 = α,

t0 = a,

wi+1 = wi + hf (ti , wi ),

ti = a + ih,

where h = (b − a)/N

99

11.2 exam 2

Answer all questions.

1. (a) Evaluate f (x) = x3 − 6.1x2 + 3.2x + 1.5 at x = 4.71 using three-digit rounding arithmetic.

(b) Find the relative error in (a).

(c) Use Horner’s Theorem to evaluate f (x) at x = 4.71 using three-digit rounding arityhmetic.

(d) Find the relative error in (c).

(e) Why the relative error in (c) is less than the relative error in (a).

(10 Marks)

2. Let f (x) = −x3 − cos x and p0 = −1. Use Newton’s method to find p2 .

(6 Marks)

3. (a) Let A be a given positive constant and g(x) = 2x − Ax2 . Show that if fixed-point iteration

converges to a nonzero limit, then the limit is p = 1/A, so the reciprocal of a number can be

found using only multiplications and subtractions.

1

(b) Use fixed-point iteration with p0 = 0.1 to find p2 that approximates .

11

(8 Marks)

4. Use the forward-difference and backward-difference formulas to determine each of the missing entry

in the following table:

x f (x) f ′ (x)

1.0 1.0000 ....

1.2 1.2625 ....

1.4 1.6595 ....

(8 Marks)

5. Use Euler’s method to approximate the solution for the following initial-value problem.

y ′ = et−y , 0 ≤ t ≤ 1, y(0) = 1 with h = 0.5.

(8 Marks)

END

100

Useful Formulas and Theorem

P (x) = an xn + an−1 xn−1 + ... + a1 x + a0 (11.3)

If bn = an and

then b0 = P (x0 ),

Fixed-Point Theorem:

Let g ∈ C[a, b] be such that g(x) ∈ [a, b], for all x in [a, b]. Suppose, in addition, that g ′ exists on

(a, b) and that a constant 0 < k < 1 exists with

pn = g(pn−1 ), n ≥ 1,

Euler’s method:

dy

To approximate the solution of the initial-value problem = f (t, y), a ≤ t ≤ b, y(a) = α at (N+1)

dt

equally spaced numbers in the interval [a, b], we construct the solution y(ti ) = wi for i = 0, 1, ..., N −1

and

w0 = α,

t0 = a,

wi+1 = wi + hf (ti , wi ),

ti = a + ih,

where h = (b − a)/N

101

11.3 exam 3

Answer all questions.

1. Let

ex − e−x

f (x) =

x

(a) Find lim f (x).

x→0

(c) The actual value is f (0.100) = 2.003335000 find the relative error for the value obtained in (b).

(8 Marks)

Z 1

ex dx

−1

(c) Find the maximum error from Simpson’s formula.

(8 Marks)

3. Use the forward-difference and backward-difference formulas to determine each of the missing entry

in the following table

x f (x) f ′ (x)

1.0 1.0000 ....

1.2 1.2625 ....

1.4 1.6595 ....

(8 Marks)

Z2

1

4. (a) Find The actual value of dx.

x+4

0

Z2

1

(b) Use the Trapazoidal rule to approximate dx

x+4

0

and Find the actual error.

(c) Determine the values of n and h required for the Composite Trapazoidal rule to approximate

Z2

1

dx to within 10−6 .

x+4

0

102

(8 Marks)

5. Use Euler’s method to approximate the solution for the following initial-value problem.

y ′ = et−y , 0 ≤ t ≤ 1, y(0) = 1 with h = 0.5.

(8 Marks)

END

Let f ∈ C 2 [a, b], h = (b − a)/n, and xj = a + hj, for each j = 0, 1, . . . , n. There exists a µ ∈ (a, b)

for which the composite trapezoidal rule for n subintervals can be written as

Z b " n−1

#

h X b − a 2 ′′

f (x) dx = f (a) + 2 f (xj ) + f (b) − h f (µ) (11.5)

a 2 j=1

12

Simpson’s rule

With nodes at x0 = a, x1 = a + h, x2 = b, where h = (b − a)/2, the Simpson’s rule is

Zx2

h h5

f (x)dx = [f (x0 ) + 4f (x1 ) + f (x2 )] − f (4) (ξ). (11.6)

3 90

x0

Euler’s method

dy

To approximate the solution of the initial-value problem = f (t, y), a ≤ t ≤ b, y(a) = α at (N+1)

dt

equally spaced numbers in the interval [a, b], we construct the solution y(ti ) = wi for i = 0, 1, ..., N −1

and

w0 = α,

t0 = a,

wi+1 = wi + hf (ti , wi ),

ti = a + ih,

where h = (b − a)/N

103