Num Math

CHAPTER 1 (4 LECTURES)
NUMERICAL ALGORITHMS AND ROUNDOFF ERRORS
1. Numerical analysis
Numerical analysis is the branch of mathematics which study and develop the algorithms that
use numerical approximation for the problems of mathematical analysis (continuous mathematics).
Numerical technique is widely used by scientists and engineers to solve their problems. A major
advantage for numerical technique is that a numerical answer can be obtained even when a problem
has no analytical solution. However, result from numerical analysis is an approximation,
in general,
which can be made as accurate as desired. For example to find the approximate values of 2, etc.
In this chapter, we introduce and discuss some basic concepts of scientific computing. We begin
with discussion of floating-point representation and then we discuss the most fundamental source of
imperfection in numerical computing namely roundoff errors. We also discuss source of errors and then
stability of numerical algorithms.
2. Numerical analysis and the art of scientific computing
Scientific computing is a discipline concerned with the development and study of numerical algorithms for solving mathematical problems that arise in various disciplines in science and engineering.
Typically, the starting point is a given mathematical model which has been formulated in an attempt
to explain and understand an observed phenomenon in biology, chemistry, physics, economics, or any
engineering or scientific discipline. We will concentrate on those mathematical models which are continuous (or piecewise continuous) and are difficult or impossible to solve analytically: this is usually
the case in practice. Relevant application areas within computer science include graphics, vision and
motion analysis, image and signal processing, search engines and data mining, machine learning, hybrid
and embedded systems, and many more. In order to solve such a model approximately on a computer,
the (continuous, or piecewise continuous) problem is approximated by a discrete one. Continuous functions are approximated by finite arrays of values. Algorithms are then sought which approximately
solve the mathematical problem efficiently, accurately and reliably.
3. Floating-point representation of numbers
Any real number is represented by an infinite sequence of digits. For example

2
6
6
8
= 2.66666 =
+
+
+ . . . 101 .
3
101 102 103
This is an infinite series, but computer use an finite amount of memory to represent numbers. Thus
only a finite number of digits may be used to represent any number, no matter by what representation
method.
For example, we can chop the infinite decimal representation of 83 after 4 digits,
8
2
6
6
6
= ( 1 + 2 + 3 + 4 ) 101 = 0.2666 101 .
3
10
10
10
10
Generalizing this, we say that number has n decimal digits and call this n as precision.
For each real number x, we associate a floating point representation denoted by f l(x), given by
f l(x) = (0.a1 a2 . . . an ) e ,
here based fraction is called mantissa with all ai integers and e is known as exponent. This representation is called based floating point representation of x.
For example,
42.965 = 4 101 + 2 100 + 9 101 + 6 102 + 5 103
= 42965 102 .
1
NUMERICAL ALGORITHMS AND ERRORS
0.00234 = 0.234 102 .

0 is written as 0.00 . . . 0 e . Likewise, we can use for binary number system and any real x can be
written
x = q 2m
with 12 q 1 and some integer m. Both q and m will be expressed in terms of binary numbers.
For example,
1001.1101 = 1 23 + 2 20 + 1 21 + 1 22 + 1 24
= (9.8125)10 .
Remark: The above representation is not unique.
For example, 0.2666 101 = 0.02666 102 .
Definition 3.1 (Normal form). A non-zero floating-point number is in normal form if the values of
mantissa lies in (1, 1 ) or [ 1 , 1].
Therefore, we normalize the representation by a1 6= 0. Not only the precision is limited to a finite
number of digits, but also the range of exponent is also restricted. Thus there are integers m and M
such that m e M .
3.1. Rounding and chopping. Let x be any real number and f l(x) be its machine approximation.
There are two ways to do the cutting to store a real number
x = (0.a1 a2 . . . an an+1 . . . ) e ,
a1 6= 0.
(1) Chopping: We ignore digits after an and write the number as following in chopping
f l(x) = (.a1 a2 . . . an ) e .
(2) Rounding: Rounding is defined as following

(0.a1 a2 . . . an ) e , 0 an+1 < /2
(rounding down)
f l(x) =
(0.a1 a2 . . . an ) + (0.00 . . . 01) e , /2 an+1 < (rounding up).
Example 1.

6
0.86 100 (rounding)
=
fl
0.85 100 (chopping).
7
Rules for rounding off numbers:
(1) If the digit to be dropped is greater than 5, the last retained digit is increased by one. For example,
12.6 is rounded to 13.
(2) If the digit to be dropped is less than 5, the last remaining digit is left as it is. For example,
(3) If the digit to be dropped is 5, and if any digit following it is not zero, the last remaining digit is
increased by one. For example,
(4) If the digit to be dropped is 5 and is followed only by zeros, the last remaining digit is increased
by one if it is odd, but left as it is if even. For example,
11.5 is rounded to 12, and 12.5 is rounded to 12.
Definition 3.2 (Absolute and relative error). If f l(x) is the approximation to the exact value x, then
|x f l(x)|
.
the absolute error is |x f l(x)|, and relative error is
|x|
Remark: As a measure of accuracy, the absolute error may be misleading and the relative error is more
meaningful.
Definition 3.3 (Overflow and underflow). An overflow is obtained when a number is too large to fit
into the floating point system in use, i.e e > M . An underflow is obtained when a number is too small,
i.e e < m . When overflow occurs in the course of a calculation, this is generally fatal. But underflow
is non-fatal: the system usually sets the number to 0 and continues. (Matlab does this, quietly.)
4. Errors in numerical approximations

Let x be any real number we want to represent in a computer. Let f l(x) be the representation of x
|x f l(x)|
in the computer then what is largest possible values of
? In the worst case, how much data
|x|
we are losing due to round-off errors or chopping errors?
Chopping errors: Let
X
ai
i
x = (0.a1 a2 . . . an an+1 . . . ) e =
!
e , a1 6= 0.
i=1
n
X
ai
i
f l(x) = (0.a1 a2 . . . an ) e =
!
e.
i=1
Therefore
X
ai
i
|x f l(x)| =
!
e
i=n+1
e |x f l(x)| =
X
ai
.
i
i=n+1
Now since each ai 1,

therefore,
X
1
|x f l(x)|
i
i=n+1

1
1
= ( 1) n+1 + n+2 + . . .
" 1 #
e
= ( 1))
n+1
1 1
= n .
Now
|x| = (0.a1 a2 . . . an ) e
Therefore
1
e.
|x f l(x)|
n e
1
1n .
|x|
e
Rounding errors: For rounding

n

P ai
e , an+1 < /2
(0.a1 a2 . . . an ) =
i

i=1

f l(x) =
n a
P
1
i
e
e , an+1 /2.
(0.a1 a2 . . . an1 [an + 1]) = n +
i
i=1
For an+1 < /2,
e |x f l(x)| =
X
X
ai
an+1
ai
=
+
i
n+1
i=n+1
i=n+2
X
/2 1
( 1)
+
n+1
i
i=n+2
/2 1
1
1
+ n+1 = n .
n+1
For an+1 /2,

X a

1

i
e |x f l(x)| =

i
n

i=n+1
1
X
ai

= n

i
i=n+1
1

X
a
a

n+1
i
n n+1

i
i=n+2

1
an+1

n n+1
Since an+1 /2, therefore

1

/2
|x f l(x)| n n+1
1
= n .
2
Therefore, for both cases

|x f l(x)|
1 en
.
2
Now
|x f l(x)|
1 n e
1
= 1n .
1
e
|x|
2
2
5. Significant Figures
All measurements are approximations. No measuring device can give perfect measurements without
experimental uncertainty. By convention, a mass measured to 13.2 g is said to have an absolute
uncertainty of plus or minus 0.1 g and is said to have been measured to the nearest 0.1 g. In other
words, we are somewhat uncertain about that last digit-it could be a 2; then again, it could be a
1 or a 3. A mass of 13.20 g indicates an absolute uncertainty of plus or minus 0.01 g.
The number of significant figures in a result is simply the number of figures that are known with some
degree of reliability.
The number 25.4 is said to have 3 significant figures. The number 25.40 is said to have 4 significant
figures
Rules for deciding the number of significant figures in a measured quantity:
(1) All nonzero digits are significant:
1.234 has 4 significant figures, 1.2 has 2 significant figures.
(2) Zeros between nonzero digits are significant: 1002 has 4 significant figures.
(3) Leading zeros to the left of the first nonzero digits are not significant; such zeros merely indicate
the position of the decimal point: 0.001 has only 1 significant figure.
(4) Trailing zeros that are also to the right of a decimal point in a number are significant: 0.0230 has
3 significant figures.
(5) When a number ends in zeros that are not to the right of a decimal point, the zeros are not necessarily significant: 190 may be 2 or 3 significant figures, 50600 may be 3, 4, or 5 significant figures.
The potential ambiguity in the last rule can be avoided by the use of standard exponential, or scientific, notation. For example, depending on whether the number of significant figures is 3, 4, or 5, we
would write 50600 calories as:
0.506 106 (3 significant figures)
0.5060 106 (4 significant figures), or
0.50600 106 (5 significant figures).
What is an exact number? Some numbers are exact because they are known with complete certainty.
Most exact numbers are integers: exactly 12 inches are in a foot, there might be exactly 23 students in
a class. Exact numbers are often found as conversion factors or as counts of objects. Exact numbers
can be considered to have an infinite number of significant figures. Thus, the number of apparent
significant figures in any exact number can be ignored as a limiting factor in determining the number
of significant figures in the result of a calculation.
6. Rules for mathematical operations

In carrying out calculations, the general rule is that the accuracy of a calculated result is limited
by the least accurate measurement involved in the calculation. In addition and subtraction, the result
is rounded off so that it has the same number of digits as the measurement having the fewest decimal
places (counting from left to right). For example,
100 (assume 3 significant figures) +23.643 (5 significant figures) = 123.643,
which should be rounded to 124 (3 significant figures). However, that it is possible two numbers have
no common digits (significant figures in the same digit column).
In multiplication and division, the result should be rounded off so as to have the same number of
significant figures as in the component with the least number of significant figures. For example,
3.0 (2 significant figures ) 12.60 (4 significant figures) = 37.8000
which should be rounded to 38 (2 significant figures).
Let X = f (x1 , x2 , . . . , xn ) be the function having n variables. To determine the error X in X due to
the errors x1 , x2 , . . . , xn , respectively.
X + X = f (x1 + x1 , x2 + x2 , . . . , xn + xn ).
Error in addition of numbers. Let X = x1 + x2 + + xn .
Therefore
X + X = (x1 + x1 ) + (x2 + x2 ) + + (xn + xn )
= (x1 + x2 + + xn ) + (x1 + x2 + + xn )
Therefore absolute error,
|X| |x1 | + |x2 | + + |xn |.
Dividing by X we get,

xn
X x1 x2

X X + X + ... X
which is a maximum relative error. Therefore it shows that when the given numbers are added then
the magnitude of absolute error in the result is the sum of the magnitudes of the absolute errors in
that numbers.
Error in subtraction of numbers. As in the case of addition, we can obtain the maximum absolute
errors for subtraction of numbers
|X| |x1 | + |x2 |.
Also

X x1 x2

X X + X
which is a maximum relative error in subtraction of numbers.
Error in product of numbers. Let X = x1 x2 . . . xn then using the general formula for error
X
X
X
X = x1
+ x2
+ + xn
.
x1
x2
xn
We have
X
x1 X
x2 X
xn X
=
+
+ +
.
X
X x1
X x2
X xn
Now
1 X
x 2 x 3 . . . xn
1
=
=
X x1
x1 x2 x3 . . . xn
x1
1 X
x 1 x 3 . . . xn
1
=
=
X x2
x1 x2 x3 . . . xn
x2
1 X
x1 x2 . . . xn1
1
=
=
.
X xn
x1 x2 x3 . . . xn
xn
Therefore
X
x1 x2
xn
=
+
+ +
.
X
x1
x2
xn
Therefore maximum relative and absolute errors are given by

X x1 x2
xn

+
+ +

Er =
xn .
X x1 x2

X
X

(x1 x2 . . . xn ).
Ea =
X
X
X
x1
then
x2
X
X
X = x1
+ x2
.
x1
x2
Error in division of numbers. Let X =
We have
Therefore relative error
and absolute error
X
x2 X
x1 x2
x1 X
+
=
.
=
X
X x1
X x2
x1
x2

X x1 x2

,

Er =
+
X x1 x2

X
X.
Ea =
X
Example 2. Add the following floating-point numbers 0.4546e3 and 0.5433e7.

Sol. This problem contains unequal exponent. To add these floating-point numbers, take operands
with the largest exponent as,
0.5433e7 + 0.0000e7 = 0.5433e7.
(Because 0.4546e3 changes in the same operand as 0.0000e7).
Example 3. Add the following floating-point numbers 0.6434e3 and 0.4845e3.
Sol. This problem has an equal exponent but on adding we get 1.1279e3, that is, mantissa has 5
digits and is greater than 1, thats why it is shifted right one place. Hence we get the resultant value
0.1127e4.
Example 4. Subtract the following floating-point numbers:
1. 0.5424e 99 from 0.5452e 99
2. 0.3862e 7 from 0.9682e 7
Sol. On subtracting we get 0.0028e99. Again this is a floating-point number but not in the normalized
form. To convert it in normalized form, shift the mantissa to the left. Therefore we get 0.28e 101.
This condition is called an underflow condition.
Similarly, after subtraction we get 0.5820e 7.
Example 5. Multiply the following floating point numbers:
1. 0.1111e74 and 0.2000e80
2. 0.1234e 49 and 0.1111e 54
Sol. 1. On multiplying we obtain 0.1111e74 0.2000e80 = 0.2222e153. This shows overflow condition
of normalized floating-point numbers.
2. Similarly second multiplication gives 0.1370e 104, which shows the underflow condition of floatingpoint number.
Example 6. The error in the measurement of area of a circle is not allowed to exceed 0.5%. How
accurately the radius should be measured.
Sol. Area of the circle = r2 (say)
A
= 2r.
r
A
Percentage error in A =
100 = 0.5
A
0.5
A = 1/200r2
100
100 A
r
= 0.25.
100 =
Percentage error in r =
r
r A
r
Therefore A =
7.342
Example 7. Find the relative error in calculation of
, where numbers 7.342 and 0.241 are correct
0.241
to three decimal places. Determine the smallest interval in which true result lies.
x1
7.342
Sol. Let
=
= 30.467
x2
0.241
Here errors x1 = x2 = 21 103 = 0.0005.
Therefore relative error

0.0005 0.0005

= 0.0021

+
Er
7.342 0.241
Absolute error
x1
Ea 0.0021
= 0.0639
x2
7.342
Hence true value of
lies between 30.4647 0.0639 = 30.4008 and 30.4647 + 0.0639 = 30.5286.
0.241
7. Loss of significance, stability and conditioning
Roundoff errors are inevitable and difficult to control. Other types of errors which occur in computation may be under our control. The subject of numerical analysis is largely preoccupied with
understanding and controlling errors of various kinds. Here we examine some of them.
7.1. Loss of significance. One of the most common error-producing calculations involves the cancellation of significant digits due to the subtractions nearly equal numbers (or the addition of one very
large number and one very small number). The phenomenon can be illustrated with the following
example.
Example 8. If x = 0.3721478693 and y = 0.3720230572. What is the relative error in the computation
of x y using five decimal digits of accuracy?
Sol. We can compute with ten decimal digits of accuracy and can take it as exact.
x y = 0.0001248121.
Both x and y will be rounded to five digits before subtraction. Thus
f l(x) = 0.37215
f l(y) = 0.37202.
f l(x) f l(y) = 0.13000 103 .
Relative error, therefore is
(x y) (f l(x) f l(y)
.04% = 4%.
xy
Example 9. Consider the stability of x + 1 1 when x is near 0. Rewrite the expression to rid it
of subtractive cancellation.
Sol. Suppose that x = 1.2345678 105 . Then x + 1 1.000006173. If our computer (or calculator)
can only keep 8 significant digits, this will be rounded to 1.0000062. When 1 is subtracted, the result
is 6.2 106 .
Thus 6 significant digits have been lost from the original. To fix this, we rationalize the expression
x+1+1
x
x + 1 1 = ( x + 1 1)
=
.
x+1+1
x+1+1
This expression has no subtractions, and so is not subject to subtractive cancelling. When x =
1.2345678 105 , this expression evaluates approximately as
Er =
1.2345678 105
= 6.17281995 106
2.0000062
on a machine with 8 digits, there is no loss of precision.

Example 10. Find the solution of the following equation using floating-point arithmetic with 4-digit
mantissa
x2 1000x + 25 = 0.
Sol. Given that,
x2 1000x +
25 = 0
1000 106 102
= x =
2
Now in four digit mantissa
106 = 0.1000e7 & 102 = 0.1000e3
Therefore
p
106 102 = 0.1000e4
Hence roots are given by

x1 =
and
0.1000e4 + 0.1000e4
2

= 0.1000e4

0.1000e4 0.1000e4
x2 =
= 0.0000e4
2
One of the roots becomes zero due to the limited
precision allowed in computation. In this equation
since b2 is much larger than 4ac. Hence b and b2 4ac become two equal numbers. Calculation of x2
involves the subtraction of nearly two equal numbers which will cause serious loss of significant figures.
To obtain a more accurate 4-digit rounding approximation for x2 , we change the formulation by
rationalizing the numerator or we know that in quadratic equation ax2 + bx + c = 0, the product of
the roots is given by c/a, therefore the smaller root may be obtained by dividing (c/a) by the largest
root. Therefore first root is given by 0.1000e4 and second root is given as
25
0.2500e2
=
= 0.2500e 1.
0.1000e4
0.1000e4
Example 11. The quadratic formula is used for computing the roots of equation ax2 +bx+c = 0, a 6= 0
and roots are given by
b b2 4ac
.
x=
2a
Consider the equation x2 + 62.10x + 1 = 0 and discuss the numerical results.
Sol. Using quadratic formula and 8-digit rounding arithmetic, we obtain two roots
x1 = .01610723
x2 = 62.08390.
We use these
values asexact values. Now
we perform calculations with 4-digit rounding arithmetic.
We have b2 4ac = 62.102 4.000 = 3856 4.000 = 62.06 and
62.10 + 62.06
f l(x1 ) =
= 0.02000.
2.000
The relative error in computing x1 is
|f l(x1 ) x1 |
| 0.02000 + .01610723|
=
= 0.2417.
|x1 |
| 0.01610723|
In calculating x2 ,
62.10 62.06
f l(x2 ) =
= 62.10.
2.000
The relative error in computing x2 is
|f l(x2 ) x2 |
| 62.10 + 62.08390|
=
= 0.259 103 .
|x2 |
| 62.08390|
In this equation since b2 = 62.102 is much larger than 4ac = 4. Hence b and b2 4ac become two
equal numbers. Calculation of x1 involves the subtraction of nearly two equal numbers but x2 involves
the addition of the nearly equal numbers which will not cause serious loss of significant figures.
To obtain a more accurate 4-digit rounding approximation for x1 , we change the formulation by
rationalizing the numerator, that is,
2c
x1 =
.
b + b2 4ac
Then
2.000
f l(x1 ) =
= 2.000/124.2 = 0.01610.
62.10 + 62.06
The relative error in computing x1 is now reduced to 0.62103 . However, if rationalize the numerator
in x2 to get
2c
.
x2 =
b b2 4ac
The use of this formula results not only involve the subtraction of two nearly equal numbers but also
division by the small number. This would cause degrade in accuracy.
f l(x2 ) =
2.000
= 2.000/.04000 = 50.00
62.10 62.06
The relative error in x2 becomes 0.19.

Example 12. How to evaluate y x sin x, when x is small.
Sol. Since x sin x, x is small. This will cause loss of significant figures. Alternatively, if we use
Taylor series for sin x, we obtain
y = x (x
x3 x5 x7
+
+ ...)
3!
5!
7!
x3
x5
x7
+
...
6
6 20 6 20 42

x2
x2
x2
x3
1 (1 (1 )(...)) .
=
6
20
42
72
=
7.2. Conditioning. The words condition and conditioning are used to indicate how sensitive the
solution of a problem may be to small changes in the input data. A problem is ill-conditioned if
small changes in the data can produce large changes in the results. For a certain types of problems, a
condition number can be defined. If that number is large, it indicates an ill-conditioned problem. In
contrast, if the number is modest, the problem is recognized as a well-conditioned problem.
The condition number can be calculated in the following manner:
K=
For example, if f (x) =
relative change in output

relative change in input

f (x) f (x )

f (x)

=
x x

x
0

xf (x)

,

f (x)
10
, then the condition number can be calculated as
1 x2
0

2
xf (x)

= 2x .
K=
f (x) |1 x2 |
Condition number can be quite large for |x| 1. Therefore, the function is ill-conditioned.
10
7.3. Stability of an algorithm. Another theme that occurs repeatedly in numerical analysis is the
distinction between numerical algorithms are stable and those that are not. Informally speaking, a
numerical process is unstable if small errors made at one stage of the process are magnified and propagated in subsequent stages and seriously degrade the accuracy of the overall calculation.
An algorithm can be thought of as a sequence of problems, i.e. a sequence of function evaluations.
In this case we consider the algorithm for evaluating f (x) to consist of the evaluation of the sequence
x1 , x2 , , xn . We are concerned with the condition of each of the functions f1 (x1 ), f2 (x2 ), , fn1 (xn1 )
where f (x) = fi (xi ) for all i. An algorithm is unstable if any fi is ill-conditioned, i.e. if any fi (xi ) has
condition much worse than f (x). Consider the example
f (x) = x + 1 x
so that there is potential loss of significance when x is large. Taking x = 12345 as an example, one
possible algorithm is
x0
x1
x2
x3
f (x) := x4
:
:
:
:
:
=
=
=
=
=
x = 12345
x0 + 1
x
1
x0
x2 x3 .
The loss of significance occurs with the final subtraction. We can rewrite the last step in the form
f3 (x3 ) = x2 x3 to show how the final answer depends on x3 . As f30 (x3 ) = 1, we have the condition

x3 f30 (x3 ) x3

K(x3 ) =
=
f3 (x3 ) x2 x3
from which we find K(x3 ) 2.2 104 when x = 12345. Note that this is the condition of a subproblem
arrived at during the algorithm. To find an alternative algorithm we write
x+1+ x
1
f (x) = ( x + 1 x)
=
x+1+ x
x+1+ x
This suggests the algorithm
x0
x1
x2
x3
x4
f (x) := x5
:
:
:
:
:
:
=
=
=
=
=
=
x = 12345
x0 + 1
x
1
x0
x2 + x3
1/x4 .
In this case f3 (x3 ) = 1/(x2 + x3 ) giving a condition for the subproblem of

x3 f30 (x3 ) x3

K(x3 ) =
=
f3 (x3 ) x2 + x3
which is approximately 0.5 when x = 12345, and indeed in any case where x is much larger than 1.
Thus first algorithm is unstable and second is stable for large values of x. In general such analyses are
not usually so straightforward but, in principle, stability can be analysed by examining the condition
of a sequence of subproblems.
Exercises
(1) Determine the number of significant digits in the following numbers: 123, 0.124, 0.0045,
0.004300, 20.0045, 17001, 170.00, and 1800.
(2) Find the absolute, percentage, and relative errors if x = 0.005998 is rounded-off to three decimal
digits.
(3) Round-off the following numbers correct to four significant figures: 58.3643, 979.267, 7.7265,
56.395, 0.065738 and 7326853000.
11
(4) The following numbers are given in a decimal computer with a four digit normalized mantissa:
A = 0.4523e 4, B = 0.2115e 3, and C = 0.2583e1.
Perform the following operations, and indicate the error in the result, assuming symmetric
rounding:
(i) A + B + C (ii) A B (iii) A/C (iv) AB/C.
(5) Assume 3-digit mantissa with rounding
(i) Evaluate y = x3 3x2 + 4x + 0.21 for x = 2.73.
(ii) Evaluate y = [(x 3)x + 4]x + 0.21 for x = 2.73.
Compare and discuss the errors obtained in part (i) and (ii).
(6) Associativity not necessarily hold for floating point addition (or multiplication).
Let a = 0.8567 100 , b = 0.1325 104 , c = 0.1325 104 , then a + (b + c) = 0.8567 100 ,
and (a + b) + c) = 0.1000 101 .
The two answers are
NOT
the same!
Show the calculations.
(7) Calculate the sum of 3, 5, and 7 to four significant digits and find its absolute and relative
errors.
(8) Rewrite ex cos x to be stable when x is near 0.
(9) Find the smaller root of the equation
x2 400x + 1 = 0
using four digits rounding arithmetic.
(10) Discuss the condition number of the polynomial function f (x) = 2x2 + x 1.
(11) Suppose that a function ln is available to compute the natural logarithm of its argument.
Consider the calculation of ln(1 + x), for small x, by the following algorithm
x0 : = x
x1 : = x0
f (x) := x2 : = ln(x1 )
By considering the condition K(x1 ) of the subproblem of evaluating ln(x1 ), show that such a
function ln is inadequate for calculating ln(1 + x) accurately.
Bibliography
[Atkinson]
[Conte]
K. Atkinson and W. Han, Elementary Numerical Analysis, John Willey and Sons, Third
edition, 2004.
Samuel D. Conte and Carle de Boor, Elementary Numerical Analysis: An Algorithmic
Approach, Third edition, McGraw-Hill, New York, 1980.
ROOTS OF NON-LINEAR EQUATIONS
1. Introduction
Finding one or more root of the equation
f (x) = 0
is one of the more commonly occurring problems of applied mathematics. In most cases explicit
solutions are not available and we must be satisfied with being able to find a root to any specified
degree of accuracy. The numerical procedures for finding the roots are called iterative methods.
Definition 1.1 (Simple and multiple root). A root having multiplicity one is called a simple root. For
example, f (x) = (x 1)(x 2) has a simple root at x = 1 and x = 2, but g(x) = (x 1)2 has a root
of multiplicity 2 at x = 1, which is therefore not a simple root.
A multiple root is a root with multiplicity m 2 is called a multiple point or repeated root. For example,
in the equation (x 1)2 = 0, x = 1 is multiple (double) root.
If a polynomial has a multiple root, its derivative also shares that root.
Let be a root of the equation f (x) = 0, and imagine writing it in the factored form
f (x) = (x )m (x)
with some integer m 1 and some continuous function (x) for which () 6= 0. Then we say that
is a root of f (x) of multiplicity m.
Definition 1.2 (Convergence). A sequence {xn } is said to be converge to a point with order p if
there is exist a constant c such that
|xn+1 |
n 0.
lim
p = c,
n |xn |
The constant c is known as asymptotic error constant.
Two cases are given special attention.
(i) If p = 1 (and c < 1), the sequence is linearly convergent.
(ii) If p = 2, the sequence is quadratically convergent.
Definition 1.3. Let {n } is a sequence which converges to zero and {xn } is any sequence. If there
exists a constant c > 0 and integer N > 0 such that
|xn | c|n |,
n N,
then we say that {xn } converges to .

Now we study some iterative methods to solve the non-linear equations.
2. The Bisection Method
2.1. Method. Let f (x) be a continuous function on some given interval [a, b] and it satisfies the
condition f (a) f (b) < 0, then by Intermediate value theorem the function f (x) must have at least one
root in [a, b]. The bisection method repeatedly bisects the interval [a, b] and then selects a subinterval
in which a root must lie for further processing. It is a very simple and robust method, but it is also
relatively slow. Usually [a, b] is chosen to contain only root , but the following algorithm for the
bisection method will always converge to some root in [a, b] as f (a) f (b) < 0.
Algorithm: To determine a root of f (x) = 0 that is accurate within a specified tolerance value ,
given values a and b such that f (a) f (b) < 0.
a+b
Define c =
.
2
If f (a) f (c) < 0, then set b = c, otherwise a = c.
1
Table 1. Iterations in bisection method

k
1
2
3
4
5
ak1
0
0
0
0.125
0.1875
bk1
1
0.5
0.25
0.25
0.25
ck
0.5
0.25
0.125
0.1875
0.21875
f (ak1 ) f (ck )
<0
<0
>0
>0
<0
End if.
Until |a b| (tolerance value).
Print root as c.
Example 1. Perform the five iterations of the bisection method to obtain the smallest root of the
equation x3 5x + 1 = 0.
Sol. We write f (x) = x3 5x + 1 = 0.
Since f (0) > 0 and f (1) < 0,
= the smallest root lies in the interval (0, 1). 1 Taking a0 = 0 and b0 = 1, we obtain c1 = 12 (a0 +b0 ) =
0.5.
Now f (c1 ) = 1.375, f (a0 ) f (c1 ) < 0.
This implies root lies in the interval [0, 0.5].
Now we take a1 = 0 and b1 = 0.5, then c2 = 12 (a1 + b1 ) = 0.25
f (c2 ) = 0.2343, and f (a1 ) f (c2 ) < 0,
which implies root lies in interval [0, 0.25].
Similarly applying the same procedure, we can obtain the other iterations as given in the following
Table.
0.1875 + 0.21875
= 0.203125 as root .
Root lies in (0.1875, 0.21875), and we take the mid point
2
2.2. Convergence analysis. Now we analyze the convergence of the iterations.
Theorem 2.1. Suppose that f C[a, b] and f (a)f (b) < 0. The Bisection method generates a sequence
{ck } approximating a zero of f with linear convergence.
Proof. Let [a0 , b0 ], [a1 , b1 ], ... denote the successive intervals produced by the bisection algorithm.
Thus
a = a0 a1 a2 b0 = b
b = b0 b1 b2 a0 = a.
This implies {an } and {bn } are monotonic and bounded and hence convergent.
Since
1
b1 a1 = (b0 a0 )
2
1
1
b2 a2 = (b1 a1 ) = 2 (b0 a0 )
2
2
........................
1
bn an = n (b0 a0 ).
2
Hence
lim (bn an ) = 0.
n
Take limit
lim an = lim bn = (say).
1Choice of Initial approximations: Initial approximations to the root are often known from the physical significance
of the problem. Graphical methods are used to find the zero of f (x) = 0 and any value in the neighborhood of root can
be taken as initial approximation.
If the given equation f (x) = 0 can be written as f1 (x) = f2 (x) = 0, then the point of the intersection of the graphs
y = f1 (x) and y = f2 (x) gives the root of the equation. Any value in the neighborhood of this point can be taken as
initial approximation.
Since f is continuous function, therefore

lim f (an ) = f ( lim an ) = f ().
The bisection method ensures that

f (an )f (bn ) < 0
which implies
lim f (an )f (bn ) = f 2 () < 0
= f () = 0.
i.e. limit of {an } and {bn } is a zero of [a, b].
Let cn+1 = 21 (an + bn )
Then
1
1
| cn+1 | = | lim an (an + bn )| |bn (an + bn )|
n
2
2
(since an bn , n)
1
1
= |bn an | = n+1 |b0 a0 |.
2
2
By definition of convergence, we can say that the bisection method converges linearly with rate 12 .
Note: 1. From the statement of the bisection algorithm, it is clear that the algorithm always converges,
however, can be very slow.
ak + bk
2. Computing ck : It might happen that at a certain iteration k, computation of ck =
will
2
give overflow. It is better to compute ck as:
bk ak
ck = ak +
.
2
Stopping Criteria: Since this is an iterative method, we must determine some stopping criteria that
will allow the iteration to stop. Criterion |f (ck )| very small can be misleading since it is possible to
have |f (ck )| very small, even if ck is not close to the root.
Lets now find out what is the minimum number of iterations N needed with the bisection method
b0 a0
to achieve a certain desired accuracy. The interval length after N iterations is
. So, to obtain
2N
b0 a0
an accuracy of , we must have
. That is,
2N
2N (b0 a0 ) ,
or
N
log(b0 a0 ) log
.
log 2
Note the number N depends only on the initial interval [a0 , b0 ] bracketing the root.
Example 2. Find the minimum number of iterations needed by the bisection algorithm to approximate
the root in interval [2.5, 4] of x3 6x2 + 11x 6 = 0 with error tolerance 103 .
Sol. Number of iterations
N
log(4 2.5) log(103 )

= 10.55.
log 2
Thus, a minimum of 11 iterations will be needed to obtain the desired accuracy using the bisection
method.
3. Iteration method based on first degree equation

3.1. The Secant Method. Let f (x) = 0 be the given non-linear equation.
Let y = ax + b, be the equation of the secant line joining two points (x0 , f (x0 )) and (x1 , f (x1 )) on the
curve y = f (x).
Let intersection point of the secant line with the x-axis is (x2 , 0).
= x2 = b/a, a 6= 0.
Here x0 and x1 are two approximations of the root and then we can determine a and b. Now for
intersection points
f0 = f (x0 ) = ax0 + b.
f1 = f (x1 ) = ax1 + b.
Solving these two equations, we obtain
f1 f0
x1 x0
x1 f0 x0 f1
b=
.
x1 x0
The next approximation to the root is given by
b
x0 f1 x1 f0
x2 = =
a
f1 f0
a=
x1 x0
f1 .
f1 f0
This is called the secant or chord method and successive iterations are given by
xk xk1
xk+1 = xk
fk , k = 1, 2, . . .
fk fk1
x2 = x1
Geometrically, in this method we replace the unknown function by a straight line or chord passing
through (xk1 , fk1 ) and (xk , fk ) and we take the point of intersection of the straight line with the
x-axis as the next approximation to the root.
Figure 1. Secant method

Algorithm:
1. Give inputs and take two initial guesses x0 and x1 .
2. Start iterations
x1 x0
x2 = x1
f0 .
f1 f0
3. If |f (x2 )| < (error tolerance) then stop and print the root.
4. Repeat the iterations (step 2). Also check if the number of iterations has exceeded the maximum
number of iterations.
Example 3. Apply secant method to find the root of the equation
cos x x ex = 0.
Sol. Let f (x) = cos x x ex = 0.

The successive iterations of the secant method are given by
xk xk1
xk+1 = xk
fk , k = 1, 2,
fk fk1
As f (0)f (1) < 0, we take initial guesses x0 = 0 and x1 = 1, and obtain
x2 = 0.3146653378
x3 = 0.4467281466
etc.
3.2. Convergence analysis.
Theorem 3.1. Let f C 0 [a, b]. If is a simple root of f (x) = 0, then secant method generates a
sequence {xn } converging to for any initial approximation x0 near to .
Proof. We assume that is a simple root of f (x) = 0 then f () = 0.
Let xk = + k .
An iterative method is said to has order of convergence p if
|xk+1 | = C |xk |p .
Or equivalently
|k+1 | = C|k |p .
Successive iteration in secant method are given by
xk xk1
fk k = 1, 2, . . .
xk+1 = xk
fk fk1
Error equation is written as
k+1 = k
k k1
f ( + k ).
f ( + k ) f ( + k1 )
By expanding f ( + k ) and f ( + k1 ) in Taylor series, we obtain the error equation

1 2 00
0
(k k1 ) k f () + k f () + . . .
2

k+1 = k
1
2
2
00
0
(k k1 ) f () + (k k1 ) f () + . . .
2

1
00
1 2 f ()
1
f 00 ()
= k k + k 0
+ . . . 1 + (k1 + k ) 0
+ ...
2 f ()
2
f ()

1 2 f 00 ()
1
f 00 ()
= k k + k 0
+ . . . 1 (k1 + k ) 0
+ ...
2 f ()
2
f ()
1 f 00 ()
=
k k1 + O(2k k1 + k 2k1 )
2 f 0 ()
Therefore
k+1 Ak k1
where constant A =
1 f 00 ()
.
2 f 0 ()
This relation is called the error equation. Now by the definition of the order of convergence, we expect
a relation of the following type
k+1 = Cpk .
1/p
Making one index down, we obtain k = Cpk1 or k1 = C 1/p k .

Hence
1/p
C pk = Ak C 1/p k
1+1/p
= pk = AC (1+1/p) k
Comparing the powers of k on both sides, we get

p = 1 + 1/p,
which gives two values of p, one is p = 1.618 and another one is negative (and we neglect negative
value of p as order of convergence is non-negative).
Therefore, order of convergence of secant method is less than 2.
3.3. Newton Method. Let f (x) = 0 be the given non-linear equation.
Let y = ax + b, be the equation of the tangent line at point (x0 , f (x0 )) on the curve y = f (x).
Let intersection point of the tangent line with the x-axis is (x2 , 0).
= x2 = b/a, a 6= 0.
Here x0 is the approximations of the root and then we can determine a and b. Then
f (x0 ) = ax0 + b.
Differentiating w.r.t. x, we obtain
f 0 (x0 ) = a.
Hence
b = f (x0 ) f 0 (x0 )x0 .
Now
x1 =
f (x0 ) f 0 (x0 )x0

b
=
a
f 0 (x0 )
f (x0 )
.
f 0 (x0 )
This is called the Newton method and successive iterations are given by
f (xk )
xk+1 = xk 0
, k = 0, 1, . . . .
f (xk )
The method can be obtained directly from the secant method by taking limit xk1 xk . In the
limiting case the chord joining the points (xk1 , fk1 ) and (xk , fk ) becomes the tangent at (xk , fk ).
In this case problem of finding the root of the equation is equivalent to finding the point of intersection
of the tangent to the curve y = f (x) at point (xk , fk ) with the x-axis.
= x0
Figure 2. Newton method

Algorithm: Let f : R R be a differentiable function. The following algorithm computes an approximate solution x of the equation f (x) = 0.
1. Choose an initial guess x0
2. for k = 0, 1, 2, . . . do
if f (x) is sufficiently small
then x = x
return x
end
f (xk )
3. xk+1 = xk 0
f (xk )
If |xk+1 xk | is sufficiently small
then x = xk+1
return x
end
4. end (for main loop)
Example 4. Use Newtons Method in computing of
2.
Sol. This number satisfies the equation f (x) = 0 where f (x) = x2 2 = 0.

Since f 0 (x) = 2x, it follows that in Newtons Method, we can obtain the next iterate from the previous
iterate xk by
x2 2
xk
1
=
+ .
xk+1 = xk k
2xk
2
xk
Starting with x0 = 1, we obtain
1 1
+ = 1.5
2 1
1.5
1
x2 =
+
= 1.41666667
2
1.5
x3 = 1.41421569
x1 =
x4 = 1.41421356
x5 = 1.41421356.
Since the fourth and fifth iterates agree in to eight decimal places, we assume that 1.41421356 is a
correct solution to f (x) = 0, to at least eight decimal places.
Example 5. Perform four iterations to Newtons method to obtain the approximate value of (17)1/3
start with x0 = 2.0.
Sol. Let x = 171/3 which implies x3 = 17.
Let f (x) = x3 17 = 0.
Newton approximations are given by
xk+1 = xk
x3k 17
2x3k + 17
=
,
3x2k
3x2k
k = 0, 1, 2, . . . .
Start with x0 = 2.0, we obtain

x1 = 2.75, x2 = 2.582645, x3 = 2.571332, x4 = 2.571282 etc.
3.3.1. The Newton Method can go bad.
Once the Newton Method catches scent of the root, it usually hunts it down with amazing
speed. But since the method is based on local information, namely f (xk ) and f 0 (xk ), the
Newton Methods sense of smell is deficient.
If the initial estimate is not close enough to the root, the Newton Method may not converge,
or may converge to the wrong root.
The successive estimates of the Newton Method may converge to the root too slowly, or may
not converge at all.
The following example shows that choice of initial guess is very important for convergence.
Example 6. Using Newton Method to find a non-zero solution of x = 2 sin x.
Sol. Let f (x) = x 2 sin x.

Then f 0 (x) = 1 2 cos x, and the Newton iteration is
xk+1 = xk
f (xk )
xk 2 sin xk
2(sin xk xk cos xk )
= xk
=
.
0
f (xk )
1 2 cos xk
1 2 cos xk
Let x0 = 1.1. The next six estimates, to 3 decimal places, are:

x1 = 8.453, x2 = 5.256, x3 = 203.384, x4 = 118.019, x5 = 87.471, x6 = 203.637. Therefore
iterations diverges.
Note that choosing x0 = /3 1.0472 leads to immediate disaster, since then 1 2 cos x0 = 0 and
therefore x1 does not exist. The trouble was caused by the choice of x0 .
Lets see whether we can do better. Draw the curves y = x and y = 2 sin x. A quick sketch shows that
they meet a bit past /2. If we take x0 = 1.5. Here are the next five estimates
x1 = 2.076558, x2 = 1.910507, x3 = 1.895622, x4 = 1.895494, x5 = 1.895494.
Example 7. Find, correct to 5 decimal places, the x-coordinate of the point on the curve y = ln x
which is closest to the origin. Use the Newton Method.
Sol. Let (x, ln x) be a general point on the curve, and let S(x) be the square of the distance from
(x, ln x) to the origin. Then
S(x) = x2 + ln2 x.
We want to minimize the distance. This is equivalent to minimizing the square of the distance. Now
the minimization process takes the usual route. Note that S(x) is only defined when x > 0. We have
S 0 (x) = 2x + 2
ln x
2
= (2x2 + ln x).
x
x
Our problem thus comes down to solving the equation S 0 (x) = 0. We can use the Newton Method
directly on S 0 (x), but calculations are more pleasant if we observe that S 0 (x) = 0 is equivalent to
x2 + ln x = 0.
Let f (x) = x2 + ln x. Then f 0 (x) = 2x + 1/x and we get the recurrence relation
xk+1 = xk
x2k + ln xk
2xk + 1/xk
We need to find a suitable starting point x0 . Experimentation with a calculator suggests that we take
x0 = 0.65.
Then x1 = 0.6529181, and x2 = 0.65291864.
Since x1 agrees with x2 to 5 decimal places, we can perhaps decide that, to 5 places, the minimum
distance occurs at x = 0.65292.
3.4. Convergence Analysis.

Theorem 3.2. Let f C 2 [a, b]. If is a simple root of f (x) = 0 and f 0 () 6= 0, then Newtons method
generates a sequence {xn } converging to for any initial approximation x0 near to .
Proof. We assume that is a simple root of f (x) = 0 then f () = 0.
Let xk = + k .
Successive iteration in Newton method are given by
xk+1 = xk
f ( + k )
f 0 ( + k )
k = 1, 2, . . .
By expanding f ( + k ) and f 0 ( + k ) in Taylor series, we obtain the error equation

1 2 00
0
k f () + k f () + . . .
2
k+1 = k
0
f () + k f 00 () + . . .

k f 00 ()
0
k f () 1 +
+ ...
2 f 0 ()

= k
f 00 ()
0
f () 1 + k 0
+ ...
f ()

1
1 f 00 ()
f 00 ()
= k k 1 +
+ ...
1+ 0
k
2 f 0 ()
f ()

f 00 ()
1 f 00 ()
+ ...
1 0
k + ...
= k k 1 +
2 f 0 ()
f ()
1 f 00 () 2
=
+ O(3k )
2 f 0 () k
= C2k ,
f 00 ()
.
f 0 ()
This error analysis shows that Newton method has second-order convergence.
where C =
Theorem 3.3. Let f(x) be twice continuously differentiable on the closed finite interval [a, b] and let
the following conditions be satisfied:
(i) f (a) f (b) < 0.
(ii) f 0 (x) 6= 0, x [a, b].
(iii) Either f 00 (x) 0 or f 00 (x) 0, x [a, b].
(iv) At the end points a, b,
|f (b)|
|f (a)|
< b a,
< b a.
0
|f (a)|
|f 0 (b)|
Then the Newtons method converges to the unique solution of f (x) = 0 in [a, b] for any choice of
x0 [a, b].
Some comments about these conditions: Conditions (i) and (ii) guarantee that there is one and
only one solution in [a, b]. Condition (iii) states that the graph of f (x) is either concave from above or
concave from below, and furthermore together with condition (ii) implies that f 0 (x) is monotone on
[a, b]. Added to these, condition (iv) states that the tangent to the curve at either endpoint intersects
the xaxis within the interval [a, b]. Proof of the Theorem is left as an exercise for interested readers.
Example 8. Find an interval containing the smallest positive zero of f (x) = ex sin x and which
satisfies the conditions of previous Theorem for convergence of Newtons method.
Sol. f (x) = ex sin x, we have f 0 (x) = ex cos x, f 00 (x) = ex sin x.
We choose [a, b] = [0, 1]. Then since f (0) = 1, f (1) = 0.47, we have f (a)f (b) < 0, so that condition
(i) is satisfied.
Since f 0 (x) < 0, x [0, 1], condition (ii) is satisfied, and since f 00 (x) > 0, x [0, 1], condition (iii) is
satisfied.
Finally since f (0) = 1, f 0 (0) = 2,
|f (0)|
|f (1)|
= 1/2 < b a = 1, and since f (1) = 0.47 and f 0 (1) = 0.90, 0
= 0.52 < 1. This verify
0
|f (0)|
|f (1)|
condition (iv).
Newtons iteration will therefore converge for any choice of x0 in [0, 1].
Example 9. Find all the roots of cos x x2 x = 0 to five decimal places.
10
Sol. f (x) = cos x x2 x = 0 has two roots in the interval (2, 1) and (0, 1). Applying Newton
method,
cos xn x2n xn
xn+1 = xn
.
sin xn 2xn 1
Take x0 = 1.5 for the root in the interval (2, 1), we obtain
x1 = 1.27338985, x2 = 1.25137907, x3 = 1.25115186, x4 = 1.25114184.
Starting with x0 = 0.5, we can obtain the root in (0, 1) and iterations are given by
x1 = 0.55145650, x2 = 0.55001049, x3 = 0.55000935.
Hence roots correct to five decimals are 1.25115 and 0.55001.
3.5. Newton method for multiple roots. Let be a root of f (x) = 0 with multiplicity m. In this
case we can write
f (x) = (x )m (x).
In this case
f () = f 0 () = ... = f (m1) () = 0, f (m) () 6= 0.
Now
m+1
m
k
f (m+1) () + . . .
f (xk ) = f ( + k ) = k f (m) () +
m!
(m + 1)!
f 0 (xk ) = f 0 ( + k ) =
m1
m
k
f (m) () + k f (m+1) () + . . .
(m 1)!
m!
Therefore
k+1 = k
f ( + k )
f 0 ( + k )
"
#
m
k
f (m+1) ()
k (m)
f () 1 +
+ ...
m!
(m + 1) f (m) ()
"
#
= k
(m+1) ()
m1
f
k
k
f (m) () 1 +
+ ...
(m 1)!
m f (m) ()
"
# "
#1
f (m+1) ()
k
k
k f (m+1) ()
= k
1+
+ ...
1+
+ ...
m
(m + 1) f (m) ()
m f (m) ()
!
k f (m+1) ()
k
= k
1
+ ...
m
m f (m) ()
1
) + O(2k ).
m
This implies method has linear rate of convergence for multiple roots.
However when the multiplicity of the root is known in advance, we can modify the method to increase
the order of convergence.
We consider
f (xk )
xk+1 = xk e 0
f (xk )
where e is an arbitrary constant to be determined.
If is a multiple root with multiplicity m then error equation
= k (1
k+1 = k (1 e/m) + O(2k ).

If method is to have quadratic rate of convergence then 1 e/m = 0.
This implies e = m. Therefore, for multiple roots Newton method with quadratic convergence will be
xk+1 = xk m
f (xk )
.
f 0 (xk )
11
Example 10. Let f (x) = ex x 1. Show that f has a zero of multiplicity 2 at x = 0. Show that
Newtons method with x0 = 1 converges to this zero but not quadratically.
Sol. We have f (x) = ex x 1, f 0 (x) = ex 1 and f 00 (x) = ex .
Now f (0) = 1 0 1 = 0, f 0 (0) = 1 1 = 0 and f 00 (0) = 1. Therefore f has a zero of multiplicity 2
at x = 0.
f (xk )
Starting with x0 = 1, iterations are given by xk+1 = xk 0
x1 = 0.58198, x2 = 0.31906,
f (xk )
x3 = 0.16800
x4 = 0.08635, x5 = 0.04380, x6 = 0.02206.
Example 11. The equation f (x) = x3 7x2 + 16x 12 = 0 has a double root at x = 2.0. Starting
with x0 = 1, find the root correct to three decimals.
Sol. Firstly we apply simple Newton method and successive iterations are given by
xk+1 = xk
x3k 7x2k + 16xk 12

, k = 0, 1, 2, . . .
3x2k 14xk + 16

x1 = 1.4, x2 = 1.652632, x3 = 1.806484, x4 = 1.89586
x5 = 1.945653, x6 = 1.972144, x7 = 1.985886, x8 = 1.992894
x9 = 1.996435, x10 = 1.998214, x11 = 1.999106, x12 = 1.999553.
The root correct to 3 decimal places is x12 = 2.000.
If we apply modified Newtons Method then
xk+1 = xk 2
x3k 7x2k + 16xk 12

, k = 0, 1, 2, . . .
3x2k 14xk + 16

x1 = 1.8, x2 = 1.984615, x3 = 1.999884.
The root correct to 3 decimal places is 2.000 and in this case we need less iterations to get desired
accuracy.
Example 12. Apply the Newton method with x0 = 0.8 to the equation f (x) = x3 x2 x + 1 = 0, and
verify the first-order of convergence. Then apply the modified Newton method with m = 2 and verify
the order of convergence.
Sol. Successive iterations in Newton method are given by
xk+1 = xk
x3k x2k xk + 1
.
3x2k 2xk 1
Starting with x0 = 0.8, we obtain

x1 = 0.905882, x2 = 0.954132, x3 = 0.977338, x4 = 0.988734.
Since the exact toot is = 1, we have the error in approximations
0 = | x0 | = 0.2 = 0.2 100
1 = | x1 | = 0.094118 = 0.94 101
2 = | x2 | = 0.045868 = 0.46 101
3 = | x3 | = 0.022662 = 0.22 101
4 = | x4 | = 0.011266 = 0.11 101
which shows the linear convergence (error is almost half in the consecutive steps).
Iterations in modified Newtons method are given by
xk+1 = xk 2
x3k x2k xk + 1
.
3x2k 2xk 1
12

x1 = 1.011765, x2 = 1.0000034, x3 = 1.000000.
Now we have the error in approximations
0 = | x0 | = 0.2 = 0.2 100
1 = | x1 | = 0.011765 = 0.12 101
2 = | x2 | = 0.000034 = 0.34 104
which verifies the second-order convergence.
4. General theory for one-point iteration methods
We now consider solving an equation x = g(x) for a root by the iteration
xn+1 = g(xn ),
n 0,
with x0 as an initial guess to .

For example, the Newton method fits in this pattern with
f (x)
g(x) = x 0
f (x)
& xn+1 = g(xn ).
Each solution of x = g(x) is called a fixed point of g.
For example, consider the solving x2 a = 0, a > 0.
We can write 1. x = x2 + x a or x = x2 + c(x a), c 6= 0.
2. x = a/x.
3. x = 12 (x + a/x).
Let a = 3, x0 = 2.
Table 2. Table for iterations in three cases
n
1
0 2.0
1 3.0
2 9.0
3 87.0
2
3
2.0
2.0
1.5
1.75
2.0 1.732147
1.5 1.73205
Now 3 = 1.73205 and it is clear that third choice is correct but why other two are not working?
Therefore which of the approximation is correct or not, we will answer after the convergence result
(which requires |g 0 () < 1| in the neighborhood of ) for convergence.
Lemma 4.1. Let g(x) be a continuous function on [a, b] and assume that a g(x) b, x [a, b] i.e.
g([a, b]) [a, b] then x = g(x) has at least one solution in [a, b].
Proof. Let g be a continuous function on [a, b].
Let assume that a g(x) b, x [a, b].
Now consider (x) = g(x) x.
If g(a) = a or g(b) = b then proof is trivial. Hence we assume that a 6= g(a) and b 6= g(b).
Now since a g(x) b
= g(a) > a and g(b) < b.
Now
(a) = g(a) a > 0
and
(b) = g(b) b < 0.
Now is continuous and (a)(b) < 0, therefore by Intermediate Value Theorem has at least one
zero in [a, b], i.e. there exists some s.t.
g() = , [a, b].
Graphically, the roots are the intersection points of y = x & y = g(x) as shown in the Figure.
13
Figure 3. An example of Lemma

Theorem 4.2 (Contraction Mapping Theorem). Let g & g 0 are continuous functions on [a, b] and
assume that g satisfy a g(x) b, x [a, b]. Furthermore, assume that there is a constant 0 < < 1
s.t.
= max g 0 (x).
axb
Then
1. x = g(x) has a unique solution of x = g(x) in the interval [a, b].
2. The iterates xn+1 = g(xn ), n 1 will converge to for any choice of x0 [a, b].
n
3. | xn |
|x1 x0 |, n 0.
1
4.
| xn+1 |
lim
= |g 0 ()|.
n | xn |
Thus for xn close to , xn+1 g 0 ()( x0 ).
Proof. Let g and g 0 are continuous functions on [a, b] and assume that a g(x) b, x [a, b].
By previous Lemma, there exists at least one solution to x = g(x).
By Mean-Value Theorem, c s.t.
g(x) g(y) = g 0 (c)(x y).
|g(x) g(y)| |x y|, 0 < < 1, x [a, b].
1. Let x = g(x) has two solutions, say and in [a, b] then = g(), and = g().
Now | | = |g() g()| | |
= (1 )| | 0
Since 0 < < 1, = = .
= x = g(x) has a unique solution in [a, b] which is (say).
2. To check the convergence of iterates {xn }, we observe that they all remain in [a, b] as xn
[a, b], xn+1 = g(xn ) [a, b].
Now
| xn+1 | = |g() g(xn )| = g 0 (cn )| xn |
for some cn between and xn .
= | xn+1 | | xn | 2 | xn1 |
................
n | x0 |
As n , n 0 which = xn .
14
3. To find the bound:

Since
| x0 | = | x1 + x1 x0 |
| x1 | + |x1 x0 |
| x0 | + |x1 x0 |
= (1 )| x0 | |x1 x0 |
= | x0 |
1
|x1 x0 |
1
= n | x0 |
n
|x1 x0 |
1
Therefore
| xn |
n
|x1 x0 |
1
4. Now
| xn+1 |
= = lim |g 0 (cn )|
n | xn |
n
lim
for some cn between and xn .

Now xn = cn .
Hence
lim
| xn+1 |
= |g 0 ()|.
| xn |
Remark 4.1. If |g 0 ()| < 1, the formula

| xn+1 |
= |g 0 ()|
n | xn |
lim
shows that iterates are linearly convergent. If in addition g 0 () 6= 0, then formula proves that convergence is exactly linear, with no higher order of convergence being possible. In this case, the value of
g 0 () is the linear rate of convergence.
In practice, we dont use the above result in the Theorem. The main reason is that it is difficult to find
an interval [a, b] for which a g(x) b condition is satisfied. Therefore, we use the Theorem in the
following practical way.
Corollary 4.3. Let g & g 0 are continuous on some interval c < x < d with the fixed point contained
in this interval. Moreover assume that
|g 0 ()| < 1.
Thus there is an interval [a, b] around for which the hypothesis and hence conclusions of Theorem
are true.
On the contrary if |g 0 ()| > 1, then the iteration method xn+1 = g(xn ) will not converge to .
When |g 0 ()| = 1, no conclusion can be drawn and even if convergence occur, the method would be far
too slow for the iteration method to be practical.
Remark 4.2. The possible behavior of fixed-point iterates {xn } is shown in Figure for various values
of g 0 (). To see the convergence, consider the case case of x1 = g(x0 ), the height of y = g(x) at x0 .
We bring the number x1 back to the x-axis by using the line y = x and the height y = x1 . We continue
this with each iterate, obtaining a stair-step behavior when g 0 () > 0. When g 0 () < 0, the iterates
oscillates around the fixed point , as can be seen in the Figure. In first figure (on top) iterations are
monotonic convergence, in second oscillatory convergent, in third figure iterations are divergent and in
the last figure iterations are oscillatory divergent.
15
Figure 4. Convergent and non-convergent sequences xn+1 = g(xn )

Theorem 4.4. Let is a root of x = g(x), and g(x) is p times continuously differentiable function
for all x near to , for some p 2. Furthermore assume
g 0 () = = g (p1) () = 0.
(4.1)
Then if the initial guess x0 is sufficiently close to , then iteration

xn+1 = g(xn ),
n0
will have order of convergence p and

(p)
xn+1
p1 g ()
.
p = (1)
n ( xn )
p!
lim
Proof. Let g(x) is p times continuously differentiable function for all x near to and satisfying the
conditions in equation (4.1) stated above.
Now expand g(xn ) about .
xn+1 = g(xn ) = g( + xn )
(xn )p1 (p1)
(xn )p (p)
g
() +
g (n )
(p 1)!
p!
for some n between xn and . Using equation (4.1) and g() = , we obtain
= g() + (xn )g 0 () + +
(xn )p (p)
xn+1 =
g (n ).
p!
=
xn+1
g (p) (n )
=
(xn )p
p!
16
(p)
xn+1
p1 g (n )
=
(1)
( xn )p
p!
which proves the result.

Remark: The Newton method can be analyzed by this result.
f (x)
f (x) f 00 (x)
g(x) = x 0
,
g 0 (x) =
f (x)
[f 0 (x)]2
g 0 () = 0,
g 00 () =
f 00 ()
6= 0,
f 0 ()
as f 0 () 6= 0 and f 00 () is either positive or negative.

Example 13. Use a fixed-point method to determine a solution to within 104 for x = tan x, for x in
[4, 5].
Sol. Using g(x) = tan x and x0 = 4 gives x1 = g(x0 ) = tan 4 = 1.158, which is not in the interval [4, 5].
So we need a different fixed-point function.
If we note that x = tan x implies that
1
1
=
x
tan x
1
1
= x = x +
.
x tan x
1
1
,
Starting with x0 and taking g(x) = x +
x tan x
we obtain x1 = 4.61369, x2 = 4.49596, x3 = 4.49341, x4 = 4.49341.
As x3 and x4 agree to five decimals, it is reasonable to assume that these values are sufficiently accurate.
Example 14. Consider the equation x3 7x + 2 = 0 in [0, 1]. Write a fixed-point iteration which will
converge to the solution.
1
Sol. We rewrite the equation in the form x = (x3 + 2) and define the fixed-point iteration xn+1 =
7
1 3
(x + 2).
7 n
1
Now g(x) = (x3 + 2)
7
then g : [0, 1] [0, 1] and |g 0 (x)| < 3/7 < 1, x [0, 1].
Hence by the Contraction Mapping Theorem the sequence {xn } defined above will converge to the
unique solution of given equation. Starting with x0 = 0.5, we can compute the solution as following.
x1 = 0.303571429
x2 = 0.28971083
x3 = 0.289188016.
Therefore root correct to three decimals is 0.289.
Example 15. The iterates xn+1 = 2 (1 + c)xn + cx3n will converge to = 1 for some values of
constant c (provided that x0 is sufficiently close to ). Find the values of c for which convergence
occurs? For what values of c, if any, convergence is quadratic?
Sol. Fixed-point iteration
xn+1 = g(xn )
with
g(x) = 2 (1 + c)x + cx3 .
If = 1 is a fixed point then for convergence |g 0 ()| < 1
= | (1 + c) + 3c2 | < 1
= 0 < c < 1/2.
g 00 ()
For this value of c,

6= 0.
For quadratic convergence
g 0 () = 0 & g 00 () 6= 0.
17
This gives c = 1/2.

Example 16. How should the constant a be chosen to ensure the fastest possible convergence with the
iteration from
axn + x2
n +1
xn+1 =
.
a+1
Sol. Let
lim xn = lim xn+1 = .
a + 2 + 1
a+1
= 3 2 1 = 0.
Therefore, the above formula is used to find the roots of the equation f (x) = x3 x2 1 = 0.
Now substitute xn = + n , and xn+1 = + n+1 , we get
(a + 1)( + n+1 ) = a( + n ) +
1
(1 + /)2 + 1
2
which implies
(a + 1)n+1 = (a 2/3 )n + O(2n ).
Therefore, for fastest convergence, we have a = 2/3 . Here is the root of the equation x3 x2 1 = 0
and can be computed by the Newton method.
Example 17. To compute the root of the equation
ex = 3 loge x,
using the formula
xn+1 = xn
3 loge xn exp(xn )
,
p
show that p = 3 gives rapid convergence.

Sol. Substitute xn = + n , and xn+1 = + n+1 , in the given equation
n+1 = n
= n
= n
3 loge ( + n ) exp( n )
p
1
[3 loge + 3 loge (1 + n /) exp() exp(n )]
p

1
3 loge + 3 n / 2n /22 + O(3n ) exp() 1 n + 2n /2 . . .
p
Since is exact root, exp() 3 loge = 0, therefore error equation will be

1
n+1 = 1 (3/ + e ) n + O(2n ).

p
The method have rapid convergence if
3
+ e
= 3 loge x = 0. The root lies in (1, 2) and by applying Newtons method

p=
where is the root of ex

with x0 = 1.5, we get
x1 = 1.053213, x2 = 1.113665, x3 = 1.115447, x4 = 1.115448.

Taking = 1.11545, we obtain p = 2.9835. Hence p = 3.
18
Exercises
(1) Given the following equations: (i) x4 x 10 = 0, (ii) x ex = 0.
Find the initial approximations for finding the smallest positive root. Use these to find the
root correct three decimals with Secant and Newton method.
(2) Find all solutions of e2x = x + 6, correct to 4 decimal places using the Newton Method.
(3) Use the bisection method to find the indicated root of the following equations. Use an error
tolerance of = 0.0001.
(a) The real root of x3 x2 x 1 = 0.
(b) The smallest positive root of cos x = 1/2 + sin x.
(c) The real roots of x3 2x 1 = 0.
(4) Suppose that

2
e1/x , x 6= 0
f (x) =
0, x = 0
The function f is continuous everywhere, in fact differentiable arbitrarily often everywhere, and
0 is the only solution of f (x) = 0. Show that if x0 = 0.0001, it takes more than one hundred
million iterations of the Newton Method to get below 0.00005.
(5) A calculator is defective: it can only add, subtract, and multiply. Use the equation 1/x = 1.37,
the Newton Method, and the defective calculator to find 1/1.37 correct to 8 decimal places.
(6) Use the Newton Method to find the smallest and the second smallest positive roots of the
equation tan x = 4x, correct to 4 decimal places.
(7) What is the order of convergence of the iteration
xn+1 =
xn (x2n + 3a)
3x2n + a
as it converges to the fixed point = a ?
(8)
What are the solutions , if any, of the equation x = 1 + x ? Does the iteration xn+1 =
1 + xn , converge to any of these solutions (assuming x0 is chosen sufficiently close to ) ?
(9) (a) Apply Newtons method to the function

x, x 0
f (x) =
x, x < 0
with the root = 0. What is the behavior of the iterates? Do they converge, and if so, at what
rate?
(b) Do the same as in (a), but with
3
x2 , x 0
f (x) =
3
x2 , x < 0.
(10) Find all positive roots of the equation
Z
10
ex dt = 1
with six correct decimals with Newton method.

2
Hint: f (x) = 10 x ex 1 = 0. This equation has two positive roots in the intervals (0, 1) and
(1, 2).
(11) Show that
xn (x2n + 3a)
xn+1 =
, n0
3x2n + a
is a third-order method for computing a. Calculate
( a xn+1 )
lim
n ( a xn )3
assuming x0 has been chosen sufficiently close to the root.
19
(12) Show that the following two sequences have convergence of the second order with the same
limit a.

1
a
1
x2n
(i) xn+1 = xn 1 + 2
(ii) xn+1 = xn 3
.
2
xn
2
a
If xn is a suitably close to approximation to a, show that the error in the first formula for
xn+1 is about one-third of that in the second formula, and deduce that the formula

1
3a x2
xn+1 = xn 6 + 2 n
8
xn
a
gives a sequence with third-order convergence.
(13) Suppose is a zero of multiplicity m of f , where f (m) is continuous on an open interval
containing . Show that the fixed-point method x = g(x) with the following g has secondorder convergence:
f (x)
g(x) = x m 0
.
f (x)
Bibliography
[Gerald]
Curtis F. Gerald and Patrick O. Wheatley Applied Numerical Analysis, 7th edition,
Pearson, 2003.
[Atkinson] K. Atkinson and W. Han. Elementary Numerical Analysis, 3rd edition, John Willey
and Sons, 2004.
[Jain] M. K. Jain, S. R. K. Iyengar, and R. K. Jain. Numerical Methods for Scientific and
Engineering Computation, 6th edition, New Age International Publishers, New Delhi,
2012.
NUMERICAL SOLUTION OF SYSTEM OF LINEAR EQUATIONS
1. Introduction
System of simultaneous linear equations are associated with many problems in engineering and
science, as well as with applications to the social sciences and quantitative study of business and
economic problems. These problems occur in wide variety of disciplines, directly in real world problems
as well as in the solution process for other problems.
The principal objective of this Chapter is to discuss the numerical aspects of solving linear system of
equations having the form
a x + a12 x2 + .........a1n xn = b1
11 1
a21 x1 + a22 x2 + .........a2n xn = b2
(1.1)
................................................
an1 x1 + an2 x2 + .........ann xn = bn .

This is a linear system of n equation in n unknowns x1 , x2 ......xn . This system can simply be written
in the matrix equation form
Ax=b
a11
a21
..
.
a12
a22
..
.
..
.
an1 an2

a1n
x1
b1
x2 b2
a2n

.. .. = ..
. . .
ann
xn
(1.2)
bn
This equations has a unique solution x = A1 b, when the coefficient matrix A is non-singular. Unless
otherwise stated, we shall assume that this is the case under discussion. If A1 is already available,
then x = A1 b provides a good method of computing the solution x.
If A1 is not available, then in general A1 should not be computed solely for the purpose of obtaining
x. More efficient numerical procedures will be developed in this chapter. We study broadly two
categories Direct and Iterative methods. We start with direct method to solve the linear system.
2. Gaussian Elimination
Direct methods, which are technique that give a solution in a fixed number of steps, subject only to
round-off errors, are considered in this chapter. Gaussian elimination is the principal tool in the direct
solution of system (1.2). The method is named after Carl Friedrich Gauss (1777-1855). To solve larger
system of linear equation we use a method of introductory Algebra.
For example, consider
x1 + 2x2 + x3 = 0
2x1 + 2x2 + 3x3 = 3
x1 3x2 = 2.
Eliminating x1 from second and third equations, we obtain
x1 + 2x2 + x3 = 0
2x2 + x3 = 3
x2 + x3 = 2.
1
SOLUTION OF SYSTEM OF LINEAR EQUATIONS
Now eliminate x2 from last equation with the help of second equation
x1 + 2x2 + x3 = 0
2x2 + x3 = 3
1
x3 = 2.
2
The last equation gives x3 = 1.
Therefore 2x2 + 1 = 3, = x2 = 1.
x1 + 2(1) + 1 = 0, = x1 = 1.
Formal structure of Gaussian elimination is the following.

We consider a following 3 3 system
a11 x1 + a12 x2 + a13 x3 = b1
a21 x1 + a22 x2 + a23 x3 = b2
a31 x1 + a32 x2 + a33 x3 = b3
(R1 )
(R2 )
(R3 ).
Let a11 6= 0 and eliminate x1 from R2 and R3 .

a31
a21
and m31 =
.
Define multipliers m21 =
a11
a11
We write each entry in R2 as R2 m21 R1 and in R3 as R3 m31 R1 .
We obtain a 2 2 system
(2)
(2)
(2)
(R2 )
(2)
(2)
(2)
(R3 ).
a22 x2 + a23 x3 = b2
a32 x2 + a33 x3 = b3
Here
(2)
aij = aij mi1 a1j , i, j = 2, 3

(2)
bi
= bi mi1 b1 , i = 2, 3.
(2)
Let a22 6= 0 and eliminate x2 from R3 .

(2)
a33
Define multipliers m32 = (2)
. Subtract m32 times R2 from R3 . This yields
a22
(3)
(3)
a33 x3 = b3
(R3 ).
Here
(3)
(2)
(2)
(3)
(2)
(2)
a33 = a32 m32 a23 ,

b3 = b3 m32 b2 .
Using back substitution, we can find the values
(3)
x3 =
b3
(3)
a33
(2)
x2 =
(2)
b3 a23 x3
(2)
a22
b1 a12 x2 a13 x3
.
a11
Note that every step of the elimination procedure can be obtained through elementary row operations
on the Augmented matrix (A | b).
x1 =
Algorithm [Gauss Elimination]

(1) Start
(2) Declare the variables and read the order of the matrix n.
(3) Input the coefficients of the linear equation with right side as:
Do for i = 1 to n
Do for j = 1 to n + 1
Read a[i][j] End for j
End for i
(4) Do for k = 1 to n 1
Do for i = k + 1 to n
Do for j = k + 1 to n + 1
a[i][j] = a[i][j] a[i][k]/a[k][k] a[k][j] End for j
End for i
End for k
(5) Compute x[n] = a[n][n + 1]/a[n][n]
(6) Do for i = n 1 to 1
sum = 0
Do for j = i + 1 to n
sum = sum +a[i][j] x[j] End for j
x[i] = 1/a[i][i] (a[i][n + 1] sum)
End for i
(7) Display the result x[i]
(8) Stop
Partial Pivoting: In the elimination process, it is assumed that pivot element aii 6= 0, i = 1, 2, . . . , n.
If at any stage of elimination, one of the pivot becomes small (or zero) then we bring other element as
pivot by interchanging rows.
Remark 2.1. Unique Solution, No Solution, or Infinite Solutions.
Here are some tips that will allow us to determine what type of solutions we have based on either the
reduced echelon form.
1. If we have a leading one in every column, then we will have a unique solution.
2. If we have a row of zeros equal to a non-zero number in right side, then the system has no solution.
3. If we dont have a leading one in every column in a homogeneous system, i.e. a system where all
the equations equal zero, or a row of zeros, then system have infinite solutions.
Example 1. Solve the system of equations. This system has solution x1 = 2.6, x2 = 3.8, x3 = 5.0.
6x1 + 2x2 + 2x3 = 2
1
2
2x1 + x2 + x3 = 1
3
3
x1 + 2x2 x3 = 0.
Sol. Let us use a floating-point representation
Augmented matrix is given by
6.000 2.000
2.000 1.667
1.000 2.000
with 4digits and all operations will be rounded.
2.000 2.000
0.3333 1.000
1.000
0.0
2
1
= 0.3333 and m31 = = 0.1667.
6
6
(2)
= a21 m21 a11 , a22 = a22 m21 a12 etc.
Multipliers are m21 =

(2)
a21
6.000
2.000
2.000
2.000
0.0 0.0001000 0.3333 1.667
0.0
1.667
1.333 0.3334
Multiplier is m32 =
1.667
= 16670
0.0001
6.000
2.000
2.000
2.000
0.0 0.0001000 0.3333 1.667
0.0
0.0
5555
27790
Using back substitution, we obtain

x3 = 5.003
x2 = 0.0
x1 = 1.335.
We observe that computed solution is not compatible with the exact solution.
The difficulty is in a22 . This coefficient is very small (almost zero). This means that the coefficient
in this position had essentially infinite relative error and this was carried through into computation
involving this coefficient. To avoid this, we interchange second and third rows and then continue the
elimination.
In this case (after interchanging) multipliers is m32 = 0.00005999.
6.000 2.000
2.000
2.000
0.0 1.667 1.337 0.3334
0.0
0.0 0.3332 1.667
Using back substitution, we obtain
x3 = 5.003
x2 = 3.801
x1 = 2.602.
We see that after partial pivoting, we get the desired solution.
Complete Pivoting: In the first stage of elimination, we search the largest element in magnitude
from the entire matrix and bring it at the position of first pivot. We repeat the same process at every
step of elimination. This process require interchange of both rows and columns.
Scaled Partial Pivoting: In this approach, the algorithm selects as the pivot element the entry that
has the largest relative entries in its rows.
At the beginning, a scale factor must be computed for each equation in the system. We define
si = max |aij | (1 i n)
1jn
These numbers are recored in the scaled vector s = [s1 , s2 , , sn ]. Note that the scale vector does not
change throughout the procedure. In starting the forward elimination process, we do not arbitrarily
ai,1
is
use the first equation as the pivot equation. Instead, we use the equation for which the ratio
si
greatest. We repeat the process by taking same scaling factors.
Example 2. Solve the system
3x1 13x2 + 9x3 + 3x4
6x1 + 4x2 + x3 18x4
6x1 2x2 + 2x3 + 4x4
12x1 8x2 + 6x3 + 10x4
=
=
=
=
19
34
16
26
by hand using scaled partial pivoting. Justify all row interchanges and write out the transformed matrix
after you finish working on each column.
Sol. The augmented matrix is
3 13 9
3
19
6
4
1 18 34
6
2 2
4
16
12 8 6 10
26.
and the scale factors are s1 = 13, s2 = 18, s3 = 6, & s4 = 12. We need to pick the largest (3/13, 6/18, 6/6, 12/12),
which is the third entry, and interchange row 1 and row 3 and interchange s1 and s3 to get
6
2 2
4
16
6
4
1 18 34
3 13 9
3
19
12 8 6 10
26.
with s1 = 6, s2 = 18, s3 = 13, s4 = 12. Performing R2 (6/6)R1 R2, R3 (3/6)R1 R3 , and
R4 (12/6)R1 R4 , we obtain
6 2 2
4
16
0
2
3 14 18
0 12 8
1
27
0 4 2
2
6
Comparing (a22 /s2 = 2/18, a32 /s3 = 12/13, a42 /s4 = 4/12), the largest is the third entry so we need
to interchange row 2 and row 3 and interchange s2 and s3 to get
6 2 2
4
16
0 12 8
1
27
0
2
3 14 18
0 4 2
2
6
with s1 = 6, s2 = 13, s3 = 18, s4 = 12. Performing R3 (2/12)R2 R3 and R4 (4/12)R2 R4,
we get
6 2
2
4
16
0 12
8
1
27
0
0
13/3 83/6 45/2
0
0
2/3
5/3
3
Comparing (a33 /s3 = (13/3)/18, a43 /s4 = (2/3)/12), the largest is the first entry so we do not interchange rows. Performing R4 (2/13)R3 R4 , we get the final reduced matrix
6 2
2
4
16
0 12
8
1
27
0
0
13/3 83/6 45/2
0
0
0
6/13 6/13
Backward substitution gives x1 = 3, x2 = 1, x3 = 2, x4 = 1.
Example 3. Solve this system of linear equations:
0.0001x + y = 1
x+y =2
using no pivoting, partial pivoting, and scaled partial pivoting. Carry at most five significant digits
of precision (rounding) to see how finite precision computations and roundoff errors can affect the
calculations.
Sol. By direct substitution, it is easy to verify that the true solution is x = 1.0001 and y = 0.99990 to
five significant digits.
For no pivoting, the first equation in the original system is the pivot equation, and the multiplier is
1/0.0001 = 10000. The new system of equations is
0.0001x + y = 1
9999y = 9998
We obtain y = 9998/9999 0.99990 and x = 1. Notice that we have lost the last significant digit in
the correct value of x.
We repeat the solution process using partial pivoting for the original system. We see that the second
entry is larger, so the second equation is used as the pivot equation. We can interchange the two
equations, obtaining
x+y =2
0.0001x + y = 1
which gives y = 0.99980/0.99990 0.99990 and x = 2 y = 2 0.99990 = 1.0001.
Both computed values of x and y are correct to five significant digits.
We repeat the solution process using scaled partial pivoting for the original system. Since the scaling
constants are s = (1, 1) and the ratios for determining the pivot equation are (0.0001/1, 1/1), the
second equation is now the pivot equation. We do not actually interchange the equations and use
the second equation as the first pivot equation. The rest of the calculations are as above for partial
pivoting. The computed values of x and y are correct to five significant digits.
Operations count for Gauss elimination We consider the number of floating point operations
(flops) required to solve the system Ax = b. Gaussian Elimination first uses row operations to
transform the problem into an equivalent problem of the form U x = b, where U is upper triangular.
Then back substitution is used to solve for x. First we look at how many floating point operations are
required to reduce into upper triangular matrix.
First a multiplier is computed for each row. Then in each row the algorithm performs n multiplies and
n adds. This gives a total of (n 1) + (n 1)n multiplies (counting in the computing of the multiplier
in each of the (n 1) rows) and (n 1)n adds.
In total this is 2n2 n 1 floating point operations to do a single pivot on the n by n system.
Then this has to be done recursively on the lower right subsystem, which is an (n1) by (n1) system.
This requires 2(n 1)2 (n 1) 1 operations. Then this has to be done on the next subsystem,
requiring 2(n 2)2 (n 2) 1 operations, and so on.
In total, then, we use In total floating point operations, with
n
X
2
n(n + 1)(4n 1)
n n3 .
In =
(2k 2 k 1) =
6
3
k=1
Counts for back substitution: To find xn we just requires one division. Then to solve for xn1 we
requires 3 flops. Similarly, solving for xn2 requires 5 flops. Thus in total back substitution requires
Bn total floating point operations with
n
X
Bn =
(2k 1) = n(n 1) n = n(n 2) n2 .
k=1
The LU Factorization: When we use matrix multiplication, another meaning can be given to the
Gauss elimination. The matrix A can be factored into the product of the two triangular matrices.
Let AX = b is the system to be solved, A is n n coefficient matrix. The linear system can be reduced
to the upper triangular system U X = g with
u11 u12 u1n

0 u22 u2n
U = ..
..
.
.
.
.
.
0
0 unn
Here uij = aij . Introduce an auxiliary lower triangular matrix L based on the multipliers mij as
following
1
0
0
0
m21
1
0
m31 m32 1
0
L=
..
..
..
.
.
.
mn,1
0
mn,n1 1
Theorem 2.1. Let A be a non-singular matrix and let L and U be defined as above. If U is produced
without pivoting then
LU = A.
This is called LU factorization of A.

Example 4. Solve the system
4x1 + 3x2 + 2x3 + x4
3x1 + 4x2 + 3x3 + 2x4
2x1 + 3x2 + 4x3 + 3x4
x1 + 2x2 + 3x3 + 4x4
=
=
=
=
1
1
1
1.
Also write the LU decomposition and verify that LU = A.

Sol. We write augmented matrix and solve the system
4 3 2 1 1
3 4 3 2 1
2 3 4 3 1
1 2 3 4 1
1
1
3
Multipliers are m21 = , m31 = , and m41 = .
4
2
4
Replace R2 with R2 m21 R1 , R3 with R3 m31 R1 and R4 with R4 m41 R1 .
4 3
2
1
1
0 7/4 3/2 5/4
1/4
0 3/2 3
5/2 3/2
0 5/4 5/2 15/4 5/4
5
6
Multipliers are m32 = and m42 = .
7
7
Replace R3 with R3 m32 R2 and R4 with R4 m42 R2 , we obtain
4 3
2
1
1
0 7/4 3/2 5/4
1/4
0 0 12/7 10/7 12/7

0 0 10/7 20/7 10/7
Multiplier is m43 =
5
and we replace R4 with R4 m43 R3 .
6
4 3
2
1
1
0 7/4 3/2 5/4
1/4
0 0 12/7 10/7 12/7

0 0
0
5/3
0
Using back substitution successively for x4 , x3 , x2 , x1 , we obtain x4 = 0, x3 = 1, x2 = 1, x1 = 0.

Here
4 3
2
1
0 7/4 3/2 5/4
U =
0 0 12/7 10/7
0 0
0
5/3
1
0
0
3/4 1
0
L=
1/2 6/7 1
1/4 5/7 5/6
It can be verified that LU = A.
0
0
0
1
3. Iterative Method
The linear system Ax = b may have a large order. For such systems Gauss elimination is often too
expensive in either computation time or computer memory requirements or both.
In an iterative method, a sequence of progressively iterates is produced to approximate the solution.
Jacobi and Gauss-Seidel Method: We start with an example and let us consider a system of
equations
9x1 + x2 + x3 = 10
2x1 + 10x2 + 3x3 = 19
3x1 + 4x2 + 11x3 = 0.
One class of iterative method for solving this system as follows.
We write
1
x1 = (10 x2 x3 )
9
1
x2 = (19 2x1 3x3 )
10
1
x3 = (0 3x1 4x2 ).
11
(0) (0) (0)
Let x(0) = [x1 x2 x3 ] be an initial approximation of solution x. Then define an iteration of sequence
1
(k+1)
(k)
(k)
x1
= (10 x2 x3 )
9
1
(k+1)
(k)
(k)
x2
= (19 2x1 3x3 )
10
1
(k+1)
(k)
(k)
x3
= (0 3x1 4x2 ), k = 0, 1, 2, . . . .
11
This is called Jacobi or method of simultaneous replacements. The method is named after German
mathematician Carl Gustav Jacob Jacobi.
We start with [0 0 0] and obtain
(1)
(1)
(1)
x1 = 1.1111, x2 = 1.900, x3 = 0.0.

(2)
(2)
(2)
x1 = 0.9000, x2 = 1.6778, x3 = 0.9939

etc.
An another approach to solve the same system will be following.
1
(k)
(k)
(k+1)
x1
= (10 x2 x3 )
9
1
(k+1)
(k+1)
(k)
x2
= (19 2x1
3x3 )
10
1
(k+1)
(k+1)
(k+1)
x3
= (0 3x1
4x2
), k = 0, 1, 2, . . . .
11
This method is called Gauss-Seidel or method of successive replacements. It is named after the German
mathematicians Carl Friedrich Gauss and Philipp Ludwig von Seidel. Starting with [0 0 0], we obtain
(1)
(1)
(1)
(2)
(2)
(2)
x1 = 1.1111, x2 = 1.6778, x3 = 0.9131.

x1 = 1.0262, x2 = 1.9687, x3 = 0.9588.
General approach: We rewrite the system Ax = b as
(k+1)
a11 x1
(k+1)
(k)
= a12 x2 + b1
(k+1)
(k)
+ a22 x2
= a23 x3 + b2
..................................
(k+1)
(k+1)
an1 x1
+ an2 x2
+ + ann x(k+1)
= bn
n
a21 x1
or (D + L)x(k+1) = U x(k) + b
where D, L and U are diagonal, strictly lower triangle and upper triangle matrices, respectively.
x(k+1) = (D + L)1 U x(k) + (D + L)1 b
x(k+1) = T x(k) + B, k = 0, 1, 2,
Here T = (D + L)1 U and this matrix is called iteration matrix.
Algorithm[Gauss-Seidel]
(1) Input matrix A = [aij ], b, XO = x(0) , tolerance TOL, maximum number of iterations
(2) Set k = 1
(3) while (k N ) do step 4-7
(4) For i = 1, 2, , n
i1
n
X
X
1
xi =
(aij xj )
(aij XOj ) + bi )
aii
j=1
j=i+1
(5) If ||x XO|| < T OL, then OUTPUT (x1 , x2 , , xn )

STOP
(6) k = k + 1
(7) For i = 1, 2, , n
Set XOi = xi
(8) OUTPUT (x1 , x2 , , xn )
STOP.
3.1. Convergence analysis of iterative methods.
Definition 3.1 (Eigenvalue and eigenvector). Let A be a square matrix then number is called an
eigenvalue of A if there exists a nonzero vector x such that Ax = x. Here x is called the corresponding
eigenvector.
Definition 3.2 (Characteristic polynomial). Characteristic polynomial is defined as
P () = det(I A).
is an eigenvalue of matrix A if and only if is a root of the characteristic polynomial, i.e., P () = 0.
Definition 3.3 (Spectrum and spectral radius). The set of all eigenvalues of matrix A is called the
spectrum of A and is denoted by (A).
The spectrum radius of A is defined as
(A) = max{ ||, (A) }.
Theorem 3.4 (Necessary and sufficient condition). A necessary and sufficient condition for the convergence of an iterative method is that the eigenvalue of iteration matrix T satisfy the inequality
(T ) < 1.
Proof. Let
(T ) < 1.
The sequence of vector
x(k)
by iterative method (Gauss-Seidel) are given by

x(1) = T x(0) + B.
x(2) = T x(1) + B = T (T x(0) + B) + B = T 2 x(0) + (T + I)B.

........................
(k)
k (0)
x = T x + (T k1 + T k2 + ... + T + I)B
Since (T ) < 1, this implies
lim T k x(0) = 0
k
Therefore
lim x(k) = (I T )1 B
as k
10
Therefore, x(k) converges to unique solution x = T x + B.

Conversely, assume that the sequence x(k) converges to x. Now
x x(k) = T x + B T x(k1) B = T (x x(k1) )
= T 2 (x x(k2) )
= T k (x x(0 ).
Let z = x x(0) ) then
lim T k z = lim (x x(k) ) = x lim x(k) = x x = 0.
= (T ) < 1.
Theorem 3.5. If A is strictly diagonally dominant in Ax = b, then Gauss-Seidel iteration always

converges for any initial starting vector.
Proof. We assume that A is strictly diagonally dominant, hence aii 6= 0 and
n
X
|aii | >
|aij |,
i = 1, 2, , n
j=1,j6=i
Gauss-Seidel iterations are given by

x(k) = (D + L)1 U x(k1) + (D + L)1
x(k) = T x(k1) + B.
Method is convergent iff (T ) < 1, where T = (D + L)1 U .
Let be an eigenvalue of matrix A and x be an eigenvector then
T x = x
(D + L)1 U x = x
U x = (D + L)x
i
X
naij = [
aij xj ],
X
j=i+1
i = 1, 2, . . . , n
j=1
naij = aii xi +
j=i+1
aij xj
j=1
aii xi =
i1
X
aij xj
j=1
|aii xi | ||
i1
X
i1
X
n
X
j=i+1
|aij | |xj | + ||
j=1
aij xj
n
X
|aij | |xj |
j=i+1
Since x is an eigenvector, x 6= 0, so we can take norm ||x|| = 1.

Hence
i1
n
X
X
|aij |
|aij |
|| |aii |
j=1
Pn
= ||
j=i+1 |aij |
Pi1
|aii | j=1
|aij |
which implies spectral radius (T ) < 1.

This implies Gauss-Seidel is convergent.
j=i+1
Pn
j=i+1
Pn
|aij |
j=i+1 |aij |
=1
11
Example 5. Given the matrix
1 2 2
A = 1 1 1
2 2 1
Decide whether Gauss-Seidel converge to the solution of Ax = b.
Sol. The iteration matrix of the Gauss-Seidel method is
T
= (D + L)1 U
1
1 0 0
0
0
= 1 1 0
2 2 1
0
1
0 0
0
= 1 1 0 0
0 2 1
0
0 2 2
= 0 2 3
0 0 2
0 2 2
= 0 2 3
0 0 2
2 2
0 1
0 0
2 2
0 1
0 0
The eigenvalues of iteration matrix T are = 0, 2, 2 and therefore spectral radius > 1. The iteration
diverges.
4. Power method for approximating eigenvalues
The eigenvalues of an n n of matrix A are obtained by solving its characteristic equation
det(a I) = 0
n
n1
+ cn1
cn2 n2 + + c0 = 0.
For large values of n, the polynomial equations like this one are difficult, time-consuming to solve
and sensitive to rounding errors. In this section we look at an alternative method for approximating
eigenvalues. The method can be used only to find the eigenvalue of A that is largest in absolute value.
We call this eigenvalue the dominant eigenvalue of A.
Definition 4.1 (Dominant Eigenvalue and Dominant Eigenvector). Let 1 , 2 , , and n be the
eigenvalues of an n n matrix A. 1 is called the dominant eigenvalue of A if
|1 | > |i |,
i = 2, 3, . . . , n.
The eigenvectors corresponding to 1 are called dominant eigenvectors of A.

4.1. The Power Method. The power method for approximating eigenvalues is iterative. First we assume that the matrix A has a dominant eigenvalue with corresponding dominant eigenvectors. Then we
choose an initial approximation x0 of one of the dominant eigenvectors of A. This initial approximation
must be a nonzero vector in R. Finally we form the sequence given by
x1 = Ax0
x2 = Ax1 = A2 x0
..............
xk = Axk1 = A(Ak1 x0 ) = Ak x0 .
For large powers of k, and by properly scaling this sequence, we will see that we obtain a good
approximation of the dominant eigenvector of A. This procedure is illustrated in the following example.
Example 6. Complete six iterations of the power method to approximate a dominant eigenvector of

2 12
A=
.
1 5
12
Sol. We begin with an initial non-zero approximation of

1
x0 =
1
We then obtain the following approximations by power method

2 12 1
10
1.00
y1 = Ax0 =
=
= 10
= 10x1
1 5 1
4
0.40

2 12 1.00
2.8
1.000
y2 = Ax1 =
=
= 2.8
= 2.8x2
1 5 0.40
1
0.357

2 12 1.000
2.284
1.000
y3 = Ax2 =
=
= 2.284
1 5 0.357
0.785
0.3436
After several iterations, we note that the approximations appears to be approaching the dominant
eigenvalue = 2.
Theorem 4.2. If A is an n n diagonalizable matrix with a dominant eigenvalue, then there exists a
nonzero vector x0 such that the sequence of vectors given by
Ax0 , A2 x0 , A3 x0 , . . . , Ak x0 , . . .
approaches a multiple of the dominant eigenvector of A.
Proof. Let A is diagonalizable, which implies that it has n linearly independent eigenvectors
x1 , x2 , . . . , xn with corresponding eigenvalues of 1 , 2 , , n .
We assume that these eigenvalues are ordered so that 1 is the dominant eigenvalue (with a corresponding eigenvector of x1 ).
Because the n eigenvectors x1 , x2 , . . . , xn are linearly independent, they must form a basis for Rn .
For the initial approximation x0 , we choose a nonzero vector such that the linear combination
x0 = c1 x1 + c2 x2 + + cn xn
has nonzero leading coefficients. (If c1 = 0, the power method may not converge, and a different x0
must be used as the initial approximation.
Now, operating both sides of this equation by A produces
Ax0 = Ac1 x1 + Ac2 x2 + + Acn xn
Ax0 = c1 (Ax1 ) + c2 (Ax2 ) + + cn (Axn )
Ax0 = c1 (1 x1 ) + c2 (2 x2 ) + + cn (n 2xn )
As i are eigenvalues, so Axi = i xi .
Repeated multiplication of both sides of this equation by A produces
Ak x0 = c1 (k1 x1 ) + c2 (k2 x2 ) + + cn (kn xn )
which implies that
"
Ak x0 = k1 c1 x1 + c2
2
1
k

x2 + + cn
n
1
k
xn
Now, from our original assumption that 1 is larger in absolute value than the other eigenvalues it
follows that each of the fractions
2 3
n
, , ,
< 1.
1 1
1
Therefore each of the factors
k k
k
2
3
n
,
, ,
1
1
1
must approach 0 as k approaches infinity. This implies that the approximation
Ak x0 k1 c1 x1 ,
c1 6= 0
13
improves as k increases. Since x1 is a dominant eigenvector, it follows that any scalar multiple of x1 is
also a dominant eigenvector. Thus we have shown that Ak x0 approaches a multiple of the dominant
eigenvector of A.
Algorithm
(1) Start
(2) Define matrix A and initial guess x
(3) Calculate y = Ax
(4) Find the largest element in magnitude of matrix y and assign it to K.
(5) Calculate fresh value x = (1/K) y
(6) If [K(n) K(n 1)] > error, goto step 3.
(7) Stop
Example 7. Calculate seven iterations of the power
eigenvector of the matrix
1 2
2 1
1 3
Sol. Using x0 = [1, 1, 1]T as initial approximation,
1
y1 = Ax0 = 2
1
method with scaling to approximate a dominant
0
2
1
we obtain

2 0
1
3
1 2 1 = 1
3 1
1
5
and by scaling we obtain the approximation

3
0.60
x1 = 1/5 1 = 0.20
5
1.00
Similarly we get
1.00
0.45
y2 = Ax1 = 1.00 = 2.20 0.45 = 2.20x2
2.20
1.00
1.35
0.48
y3 = Ax2 = 1.55 = 2.8 0.55 = 2.8x3
2.8
1.00
0.51
y4 = Ax3 = 3.1 0.51
1.00
etc.
After several iterations, we observe that dominant eigenvector is
0.50
x = 0.50
1.00
Scaling factors are approaching to dominant eigenvalue = 3.
Remark 4.1. The power method is useful to compute the eigenvalue but it gives only dominant eigenvalue. To find other eigenvalue we use properties of matrix such as sum of all eigenvalue is equal to the
trace of matrix. Also if is an eigenvalue of A then 1 is the eigenvalue of A1 . Hence the smallest
eigenvalue of A is the dominant eigenvalue of A1 .
Remark 4.2. We consider A I then its eigenvalues are (1 , 2 , . . . ). Now the eigenvalues
1
1
,
, . . . ).
of (A I)1 are (
1 2
The eigenvalues of the original matrix A that is the closest to corresponds to the eigenvalue of largest
magnitude of the shifted and inverted of matrix (A I)1 .
To find the eigenvalue closest to , we apply the Power method to obtain the eigenvalue of (AI)1 .
14
Then we recover the eigenvalue of the original problem by = 1/ + . This method is shifted and
inverted. We solve y = (A I)1 x which implies (A I)y = x. We need not to compute the inverse
of the matrix.
Example 8. Find the eigenvalue of matrix nearest to 3
2 1 0
1 2 1
0 1 2
using power method.
Sol. The eigenvalue of matrix A which is nearest to 3 is the smallest eigenvalue in magnitude of A 3I.
Hence it is the largest eigenvalue of (A 3I)1 in magnitude. Now
1 1 0
A 3I = 1 1 1
0 1 1
0 1 1
B = (A 3I)1 = 1 1 1 .
1 1 0
Starting with

1
x0 = 1
1
we obtain

0 1 1
1
0
y1 = Bx0 = 1 1 1 1 = 1 = 1.x1
1 1 0
1
0

1
y2 = Bx1 = 1 = 1.x2
1

2
0.6667
y3 = Bx2 = 3 = 3 1 = 3x3
2
0.6667
1.6667
0.7143
y4 = Bx3 = 2.3334 = 2.3334 1 = 2.3334x4 .
1.6667
0.7143
After six iterations, we obtain the dominant eigenvalue of matrix B and which is 2.4 and the dominant
eigenvector is
0.7143
1 .
0.7143
1
Now the eigenvalue of matrix A is 3 2.4
= 3 0.42 = 3.42, 2.58. Since 2.58 does not satisfy
|A 2.58I| = 0, therefore the correct eigenvalue of matrix A nearest to 3 is 3.42.
Although the power method worked well in these examples, we must say something about cases in
which the power method may fail. There are basically three such cases :
1. Using the power method when A is not diagonalizable. Recall that A has n linearly Independent
eigenvector if and only if A is diagonalizable. Of course, it is not easy to tell by just looking at A
whether is is diagonalizable.
2. Using the power method when A does not have a dominant eigenvalue or when the dominant
eigenvalue is such that 1 = 2 .
3. If the entries of A contains significant error. Powers Ak will have significant roundoff error in their
entires.
15
Exercises
(1) Using the four-decimal-place computer solve the following system of equation without and with
pivoting
0.729x1 + 0.81x2 + 0.9x3 = 0.6867
x1 + x2 + x3 = 0.8338
1.331x1 + 1.21x2 + 1.1x3 = 1.000
This system has exact solution, rounded to four places x1 = 0.2245, x2 = 0.2814, x3 = 0.3279.
(2) Solve the following system of equations by Gaussian elimination with partial and scaled partial
pivoting
x1 + 2x2 + x3 = 3
3x1 + 4x2 = 3
2x1 + 10x2 + 4x3 = 10.
(3) Consider the linear system
x1 + 4x2 = 1
4x1 + x2 = 0.
The true solution is x1 = 1/5 and x2 = 4/15. Apply the Jacobi and Gauss-Seidel methods
with x(0) = [0, 0]T to the system and find out which methods diverges rapidly. Next, interchange
the two equations to write the system as
4x1 + x2 = 0
x1 + 4x2 = 1
and apply both methods with x(0) = [0, 0]T .
Iterate until ||x x(k) || 105 . Which method converges faster?
(4) Solve the system of equations by Jacobi and Gauss-Seidel method
8x1 + x2 + 2x3 = 1
x1 5x2 + x3 = 16
x1 + x2 4x3 = 7.
(5) Solve this system of equations by Gauss-Seidel, starting with the initial vector [0,0,0]:
4.63x1 1.21x2 + 3.22x3 = 2.22
3.07x1 + 5.48x2 + 2.11x3 = 3.17
1.26x1 + 3.11x2 + 4.57x3 = 5.11.
(6) Show that Gauss-Seidel method does not converge for the following system of equations
2x1 + 3x2 + x3 = 1
3x1 + 2x2 + 2x3 = 1
x1 + 2x2 + 2x3 = 1.
(7) Consider the iteration
x
(k+1)
2 1
=b+
1 2
x(k) ,
k0
where is a real constant. For some values of , the iteration method converges for any choice
of initial guess x(0) , and for some other values of , the method diverges. Find the values of
for which the method converges.
(8) Determine the largest eigenvalue and the corresponding eigenvector of the matrix
4 1 0
A = 1 20 1
0 1 4
correct to three decimals using the power method.
16
(9) Find the smallest eigenvalue in magnitude of the matrix
2 1 0
1 2 1
0 1 2
using four iterations of the inverse power method.
Bibliography
[Gerald]
Pearson, 2003.
and Sons, 2004.
POLYNOMIAL INTERPOLATION AND APPROXIMATIONS
1. Introduction
Polynomials are used as the basic means of approximation in nearly all areas of numerical analysis.
They are used in the solution of equations and in the approximation of functions, of integrals and
derivatives, of solutions of integral and differential equations, etc. Polynomials have simple structure,
which makes it easy to construct effective approximations and then make use of them. For this reason,
the representation and evaluation of polynomials is a basic topic in numerical analysis. We discuss this
topic in the present chapter in the context of polynomial interpolation, the simplest and certainly the
most widely used technique for obtaining polynomial approximations.
Definition 1.1 (Polynomial). A polynomial Pn (x) of degree n is, by definition, a function of the
form
Pn (x) = a0 + a1 x + a2 x2 + + an xn
(1.1)
with certain coefficients a0 , a1 , , an . This polynomial has (exact) degree n in case its leading coefficient an is nonzero.
The power form (1.1) is the standard way to specify a polynomial in mathematical discussions. It is
a very convenient form for differentiating or integrating a polynomial. But, in various specific contexts,
other forms are more convenient. For example, the following shifted power form may be helpful.
P (x) = a0 + a1 (x c) + a2 (x c)2 + + an (x c)n .
(1.2)
It is good practice to employ the shifted power form with the center c chosen somewhere in the interval
[a, b] when interested in a polynomial on that interval.
Remark 1.1. The coefficients in the shifted power form provide derivative values, i.e.,
P (i) (c)
, i = 0, 1, 2, , n.
i!
In effect, the shifted power form provides the Taylor expansion for P (x) around the center c.
ai =
Definition 1.2 (Newton form). A further generalization of the shifted power form is the following
Newton form
P (x) = a0 + a1 x c1 ) + a2 (x c1 )(x c2 ) + + an (x c1 )(x c2 ) (x cn ).
This form plays a major role in the construction of an interpolating polynomial. It reduces to the
shifted power form if the centers c1 , , cn , all equal c, and to the power form if the centers c1 , , cn ,
all equal zero. The following discussion on the evaluation of the Newton form therefore applies directly
to these simpler forms as well.
It is inefficient to evaluate each of the n + 1 terms in the Newton form separately and then sum. This
would take n + n(n + 1)/2 additions and n(n + 1)/2 multiplications. Instead, we notice that the factor
(x c1 ) occurs in all terms but the first and the factor (x c2 ) occurs in remaining factors and then
(x c3 ) and so on. Finally we get
P (x) = a0 + (x c1 ) {a1 + (x c2 )[a2 + (x c3 )[a3 + + [(x cn1 [an1 + (x cn )an ] ]}
Now for any particular value of x takes 2n additions and n multiplications.
Theorem 1.3 (Algorithm). (Nested Multiplication) Let P (x) be the polynomial in Newton form having
coefficients a0 , a1 , , an and centers c1 , c2 , , cn , the following algorithm computes y = P (x) for a
given real number x.
y = an
for i = n 1, n 2, , 0 do
1
INTERPOLATION AND APPROXIMATIONS
y = ai + (x ci )y
end.
Example 1. Consider the interpolating polynomial
P3 (x) = 3 7(x + 1) + 8(x + 1)x 6(x + 1)x(x 1).
We will use nested multiplication to write this polynomial in the power form
P3 (x) = b0 + b1 x + b2 x2 + b3 x3 .
This requires repeatedly applying nested multiplication to a polynomial of the form
P (x) = a0 + x1 (x c1 ) + a1 (x c1 )(x c2 ) + a3 (x c1 )(x c2 )(x c3 ),
and for each application it will perform the following steps,
b3
b2
b1
b0
=
=
=
=
a3
a2 + (z c3 )b3
a1 + (z c2 )b2
a0 + (z c1 )b1 ,
where, in this example, we will set z = 0 each time.

The numbers b0 , b1 , b2 and b3 computed by the algorithm are the coefficients of P (x) in the Newton
form, with the centers c1 , c2 and c3 changed to z, c1 and c2 ; that is,
P (x) = b0 + b1 (x z) + b2 (x z)(x c1 ) + b3 (x z)(x c1 )(x c2 ).
It follows that b0 = P (z), which is why this algorithm is the preferred method for evaluating a
polynomial in Newton form at a given point z.
It should be noted that the algorithm can be derived by writing P (x) in the nested form
P (x) = a0 + (x c1 )[a1 + (x c2 ) [a2 + (x c3 )a3 ]]
and computing P (z) as follows:
P (z) =
=
=
=
=
a0 + (z c1 )[a1 + (z c2 ) [a2 + (z c3 )a3 ]]

a0 + (z c1 )[a1 + (z c2 ) [a2 + (z c3 )b3 ]]
a0 + (z c1 )[a1 + (z c2 )b2 ]
a0 + (z c1 )b1
b0 .
Initially, we have
P (x) = 3 7(x + 1) + 8(x + 1)x 6(x + 1)x(x 1),
so the coefficients of P (x) in this Newton form are
a0 = 3, a1 = 7, a2 = 8, a3 = 6
with the centers
c1 = 1, c2 = 0, c3 = 1.
Applying nested multiplication to these coefficients and centers, with z = 0, yields
b3
b2
b1
b1
=
=
=
=
6
8 + (0 1)(6) = 14
7 + (0 0)(14) = 7
3 + (0 (1))(7) = 4.
It follows that
P (x) = 4 + (7)(x 0) + 14(x 0)(x (1)) + (6)(x 0)(x (1))(x 0)
= 4 7x + 14x(x + 1) 6x2 (x + 1),
For the second application of nested multiplication, we have the centers
c1 = 0, c2 = 1, c3 = 0
with coefficients
a0 = 4, a1 = 7, a2 = 14, a3 = 6.
b3
b2
b1
b1
=
=
=
=
6
14 + (0 0)(6) = 14
7 + (0 (1))(14) = 7
4 + (0 0)(7) = 4.
It follows that
P (x) = 4 + 7(x 0) + 14(x 0)(x 0) + (6)(x 0)(x 0)(x (1))
= 4 + 7x + 14x2 6x2 (x + 1), n
For the third and final application of nested multiplication, we have the centers
c1 = 0, c2 = 0, c3 = 1
with coefficients
a0 = 4, a1 = 7, a2 = 14, a3 = 6.
b3
b2
b1
b1
=
=
=
=
6
14 + (0 (1))(6) = 8
7 + (0 0)(8) = 7
4 + (0 0)(7) = 4.
It follows that
P (x) = 4 + 7(x 0) + 8(x 0)(x 0) + (6)(x 0)(x 0)(x 0)
= 4 + 7x + 8x2 6x3 ,
and the centers are now 0, 0 and 0. Since all of the centers are equal to zero, the polynomial is now in
the power form.
2. Interpolation
In this chapter, we consider the interpolation problem. Suppose we do not know the function f , but
a few information (data) about f . Now we try to compute a function g that approximates f .
2.1. Polynomial Interpolation. The polynomial interpolation problem, also called Lagrange interpolation, can be described as follows: Given (n+1) data points (xi , yi ), i = 0, 1, , n find a polynomial
P of lowest possible degree such
yi = P (xi ),
i = 0, 1, , n.
Such a polynomial is said to interpolate the data. Here yi may be the value of some unknown function
f at xi , i.e. yi = f (xi ).
One reason for considering the class of polynomials in approximation of functions is that they uniformly
approximate continuous function.
Theorem 2.1 (Weierstrass Approximation Theorem). Suppose that f is defined and continuous on
[a, b]. For any > 0, there exists a polynomial P (x) defined on [a, b] with the property that
|f (x) P (x)| < ,
x [a, b].
Another reason for considering the class of polynomials in approximation of functions is that the
derivatives and indefinite integrals of a polynomial are easy to compute.
Theorem 2.2 (Existence and Uniqueness). Given a real-valued function f (x) and n + 1 distinct points
x0 , x1 , , xn , there exists a unique polynomial Pn (x) of degree n which interpolates the unknown
f (x) at points x0 , x1 , , xn .
Proof. Existence: Let (xi , f (xi )), i = 0, 1, , n. We prove the result by the mathematical induction.
The Theorem clearly holds for n = 0, only one data point is given and we can take constant polynomial
P0 (x) = f (x0 ), x.
Assume that the Theorem holds for n k, i.e. there is a polynomial Pk with degree k such that
Pk (xi ) = f (xi ), for 0 i k.
Now we try to construct a polynomial of degree at most k + 1 to interpolate (xi , f (xi )), 0 i k + 1.
Let
Pk+1 (x) = Pk (x) + c(x x0 )(x x1 ) (x xk ).
For x = xk+1 ,
Pk+1 (xk+1 ) = f (xk+1 ) = Pk (xk+1 ) + c(xk+1 x0 )(xk+1 x1 ) (xk+1 xk )
f (xk+1 ) Pk (xk+1 )
.
(xk+1 x0 )(xk+1 x1 ) (xk+1 xk )
Since xi are distinct, the polynomial Pk+1 (x) is well-defined and degree of Pk+1 k + 1. Now
= c =
Pk+1 (xi ) = Pk (xi ) + 0 = Pk (xi ) = f (xi ), 0 i k

and
Pk+1 (xk+1 ) = f (xk+1 )
Above two equations implies
Pk+1 (xi ) = f (xi ), 0 i k + 1.
Therefore Pk+1 (x) interpolate f (x) at all k + 2 nodal points. By mathematical induction result is true
for all polynomials.
Uniqueness: Let there are two such polynomials Pn and Qn such that
Pn (xi ) = f (xi )
Qn (xi ) = f (xi ), 0 i n.
Define
Sn (x) = Pn (x) Qn (x)
Since for both Pn and Qn , degree n, which implies the degree of Sn is also n.
Also
Sn (xi ) = Pn (xi ) Qn (xi ) = f (xi ) f (xi ) = 0, 0 i n.
This implies Sn has at least n + 1 zeros which is not possible as degree of Sn is at most n.
This implies
Sn = 0
= Pn = Qn , x.
Therefore interpolating polynomial is unique.
2.2. Linear Interpolation. We determine a polynomial
P (x) = ax + b
(2.1)
where a and b are arbitrary constants satisfying the interpolating conditions f (x0 ) = P (x0 ) and
f (x1 ) = P (x1 ). We have
f (x0 ) = P (x0 ) = ax0 + b
f (x1 ) = P (x1 ) = ax1 + b.
Lagrange interpolation: Solving for a and b, we obtain
f (x0 ) f (x1 )
a=
x0 x1
f (x0 )x1 f (x1 )x0
b=
x1 x0
Substituting these values in equation (2.1), we obtain
P (x) =
f (x0 ) f (x1 )
f (x0 )x1 f (x1 )x0
x+
x0 x1
x1 x0
x x1
x x0
f (x0 ) +
f (x1 )
x0 x1
x1 x0
= P (x) = l0 (x)f (x0 ) + l1 (x)f (x1 )
x x1
x x0
where l0 (x) =
and l1 (x) =
.
x0 x1
x1 x0
These functions l0 (x) and l1 (x) are called the Lagrange Fundamental Polynomials and they satisfy the
following conditions.
l0 (x) + l1 (x) = 1.
l0 (x0 ) = 1, l0 (x1 ) = 0
l1 (x0 ) = 0, l1 (x1 ) = 1

1, i = j
= li (xj ) = ij =
0, i 6= j.
Newtons divided difference interpolation: Again write P (x) in different way as following
x x0
x x1
f (x0 ) +
f (x1 )
P (x) =
x0 x1
x1 x0
f (x0 )(x x1 ) f (x1 )(x x0 )
=
x0 x1
f (x1 ) f (x0 )
= f (x0 ) + (x x0 )
x1 x0
= f (x0 ) + (x x0 )f [x0 , x1 ].
= P (x) =
f (x1 ) f (x0 )
, is called first divided difference of f (x).
x1 x0
Higher-order interpolation: In this section we take a different approach and assume that the
interpolation polynomial is given as a linear combination of n + 1 polynomials of degree n. This time,
we set the coefficients as the interpolated values, {f (xi )}ni=0 , while the unknowns are the polynomials.
We thus let
n
X
Pn (x) =
f (xi )li (x),
The ratio f [x0 , x1 ] =
i=0
where li (x) are n + 1 polynomials of degree n. Note that in this particular case, the polynomials li (x)
are precisely of degree n (and not n). However, Pn (x), given by the above equation may have a
lower degree. In either case, the degree of Pn (x) is n at the most. We now require that Pn (x) satisfies
the interpolation conditions
Pn (xj ) = f (xj ), 0 j n.
By substituting xj for x we have
Pn (xj ) =
n
X
f (xi )li (xj ), 0 j n.
i=0
Therefore we may conclude that li (x) must satisfy

li (xj ) = ij , i, j = 0, 1, , n
where ij is the Kronecker delta, defined as

ij =
1, i = j
0, i =
6 j.
Each polynomial li (x) has n + 1 unknown coefficients. The conditions given above through delta
provide exactly n + 1 equations that the polynomials li (x) must satisfy and these equations can be
solved in order to determine all li (x)s. Fortunately there is a shortcut. An obvious way of constructing
polynomials li (x) of degree n that satisfy the condition is the following:
(x x0 )(x x1 ) (x xi1 )(x xi+1 ) (x xn )
.
(xi x0 )(xi x1 ) (xi xi1 )(xi xi+1 ) (xi xn )
The uniqueness of the interpolating polynomial of degree n given n + 1 distinct interpolation points
implies that the polynomials li (x) given by above relation are the only polynomials of degree n.
li (x) =
Note that the denominator does not vanish since we assume that all interpolation points are distinct.
We can write the formula for li (x) in a compact form using the product notation.
li (x) =
=
(x x0 )(x x1 ) (x xi1 )(x xi+1 ) (x xn )

(xi x0 )(xi x1 ) (xi xi1 )(xi xi+1 ) (xi xn )
W (x)
, i = 0, 1, , n
(x xi )W 0 (xi )
where
W (x) = (x x0 ) (x xi1 )(x xi )(x xi+1 ) (x xn )
W 0 (xi ) = (xi x0 ) (xi xi1 )(xi xi+1 ) (xi xn ).
We can write the Newton divided difference formula in the following fashion (and we will prove in
next Theorem).
Pn (x) = f (x0 ) + (x x0 )f [x0 , x1 ] + (x x0 )(x x1 )f [x0 , x1 , x2 ] +
+ (x x0 )(x x1 ) (x xn1 )f [x0 , x1 , , xn ]
= f (x0 ) +
n
X
f [x0 , x1 , , xi ]
i=1
i1
Y
(x xj ).
j=0
Divided difference are calculated as following.

f [x1 , x2 ] f [x0 , x1 ]
f [x0 , x1 , x2 ] =
x2 x0

1
f (x2 ) f (x1 ) f (x1 ) f (x0 )
=
x2 x0
x2 x1
x1 x0
f (x0 )
f (x1 )
f (x2 )
=
+
+
(x0 x1 )(x0 x2 ) (x1 x0 )(x1 x2 ) (x2 x0 )(x2 x1 )
In general
f [x1 , x2 , , xn ] f [x0 , x1 , , xn1 ]
f [x0 , x1 , , xn ] =
xn x0
n
X
f (xi )
=
n
Q
i=0
(xi xj )
j=0
Example 2. Given the following four data points. Find a polynomial in Lagrange and Newton form
xi 0 1 3 5
yi 1 2 6 7
to interpolate the data.
Sol. The Lagrange functions are given by
(x 1)(x 3)(x 5)
1
l0 (x) =
= (x 1)(x 3)(x 5).
(0 1)(0 3)(0 5)
15
(x 0)(x 3)(x 5)
1
= (x 0)(x 3)(x 5).
(1 0)(1 3)(1 5)
8
(x 0)(x 1)(x 5)
1
l2 (x) =
= (x)(x 1)(x 5).
(3 0)(3 1)(3 5)
12
(x 0)(x 1)(x 3)
1
l3 (x) =
= (x)(x 1)(x 3).
(5 0)(5 1)(5 3)
40
The interpolating polynomial in the Lagrange form is
l1 (x) =
P3 (x) = l0 (x) + 2l1 (x) + 6l2 (x) + 7l3 (x).

To write the Newton form, we draw divided difference table as following
xi yi first d.d. second d.d. third d.d.

0 1
1
1/3
17/120
1 2
2
3/8
3 6
1/2
5 7
P3 (x) = f (x0 ) + (x 0)f [0, 1] + (x 0)(x 1)f [0, 1, 3] + (x 0)(x 1)(x 3)f [0, 1, 3, 5]
P3 (x) = 1 + x + 1/3x(x 1) 17/120x(x 1)(x 3).
Note that xi can be re-ordered but must be distinct. When the order of some xi are changed, one
obtain the same polynomial but in different form.
Remark 2.1. If more data points are added to the interpolation problem, we have to recalculate all the
cardinal numbers in Lagrange form but in Newton form we need not to recalculate which is the great
advantage of Newton form.
Example 3. Let f (x) = x x2 and P2 (x) be the interpolation polynomial on x0 = 0, x1 and x2 = 1.

Find the largest value of x1 in (0, 1) for which f (0.5) P2 (0.5) = 0.25.
p
Sol. If f (x) = x x2 then our nodes are [x0 , x1 , x2 ] = [0, x1 , 1] and f (x0 ) = 0, f (x1 ) = x1 x21
and f (x2 ) = 0. Therefore
l0 (x) =
(x x1 )(x 1)
(x x1 )(x x2 )
=
,
(x0 x1 )(x0 x2 )
x1
(x x0 )(x x2 )
x(x 1)
=
,
(x1 x0 )(x1 x2 )
x1 (x1 1)
x(x 1)
(x x0 )(x x1 )
=
.
l2 (x) =
(x2 x0 )(x2 x1 )
(1 x1 )
l1 (x) =
P2 (x) = l0 (x)f (x0 ) + l1 (x)f (x1 ) + l2 (x)f (x2 )

q
(x x1 )(x 1)
x(x 1)
x(x 1)
=
.0 +
. x1 x21 +
.0
x1
x1 (x1 1)
(1 x1 )
x(x 1)
.
= p
x1 (1 x1 )
If we now consider f (x) P2 (x), then
f (x) P2 (x) =
p
x(x 1)
x x2 + p
.
x1 (1 x1 )
Hence f (0.5) P2 (0.5) = 0.25 implies

p
0.5(0.5 1)
0.5 0.52 + p
= 0.25
x1 (1 x1 )
Solving for x1 gives
x21 x1 = 1/9
or
q
5
which gives x1 = 12 36
or x1 =
The largest of these is therefore
1
2
(x 1/2)2 = 5/36
q 1
5
+ 36
.
r
1
5
x1 = +
0.8727.
2
36
Theorem 2.3. The unique polynomial of degree n that passes through (x0 , y0 ), (x1 , y1 ), , (xn , yn )
is given by
Pn (x) = f [x0 ] + f [x0 , x1 ](x x0 ) + f [x0 , x1 , x2 ](x x0 )(x x1 ) + +
f [x0 , , xn ](x x0 )(x x1 ) (x xn1 )
Proof. We prove it by induction. The unique polynomial of degree 0 that passes through (x0 , y0 )
is obviously
P0 (x) = y0 = f [x0 ].
Suppose that the polynomial Pk (x) of order k that passes through (x0 , y0 ), (x1 , y1 ), , (xk , yk ) is
Pk (x) = f [x0 ] + f [x0 , x1 ](x x0 ) + f [x0 , x1 , x2 ](x x0 )(x x1 ) + +
f [x0 , , xk ](x x0 )(x x1 ) (x xk1 )
Write Pk+1 (x), the unique polynomial of order (degree) k that passes through (x0 , y0 ), (x1 , y1 ), , (xk , yk )(xk+1 , yk
by
Pk+1 (x) = f [x0 ] + f [x0 , x1 ](x x0 ) + f [x0 , x1 , x2 ](x x0 )(x x1 ) + +
f [x0 , , xk ](x x0 )(x x1 ) (x xk1 ) + C(x x0 )(x x1 ) (x xk1 )(x xk )
We only need to show that
C = f [x0 , x1 , , xk , xk+1 ].
For this, let Qk (x) be the unique polynomial of degree k that passes through (x1 , y1 ), , (xk , yk )(xk+1 , yk+1 ).
Define
x x0
R(x) = Pk (x) +
[Qk (x) Pk (x)]
xk+1 x0
Then,
R(x) is a polynomial of degree k + 1.
R(x0 ) = Pk (x0 ) = y0 ,
xi x0
(Qk (xi ) Pk (xi )) = Pk (xi ) = yi , i = 1, , k,
R(xi ) = Pk (xi ) +
xk+1 x0
R(xk+1 ) = Qk (xk+1 ) = yk+1 .
By the uniqueness, R(x) = Pk+1 (x).
The leading coefficient of Pk+1 (x) is C.
x x0
The leading coefficient of R(x) is the leading coefficient of
[Qk (x) Pk (x)] which is
xk+1 x0
1
(leading coefficient of Qk (x) - leading coefficient of Pk (x)).
xk+1 x0
On the other hand, the leading coefficient of Qk (x) is f [x1 , , xk+1 ], and the leading coefficient of
Pk (x) is f [x0 , , xk ]. Therefore
C=
f [x1 , , xk+1 ] f [x0 , , xk ]

= f [x0 , x1 , , xk+1 ].
xk+1 x0
3. Error Analysis for Polynomial Interpolation

We are given x0 , x1 , , xn , and the corresponding function values
f (x0 ), f (x1 ), , f (xn ), but the we dont know the expression for the function. Let Pn (x) be the
polynomial of order n that passes through the n + 1 points (x0 , f (x0 )), (x1 , f (x1 )), , (xn , f (xn )).
Question: What is the error between f (x) and Pn (x) even we dont know f (x) in advance?
Theorem 3.1. Let f C n [a, b] and x0 , , xn are distinct numbers in [a, b]. Then there exists such
that
f (n) ()
f [x0 , x1 , x2 , , xn ] =
.
n!
Proof. Let
n
X
Pn (x) = f (x0 ) +
f [x0 , x1 , , xk ](x x0 )(x x1 ) (x xk1 )
k=1
be the interpolating polynomial of f in Newtons form. Define

g(x) = f (x) Pn (x).
Since Pn (xi ) = f (xi ) for i = 0, 1, , n, the function g has n + 1 distinct zeros in [a, b]. By the
generalized Rolles Theorem there exists (a, b) such that
g (n) () = f (n) () Pn(n) () = 0.
Here
Pn(n) (x) = n! f [x0 , x1 , , xn ].
Therefore
f [x0 , x1 , , xn ] =
f (n) ()
.
n!
Truncation error: The polynomial P (x) coincides with f (x) at all nodal points and may deviates at
other points in the interval. This deviation is called the truncation error and we write
En (f ; x) = f (x) P (x).
Theorem 3.2. Suppose that x0 , x1 , , xn are distinct numbers in [a, b] and f C n+1 [a, b]. Let Pn (x)
be the unique polynomial of degree n that passes through n + 1 nodal points then
x [a, b], (a, b)
such that
En (f ; x) = f (x) Pn (x) =
(x x0 ) (x xn ) (n+1)
f
().
(n + 1)!
Proof. Let x0 , x1 , , xn are distinct numbers in [a, b] and f C n+1 [a, b]. Let Pn (x) be the unique
polynomial of degree n that passes through n + 1 nodal points.
The truncation error in interpolation is given by
En (f ; x) = f (x) Pn (x).
En (f ; xi ) = 0, i = 0, 1, , n.
Now for any t in the domain, define
g(t) = f (t) P (t) [f (x) P (x)]
(t x0 ) (t xn )
(x x0 ) (x xn )
(3.1)
Now g(t) = 0 at t = x, x0 , x1 , ....., xn . Therefore g(t) satisfy the conditions of Rolles Theorem which
states that between n + 2 zeros of a function, there is at least one zero of (n + 1)th derivative of the
function. Hence there exists a point such that
g (n+1) () = 0
where is some point such that
min(x0 , x1 , , xn , x) < < max(x0 , x1 , , xn , x).
Now differentiate (3.1) (n + 1) times, we get
(n + 1)!
(x x0 ) (x xn )
(n + 1)!
= f (n+1) (t) [f (x) P (x)]
(x x0 ) (x xn )
g (n+1) (t) = f (n+1) (t) P (n+1) (t) [f (x) P (x)]
Here P (n+1) (t) = 0 as P is a n degree polynomial.

Now g (n+1) () = 0 and then solving for f (x) P (x), we obtain
f (x) P (x) =
(x x0 ) (x xn ) (n+1)
f
()
(n + 1)!
Truncation error is given by

En (f ; x) = f (x) P (x) =
(x x0 ) (x xn ) (n+1)
f
().
(n + 1)!
10
If |f (n+1) ()| M then we can obtain a bound of the error

M
|En (f ; x)|
max |(x x0 ) (x xn )|.
(n + 1)! x[a,b]
Example 4. Suppose f (x) = sin x is approximated by an interpolating polynomial P (x) of degree 9 in
[0, 1]. Estimate |f (x) P (x)|, for all x [0, 1].
Sol. f (x) = sin x, n = 9.
|f (10) ()| 1.
Now
n
Y
|x xi | 1, = | (x xi )| 1, x [0, 1].
i=0
Hence
|f (x) P (x)| =
n
Y
1
1
|f (10) ()| | (x xi )|
.
10!
10!
i=0
Example 5. Denoting the interpolating polynomial f (x) on the distinct points x0 , , xn by

find an expression for
n
P
lk (x)f (xk ),
k=0
n
P
k=0
lk (0)xn+1
.
k
Sol. Lagrange interpolating polynomial is

n
X
(x x0 ) (x xn ) (n+1)
f
().
f (x) =
lk (x)f (xk ) +
(n + 1)!
k=0
Let f (x) =
xn+1 ,
x
n+1
n
X
+
lk (x)xn+1
k
k=0
= x
n+1
n
X
(x x0 ) (x xn )
(n + 1)!
(n + 1)!
lk (x)xn+1
+ (x x0 ) (x xn ).
k
k=0
Now put x = 0 to obtain

n
X
lk (0)xn+1
= (1)n x0 x1 xn .
k
k=0
The next example illustrates how the error formula can be used to prepare a table of data that will
ensure a specified interpolation error within a specified bound.
Example 6. Suppose a table is to be prepared for the function f (x) = ex , for x in [0, 1]. Assume
the number of decimal places to be given per entry is d 8 and that the difference between adjacent
xvalues, the step size, is h. What step size h will ensure that linear interpolation gives an absolute
error of at most 106 for all x in [0, 1]?
Sol. Let x0 , x1 , . . . be the numbers at which f is evaluated, x be in [0, 1], and suppose i satisfies
xi x xi+1 .
The error in linear interpolation is

1 2
|f 2 ()|

|f (x) P (x)| = f ()(x xj )(x xj+1 ) =
|(x xi )||(x xi+1 )|.
2
2
The step size is h, so xi = ih, xi+1 = (i + 1)h, and
1
|f (x) p(x)| |f 2 ()| |(x ih)(x (i + 1)h|.
2
Hence
1
|f (x) p(x)| max e
max |(x ih)(x (i + 1)h|
xi xxi+1
2 [0,1]
11
e
max |(x ih)(x (i + 1)h|.
2 xi xxi+1
Consider the function g(x) = (x ih)(x (i + 1)h), for ih x (i + 1)h. Because
g 0 (x) = (x (i + 1)h) + (x ih) = 2 x ih
h
),
2
the only critical point for g is at x = ih + h/2, with g(ih + h/2) = (h/2)2 = h2 /4. Since g(ih) = 0 and
g((i + 1)h) = 0, the maximum value of |g 0 (x)| in [ih, (i + 1)h] must occur at the critical point which
implies that
e
e h2
eh2
|f (x) p(x)|
max |g(x)|
=
.
2 xi xxi+1
2 4
8
Consequently, to ensure that the the error in linear interpolation is bounded by 106 , it is sufficient
for h to be chosen so that
eh2
106 .
8
This implies that h < 1.72 103 .
Because n = (1 0)/h must be an integer, a reasonable choice for the step size is h = 0.001.
Example 7. Determine the step size h that can be used in the tabulation of a function f (x), a x b,
at equally spaced nodal points so that the truncation error of the quadratic interpolation is less than .
Sol. Let x0 , x1 , x2 are three eqispaced points with space h. The truncation error of the quadratic
interpolation is given by
E2 (f ; x)
M
max |(x x0 )(x x1 )(x x2 )|
3! axb
where M = max |f (3) (x)|.

axb
Let x = x0 + th, x1 = x0 + h, x2 = x0 + 2h,

|(x x0 )(x x1 )(x x2 )| = h3 |t(t 1)(t 2)| = g(t)(say)
Now g(t) attains its extreme values if
dg
=0
dt
which gives t = 1 13 . For both values of t, we obtain max |g(t)| =
E2 (f ; x) <
h3
M <
9 3
" #1/3
9 3
= h <
.
M
=
Algorithm: (Divided-Difference Algorithm)

for i = 0, 1, , n do
c(i) = y(i) = f (x(i))
end for
for k = 1, , n do
for i = n, n 1 , k do
c(i) =
c(i) c(i 1)
x(i) x(i k)
end for
end for
.
3 3
Truncation error
12
4. Newton interpolation for equally spaced points

Let n + 1 points x0 , x1 , , xn are arranged consequently with equal spacing h.
Let
xn x0
h=
= xi+1 xi , i = 0, 1, , n
n
Then each xi = x0 + ih, i = 0, 1, , n.
For any x [a, b], we can write x = x0 + sh, s R.
Then x xi = (s i)h.
Now Newton interpolating polynomial is given by
Pn (x) = f (x0 ) +
= f (x0 ) +
n
X
k=1
n
X
f [x0 , x1 , , xk ] (x x0 ) (x xk1 )
f [x0 , x1 , , xk ] (s 0)h (s 1)h (s k + 1)h
k=1
= f (x0 ) +
n
X
f [x0 , x1 , , xk ] s(s 1) (s k + 1) hk
k=1
= f (x0 ) +
n
X
k=1

s k
f [x0 , x1 , , xk ] k!
h
k
where the binomial formula

s
s(s 1) (s k + 1)
.
=
k!
k
This formula is called the Newton forward divided difference formula.
Now we introduce the forward difference operator
4f (xi ) = f (xi+1 ) f (xi ).
4k f (xi ) = 4k1 4f (xi ) = 4k1 [f (xi+1 ) f (xi )], i = 0, 1, , n 1
Using the 4 notation, we can write
f [x0 , x1 ] =
1
f (x1 ) f (x0 )
= 4f (x0 )
x1 x0
h
f [x1 , x2 ] f [x0 , x1 ]
f [x0 , x1 , x2 ] =
=
x2 x0
1
h 4f (x1 )
h1 4f (x0 )
1
=
42 f (x0 )
2h
2!h2
In general
f [x0 , x1 , , xk ] =
1
4k f (x0 ).
k!hk
Therefore
n
X
s
Pn (x) = f (x0 ) +
4k f (x0 ).
k
k=1
This is the Newton forward divided difference interpolation.

If the interpolation nodes are arranged recursively as xn , xn1 , , x0 , a formula for the interpolating
polynomial is similar to previous result. In this case, Newton divided difference formula can be written
as
n
X
Pn (x) = f (xn ) +
f [xn , xn1 , xnk ] (x xn ) (x xnk+1 ).
k=1
If nodes are equally spaced with spacing

h=
xn x0
, xi = xn (n i)h, i = n, n 1, , 0.
n
13
Let x = xn + sh.
Therefore
Pn (x) = f (xn ) +
n
X
f [xn , xn1 , xnk ] (x xn ) (x xnk+1 )
k=1
= f (xn ) +
= f (xn ) +
n
X
k=1
n
X
k=1
f [xn , xn1 , xnk ] (s)h (s + 1)h (s + k 1)h

s
hk k!
f [xn , xn1 , xnk ] (1)
k
k
where the binomial formula is extended to include all real values s

s(s + 1) (s + k 1)
s
s(s 1) (s k + 1)
= (1)k
.
=
k!
k!
k
This formula is called the Newton backward divided-difference formula. Like-wise the forward difference
operator, we introduce the backward-difference operator
f (xi ) = f (xi ) f (xi1 ).
k
f (xi ) = k1 f (xi ) = k1 [f (xi ) f (xi1 )]

then
1
f (xn )
h
1
2 f (xn )
f [xn , xn1 , xn2 ] =
2!h2
f [xn , xn1 ] =
In general
1
k f (xn ).
k!hk
Therefore by using the backward-difference operator, the Newton backward divided-difference formula
can be written as

n
X
s
(1)k k f (xn ).
Pn (x) = f (xn ) +
k
f [xn , xn1 , xn2 , xnk ] =
k=1
This is the Newton backward difference interpolation formula.

Example 8. For the following data, calculate the differences and obtain the forward and backward
difference polynomials. Interpolate at x = 0.25.
x
0.1 0.2 0.30 0.4 0.5
f (x) 1.40 1.56 1.76 2.00 2.28
Sol. The forward difference polynomial is given by
x
0.1
0.2
0.3
0.4
0.5
f (x) first d.d. second d.d. third d.d.

1.40
1.6
2
0
1.56
2.0
2
1.76
2.4
2
2.00
2.8
2.28
P (x) = 1.4 + (x 0.1) 1.6 + (x 0.1)(x 0.2) 2

= 2x2 + x + 1.28.
The backward difference polynomial is given by
P (x) = 2.28 + (x 0.5) 2.8 + (x 0.5)(x 0.4) 2
= 2x2 + x + 1.28.
14
Now
f (0.25) P (0.25) = 1.655.
Differences and Derivatives:
Since
f (x) = f (x + h) f (x)

h2 00
0
= f (x) + hf (x) + f (x) + f (x)
2
0
= hf (x) + O(h)
hf 0 (x).
Similarly
2 f (x) = f (x + 2h) 2f (x + h) + f (x)

(2h)2 00
0
= f (x) + 2hf (x) +
f (x) +
2

h2
2 f (x) + hf 0 (x) + f 00 (x) + + f (x)
2
= h2 f 00 (x) + h3 f 000 (x) +
2 f (x)
.
h2
Similarly we can obtain higher-order derivatives.
= f 00 (x) =
5. Curve Fitting : Principles of Least Squares

Least-squares, also called regression analysis, is one of the most commonly used methods in
numerical computation. Essentially it is a technique for solving a set of equations where there are more
equations than unknowns, i.e. an overdetermined set of equations. Least squares is a computational
procedure for fitting an equation to a set of experimental data points. The criterion of the best fit is
that the sum of the squares of the differences between the observed data points, (xi , yi ), and the value
calculated by the fitting equation, is minimum. The goal is to find the parameter values for the model
which best fits the data. The least squares method finds its optimum when the sum E, of squared
residuals
n
X
E=
ei 2
i=1
is a minimum. A residual is defined as the difference between the actual value of the dependent variable
and the value predicted by the model. Thus
ei = yi f (xi ).
Least square fit of a straight line: Suppose that we are given a data set (x1 , y1 ), (x2 , y2 ), , (xn , yn )
of observations from an experiment. We are interested in fitting a straight line of the form y = a + bx,
to the given data. Now residuals is given by
ei = yi (a + bxi ).
Note that ei is a function of parameters a and b. We need to find a and b such that
n
X
E=
e2i
i=1
is minimum. The necessary condition for the minimum is given by

E
E
= 0,
= 0.
a
b
The conditions yield
n
X
E
=
[yi (a + bxi )](2) = 0
a
i=1

n
X
yi = na + b
n
X
i=1
15
xi
(5.1)
i=1
X
E
=
[yi (a + bxi )](2xi ) = 0
b
i=1
n
X
x i yi = a
n
X
i=1
xi + b
i=1
n
X
x2i .
(5.2)
i=1
These equations (5.1-5.2) are called normal equations, which are to be solved to get desired values for
a and b.
Example 9. Obtain the least square straight line fit to the following data
x
0.2
0.4
0.6
0.8 1
f (x) 0.447 0.632 0.775 0.894 1
Sol. The normal equations for fitting a straight line y = a + bx are
5
X
f (xi ) = 5a + b
i=1
5
X
From the data, we have
5
P
xi = 3,
i=1
i=1
5
P
i=1
5
X
xi
i=1
xi f (xi ) = a
5
X
xi + b
i=1
x2i = 2.2,
5
P
5
X
x2i
i=1
f (xi ) = 3.748, and
i=1
5
P
xi f (xi ) = 2.5224.
i=1
Therefore
5a + 3b = 3.748, 3a + 2.2b = 2.5224.
The solution of this system is a = 0.3392 and b = 0.684. The required approximation is y = 0.3392 +
0.684x.
5
P
Least square error=
[f (xi ) (0.3392 + 0.684xi )2 ] = 0.00245.
i=1
Example 10. Find the least square approximation of second degree for the discrete data
x
2 1 0 1 2
f (x) 15 1 1 3 19
Sol. We fit a second degree polynomial y = a + bx + cx2 .
By principle of least squares, we minimize the function
E=
5
X
[yi (a + bxi + cx2i )]2 .
i=1
The necessary condition for the minimum is given by

E
E
E
= 0,
= 0,
= 0.
a
b
c
The normal equations for fitting a second degree polynomial are
5
X
f (xi ) = 5a + b
i=1
5
X
i=1
xi f (xi ) = a
5
X
xi + c
i=1
5
X
i=1
xi + b
5
X
x2i
i=1
5
X
i=1
x2i + c
5
X
i=1
x3i
16

5
X
x2i f (xi )
=a
i=1
We have
5
P
xi = 0,
i=1
5
P
i=1
x2i
+b
i=1
4
P
x2i = 10,
5
X
i=1
x3i = 0,
5
P
i=1
x4i = 34,
5
X
i=1
5
P
x3i
+c
5
X
x4i .
i=1
f (xi ) = 39,
i=1
5
P
xi f (xi ) = 10,
i=1
5
P
i=1
x2i f (xi ) =
140.
From given data
5a + 10c = 39
10b = 10
10a + 34c = 140.
31
37
, b = 1, and c = .
The solution of this system is a =
35
7
1
The required approximation is y = (37 + 35x + 155x2 ).
35
Example 11. Use the method of least square to fit the curve f (x) = c0 x + c1 / x. Also find the least
x
0.2 0.3 0.5 1 2
f (x) 16 14 11 6 3
square error.
Sol. By principle of least squares, we minimize the error
E(c0 , c1 ) =
5
X
i=1
c1
[f (xi ) c0 xi ]2
xi
We obtain the normal equations

c0
5
X
x2i
+ c1
i=1
c0
5
X
xi =
i=1
5
X
xi + c1
xi f (xi )
i=1
5
5
X
X
1
f (xi )
=
.
xi
xi
i=1
i=1
5
X
i=1
We have
5
X
xi = 4.1163,
5
5
X
X
1
= 11.8333,
x2i = 5.38
xi
i=1
i=1
5
X
xi f (xi ) = 24.9,
i=1
i=1
5
X
i=1
f (xi )
= 85.0151.
xi
The normal equations are given by

5.3c0 + 4.1163c1 = 24.9
4.1163c0 + 11.8333c1 = 85.0151.
Whose solution is c0 = 1.1836, c1 = 7.5961.
Therefore, the least square fit is given as
f (x) =
7.5961
1.1836x.
x
The least square error is given by

E=
5
X
i=1
7.5961
[f (xi )
+ 1.1836xi ]2 = 1.6887
xi
Example 12. Obtain the least square fit of the form y = abx to the following data
17
x
1
2
3
4
5
6
7
8
f (x) 1.0 1.2 1.8 2.5 3.6 4.7 6.6 9.1
Sol. The curve y = abx takes the form Y = A + Bx after taking log, where Y = log y, A = log a and
B = log b.
Hence the normal equations are given by
8
X
Yi = 8A + B
i=1
8
X
8
X
xi
i=1
xi Yi = A
i=1
x
X
xi + B
i=1
8
X
x2i
i=1
From the data, we form the following table.

x
y
Y = log y
xY
x2
1
1.0
0.0
0.0
1
2
1.2
0.0792
0.1584
4
3
1.8
0.2553
0.7659
9
4
2.5
0.3979
1.5916 16
5
3.6
0.5563
2.7815 25
6
4.7
0.6721
4.0326 36
7
6.6
0.8195
5.7365 49
8
9.1
0.9590
7.6720 64
36 30.5
3.7393
22.7385 204
Putting the values, we obtain
8A + 36B = 3.7393,
36A + 204B = 22.7385
= A = 0.1656,
= a = 0.68,
x
The required curve is y = (0.68)(1.38) .
B = 0.1407
b = 1.38
Example 13. We are given the values of a function of the variable t. Obtain a least square fit of the
t
0.1 0.2 0.3 0.4
f (t) 0.76 0.58 0.44 0.35
form f = ae3t + be2t .
Sol. Using the method of least square, we minimize the error
E=
4
X
[fi (ae3ti + be2ti )]2
i=1
and obtain the normal equations

4
X
E
=
(fi ae3ti be2ti )e3ti = 0
a
E
=
b
a
4
X
i=1
i=1
4
X
(fi ae3ti be2ti )e2ti = 0
i=1
e6ti +
4
X
i=1
e5ti
4
X
i=1
fi e2ti = 0
18
4
X
i=1
5ti
+b
4
X
4ti
4
X
i=1
fi e2ti = 0
i=1
Using the table values, we obtain the system of equations

1.106023a + 1.332876b 1.16542 = 0
1.332876a + 1.7622740b 1.409764 = 0,
which have the solution a = 0.6853, b = 0.3058.
Therefore the least square fit is given by
f (t) = 0.6853e3t + 0.3058e2t .
Remark 5.1. If data is quite large then we can make it small by changing the origin and appropriating
scaling.
Example 14. Show that the line of fit to the following data is given by y = 0.7x + 11.28.
x 0 5 10 15 20 25
y 12 15 17 22 24 30
Sol. Here n = 6. We fit a line of the form y = A + Bx.
x 15
, v = y 20 and line of the form v = a + bu.
Let u =
5
x y
u
v uv u2
0 12 3 8 24 9
5 15 2 5 10 4
10 17 1 3 3 1
15 22 0
2
0 0
20 24 1
4
4 1
25 30 2 10 20 4
3 0 61 19
The normal equations are,
0 = 6a 3b
61 = 3a + 19b.
By solving a = 1.7428 and b = 3.4857.
Therefore equation of the line is v = 1.7428 + 3.4857u.
Changing in to original variable, we obtain

x 15
y 20 = 1.7428 + 3.4857
5
= y = 11.2857 + 0.6971x.
Exercises
(1) Find the unique polynomial P (x) of degree 2 or less such that
P (1) = 1, P (3) = 27, P (4) = 64
using Lagrange and Newton interpolation. Evaluate P (1.05).
(2) Let P3 (x) be the Lagrange interpolating polynomial for the data (0, 0), (0.5, y), (1, 3) and (2, 2).
Find y if the coefficient of x3 in P3 (x) is 6.
(3) Calculate a quadratic interpolate in Newton form to e0.826 from the function values
e0.82 = 2.270500, e0.83 = 2.293319, e0.84 = 2.316367.
(4) Let f (x) = ln(1 + x), x0 = 1, x1 = 1.1. Use Lagrange linear interpolation to find the
approximate value of f (1.04) and obtain a bound on the truncation error.
(5) Use the following values and four-digit rounding arithmetic to construct a third degree Lagrange
polynomial approximation to f (1.09). The function being approximated is f (x) = log10 (tan x).
Use this knowledge to find a bound for the error in the approximation.
f (1.00) = 0.1924, f (1.05) = 0.2414, f (1.10) = 0.2933, f (1.15) = 0.3492.
19
(6) Determine the step size h that can be used in the tabulation of a function f (x), a x b, at
equally spaced nodal points so that the truncation error of the cubic interpolation is less than
.
(7) If linear interpolation is used to interpolate the error function
Z x
2
2
f (x) =
ex dt,
0
show that the
error of linear interpolation using data (x0 , f0 ) and (x1 , f1 ) cannot exceed
(x1 x0 )2 /2 2e.
(8) Suppose that f (x) = ex cos x is to be approximated on [0, 1] by an interpolating polynomial on
n + 1 equally spaced points. Determine n so that the truncation error will be less than 0.0001
in this interval.
(9) The following data represents the function f (x) = ex .
x
1
1.5
2.0
2.5
f (x) 2.7183 4.4817 7.3891 12.1825
Estimate the value of f (2.25) using the Newtons forward and backward difference interpolation.
Compare with the exact value. Also obtain the bound of the truncation error.
(10) Construct the interpolating polynomial that fits the following data using Newton forward and
backward difference interpolation.
x
0
0.1
0.2
0.3
0.4
0.5
f (x) 1.5 1.27 0.98 0.63 0.22 0.25
Hence find the values of f (x) at x = 0.15 and 0.45.
(11) The error function erf (x) is defined by the integral
Z x
2
2
et dt.
erf (x) =
0
(A) Approximate erf (0.08) by linear interpolation in the given table of correctively rounded
values. Estimate the total error.
x
0.05
0.10
0.15
0.20
erf (x) 0.05637 0.11246 0.16800 0.22270
(B) Suppose that the table were given with 7 correct decimals and with the step size 0.001.
Find the maximum total error for linear interpolation in the interval 0 x 0.10
in this table.
(12) Determine the spacing h in a table of equally spaced values of the function f (x) = x between
1 and 2, so that interpolation with a quadratic polynomial will yield an accuracy of 5 108 .
sin x
(13) The following data are parts of a table for function g(x) = 2 .
x
x
0.1
0.2
0.3
0.4
0.5
f (x) 9.9833 4.9667 3.2836 2.4339 1.9177
Calculate g(0.25) as accurately as possible
(a) by interpolating directly in this table, (b) by first calculating xg(x) and then interpolating
directly in that table, (c) explain the difference between the results obtained in (a) and (b),
respectively.
(14) By the method of least square fit a curve of the form y = axb to the following data
x
2
3
4
5
y 27.8 62.1 110 161
(15) Determine the least squares approximation of the type ax2 + bx + c, to the function 2x at the
points xi = 0, 1, 2, 3, 4.
(16) Experiments with a periodic process gave the following data :
t
0
50
100
150
200
y 0.754 1.762 2.041 1.412 0.303
Estimate the parameter a and b in the model y = a + b sin t, using the least square approximation.
20
Bibliography
[Gerald]
Pearson, 2003.
and Sons, 2004.
NUMERICAL INTEGRATION
1. Introduction
The general problem is to find the approximate value of the integral of a given function f (x) over
an interval [a, b]. Thus
Z b
f (x)dx.
(1.1)
I=
a
Problem can be solved by using the Fundamental Theorem of Calculus by finding an anti-derivative
F of f , that is, F 0 (x) = f (x), and then
Z b
f (x)dx = F (b) F (a).
a
But finding an anti-derivative is not an easy task in general. Hence, it is certainly not a good approach
for numerical computations.
In this chapter well study methods for finding integration rules. Well also consider composite versions
of these rules and the errors associated with them.
2. Elements of numerical integration
The basic method involved in approximating the integration is called numerical quadrature and uses
a sum of the type
Z b
f (x)dx i f (xi ).
(2.1)
a
The method of quadrature is based on the polynomial interpolation. We divide the interval [a, b] in to
a set of distinct nodes {x0 , x1 , x2 , , xn }. Then we approximate the function f (x) by an interpolating
polynomial, say Lagrange interpolating polynomial is used to approximate f (x), i.e.
f (x) = Pn (x) + en
n
n
X
f (n+1) () Y
=
f (xi )li (x) +
(x xi ).
(n + 1)!
i=0
i=0
Here = (x) (a, b) and

n
Y
li (x) =
j=0, j6=i
x xj
, 0 i n.
xi xj
Therefore
Z
f (x)dx =
a
=
=
Z
Pn (x)dx +
a
n
X
i=0
n
X
en (x)dx
a
Z
f (xi )
li (x)dx +
a
1
(n + 1)!
i f (xi ) + En
i=0
where
Z
i =
li (x)dx.
a
1
Z
a
f (n+1) ()
n
Y
(x xi )dx
i=0
Error in the numerical quadrature is given by

1
En =
(n + 1)!
(n+1)
n
Y
() (x xi )dx.
i=0
If |f (n+1) ()| M , then

En
Z bY
n
(x xi )dx.
M
(n + 1)!
a i=0
We can also use Newton divided difference interpolation to approximate the function f (x).
Before we proceed, we define an alternative method to analyze error which is based on the method of
undetermined coefficients.
Definition 2.1. An integration method of the form
Z
f (x)dx =
a
n
X
1
i f (xi ) +
(n + 1)!
i=0
(n+1)
()
n
Y
(x xi )dx
i=0
is said to be of order p if it provides exact results for all polynomials of degree less than or equal to p.
Now if the above method gives exact results for polynomials of degree less than or equal n, then the
error term will be zero for all polynomials of degree n.
IF |f (n+1) ()| M then error term can be written as
En
=
M
(n + 1)!
Z bY
n
(x xi )dx
a i=0
C
M.
(n + 1)!
For f (x) = xn+1 ,

Z
n+1
dx =
n
X
i xi n+1 +
i=0
b
Z
= C =
n+1
dx
C
(n + 1)!
(n + 1)!
n
X
i xi n+1 .
i=0
The number C is called error constant. By using the notation, we can write error term as following
En =
C
f (n+1) ().
(n + 1)!
3. Newton-Cotes Formula
ba
Let all nodes are equally spaced with spacing h =
. The number h is also called the step
n
length.
Let x0 = a and xn = b then xi = a + ih, i = 0, 1, , n.
The general quadrature formula is given by
Z
f (x)dx =
a
n
X
i=0
i f (xi ) + En .
This formula is called Newton-Cotes if all points are equally spaced.

Let x = a + ht, t R. Now
n
Y
x xj
li (x) =
xi xj
j=0, j6=i
n
Y
j=0, j6=i
n
Y
j=0, j6=i
(a + ht) (a + jh)
(a + ih) (a + jh)
tj
ij
Therefore
b
Z
i =
li (x)dx
a
Z
= h
0
n
Y
j=0, j6=i
tj
dt
ij
(dx = hdt).
For n=1. x0 = a, x1 = b, and h = b a and we use linear interpolation. The values of the multipliers
are given by
Z 1
t1
dt = h/2.
0 = h
0 01
Z 1
t0
1 = h
dt = h/2.
0 10
Hence
Z b
f (x)dx = 0 f (x0 ) + 1 f (x1 )
a
h
[f (a) + f (b)].
2
This is called the Trapezoidal rule. Now error is given by
Z
1 b 00
E1 =
f ()(x a)(x b)dx
2 a
=
Since (x a)(x b) does not change its sign in [a, b], therefore by the Weighted Mean-Value Theorem,
there exists (a, b) such that
Z b
1 00
E1 =
f ()
(x a)(x b)dx
2
a

1
3
00
= f () (b a)
6
3
h
= f 00 ().
12
Trapezoidal rule (with error) is given by
Z b
h
h3
f (x)dx = [f (a) + f (b)] f 00 ().
2
12
a
Geometrically, it is the area of Trapezium (Trapezoid) with width h and ordinates f (a) and f (b).
For n=2. We take x0 = a, x1 = a+b
2 , x2 = b. The values of the multipliers are given by
Z 2
(t 1)(t 2)
0 = h
dt = h/3
0 (0 1)(0 2)
Z 2
(t 0)(t 2)
1 = h
dt = 4h/3
0 (1 0)(1 2)
2
Z
2 = h
0
(t 0)(t 1)
dt = h/3.
(2 0)(2 1)
Hence
Z
a
f (x)dx = 0 f (x0 ) + 1 f (x1 ) + 2 f (x2 )

a+b
h
f (a) + 4f (
) + f (b) .
=
3
2
This is Simpsons 31 rule.

To calculate the error in Simpsons rule, we use alternate method by making the method exact for all
polynomial degree up to 2, therefore
C
E2 = f (3) ().
3!
Since Simpsons rule is exact for polynomial degree up to 2, therefore
"
#

3
Z b
a
+
b
h
a3 + 4
+ b3
x3 dx
C =
3
2
a
= 0.
This implies Simpsons rule is exact for polynomial up to degree 3 also. Therefore error is given by
C
E3 = f (4) ().
4!

Z b
h
a+b 4
C =
x4 dx [a4 + 4
+ b4 ]
3
2
a
(b a)5
=
.
120
Hence error in Simpsons rule is given by
(b a)5 (4)
f ()
120 4!
h5
= f (4) ().
90
E3 =
ba
For n=3. Three nodal points are a = x0 , x1 , x2 , x3 = b with h =
. We get the Simpsons 83 rule
3
Z b
3h
f (x)dx =
[f (x0 ) + 3f (x1 ) + 3f (x2 ) + f (x3 )].
8
a
Error in Simpsons three-eighth rule is given by
3
E4 = h5 f (4) ().
80
Example 1. Find the value of the integral
Z 1
dx
I=
0 1+x
using trapezoidal and Simpsons rule. Also obtain a bound on the errors. Compare with exact value.
Sol.
f (x) =
1
1+x
By trapezoidal rule
IT = h/2[f (a) + f (b)]
Here a = 0, b = 1, h = b a = 1.
I = 1/2[1 + 1/2] = 0.75
Exact value
Iexact = ln 2 = 0.693147
Error= |0.75 0.693147| = 0.056853
The error bound for the trapezoidal rule is given by
E1 h3 /12 max |f 00 ()|
0x1

2

1/12 max
0x1 (1 + x)3
1/6
Similarly by using Simpsons rule with h = (b a)/2 = 1/2, we obtain
IS = h/3[f (0) + 4f (1/2) + f (1)] = 1/6(1 + 8/3 + 1/2) = 0.69444
Error= |0.75 0.69444| = 0.001297.
The error bound for the Simpsons rule is given by
h5
max |f 0000 ()|
90 0x1

24
1

max
2880 0x1 (1 + x)5

0.008333.
E3
Example 2. Find the quadrature formula by method of undetermined coefficients

Z 1
f (x)
p
dx = 1 f (0) + 2 f (1/2) + 3 f (1)
x(1 x)
0
which is exact for polynomials of highest possible degree. Then use the formula to evaluate
Z 1
dx
x x3
0
and compare with the exact value.
Sol. We make the method exact for polynomials up to degree 2.
Z 1
dx
p
f (x) = 1 : I1 =
= 1 + 2 + 3
x(1 x)
0
Z 1
xdx
p
f (x) = x : I2 =
= 1/22 + 3
x(1 x)
0
Z 1
x2 dx
2
p
= 1/42 + 3
f (x) = x : I3 =
x(1 x)
0
Now
Z 1
Z 1
Z 1
dx
dx
dt
p
p
I1 =
=
=
= [sin1 t]11 =
2
2
1t
x(1 x)
1 (2x 1)
0
0
1
Similarly
I2 = /2
I3 = 3/8.
Therefore
1 + 2 + 3 =
1/22 + 3 = /2
1/42 + 3 = 3/8
By solving these equations, we obtain 1 = /4, 2 = /2, 3 = /4. Hence
Z 1
f (x)
p
dx = /4[f (0) + 2f (1/2) + f (1)].
x(1 x)
0
Z
I=
dx
=
x x3
dx
p
=
1 + x x(1 x)
f (x)dx
p
.
x(1 x)
Here f (x) = 1/ 1 + x.
By using the above formula, we obtain
#
2
2 2
= 2.62331.
I = /4 1 + +
2
3
"
The exact value of integral is

I = 2.6220575.
4. Gauss Quadrature
In the numerical integration method if both nodes xi and multipliers i are unknown then method is
called Gaussian quadrature. We can obtain the unknowns by making the method exact for polynomials
of degree as high as required. The formulas are derived for the interval [1, 1] and any interval [a, b]
can be transformed to [1, 1] by taking the transformation x = At + B which gives a = A + B and
ba
b+a
b = A + B and after solving we get x =
t+
.
2
2
We consider
Z 1
n
X
w(x)f (x)dx =
i f (xi )
1
i=0
where w(x) is appropriate weight function.

Gauss-Legendre Integration Methods: In this integration procedure we take w(x) = 1. The
integration formula is given by
Z 1
n
X
f (x)dx =
i f (xi ).
1
i=0
One-point formula: The formula is given by

Z 1
f (x)dx = 0 f (x0 ).
1
The method has two unknowns 0 and x0 . Make the method exact for f (x) = 1, x, we obtain
Z 1
f (x) = 1 :
dx = 2 = 0
1
f (x) = x :
xdx = 0 = 0 x0 = x0 = 0.
1
Therefore one-point formula is given by

1
f (x)dx = 2f (0).
1
The error in approximation is given by

E1 =
C 00
f ()
2!
where error constant C is given by

Z
C=
x2 dx 2f (0) = 2/3.
Hence
1
E1 = f 00 (),
3
1 < < 1.
Two-point formula:
Z
f (x)dx = 0 f (x0 ) + 1 f (x1 ).

1
The method has four unknowns. Make the method exact for f (x) = 1, x, x2 , x3 , we obtain
Z 1
dx = 2 = 0 + 1
f (x) = 1 :
(4.1)
1
Z 1
f (x) = x
:
1
Z 1
f (x) = x2
:
1
Z 1
f (x) = x3
:
1
xdx = 0 = 0 x0 + 1 x1
(4.2)
x2 dx = 2/3 = 0 x20 + 1 x21
(4.3)
x3 dx = 0 = 0 x30 + 1 x31
(4.4)
Now eliminate 0 from second and fourth equation

1 x31 1 x1 x20 = 0
which gives
1 x1 (x1 x0 )(x1 + x0 ) = 0
Since 1 6= 0, x0 6= x1 and x1 6= 0 (if x1 = 0 then by second equation x0 = 0). Therefore x1 = x0 .
Substituting in second equation, we obtain 0 = 1 .
By substituting these values in first equation,
we get 0 = 1= 1.
Third equation gives x20 = 1/3 or x0 = 1/ 3 and x1 = 1/ 3.
Therefore, the two-point formula is given by

Z 1
1
1
f (x)dx = f
+f
.
3
3
1
The error is given by
E3 =
C (4)
f ()
4!
and
1

1
1
8
x4 dx f
+f
= .
45
3
3
1
The error in two-point formula is given by
Z
C=
1 (4)
f (), 1 < < 1.
135
Three-point formula: By taking n = 2, we obtain
Z 1
f (x)dx = 0 f (x0 ) + 1 f (x1 ) + 2 f (x2 ).
E3 =
The method has six unknowns. Make the method exact for f (x) = 1, x, x2 , x3 , x4 , x5 , we obtain
f (x) = 1
f (x) = x
:
:
2 = 0 + 1 + 2
0 = 0 x0 + 1 x1 + 2 x2
f (x) = x2
2/3 = 0 x20 + 1 x21 + 2 x22
f (x) = x3
0 = 0 x30 + 1 x31 + 2 x32
f (x) = x4
2/5 = 0 x40 + 1 x41 + 2 x42
f (x) = x5
0 = 0 x50 + 1 x51 + 2 x52
p
3/5, x1 = 0 and
By solving
these
equations,
we
obtain
=
5/9
and
=
8/9.
x
=
0
2
1
0
p
x2 = 3/5.
Therefore formula is given by
"
r !
r !#
Z 1
1
3
3
f (x)dx =
5f
+ 8f (0) + 5f
.
9
5
5
1
The error in three-point formula is given by

E5 =
C (6)
f ()
6!
where
Z
C=
1
x6 dx
1
5
9
3
5
!6
r
+80+5+
1
f (6) (),
15750
E5 =
3
5
!6
= 8 .
175
1 < < 1.
Example 3. Evaluate
Z
2x
dx
4
1 1+x
using Gauss-Legendre 1 and 2-point formula. Also compare with the exact value.
I=
Sol. Firstly we change the interval [1, 2] in to [1, 1] by taking x =

2
Z
I=
1
2x
dx =
1 + x4
t+3
, dx = dt/3.
2
8(t + 3)
dt
16 + (t + 3)4
Let
f (t) =
8(t + 3)
.
16 + (t + 3)4
By 1-point formula
I = 2f (0) = 0.4948
By 2-point formula

1
1
+f
= 0.5434
I=f
3
3
Now exact value of the integral is given by
Z 2
2x
I=
dx = tan1 4 = 0.5408
4
4
1 1+x
Example 4. Evaluate
Z
I=
(1 x2 )
3/2
cos x dx
using Gauss-Legendre 3-point formula.

3/2
Sol. Using Gauss-Legendre 3-point formula with f (x) = (1 x2 ) cos x, we obtain

"
r !
r !#
1
3
3
I =
5f
+ 8f (0) + 5f
9
5
5
"
!
r
r !#
3/2
1
2 3/2
3
2
3
=
cos
cos
5
+8+5
9
5
5
5
5
= 1.08979.
5. Composite Integration
As the order of integration method is increased, the order of the derivative involved in error term also
increase. Therefore, we can use higher-order method if the integrand is differentiable up to required
degree. We can apply lower-order methods by dividing the whole interval in to subintervals and then
we use any Newton-Cotes or Gauss quadrature method for each subintervals separately.
Composite Trapezoidal Method: We divide the interval [a, b] into N subintervals with step size
ba
h=
and taking nodal points a = x0 < x1 < < xN = b where xi = x0 +i h, i = 1, 2, , N 1.
N
Now
Z b
f (x)dx
I =
Z xN
Z x2
Za x1
f (x)dx + +
f (x)dx.
f (x)dx +
=
x0
xN 1
x1
Now use trapezoidal rule for each of the integrals on the right side, we obtain
h
[(f0 + f1 ) + (f1 + f2 ) + + (fN 1 + fN )]
2
h
[f0 + 2(f1 + f2 + + fN 1 ) + fN ]
=
2
where fi = f (xi ), i = 0, 1, , N . This formula is composite trapezoidal rule. The error in the
composite integration is given by
I =
h3 00
[f (1 ) + f 00 (2 ) + + f 00 (N )]
12
where xi1 i xi , i = 1, 2, , N. The error in numerical approximation decrease as N increases
ba
as h =
.
N
Composite Simpsons Method: Simpsons rule require three abscissas. We divide the interval [a, b]
in to 2N (to get odd number of abscissas) subintervals with step size h = ba
2N and taking nodal points
a = x0 < x1 < < x2N = b where xi = x0 + i h, i = 1, 2, , 2N 1. We write
Z b
f (x)dx
I =
Z x4
Z x2N
Za x2
f (x)dx.
=
f (x)dx +
f (x)dx + +
E=
x0
x2N 2
x2
Now use Simpsons rule for each of the integrals on the right side to obtain
h
[(f0 + 4f1 + f2 ) + (f2 + 4f3 + f4 ) + + (f2N 2 + 4f2N 1 + f2N )]
3
h
=
[f0 + 4(f1 + f3 + + f2N 1 ) + 2(f2 + f4 + + f2N 2 ) + f2N ].
3
This formula is called composite Simpsons rule. The error in the integration rule is given by
I =
h5 (4)
[f (1 ) + f (4) (2 ) + + f (4) (N )]
90
i x2i , i = 1, 2, , N.
E=
where x2i2
Example 5. Evaluate the integral

Z
1
1
+
x
0
by using the composite trapezoidal and Simpsons rule with 2 and 4 subintervals.
I=
Sol. Let IT and IS represent the values of the integral by composite trapezoidal and composite
Simpsons rule, respectively.
ba
Case I: Number of subintervals N = 2 then h =
= 1/2. Therefore we have two subintervals for
N
trapezoidal and one interval for Simpsons rule.
We have
1
IT = [f (0) + 2f (1/2) + f (1)] = 0.70833.
4
1
IS = [f (0) + 4f (1/2) + f (1)] = 0.69444.
6
10
Case II: Number of subintervals N = 4 then h = 1/4. We have four subintervals for trapezoidal rule
and two subintervals for Simpsons rule.
1
IT = [f (0) + 2(f (1/4) + f (1/2) + f (3/4)) + f (1)] = 0.69702.
8
1
[f (0) + 4f (1/4) + 2f (1/2) + 4f (3/4) + f (1)] = 0.69325.
IS =
12
Example 6. Evaluate
Z 1
dx
I=
0 1+x
by subdividing the interval [0, 1] into two equal parts and then by using Gauss-Legendre three-point
formula.
Sol.
"
1
f (x)dx =
5f
9
1
r !
3
+ 8f (0) + 5f
5
r !#
3
.
5
Let
1
Z 1/2
Z 1
dx
dx
dx
I=
=
+
= I1 + I2 .
1+x
0 1+x
0
1/2 1 + x
t+1
z+3
Now substitute x =
and x =
in I1 and I2 , respectively to change the limits to [1, 1].
4
4
We have dx = dt/4 and dx = dz/4 for integral I1 and I2 , respectively.
Therefore
"
#
Z 1
1
5
5
dt
8
p
p
=
= 0.405464
I1 =
+ +
9 5 3/5 5 5 + 3/5
1 t + 5
"
#
Z 1
dz
1
5
8
5
p
p
I2 =
=
+ +
= 0.287682
9 7 3/5 7 7 + 3/5
1 z + 7
Z
Hence
I = I1 + I2 = 0.405464 + 0.287682 = 0.693146.
Example 7. The area A inside the closed curve y 2 + x2 = cos x is given by
Z
1/2
A=4
(cos x x2 ) dx
0
where is the positive root of the equation cos x = x2 .

(a) Compute with three correct decimals.
(b) Use trapezoidal rule to compute the area A with an absolute error less than 0.05.
Sol. (a) Using Newton method to find the root of the equation
f (x) = cos x x2 = 0,
we obtain the following iteration scheme
xk+1 = xk +
cos xk x2k
, k = 0, 1, 2,
sin xk + 2xk

0.62758
= 0.92420
1.47942
0.25169
= 0.92420 +
= 0.82911
2.64655
0.011882
= 0.82911 +
= 0.82414
2.39554
0.000033
= 0.82414 +
= 0.82413.
2.38226
x1 = 0.5 +
x2
x3
x4
11
Hence the value of correct to three decimals is 0.824.

(b) Substituting the value of , we obtain
Z 0.824
1/2
(cos x x2 ) dx.
A=4
0
Using composite trapezoidal method by taking h = 0.824, 0.412, and 0.206 respectively, we obtain the
following approximations of the area A.
4(0.824)
[1 + 0.017753] = 1.67725
2
4(0.412)
A =
[1 + 2(0.864047) + 0.017753] = 2.262578
2
4(0.206)
A =
[1 + 2(0.967688 + 0.864047 + 0.658115) + 0.017753] = 2.470951.
2
Algorithm (Composite Trapezoidal Method):
A =
Given a function f (x) :

(Get user inputs). Input a, b = endpoints of interval
n = number of intervals
Set h = ba
n .
Set sum = 0
For i = 1 to n 1 Do
Set x = a + h i,
Set sum = sum +2 f (x)
End Do (For i).
Set sum = sum+ f (a) + f (b).
Set ans = sum h/2.
The value of the integral is given by ans.

Algorithm (Composite Simpsons Method):
INPUT endpoints a, b; even positive integer n.

OUTPUT approximation XI to the given integral I.
Set h = (b a)/n.
Set XI0 = f (a) + f (b);
XI1 = 0; (Summation of f (x2i1 ).)
XI2 = 0. (Summation of f (x2i ).)
For i = 1, 2, , n 1 do Steps 1 and 2.
Step 1 Set X = a + ih.
Step 2 If i is even then set XI2 = XI2 + f (X)
else set XI1 = XI1 + f (X).
Set XI = h(XI0 + 2 XI2 + 4 XI1)/3.
Print the output XI.
Exercises
(1) Given
Z
I=
x ex dx.
Approximate the value of I using trapezoidal and Simpsons one-third method. Also obtain
the error bounds and compare with exact value of the integral.
(2) Evaluate
Z 1
dx
I=
2
0 1+x
using trapezoidal and Simpsons rule with 4 and 6 subintervals. Compare with the exact value
of the integral.
12
(3) Compute
1
xp
dx
3
0 x + 10
for p = 0, 1 using trapezoidal and Simpsons rule with 3, 5 and 9 nodes.
(4) The length of the curve represented by a function y = f (x) on an interval [a, b] is given by the
integral
Z bp
I=
1 + [f 0 (x)]2 dx.
Z
Ip =
Use the trapezoidal rule and Simpsons rule with 4 and 8 subintervals to compute the length
of the curve y = tan1 (1 + x2 ), 0 x 2.
(5) Evaluate the integral
Z 1
2
ex cos x dx
1
by using the one and two point Gauss-Legendre formulas. Also obtain the bound for error for
one-point formula.
(6) Evaluate
Z 3
cos 2x
dx
2 1 + sin x
by using the two and three point Gauss-Legendre integration formulas.
(7) Determine the values of a, b, and c such that the formula
Z h
f (x)dx = h [af (0) + bf (h/3) + cf (h)]
0
is exact for polynomials of degree as high as possible. Also obtain the order of the truncation
error.
(8) Determine constants a, b, c, and d that will produce a quadrature formula
Z 1
f (x)dx = af (1) + bf (1) + cf 0 (1) + df 0 (1)
1
that has degree of precision 3.

(9) Compute by Gaussian quadrature
Z
0
ln(x + 1) dx
p
.
x(1 x)
(10) Evaluate
Z
I=
0
sin x dx
2+x
by subdividing the interval [0, 1] into two equal parts and then by using Gauss-Legendre twopoint formula.
(11) The equation
Z x
1
2
et /2 dt = 0.45
2
0
can be solved for x by applying Newtons method to the function
Z x
1
1
2
2
et /2 dt 0.45 & f 0 (x) = ex /2 .
f (x) =
2
2
0
Note that Newtons method would require the evaluation of f (xk ) at various xk which can be
estimated using a quadrature formula. Find a solution for f (x) = 0 with error no more than
105 using Newtons method starting with x0 = 0.5 and by means of the Composite Simpsons
rule.
13
(12) Determine the coefficients in the formula

Z 2h
x1/2 f (x)dx = (2h)1/2 [A0 f (0) + A1 f (h) + A2 f (2h)] + R
0
and calculate the remainder R, while f (3) (x) is constant.

(13) In a quadrature formula
Z 1
(a x)f (x)dx = A1 f (x1 ) + A0 f (0) + A1 f (x1 ) + E
1
the coefficients A1 , A0 , and A1 are functions of parameter a and x1 is a constant and the error
E is of the form Cf (k) (). Determine the values of all four parameters so that the error will
be of highest possible order. Also investigate if the order of the error is influenced by different
values of the parameter a.
Bibliography
[Atkinson]
[Jain]
K. Atkinson and W. Han. Elementary Numerical Analysis, John Willey and Sons, Third
edition, 2004.
M. K. Jain, S. R. K. Iyengar, and R. K. Jain. Numerical Methods for Scientific and
Engineering Computation, Sixth edition, New Age International Publishers, New Delhi,
2012.
NUMERICAL SOLUTIONS OF ORDINARY DIFFERENTIAL EQUATIONS
1. Introduction
In this chapter, we discuss the numerical methods for solving the ordinary differential equations of
initial-value problems (IVP) of the form
dy
= f (x, y), x R, y(x0 ) = y0
(1.1)
dx
where y is a function of x, f is function of x and y, and x0 is called the initial value. The numerical
values of y(x) on an interval containing x0 are to be determined.
We divide the domain [a, b] in to subintervals
a = x0 < x1 < < xN = b.
These points are called mesh points or grid points. Let equal spacing is h. The uniform mesh points
are given by xi = x0 + ih, i = 0, 1, 2, ... The set of points y0 , y1 , , yN are the numerical solution of
the initial-value problem (IVP).
2. Existence and Uniqueness of Solutions
Theorem 2.1. If f (x, y) is continuous on a region where
= {(x, y), |x x0 | a, |y y0 | b}
then the IVP (1.1) has a solution y(x) for |x x0 | min{a, Mb }, where M = max |f (x, y)|.
(x,y)
f
are continuous on a region then the IVP (1.1) has a unique
x
solution y(x) in the interval |x x0 | min{a, Mb }, where M = max |f (x, y)|.
Theorem 2.2. If f (x, y) and
(x,y)
Theorem 2.3. If f (x, y) is continuous in a x b, y , and

|f (x, y1 ) f (x, y2 )| L|y1 y2 |
for some positive constant L (which means f is Lipschitz continuous in y), then the IVP (1.1) has a
unique solution in the interval [a, b].
3. Numerical methods for IVP
3.1. Eulers Method: This is the one of the simplest method to solve the IVP. Consider the following
IVP
dy
= f (x, y), x R
dx
y(x0 ) = y0 .
dy
We can approximate the derivative
as following by assuming that all nodes xi are equally spaced
dx
with spacing h. Now
1
y 0 (x) [y(x + h) y(x)].
h
Apply this approximation to the given IVP at point x = xi gives
y 0 (xi ) = f (xi , yi )
Therefore
1
[y(xi+1 ) y(xi )] = f (xi , yi )
h
= y(xi+1 ) y(xi ) = hf (xi , yi )
1
NUMERICAL DIFFERENTIAL EQUATIONS
which gives
y(xi+1 ) = y(xi ) + hf (xi , yi ).
We can write
xi+1 = xi + h
yi+1 = yi + hf (xi , yi )
where yi = y(xi ). This is called Eulers method.
3.2. The Improved or Modified Eulers method. A better approximations of the slope is the
average of two slopes at points (xi , yi ) and (xi+1 , yi+1 ) and we can write the Eulers method as
h
yi+1 = yi + [f (xi , yi ) + f (xi+1 , yi+1

)].
2
This method falls in the category of Predictor-Corrector methods.
(0)
Predictor: y1 = y0 + hf (x0 , y0 )
(1)
(0)
Corrector: y1 = y0 + h2 [f (x0 , y0 ) + f (x1 , y1 )].
(i)
We repeat this process until two y1 becomes same.

Geometrically, the tangent line to the graph y(x) at xi has slope
y 0 (xi ) = f (xi , yi ).
If we use this tangent line to approximate the curve near the point (xi , y(xi )), the value of the tangent
line at x = xi+1 is given by the right side of the method.
Local truncation error and order of the method: Truncation error of the difference approximation is the difference of the exact solution y(xi+1 ) and the numerical solution yi+1 and is given
by
Ti+1 = y(xi+1 ) yi+1
= y(xi + h) [y(xi ) + hf (xi , yi )]
h2
= y(xi ) + hy 0 (xi ) + y 00 () [y(xi ) + hf (xi , yi )]
2
2
h 00
=
y ().
2
h2
M, xi < < xi+1 , M = max |y 00 (x)|.
|T |
2
If truncation error has term hp+1 , then order of the numerical method is p. For example, Eulers
method is a first-order method.
Example 1.
y 0 + 2y = 2 e4t , y(0) = 1
By taking step size 0.1, find y at t = 0.1 and 0.2 by Euler method.
Sol.
y 0 = 2y + 2 e4t = f (t, y), y(0) = 1
f (0, 1) = 2(1) + 2 1 = 1
By Euler method with step size h = 0.1,
t1 = t0 + h = 0 + 0.1 = 0.1
y1 = y0 + hf (0, 1) = 1 + 0.1(1) = 0.9
Therefore
y(0.1) = 0.9.
t2 = t0 + 2h = 0 + 2 0.1 = 0.2
y2 = y1 + hf (0.1, 0.9) = 0.9 + 0.1(2 0.9 + 2 e4(0.1) )
= 0.9 + 0.1(0.47032) = 0.852967
Therefore
y(0.2) = 0.852967.
Example 2. For the IVP y 0 = x +

using modified Eulers method.
Sol.
y0 = x +
y, y(0) = 1. Calculate y in the interval [0.0.6] with h = 0.2 by
y = f (x, y), x0 = 0, y0 = 1, h = 0.2, x1 = 0.2
Predictor
(0)
y1 = y0 + hf (x0 , y0 ) = 1 + 0.2(1) = 1.2

Corrector
(1)
(0)
(2)
(1)
(3)
(2)
y1 = y0 + h/2[f (x0 , y0 ) + f (x1 , y1 )] = 1 + (0.2/2)[1 + 1.2954] = 1.2295

y1 = y0 + h/2[f (x0 , y0 ) + f (x1 , y1 )] = 1 + (0.2/2)[1 + 1.3088] = 1.2309
y1 = y0 + h/2[f (x0 , y0 ) + f (x1 , y1 )] = 1 + (0.2/2)[1 + 1.3094] = 1.2309
(2)
(3)
y1 = y1 = y1 = y(0.2) = 1.2309
Now
y1 = 1.2309, h = 0.2, x1 = 0.2, x2 = 0.4
(0)
y2
= y1 + hf (x1 , y1 ) = 1.2309 + 0.2(1.3094) = 1.4927
(1)
(0)
(2)
(1)
(3)
(2)
y2 = y1 + h/2[f (x1 , y1 ) + f (x2 , y2 )] = 1.2309 + (0.2/2)[1.3094 + 1.6218] = 1.5240

y2 = y1 + h/2[f (x1 , y1 ) + f (x2 , y2 )] = 1.2309 + (0.2/2)[1.3094 + 1.6345] = 1.5253
y2 = y1 + h/2[f (x1 , y1 ) + f (x2 , y2 )] = 1.2309 + (0.2/2)[1.3094 + 1.6350] = 1.5253
Therefore
y(0.4) = 1.5253
Now
y2 = 1.5253, h = 0.2, x2 = 0.4, x3 = 0.6
(0)
y3 = y2 + hf (x2 , y2 ) = 1.5253 + 0.2(1.635) = 1.8523

(1)
(0)
y3 = y2 + h/2[f (x2 , y2 ) + f (x3 , y3 )] = 1.5253 + (0.2/2)[1.635 + 1.961] = 1.8849

(2)
(1)
(3)
(2)
y3 = y2 + h/2[f (x2 , y2 ) + f (x3 , y3 )] = 1.5253 + (0.2/2)[1.635 + 1.9729] = 1.8861

y3 = y2 + h/2[f (x2 , y2 ) + f (x3 , y3 )] = 1.5253 + (0.2/2)[1.635 + 1.9734] = 1.8861
Hence
y(0.6) = 1.8861.
4. Taylors Series method
Consider the one dimensional initial value problem
y 0 = f (x, y), y(x0 ) = y0
where f is a function of two variables x and y and (x0 , y0 ) is a known point on the solution curve.
If the existence of all higher order partial derivatives is assumed for y at some point x = xi , then by
Taylor series the value of y at any neighboring point xi + h can be written as
h3 000
h2 00
y (xi ) +
y (xi ) + + O(hp+1 )
2
3!
Since at xi , yi is known, y 0 at xi can be found by computing f (xi , yi ). Similarly higher derivatives
of y at xi can be computed by making use of the relation y 0 = f (x, y). Hence the value of y at any
neighboring point xi + h can be obtained by summing the above infinite series. If the series has been
terminated after the p-th derivative term then the approximated formula is called the Taylor series
approximation to y of order p and the error is of order p + 1.
y(xi + h) = y(xi ) + hy 0 (xi ) +
Example 3. Given the IVP y 0 = x2 y 1, y(0) = 1. By Taylor series method of order 4 with step size
0.1. Find y at x = 0.1 and x = 0.2.
Sol. Given IVP y 0 = x2 y 1, y 00 = 2xy + x2 y 0 , y 000 = 2y + 4xy 0 + x2 y 00 , y iv = 6y 0 + 6xy 00 + x2 y 000 .

y 0 (0) = 1, y 00 (0) = 0, y 000 (0) = 2, y (iv) (0) = 6.
The fourth-order Taylors formula is given by
y(xi + h) = y(xi ) + hy 0 (xi ) +
h2 00
h3
h4
y (xi ) + y 000 (xi ) + y iv (xi ) + O(h5 )
2
3!
4!
Therefore
y(0.1) = 1 + (0.1)(1) + 0 + (0.1)3 (2)/6 (0.1)4 (6)/24 = 0.900033
Similarly
y(0.2) = 0.80227.
4.1. Runge-Kutta Methods: This is the one of the most important method to solve the IVP (1.1).
If we apply Taylors Theorem directly then we require that the function have higher-order derivatives.
The class of Runge-Kutta methods does not involve higher-order derivatives which is the advantage of
this class. Eulers method is an example of the Runge-Kutta method of first-order.
Now we discuss the formulation of second-order Runge-Kutta method which is actually modified Eulers
method.
By Taylors Theorem
h2
y(x + h) = y(x) + hy 0 (x) + y 00 (x) + O(h3 )
2
By differentiating y(x), we have
y 0 = f (x, y) = f
d
y 00 =
f = fx + fy y 0
dx
d
d
y 000 = fxx + f fy + fy f
dx
dx
= fxx + f (fyx + fyy f ) + fy (fx + fy f )
Therefore
y(x + h) = y(x) + hy 0 (x) +
h2 00
y (x) + O(h3 )
2
h2
(fx + fy f ) + O(h3 )
2
h
h2
+ f + (fx + fy f ) + O(h3 )
2
2
h
+ (f + hfx + hfy f ) + O(h3 )
2
= y + hf +
h
f
2
h
= y+ f
2
Now apply Taylors Theorem on f to evaluate
= y+
f (x + h, y + hf ) = f (x, y) + hfx + h f fy + O(h2 )

= f + hfx + hf fy f (x + h, y + hf )
h
h
= y(x + h) = y + f + f (x + h, y + hf ) + O(h3 ).
2
2
Now at x = xi , define K1 = hf (xi , yi ), and K2 = hf (xi + h, yi + K1 ), therefore
1
y(xi + h) = yi + (K1 + K2 ).
2
This is called the second-order Runge-Kutta method which require two function evaluations at each
step.
Algorithm for second-order Runge-Kutta method:
for i = 0, 1, 2, .. do
xi+1 = xi + h = x0 + (i + 1)h
K1 = hf (xi , yi )
K2 = hf (xi+1 , yi + K1 )
1
yi+1 = yi + (K1 + K2 )
2
end for
Third-order Runge-Kutta method:
1
yi+1 = yi + (K1 + 4K2 + K3 )
6
where
K1 = hf (xi , yi )
K2 = hf (xi + h/2, yi + K1 /2)
K3 = hf (xi + h, yi K1 + 2K2 )
and xi = x0 + ih. Fourth-order Runge-Kutta method:
1
yi+1 = yi + (K1 + 2K2 + 2K3 + K4 ) + O(h5 )
6
where
K1
K2
K3
K4
=
=
=
=
hf (xi , yi )
hf (xi + h/2, yi + K1 /2)
hf (xi + h/2, yi + K2 /2)
hf (xi + h, yi + K3 ).
Algorithm for fourth-order Runge-Kutta method:

for i = 0, 1, 2, .. do
xi+1/2 = xi + h/2
xi+1 = xi + h = x0 + (i + 1)h
K1 = hf (xi , yi )
K2 = hf (xi+1/2 , yi + K1 /2)
K3 = hf (xi+1/2 , yi + K2 /2)
K4 = hf (xi+1 , yi + K3 )
1
yi+1 = yi + (K1 + 2K2 + 2K3 + K4 )
6
end for
Local truncation error in the Runge-Kutta method is the error that arises in each step because of
the truncated Taylor series. This error is inevitable. The fourth-order Runge-Kutta involves a local
truncation error of O(h5 ).
Example 4. Using Runge-Kutta fourth-order, solve
Sol.
y 2 x2
dy
= 2
with y0 = 1 at x = 0.2 and 0.4.
dx
y + x2
y 2 x2
, x0 = 0, y0 = 1, h = 0.2
y 2 + x2
K1 = hf (x0 , y0 ) = 0.2f (0, 1) = 0.200
h
K1
K2 = hf (x0 + , y0 +
) = 0.2f (0.1, 1.1) = 0.19672
2
2
K2
h
K3 = hf (x0 + , y0 +
) = 0.2f (0.1, 1.09836) = 0.1967
2
2
K4 = hf (x0 + h, y0 + K3 ) = 0.2f (0.2, 1.1967) = 0.1891
1
y1 = y0 + (K1 + 2K2 + 2K3 + K4 ) = 1 + 0.19599 = 1.196
6
Therefore y(0.2) = 1.196
Now x1 = x0 + h = 0.2
K1 = hf (x1 , y1 ) = 0.1891
f (x, y) =
h
K1
, y1 +
) = 0.2f (0.3, 1.2906) = 0.1795
2
2
h
K2
K3 = hf (x1 + , y1 +
) = 0.2f (0.3, 1.2858) = 0.1793
2
2
K4 = hf (x1 + h, y1 + K3 ) = 0.2f (0.4, 1.3753) = 0.1688
1
y2 = y(0.4) = y1 + (K1 + 2K2 + 2K3 + K4 ) = 1.196 + 0.1792 = 1.3752.
6
K2 = hf (x1 +
5. Numerical solution of system and second-order equations

We can apply the Euler and Runge-Kutta methods to find the numerical solution of system of
differential equations. Second-order equations can be changed in to system of differential equations.
The application of numerical methods are explained in the following examples.
Example 5. Solve the following system
dx
= 3x 2y
dt
dy
= 5x 4y
dt
x(0) = 3, y(0) = 6.
Find solution by Euler method at 0.1 and 0.2 by taking time increment 0.1.
Sol. Given t0 = 0, x0 = 3, y0 = 6, h = 0.1.
Write f (x, y) = 3x 2y, g(x, y) = 5x 4y.
By Euler method
x1 = x(0.1) = x0 + hf (x0 , y0 ) = 3 + 0.1(3 3 2 6) = 2.7
y1 = y(0.1) = y0 + hg(x0 , y0 ) = 6 + 0.1(5 3 4 6) = 5.1.
Similarly
x2 = x(0.2) = x1 + hf (x1 , y1 ) = 2.7 + 0.1(3 2.7 2 5.1) = 2.49
y2 = y(0.2) = y1 + hg(x1 , y1 ) = 5.1 + 0.1(5 2.7 4 5.1) = 4.41.
Example 6. Solve the following system
dy
dz
= 1 + xz,
= xy
dx
dx
for x = 0.3 by using fourth-order Runge-Kutta method. Given y(0) = 0, z(0) = 1.
Sol. Given
dy
dz
= 1 + xz = f (x, y, z),
= xy = g(x, y, z)
dx
dx
x0 = 0, y0 = 0, z0 = 1, h = 0.3
K1 = hf (x0 , y0 , z0 ) = 0.3f (0, 0, 1) = 0.3
L1 = hg(x0 , y0 , z0 ) = 0.3g(0, 0, 1) = 0
h
K1
L1
K2 = hf (x0 + , y0 +
, z0 +
) = 0.3f (0.15, 0.15, 1) = 0.346
2
2
2
h
K1
L1
L2 = hg(x0 + , y0 +
, z0 +
) = 0.00675
2
2
2
h
K2
L2
K3 = hf (x0 + , y0 +
, z0 +
) = 0.34385
2
2
2
h
K2
L2
L3 = hg(x0 + , y0 +
, z0 +
) = 0.007762
2
2
2
K4 = hf (x0 + h, y0 + K3 , z0 + L3 ) = 0.3893
L4 = hg(x0 + h, y0 + K3 , z0 + L3 ) = 0.03104
Hence
1
y1 = y(0.3) = y0 + (K1 + 2K2 + 2K3 + K4 ) = 0.34483
6
1
z1 = z(0.3) = z0 + (L1 + 2L2 + 2L3 + L4 ) = 0.9899
6
Example 7. Solve by using fourth-order Runge-Kutta method for x = 0.2.
2
dy
d2 y
=x
y 2 , y(0) = 1, y 0 (0) = 0.
2
dx
dx
Sol. Let
dy
= z = f (x, y, z)
dx
Therefore
dz
= xz 2 y 2 = g(x, y, z)
dx
Now x0 = 0, y0 = 1, z0 = 0, h = 0.2
K1 = hf (x0 , y0 , z0 ) = 0.0
L1 = hg(x0 , y0 , z0 ) = 0.2
K1
L1
h
, z0 +
) = 0.02
K2 = hf (x0 + , y0 +
2
2
2
h
K1
L1
L2 = hg(x0 + , y0 +
, z0 +
) = 0.1998
2
2
2
h
K2
L2
K3 = hf (x0 + , y0 +
, z0 +
) = 0.02
2
2
2
h
K2
L2
L3 = hg(x0 + , y0 +
, z0 +
) = 0.1958
2
2
2
K4 = hf (x0 + h, y0 + K3 , z0 + L3 ) = 0.0392
L4 = hg(x0 + h, y0 + K3 , z0 + L3 ) = 0.1905
Hence
1
y1 = y(0.2) = y0 + (K1 + 2K2 + 2K3 + K4 ) = 0.9801
6
1
0
z1 = y (0.3) = z0 + (L1 + 2L2 + 2L3 + L4 ) = 0.1970
6
Exercises
(1) Consider the IVP

y 0 = x(y + x) 2, y(0) = 2.
Use the Euler method with stepsize h = 0.2 to compute y(0.6) with four decimals.
(2) Use modified Eulers method to find y(0.2) and y(0.4) with h = 0.2 for IVP
y 0 = y + ex , y(0) = 0.
(3) Solve the initial value problem
dy
y + x2 2
=
, y(0) = 2
dx
x+1
by explicit Euler method with step size h = 0.2 for interval [0, 1].
(4) Solve the following initial value problem with step size h = 0.1 and 0.2.
y 0 = tet y, y(0) = 1
by explicit Euler method in the interval [0, 1].
(5) Solve the following differential equation by second-order Runge-Kutta method
y 0 = y + 2 cos t, y(0) = 1.
Compute y(0.2), y(0.4), and y(0.6).
(6) Compute solutions to the following problems with a second-order Taylor method. Use step size
h = 0.2.
(A)
y 0 = (cos y)2 , 0 x 1, y(0) = 0.
(B)
20
y0 =
, 0 x 1, y(0) = 1.
1 + 19ex/4
(7) Using Runge-Kutta fourth-order method to solve the IVP at x = 0.8 for
dy
= x + y, y(0.4) = 0.41
dx
with step length h = 0.2.
(8) Use the Runge-Kutta fourth-order method to solve the following IVP
y 0 = xz + 1, y(0) = 0,
z 0 = xy, z(0) = 1
with h = 0.1 and 0 x 0.2.
(9) Apply the Taylors method of order three to obtain approximate value of y at x = 0.2 for the
differential equation
y 0 = 2y + 3ex , y0 = 0.
Compare the numerical solution with the exact solution.
(10) Use Runge-Kutta method of order four to solve
2
y 00 = xy 0 y 2 , y(0) = 1, y 0 (0) = 0
for x = 0.2 with stepsize 0.2.
(11) Consider the Lotka-Volterra system
du
= 2u uv, u(0) = 1.5
dt
dv
= 9v + 3uv, v(0) = 1.5.
dt
Use Eulers method with step size 0.5 to approximate the solution at t = 2.
(12) The following system represent a much simplified model of nerve cells
dx
= x + y x3 , x(0) = 0.5
dt
dy
x
= , y(0) = 0.1
dt
2
where x(t) represents voltage across the boundary of nerve cell and y(t) is the permeability of
the cell wall at time t. Solve this system using Runge-Kutta fourth-order method to generate
the profile up to t = 0.2 with step size 0.1.
Bibliography
[Atkinson]
[Jain]
K. Atkinson and W. Han, Elementary Numerical Analysis, John Willey and Sons, Third
edition, 2004.
M. K. Jain, S. R. K. Iyengar, and R. K. Jain, Numerical Methods for Scientific and
Engineering Computation, Sixth edition, New Age International Publishers, New Delhi,
2012.

Num Math

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Num Math

Загружено:

Авторское право:

Доступные форматы

CHAPTER 1 (4 LECTURES)

NUMERICAL ALGORITHMS AND ROUNDOFF ERRORS

NUMERICAL ALGORITHMS AND ERRORS

0.00234 = 0.234 102 .

NUMERICAL ALGORITHMS AND ERRORS

4. Errors in numerical approximations

Now since each ai 1,

Rounding errors: For rounding

For an+1 /2,

NUMERICAL ALGORITHMS AND ERRORS

Since an+1 /2, therefore

Therefore, for both cases

NUMERICAL ALGORITHMS AND ERRORS

6. Rules for mathematical operations

NUMERICAL ALGORITHMS AND ERRORS

Therefore maximum relative and absolute errors are given by

Error in division of numbers. Let X =

Therefore relative error

and absolute error

Example 2. Add the following floating-point numbers 0.4546e3 and 0.5433e7.

NUMERICAL ALGORITHMS AND ERRORS

NUMERICAL ALGORITHMS AND ERRORS

on a machine with 8 digits, there is no loss of precision.

Hence roots are given by

NUMERICAL ALGORITHMS AND ERRORS

The relative error in x2 becomes 0.19.

For example, if f (x) =

relative change in output

NUMERICAL ALGORITHMS AND ERRORS

In this case f3 (x3 ) = 1/(x2 + x3 ) giving a condition for the subproblem of

NUMERICAL ALGORITHMS AND ERRORS

then we say that {xn } converges to .

ROOTS OF NON-LINEAR EQUATIONS

Table 1. Iterations in bisection method

ROOTS OF NON-LINEAR EQUATIONS

Since f is continuous function, therefore

The bisection method ensures that

log(4 2.5) log(103 )

ROOTS OF NON-LINEAR EQUATIONS

3. Iteration method based on first degree equation

Figure 1. Secant method

ROOTS OF NON-LINEAR EQUATIONS

Sol. Let f (x) = cos x x ex = 0.

By expanding f ( + k ) and f ( + k1 ) in Taylor series, we obtain the error equation

Making one index down, we obtain k = Cpk1 or k1 = C 1/p k .

ROOTS OF NON-LINEAR EQUATIONS

Comparing the powers of k on both sides, we get

f (x0 ) f 0 (x0 )x0

Figure 2. Newton method

ROOTS OF NON-LINEAR EQUATIONS

Sol. This number satisfies the equation f (x) = 0 where f (x) = x2 2 = 0.

Start with x0 = 2.0, we obtain

ROOTS OF NON-LINEAR EQUATIONS

Sol. Let f (x) = x 2 sin x.

Let x0 = 1.1. The next six estimates, to 3 decimal places, are:

3.4. Convergence Analysis.

ROOTS OF NON-LINEAR EQUATIONS

By expanding f ( + k ) and f 0 ( + k ) in Taylor series, we obtain the error equation

ROOTS OF NON-LINEAR EQUATIONS

k+1 = k (1 e/m) + O(2k ).

ROOTS OF NON-LINEAR EQUATIONS

x3k 7x2k + 16xk 12

Start with x0 = 1.0, we obtain