Вы находитесь на странице: 1из 28

Root Finding for Nonlinear Equations

S. Natesan
Department of Mathematics
Indian Institute of Technology Guwahati,
Guwahati - 781 039, India.
email: natesan@iitg.ernet.in
Contents
1 Introduction 1
2 The Bisection Method 4
3 Newtons Method 5
4 The Secant Method 7
5 M ullers Method 8
6 Fixed-Point Iteration Methods 10
7 Fixed-Point Iteration (Conte & De Boor) 14
8 Convergence Acceleration for Fixed-Point Iteration 18
9 Numerical Evaluation of Multiple Roots 21
9.1 Newtons Method and Multiple Roots . . . . . . . . . . . . . . . . . . . . . . . . . . 21
10 Roots of Polynomials 22
11 Systems of Nonlinear Equations 23
11.1 Fixed-Point Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
11.2 Newtons Method for Nonlinear Systems . . . . . . . . . . . . . . . . . . . . . . . . . 27
1 Introduction
Finding one or more roots of an equation
f(x) = 0 (1.1)
is one of the more commonly occurring problems of applied mathematics. In most cases explicit
solutions are not available and we must be satised with being able to nd a root to any specied
degree of accuracy. The numerical methods for nding the roots are called iterative methods.
A second major problem is that of nding one or more roots of a polynomial equation
p(x) a
0
+ a
1
x + a
2
x
2
+ a
n
x
n
= 0, a
n
= 0. (1.2)
The methods of the rst problem are often specialized to deal with (1.2).
The third class of problems is the solution of nonlinear systems of equations. These systems
are very diverse in form, and associated numerical analysis is both extensive and sophisticated.
1
x
x
0
x
1
1/a
(x
0
, f(x
0
))
y = a
1
x
y = a
y
Figure 1: Iterative solution of a (1/x) = 0.
We begin with iterative methods for solving (1.1) when f(x) is any continuously dierentiable
real-valued function of a real variable x. The iterative methods for this quite general class of
equations will require knowledge of one or more initial guesses x
0
for the desired root of f(x).
An initial guess x
0
can usually be found by using the context in which the problem rst arose;
otherwise, a simple graph of y = f(x) will often suce for estimating x
0
.
Consider the following problem
f(x) a
1
x
= 0, a > 0. (1.3)
Let x = 1/a be an approximate solution of the equation. At the point (x
0
, f(x
0
)) draw the tangent
line to the graph of y = f(x) see Figure 1. Let x
1
be the point at which the tangent line intersects
the x-axis. It should be an improved approximation of the root .
To obtain an equation for x
1
, match the slopes obtained from the tangent line and the derivative
of f(x) at x
0
f

(x
0
) =
f(x
0
) 0
x
0
x
1
.
Substituting from (1.3) and manipulating, we obtain
x
1
= x
0
(2 ax
0
).
The general iteration formula is then obtained by repeating the process, with x
1
replacing x
0
, and
repeating this, we get
x
n+1
= x
n
(2 ax
n
), n 0 (1.4)
A form more convenient for theoretical purposes is obtained by introducing the scaled residual
r
n
= 1 ax
n
(1.5)
2
Using it,
x
n+1
= x
n
(1 r
n
), n 0 (1.6)
For the error,
e
n
=
1
a
x
n
=
r
n
a
. (1.7)
We will analyze the convergence of this method, its speed, and its dependence on x
0
. First,
r
n+1
= 1 ax
n+1
= 1 ax
n
(1 + r
n
) = 1 (1 r
n
)(1 + r
n
)
i.e.,
r
n+1
= r
2
n
(1.8)
Inductively,
r
n
= r
2
n
0
, n 0 (1.9)
From (1.7), the error e
n
converges to zero as n if and only if, r
n
converges to zero. From
(1.9), r
n
converges to zero if and only if |r
0
| < 1, or equivalently,
1 < 1 ax
0
< 1
0 < x
0
<
2
a
(1.10)
In order that x
n
1/a, it is necessary and sucient that x
0
be chosen to satisfy (1.10).
To examine the speed of convergence when (1.10) is satised, we obtain formulas for the error
and relative error. For the speed of convergence when (1.10) is satised,
e
n+1
=
r
n+1
a
=
r
2
n
a
=
e
2
n
a
2
a
e
n+1
= ae
2
n
, (1.11)
e
n+1
1/a
= e
2
n
a
2
=
_
e
n
1/a
_
2
Rel(x
n+1
) = Rel(x
n
)
2
, n 0 (1.12)
Here, Rel(x
n
) denotes the relative error in x
n
. Based on eqn. (1.11), we say that e
n
converges to
zero quadratically. To illustrate how rapidly the error will decrease, suppose that Rel(x
0
) = 0.1.
Then Rel(x
4
) = 10e 16. Each iteration doubles the number of signicant digits.
This example illustrates the construction of an iterative method for solving a nonlinear algebraic
equation; a complete convergence analysis has been given. This analysis includes a proof of con-
vergence, a determination of the interval of convergence for the choice of x
0
, and a determination
of the speed of convergence.
Denition 1.1 A sequence of iterates {x
n
: n 0} is said to converge with order p 1 to a
point if
| x
n+1
| c| x
n
|
p
, n 0 (1.13)
for some c > 0. If p = 1, the sequence is said to converge linearly to . In that case, we
require c < 1; the constant c is called the rate of linear convergence of x
n
to .
3
2 The Bisection Method
Assume that f(x) is continuous on an interval [a, b] and that it also satises
f(a)f(b) < 0. (2.1)
Using the intermediate-value theorem, the function f(x) must have at least one root in [a, b].
Usually, [a, b] is chosen to contain only one root , but the following algorithm for the bisection
method will always converge to some root in [a, b] because of (2.1).
Algorithm. Bisect (f, a, b, root, )
1. Dene c := (a + b)/2.
2. If b c , then accept root := c, and exit.
3. If sign(f(b)) sign(f(c)) 0, then a := c; otherwise b := c.
4. Return to Step 1.
The interval [a, b] is halved in size for every pass through the algorithm. Because of Step 3,
[a, b] will always contain a root of f(x). Since a root is in [a, b], it must lie within either [a, c] or
[c, b]; and consequently
|c | b c = c a.
This is justication for the test in Step 2. On completion of the algorithm, c will be an approxi-
mation to the root with
|c | .
Example 2.1 Find the largest real root of
f(x) x
6
x 1 = 0.
(Here the real exact value = 1.13472413840152.)
It is straightforward to show that 1 < < 2, and we will use this as our initial interval [a, b].
Here, we take = 5e 05. The answer is c
15
= 1.13474, which is an approximation to with
| c
15
| 4e 05. ( c
15
= 0.000016.)
To examine the speed of convergence, let c
n
denote the nth value of c in the algorithm. Then,
it is easy to see that
= lim
n
c
n
.
i.e.,
| c
n
|
_
1
2
_
n
(b a), (2.2)
where (b a) denotes the length of the original interval input into Bisect. Using the variant (2.2)
for dening linear convergence, we say that the bisection method converges linearly with a rate of
1/2. The actual error may not decrease by a factor of 1/2 at each step, but the average rate of
decrease is 1/2, based on (2.2).
There are several deciencies in the algorithm Bisect. First, it does not take account of the
limits of machine precision.
4
A practical program would take account of the unit round on the machine, adjusting the
given if necessary.
It converges very slowly when compared with other methods.
The major advantages of the bisection method are
It is guaranteed to converge
A reasonable error bound is available.
Methods that at every step give upper and lower bounds on the root are called enclosure
methods.
3 Newtons Method
Assume that an initial estimate x
0
is known for the desired root of f(x) = 0. Newtons method
will produce a sequence of iterates {x
n
: n 1}, which we hope will converge to . Since x
0
is
assumed close to , approximate the graph of y = f(x) in the vicinity of its root by constructing
its tangent line at (x
0
, f(x
0
)). Then, use the root of this tangent to approximate ; call this new
approximation as x
1
. Repeat this process, ad innitum, to obtain a sequence of iterates x
n
. As
with the example (1.3), this leads to the iteration formula
x
n+1
= x
n

f(x
n
)
f

(x
n
)
, n 0 (3.1)
The process is illustrated in Figure 2, for the iterates x
1
and x
2
.
x
y
x
0
x
1
x
2

y = f(x)
Figure 2: Newtons method.
Newtons method is the best known procedure for nding the roots of an equation. It has been
generalized in many ways for the solution of other, more dicult nonlinear problems, for instance,
systems of nonlinear equations.
5
As another approach to (3.1), we use a Taylor series development. Expanding f(x) about x
n
,
f(x) = f(x
n
) + (x x
n
)f

(x
n
) +
(x x
n
)
2
2
f

(), x
n
< < x.
Letting x = and using f() = 0, we solve for to obtain
= x
n

f(x
n
)
f

(x
n
)

( x
n
)
2
2
f

(
n
)
f

(x
n
)
, x
n
<
n
< .
We can drop the error term to obtain a better approximation to than x
n
, and we recognize this
approximation as x
n+1
from (3.1). Then
x
n+1
= ( x
n
)
2

(
n
)
2f

(x
n
)
, n 0. (3.2)
By this formula we can show that Newtons method has a quadratic rate of convergence.
Theorem 3.1 Assume that f(x), f

(x), f

(x) are continuous for all x in some neighborhood of ,


and assume that f() = 0, f

() = 0. Then, if x
0
is chosen suciently close to , then the iterates
x
n
, n 0, of (3.1) will converge to . Moreover,
lim
n
x
n+1
( x
n
)
2
=
f

()
2f

()
(3.3)
providing that the iterates have an order of convergence p = 2.
Proof. Consider a suciently small interval I = [ , + ] on which f

(x) = 0 (this exists by


continuity of f

(x)), and then let


M =
max
xI
|f

(x)|
2 min
xI
|f

(x)|
.
From (3.2),
| x
1
| M| x
0
|
2
M| x
1
| (M| x
0
|)
2
Pick | x
0
| and M| x
0
| < 1. Then, M| x
1
| < 1, and M| x
1
| M| x
0
|, which
says that | x
1
| . We can apply the same argument to x
1
, x
2
, , inductively, showing that
| x
n
| and M| x
n
| < 1, n 1.
To show the convergence, use (3.2) to give
| x
n+1
| M| x
n
|
2
M| x
n+1
| (M| x
n
|)
2
(3.4)
and inductively,
M| x
n
| (M| x
0
|)
2
n
| x
n
|
1
M
(M| x
0
|)
2
n
(3.5)
Since M| x
0
| < 1, this shows that x
n
as n .
In formula (3.2), the unknown point
n
is between x
n
and , implying that
n
, as n .
Thus
lim
n
x
n+1
( x
n
)
2
= lim
n
f

(
n
)
2f

(x
n
)
=
f

()
2f

()
.
6
4 The Secant Method
As with Newtons method the graph of y = f(x) is approximated by a straight line in the vicinity of
the root . In this case, assume that x
0
and x
1
are two initial estimates of the root . Approximate
the graph of y = f(x) by the secant line determined by (x
0
, f(x
0
)) and (x
1
, f(x
1
)). Let its root
denoted by x
2
; we hope it will be an improved approximation of . This is illustrated in Figure 3.
x
y
x
0
x
1
x
2

y = f(x)
Figure 3: Secant method.
Using the slope formula with the secant line, we have
f(x
1
) f(x
0
)
x
1
x
0
=
f(x
1
) 0
x
1
x
2
Solving
x
2
= x
1
f(x
1
)
x
1
x
0
f(x
1
) f(x
0
)
Using x
1
and x
2
, repeat this process to obtain x
3
, etc. The general formula based on this is
x
n+1
= x
n
f(x
n
)
x
n
x
n1
f(x
n
) f(x
n1
)
, n 1 (4.1)
This is the secant method. AS with Newtons method, it is not guaranteed to converge, but it does
converge, the speed is usually greater than that of the bisection method.
Error analysis. Multiply both sides of (4.1) by -1 and then add to both sides, obtaining
x
n+1
= x
n
+ f(x
n
)
x
n
x
n1
f(x
n
) f(x
n1
)
.
The RHS can be manipulated algebraically to obtain the formula
x
n+1
= ( x
n1
)( x
n
)
f[x
n1
, x
n
, ]
f[x
n1
, x
n
]
. (4.2)
7
The quantities f[x
n1
, x
n
] and f[x
n1
, x
n
, ] are rst- and second-order Newton divided dierences
dened by
f[x
n1
, x
n
] =
f(x
n
) f(x
n1
)
x
n
x
n1
, f[x
n1
, x
n
, ] =
f[x
n
, ] f[x
n1
, x
n
]
x
n1
.
Using, the following formula
f[x
0
, x
1
] = f

(), f[x
0
, x
1
, x
2
] =
1
2
f

(),
where is between x
0
and x
1
, and between the minimum and maximum of x
0
, x
1
and x
2
, (4.2)
becomes
x
n+1
= ( x
n1
)( x
n
)
f

(
n
)
2f

n
)
, (4.3)
where
n
between x
n1
, and
n
between x
n1
, x
n
and . Using this error formula, we can examine
the following convergence of the secant method.
Theorem 4.1 Assume that f(x), f

(x), f

(x) are continuous for all values of x in some interval


containing , and assume f

() = 0. Then, if the initial guesses x


0
and x
1
are chosen suciently
close to , the iterates x
n
of (4.1) will converge to . The order of convergence will be p =
(1 +

5)/2 = 1.62.
Comparison between Newtons and Secant Methods: Newtons method and the secant
method are closely related. If the approximation
f

(x
n
) =
f(x
n
) f(x
n1
)
x
n
x
n1
is used in the Newtons formula (3.1), we obtain the secant formula (4.1). The conditions for
convergence are almost the same, and the error formulae are similar. Nonetheless, there are two
major dierences:
1. Newtons method requires two function evaluations per iterate, that of f(x
n
) and f

(x
n
),
whereas the secant method requires only one function evaluation per iterate, that of f(x
n
)
[provided the needed function value f(x
n1
) is retained from the last iteration]. Therefore,
Newtons method is generally, more expensive per iteration.
2. Newtons method converges more rapidly [order p = 2 vs. the secant methods p = 1.62], and
consequently it will require fewer iterations to attain a given desired accuracy.
5 M ullers Method
M ullers method is useful for obtaining both real and complex roots of a function, and it is reason-
ably straightforward to implement in a computer.
M ullers method is a generalization of the approach that led to the secant method. Given
three points x
0
, x
1
, x
2
, a quadratic polynomial is constructed that passes through the three points
(x
i
, f(x
i
)), i = 0, 1, 2; one of the roots of this polynomial is used as an improved estimate for a root
of f(x).
8
The quadratic polynomial is given by
p(x) = f(x
2
) + (x x
2
)f[x
2
, x
1
] + (x x
2
)(x x
1
)f[x
2
, x
1
, x
0
]. (5.1)
To check that
p(x
i
) = f(x
i
), i = 0, 1, 2
just substitute x
i
into (5.1) and then reduce the resulting expression using the divided dierences.
There are other formulas for p(x) are available, but the above form is the most convenient form
for dening M ullers method. The formula (5.1) is called Newtons divided dierence form of the
interpolation polynomial.
To nd the zeros of (5.1) we rst rewrite it in the more convenient form
y = f(x
2
) + w(x x
2
) + f[x
2
, x
1
, x
0
](x x
2
)
2
w = f[x
2
, x
1
] + (x
2
x
1
)f[x
2
, x
1
, x
0
]
= f[x
2
, x
1
] + f[x
2
, x
0
] f[x
0
, x
1
].
We want to nd the smallest value of xx
2
that satises the equation y = 0, thus nding the root
of (5.1) that is closest to x
2
. The solution is
x x
2
=
w
_
w
2
4f(x
2
)f[x
2
, x
1
, x
0
]
2f[x
2
, x
1
, x
0
]
with the sign chosen to make the numerator as small as possible. Because of the loss-of-signicance
errors implicit in this formula, we rationalize the numerator to obtain the new iteration formula
x
3
= x
2

2f(x
2
)
w
_
w
2
4f(x
2
)f[x
2
, x
1
, x
0
]
(5.2)
with the sign chosen to maximize the magnitude of the denominator.
Repeat (5.2) recursively to dene a sequence of iterates {x
n
: n 0}. If they converge to a
point , and if f

() = 0, then is a root of f(x). To see this,


w f

(), as n
=
2f()
f

()
_
[f

()]
2
2f()f

()
showing that the RHS fraction must be zero. Since f

() = 0 by assumption, the method of choosing


the sign in the denominator implies that the denominator is nonzero. Then the numerator must
be zero, showing f() = 0). The assumption f

() = 0 will say that is a simple root.


By an argument similar to that used for the secant method, it can be shown that
lim
n
| x
n+1
|
| x
n
|
p
=

f
(3)
()
6f

()

(p1)/2
p
.
= 1.84 (5.3)
provided f(x) C
3
(I) , I is the neighborhood of , and f

() = 0. The order p is the positive root


of
x
3
x
2
x 1 = 0.
With the secant method, real choices of x
0
and x
1
lead to a real value of x
2
. But with M ullers
method, real choices of x
0
, x
1
, x
2
can and do lead to complex roots of f(x). This is an important
aspect of M ullers method.
9
6 Fixed-Point Iteration Methods
We now consider solving an equation x = g(x) for a root by the iteration
x
n+1
= g(x
n
), n 0 (6.1)
with an initial guess x
0
to . The Newton method ts in the pattern with
g(x) x
f(x)
f

(x)
(6.2)
Each solution of x = g(x) is called a xed point of g. Although we are interested in solving an
equation f(x) = 0, there are many ways this can be reformulated as a xed-point problem.
Example 6.1 Consider the equation x
2
a = 0, for some a > 0.
This equation can be reformulated to xed point problem in the following ways:
(1). x = x
2
+ x a, or more generally, x = x + c(x
2
a) for some c = 0.
(2). x =
a
x
.
(3). x =
1
2
_
x +
a
x
_
.
We give a numerical example with a = 3, x
0
= 2 and =

3 = 1.732051. The results are given in


Table 4.
It is natural to ask what makes the various iterative schemes behave in the way they do in
this example. We will develop a general theory to explain this behavior and aid in analyzing new
iterative methods.
Table 4. Iteration results for x^2 - 3 = 0.
-----------------------------------------------------
n case(1) case(2) case(3)
x_n x_n x_n
-----------------------------------------------------
0 2.0000e+000 2.0000e+000 2.0000e+000
1.0000e+000 3.0000e+000 1.5000e+000 1.7500e+000
2.0000e+000 9.0000e+000 2.0000e+000 1.7321e+000
3.0000e+000 8.7000e+001 1.5000e+000 1.7321e+000
4.0000e+000 7.6530e+003 2.0000e+000 1.7321e+000
----------------------------------------------------
Lemma 6.2 Let g(x) be continuous in the interval [a, b], and assume that a g(x) b for every
x [a, b, ]. (We say that g sends [a, b] into [a, b], and denote it by g([a, b]) [a, b].) Then, x = g(x)
has at least one solution in [a, b].
Proof. Consider the continuous function g(x) x. At x = a, it is positive, and at x = b it is
negative. Thus, by the intermediate value theorem, it must have a root in the interval [a, b]. In
Figure 4, the roots are the intersection points of y = x and y = g(x).
10
a
a
b
x
y = g(x)
y
b
y = x
Figure 4: Plot of the function y = g(x).
Lemma 6.3 Let g(x) be continuous in the interval [a, b], and assume that g([a, b]) [a, b]. Fur-
thermore, assume that there is a constant 0 < < 1, with
|g(x) g(y)| |x y|, x, y, [a, b]. (6.3)
Then, x = g(x) has a unique solution in [a, b]. Also, the iterates
x
n
= g(x
n1
), n 1
will converge to for any choice of x
0
in [a, b], and
| x
n
|

n
1
|x
1
x
0
|. (6.4)
Proof. Suppose x g(x) has two solutions and in [a, b]. Then,
| | = |g() g()| | |,
i.e.,
(1 )| | 0.
Since 0 < < 1, this implies that = . Also, we know by Lemma 6.2 that there is at least one
root in [a, b].
To examine the convergence of the iterates x
n
, rst note that they all remain in [a, b]. To see
this, note that the result
x
n
[a, b] implies x
n+1
= g(x
n
) [a, b]
can be used with mathematical induction to prove x
n
[a, b] for all n. For the convergence,
| x
n+1
| = |g() g(x
n
)| | x
n
|, (6.5)
and by induction,
| x
n
|
n
| x
0
|, n 0. (6.6)
11
As n ,
n
0; thus x
n
.
To prove the bound (6.4), begin with
| x
0
| | x
1
| +|x
1
x
0
| | x
0
| +|x
1
x
0
|
where the last step used (6.5). Then solving for | x
0
|, we have
| x
0
|
1
1
|x
1
x
0
|, n 0. (6.7)
Combining this with (6.6) will complete the proof.
The bound (6.5) shows that the sequence {x
n
} is linearly convergent, with the rate of conver-
gence bounded by . Also, from the proof, we can devise a possibly more accurate error bound
than (6.4). Repeating the argument that led to (6.7), we obtain
| x
n
|
1
1
|x
n+1
x
n
|.
Further, applying (6.5) yields the bound
| x
n+1
|

1
|x
n+1
x
n
|, n 0. (6.8)
When is computable, this furnishes a practical bound in most situations.
If g(x) is dierentiable in [a, b], then
g(x) g(y) = g

()(x y), between x and y


for all x, y [a, b]. Dene
= max
x[a,b]
|g

(x)|.
Then,
|g(x) g(y)| |x y|, x, y, [a, b].
Theorem 6.4 Assume that g(x) is continuously dierentiable in [a, b], that g([a, b]) [a, b], and
that
= max
x[a,b]
|g

(x)| < 1 (6.9)


Then,
(i). x = g(x) has a unique solution in [a, b].
(ii). For any choice of x
0
in [a, b] with x
n+1
= g(x
n
), n 0,
lim
n
x
n
= .
(iii).
| x
n
|
n
| x
0
|

n
1
|x
1
x
0
|
and
lim
n
x
n+1
x
n
= g

(). (6.10)
12
Proof. Every result comes from the preceding lemmas, except for the rare of convergence (6.10).
For it, use
x
n+1
= g() g(x
n
) = g

(
n
)( x
n
), n 0 (6.11)
with
n
an unknown point between and x
n
. Since x
n
, we must have
n
, and thus
lim
n
x
n+1
x
n
= lim
n
g

(
n
) = g

().
If g

() = 0, then the sequence {x


n
} converges to with order exactly p = 1, linear convergence.
To see the importance of the assumption (6.9) on the size of g

(x), suppose that |g

()| > 1.
Then, if we had a sequence of iterates x
n+1
= g(x
n
) and a root = g(), we have (6.11). If x
n
becomes suciently close to , then |g

(
n
)| > 1 and the error | x
n+1
| will be greater than
| x
n
|. Thus, convergence is not possible if |g

()| > 1. We graphically portray the computation


of the iterates in four cases, see Figures 5 and 6.
y = x
y = g(x)
y = x
x
0
x
1
x
2
x
x
y
y
y = g(x)
x
0
x
1
x
2
x
3

Figure 5: Convergent sequences: 0 < g

() < 1 and 1 < g

() < 0.
y = x
y = x y = g(x)
y = g(x)
x
0
x
0
x
1
x
1
x
2
x
2
Figure 6: nonconvergent sequences: g

() > 1 and g

() < 1.
Theorem 6.5 Assume is a solution of x = g(x), and suppose that g(x) is continuously dieren-
tiable in some neighboring interval about with |g

()| < 1. Then, the results of Theorem 6.4 are


still true, provided x
0
is chosen suciently close to .
Proof. Pick a number satisfying |g

()| < < 1. Then, pick an interval I = [ , + ] with


max
xI
|g

(x)| < 1
13
We have g(I) I, since | x| implies
| g(x)| = |g() g(x)| = |g

()| | x| | x| .
Now, apply the preceding theorem using [a, b] = [ , + ].
Now, we can verify the condition for Example 6.1. Calculate g

(alpha):
(i). g(x) = x
2
+ x 3, g

() = g

3) = 2

3 + 1 > 1.
(ii). g(x) =
3
x
, g

3) =
3
(

3)
2
= 1.
(iii). g(x) =
1
2
_
x +
3
x
_
, g

(x) =
1
2
_
1
3
x
2
_
, g

3) = 0.
7 Fixed-Point Iteration (Conte & De Boor)
We know that xed-point iteration method is a possible method for obtaining a root of the equation
f(x) = 0. (7.1)
In this method, one derives from (7.1) an equation of the form
x = g(x) (7.2)
so that any solution of (7.2), i.e., and xed point of g(x), is a solution of (7.1). For instance,
f(x) = x
2
x 2 (7.3)
then among possible choices for g(x) are the following:
(1). g(x) = x
2
2
(2). g(x) =

2 + x
(3). g(x) = 1 +
2
x
(3). g(x) = x
x
2
x 2
m
, for some nonzero constant m.
Each such g(x) is called an iteration function for solving (7.1) [with f(x) given by (7.1)]. Once an
iteration function is chosen, one carries out the following algorithm.
Algorithm: Fixed-point iteration Given an iteration function g(x) and a starting point x
0
.
For n = 0, 1, , until satised, do:
Calculate x
n+1
= g(x
n
)
For this algorithm to be useful, we must prove that:
(i) For the given initial guess x
0
, we can calculate successively x
1
, x
2
, .
14
(ii) The sequence {x
n
} converges to some point .
(iii) The limit is xed-point of g(x), i.e., = g().
The real-valued function
g(x) =

x
shows that (i) is not a trivial requirement. For in this case, g(x) is dened only for x 0. Starting
with any x
0
> 0, we get x
1
= g(x
0
) < 0; hence we cannot calculate x
2
. Therefore, we need the
following assumption.
Assumption 1. There is an interval I = [a, b] such that, for all x I, g(x) is dened and
g(x) I; i.e., the function g(x) maps I into itself.
It follows from this assumption that, be induction on n, that if x
0
I, then for all n, x
n
I;
hence x
n+1
= g(x
n
) is dened and is in I.
To satisfy (iii), we need the continuity of g(x), let the sequence {x
n
} as n , then
= lim
n
x
n+1
= lim
n
g(x
n
) = g
_
lim
n
x
n
_
= g().
Assumption 2. The iteration function g(x) is continuous on I = [a, b].
Lemma 7.1 Let Assumptions 1 and 2 hold true. Then the xed-point problem (7.2) has a xed-
point in I = [a, b].
Proof. If we have either g(a) = a or g(b) = b, then it is true. Otherwise, we have g(a) = a and
g(b) = b. But, by assumption 1, both g(a) and g(b) are in I; hence g(a) > a and g(b) < b. This
implies that the function h(x) = g(x) x satises h(a) > 0, h(b) < 0. Since h(x) is continuous on
I, by assumption 2. By intermediate-value theorem for continuous functions, h(x) must vanish in
I. Thus, g(x) has a xed point in I.
For the discussion of (ii) concerning convergence, it is instructive to carry out the iteration
graphically. This can be done as follows. Since x
n
= g(x
n1
) the point {x
n1
, x
n
} lies on the graph
of g(x). To locate {x
n
, x
n+1
} from {x
n1
, x
n
}, draw the straight line through {x
n1
, x
n
} parallel
to the x axis. This line intersects the line y = x at the point {x
n
, x
n
}. Through this point, draw
the straight line parallel to the y axis. This line intersects the graph y = g(x) of g(x) at the point
x
n
, g(x
n
)). But since g(x
n
) = x
n+1
, this is the desired point {x
n
, x
n+1
}. In Figures 7 and 8, we
have carried out the rst few steps of xed-point iteration for four typical cases. Note that is a
xed point of g(x) if and only if y = g(x) and y = x intersect at {, }.
As Figures 7-8 show that the xed-point iteration may well fail to converge, as it does in Figures
7 (a) and 8 (d). Whether or not the iteration converges [given that g(x) has a xed point] seems to
depend on the slope of g(x). If the slope of g(x) is too large in absolute value, near a xed point
of g(x), then we cannot hope for convergence to that xed point. We therefore make the following
assumption.
Assumption 3. The iteration function is dierentiable on I = [a, b]. Further, there exists a
nonnegative constant K < 1 such that
|g

(x)| K, x I.
Note that Assumption 3 implies Assumption 2, since a dierentiable function is, in particular,
continuous.
15
x
y
y = g(x)
y = x
x
0
x
4

x
4
x
0
x
y = g(x)
y = x
y

(a) (b)
Figure 7: Fixed-point iterations.
_0
x x x
1
x
2
y = g(x)
y = x
y
x
3

x
0
x
1
x
2
x
3 x
x
y = x
y = g(x)
(a) (b)
Figure 8: Fixed-point iterations.
Theorem 7.2 Let g(x) be an iterative function satisfying Assumptions 1 and 3. Then, g(x) has
exactly one xed point I, and starting with any initial approximation x
0
inI, the sequence {x
n
}
generated by xed-point iteration Algorithm converges to .
Proof. To prove this theorem, we have already proved the existence of the xed point of g(x)
in I. Let the sequence {x
n
} be generated by the xed-point iteration Algorithm. Denote the error
in the nth iterate by
e
n
= x
n
, n = 0, 1, .
Then, since = g() and x
n
= g(x
n1
), we have
e
n
= x
n
= g() g(x
n1
) = g

(
n
)e
n1
, (7.4)
for some
n
between and x
n1
, by mean-value theorem for derivatives. Hence, by Assumption 3,
|e
n
| K|e
n1
|.
16
It follows by induction on n that
|e
n
| K|e
n1
| K
2
|e
n2
| K
n
|e
0
|.
Since 0 K < 1, we have lim
n
K
n
= 0; therefore
lim
n
|e
n
| = lim
n
K
n
|e
0
| = 0
regardless of the initial error e
0
. But his says that {x
n
} converges to . It also proves that is the
only xed point of g(x) in I. For if, assume is another xed point of g(x) in I, then with x
0
= ,
we should have x
1
= g(x
0
) = ; hence |e
0
| = |e
1
| K|e
0
|. Since K < 1, this then implies |e
0
| = 0,
or = . This completes the proof.
Corollary 7.3 If g(x) is continuously dierentiable in some open interval containing the xed point
, and if |g

()| < 1, then there exists an > 0 so that xed-point iteration with g(x) converges
whenever |x
0
| .
Proof. Indeed, since g

(x) is continuous near and |g

()| < 1, there exists for any K with |g

()|
K for every |x | . Fix one such K with its corresponding . Then, for I = [ , + ],
Assumption 3 is satised. As to Assumption 1, let x be any point in I, thus |x | . Then, as
in the proof of 7.2,
g(x) = g(x) g() = g

()(x )
for some point between x and , hence in I. But then
|g(x) | |g

()||x | K <
showing that g(x) is in I if x I. This veries Assumption 1, and the conclusion now follows from
Theorem 7.2.
Because of the corollary, a xed point of g(x) for which |g

()| < 1, is often called a point of


attraction [for the iteration with g(x)].
We consider again the quadratic function f(x) = x
2
x 2. The zeros of this function are 2
and -1. Suppose that we wish to calculate the root = 2 by xed-point iteration. If we use the
iteration function g(x) = x
2
2, then for x > 1/2, we have g

(x) > 1. It follows that Assumption


3 is not satised for any interval containing = 2; that is, = 2 is not a point of attraction. In
fact, one can prove that, starting at any point x
0
, the sequence {x
n
} generated by this xed-point
iteration will converge to = 2 only if, for some n
0
, x
n
= 2 for all n n
0
; that is, = 2 is hit
accidentally.
On the other hand, if we choose g(x) =

2 + x, then
g

(x) =
1
2

2 + x
.
Now x 0 implies g(x) 0 and 0 g

(x) 1/

8 < 1, while for example, x 7 implies that


g(x) =

2 + x

2 + 7 = 3. Hence, with I = [0, 7] both Assumptions 1, and 3 are satised, and


any x
0
[0, 7] leads, therefore, to a convergent sequence. Indeed, if we take x
0
= 0, then
x
1
=

2 = 1.41421
x
2
=

3.41421 = 1.84775
x
3
=

3.84775 = 1.96157
x
4
=

3.96157 = 1.99036
x
5
=

3.99036 = 1.99759
17
which clearly converges to the root = 2.
Consider the more realistic example of the following transcendental equation
f(x) = x 2 sin(x) = 0.
The most natural rearrangement here is
x = 2 sin(x)
so that g(x) = 2 sin(x). As examination of the curves y = g(x) and y = x shows that there is a
root between /3 and 2/3. Further,
if

3
x
2
3
, then

3 g(x) 2.
Hence, if /3 a

3 and 2 b 2/3, then Assumption 1 is satised. Finally, g

(x) = 2 cos(x)
strictly decreases from 1 to -1 as increases from /3 to 2/3. It follows that Assumption 3 is satised
whenever /3 < a

3, 2 b < 2/3. In conclusion, xed-point iteration with g(x)2 sin(x)


converges to the unique solution of the equation in [/3, 2/3] whenever x
0
(/3, 2/3).
8 Convergence Acceleration for Fixed-Point Iteration
Here, we investigate the rate of convergence of xed-point iteration and show how information
about the rate of convergence can be used at times to accelerate convergence.
We assume that the iteration function g(x) is continuously dierentiable and that, starting with
some point x
0
, the sequence {x
n
} generated by xed-point converges to some point . This is then
a xed-point of g(x), and we have that
e
n+1
= x
n+1
= g

(
n
)e
n
(8.1)
for some
n
between and x
n
, n = 1, 2, . Since lim
n
x
n
= , it then follows that lim
n

n
=
; hence
lim
n
g

(
n
) = g

()
g

(x) being continuous, y assumption. Consequently,


e
n+1
= g

()e
n
+
n
e
n
(8.2)
where lim
n

n
= 0. Hence, if g

() = 0, then for large enough n,


e
n+1
g

()e
n
(8.3)
i.e., the error e
n+1
in the (n + 1)st iterate depends (more or less) linearly on the error e
n
in the
nth iterate. We therefore, say that {x
n
} converges linearly to .
Now, note that we can solve (8.1) for . For
x
n+1
= g

(
n
)( x
n
) (8.4)
gives
(1 g

(
n
)) = x
n+1
g

(
n
)x
n
= [1 g

(
n
)]x
n+1
+ g

(
n
)(x
n+1
x
n
)
18
Therefore,
= x
n+1
+
g

(
n
)(x
n+1
x
n
)
1 g

(
n
)
= x
n+1
+
(x
n+1
x
n
)
g

(
n
)
1
1
. (8.5)
Of course, we do not know the number g

(
n
). But, we know that the ratio
r
n
:=
(x
n
x
n1
)
(x
n+1
x
n
)
=
(x
n
x
n1
)
g(x
n
) g(x
n1
)
= g

(
n
)
1
(8.6)
for some
n
between x
n
and x
n1
, by the mean-value theorem for derivatives. For large enough n,
therefore, we have
r
n
=
1
g

(
n
)

1
g

()

1
g

(
n
)
and then the point
x
n
:= x
n+1
+
(x
n+1
x
n
)
r
n
1
with r
n
=
(x
n
x
n1
)
(x
n+1
x
n
)
(8.7)
should be a very much better approximation to than x
n
or x
n+1
.
This can also be seen graphically. In eect we obtained (8.7) by solving (8.4) for after replacing
g

(
n
) by the number g[x
n1
, x
n
] and calling the solution x
n
. Thus, x
n
x
n+1
= g[x
n1
, x
n
]( x
n
x
n
).
Since x
n+1
= g(x
n
), this shows that x
n
is xed point of the straight line
s(x) = g(x
n
) + g[x
n1
, x
n
](x x
n
)
This we recognize as the linear interpolant to g(x) at x
n1
, x
n
. If now the slope of g(x) varies
between x
n1
and , that is, if g(x) is approximately a straight line between x
n1
and , then the
secant s(x) should be a very good approximation to g(x) in that interval; hence the xed point x
n
of the secant should be very good approximation to the xed point of g(x), see Figure 9.
x
y y = x
y = g(x)
y = s(x)
x
n
x
n1
x
n+1
x
n
y
y = x
y = g(x)
y = s(x)
x
n1
x x
n+1
x
n
x
n
(a) (b)
Figure 9: Fixed-point iterations.
In practice, we will not be able to prove that any particular x
n
is close enough to to make
x
n
a better approximation to than is x
n
or x
n+1
. But we can test the hypothesis that x
n
is close
enough by checking the ratios r
n1
, r
n
. If the ratios are approximately constant, we accept the
hypothesis that the slope of g(x) varies little in the interval of interest; hence we believe that the
19
secant s(x) is a good enough approximation to g(x) to make x
n
a very much better approximation
to than is x
n
. In particular, we then accept | x x
n
| as a good estimate for the error |e
n
|.
Whether or not any particular x
n
is a better approximation to than is x
n
, one can prove that
the sequence { x
n
} converges faster to than does the original sequence {x
n
}; that is,
x
n
= + o(e
n
). (8.8)
This process of deriving from a linearly converging sequences {x
n
} a faster converging sequence
{ x
n
} by (8.7) is usually called Aitkens
2
process. Using the abbreviations
x
k
= x
k+1
x
k
,
2
x
k
= (x
k
) = x
k+1
x
k
(8.7) can be expressed in the form
x
n
= x
n+1

(x
n
)
2

2
x
n1
(8.9)
therefore the name
2
process. This process is applicable to any linearly convergent sequence,
whether generated by xed-point iteration or not.
Algorithm: Aitkens
2
process Given a sequence {x
n
} converging to , calculate the sequence
{ x
n
} by (8.9).
If the sequence {x
n
} converges linearly to , that is, if
x
n+1
= K( x
n
) + o( x
n
), for some K = 0
then
x
n
= + o( x
n
).
Furthermore, if starting from a certain k on the sequence x
k1
/x
k
, x
k
/x
k+1
, of dierence
ratios is approximately constant, then x
n
can be assumed to be a better approximation to than
is x
k
. In particular, | x
n
x
n
| is then a good estimate for the error | x
k
|.
If, in the case of xed-point iteration, we decide that a certain x
k
is a very much better approx-
imation to than x
k
, then it is certainly wasteful to continue generating X
k+1
, x
k+2
, etc. It seems
more reasonable to start xed-point iteration afresh with x
k
as the initial guess. This leads to the
following algorithm.
Algorithm: Steensen iteration Given the iteration function g(x) and a point y
0
:
For n = 0, 1, 2, until satised, do:
x
0
:= y
n
Calculate x
1
= g(x), x
2
= g(x
1
)
Calculate d = x
1
, r = x
0
/d
Calculate y
n+1
= x
2
+ d/(r 1)
One step of this algorithm consists of two steps of xed-point iteration followed by one appli-
cation of (8.7), using the three iterates available to get the starting value for the next step.
20
9 Numerical Evaluation of Multiple Roots
We say that the function f(x) has a root of multiplicity p > 1, if
f(x) = (x )
p
h(x), (9.1)
with h() = 0 and h(x) is continuous at x = . We restrict p to be a positive integer, although
some of the following is equally valid for non-integral values. If h(x) is suciently dierentiable at
x = , then (9.1) is equivalent to
f() = f

() = = f
(p1)
() = 0, f
(p)
() = 0. (9.2)
When nding a root of any function on a computer, there is always an interval of uncertainty
about the root, and this is made worse when the root is multiple. To see this, consider evaluating
the two functions f
1
(x) = x
2
3, and f
2
(x) = 9 + x
2
(x
2
6). Then, =

3 has multiplicity
one as a root of f
1
and multiplicity two as a root os f
2
. Using four-digit arithmetic, f
1
(x) < 0 for
x 1.731, f
1
(1.732) = 0, and f
1
(x) > 0 for x > 1.733. But f
2
(x) = 0 for 1.726 x 1.738, thus
limiting the amount of accuracy that can be attained in nding a root od f
2
(x).
9.1 Newtons Method and Multiple Roots
Earlier root nding methods will not perform as before in the case of multiple roots. Consider, the
Newtons method as xed-point method, as in (6.2), with f(x) as given in (9.1):
x
n+1
= g(x
n
), g(x) = x
f(x)
f

(x)
, x = .
Before calculating g

(), we rst simplify g(x) using (9.1):


f

(x) = (x )
p
h

(x) + p(x )
p1
h(x)
g(x) = x
(x )h(x)
ph(x) + (x )h

(x)
Dierentiating,
g

(x) = 1
h(x)
ph(x) + (x )h

(x)
(x )
d
dx
_
h(x)
ph(x) + (x )h

(x)
_
and
g

() = 1
1
p
= 0, for p > 1. (9.3)
Thus, Newtons method is a linear method with rate of convergence (p 1)/p.
To improve Newtons method, we would like a function g(x) for which g

() = 0. Based on the
derivation of (9.3), dene
g(x) = x p
f(x)
f

(x)
.
Then easily, g

() = 0; thus
x
n+1
= g() g(x
n
) = g

()(x
n
)
1
2
(x
n
)
2
g

(
n
)
21
with
n
between x
n
and . Thus,
x
n+1
=
1
2
(x
n
)
2
g

(
n
)
showing that the method
x
n+1
= x
n
p
f(x
n
)
f

(x
n
)
, n 0 (9.4)
her order of convergence two, the same as the original Newton method for simple roots.
10 Roots of Polynomials
Consider the polynomial equation
p(x) a
0
+ a
1
x + + a
n
x
n
= 0, a
n
= 0 (10.1)
Nested multiplication: A very ecient way to evaluate the polynomial p(x) given in (10.1) is
to use nested multiplication:
p(x) = a
0
+ x(a
1
+ x(a
2
+ + x(a
n1
+ a
n
x) )). a
n
= 0 (10.2)
With formula (10.1), there are n additions and 2n 1 multiplications, and with (10.2) there are n
additions and n multiplications, a considerable saving.
It is convenient to introduce the following auxiliary coecients. Let b
n
= a
n
,
b
k
= a
k
+ zb
k+1
, k = n 1, n 2, , 0 (10.3)
By considering (10.3), it is easy to see that
p(z) = b
0
(10.4)
Introduce the polynomial
q(x) = b
1
+ b
2
x + + b
n
x
n1
. (10.5)
Then,
b
0
+ (x z)q(x) = b
0
+ (x z)[b
1
+ b
2
x + + b
n
x
n1
]
= (b
0
b
1
z) + (b
1
b
2
z)x + + (b
n1
b
n
z)x
n1
+ b
n
x
n
= a
0
+ a
1
x + + a
n
x
n
= p(x)
p(x) = b
0
+ (x z)q(x), (10.6)
where q(x) is the quotient and b
0
the remainder when p(x) is divided by (x z). The use of (10.3)
to evaluate p(z) and to form the quotient polynomial q(x) is also called Horners method.
If z is a root of p(x), then b
0
= 0 and p(x) = (x z)q(x). To nd additional roots of p(x), we
can restrict our search to the roots of q(x). This reduction process is called deation.
Newtons Method: If we want to apply Newtons method to nd a root of p(x), we must be able
to evaluate both p(x) and p

(x) at any point z. From (10.6),


p

(x) = (x z)q

(x) + q(x),
22
i.e.,
p

(z) = q(z). (10.7)


We use (10.5) and (10.7) in the following adaption of Newtons method to polynomial root nding.
Algorithm Polynew(a, n, x
0
, , itmax, root, b, ier)
1. Remark: a is a vector of coecients, itmax the maximum number of iterates to be computed,
b the vector of coecients for the deated polynomial, and ier an error indicator.
2. itnum := 1
3. z := x
0
, b
n
:= c := a
n
4. For k = n 1, , 1, b
k
:= a
k
+ zb
k+1
, c := b
k
+ zc.
5. b
0
:= a
0
+ zb
1
.
6. If c = 0, ier := 2 and exit.
7. x
1
:= x
0
b
0
/c.
8. If |x
1
x
0
| , then ier := 0, root : = x
1
, and exit.
9. If itnum = itmax, then ier := 1 and exit.
10. Otherwise, itnum := itnum + 1, x
0
:= x
1
, and go to Step 3.
11 Systems of Nonlinear Equations
Here, we consider some numerical methods to solve systems of nonlinear equations. These problems
are widespread in applications, for instance, one encounter systems of nonlinear algebraic equations
when solving nonlinear dierential equations.
Consider the following two equations:
f
1
(x
1
, x
2
) = 0, f
2
(x
1
, x
2
) = 0. (11.1)
The generalizations to n equations in n variables should be straightforward once the principal ideas
have been grasped. Rewrite the equations (11.1) in vector notation:
f (x) = 0, x =
_
x
1
x
2
_
, f (x) =
_
f
1
(x
1
, x
2
)
f
2
(x
1
, x
2
)
_
. (11.2)
The solution of (11.1) can be looked upon as a two-step process:
(1). Find the zero curves in the x
1
x
2
-plane of the surfaces z = f
1
(x
1
, x
2
) and z = f
2
(x
1
, x
2
).
(2). Find the points of intersection of these zero curves in the x
1
x
2
-plane.
This perspective will be used to generalize the Newtons method to solve the system (11.1).
23
11.1 Fixed-Point Theory
We generalize some of the xed-point theory to the system (11.1). Assume that the root nding
problem (11.1) has been reformulated in an equivalent form as
x
1
= g
1
(x
1
, x
2
), x
2
= g
2
(x
1
, x
2
). (11.3)
Denote its solution by
=
_

1

2
_
.
We study the xed-point iteration
x
1,n+1
= g
1
(x
1,n
, x
2,n
), x
2,n+1
= g
2
(x
1,n
, x
2,n
). (11.4)
Using vector notation, we rewrite this as
x
n+1
= g(x
n
), (11.5)
with
x
n
=
_
x
1,n
x
2,n
_
, g(x) =
_
g
1
(x
1
, x
2
)
g
2
(x
1
, x
2
)
_
.
To analyze the convergence of (11.5), begin by subtracting the two equations in (11.4) from the
corresponding equations

1
= g
1
(
1
,
2
),
2
= g
2
(
1
,
2
)
involving the exact solution . Apply the mean-value theorem for functions of two variables to
these dierences to obtain

i
x
i,n+1
=
g
i
(
(i)
1,n
,
(i)
2,n
)
x
1
(
1
x
1,n
) +
g
i
(
(i)
1,n
,
(i)
2,n
)
x
2
(
2
x
2,n
), i = 1, 2.
The point
(i)
n
= (
(i)
1,n
,
(i)
2,n
) are on the line segment joining and x
n
. In matrix form, these
equations become
_

1
x
1,n+1

2
x
2,n+1
_
=
_

_
g
1
(
(1)
n
)
x
1
g
1
(
(1)
n
)
x
2
g
2
(
(2)
n
)
x
1
g
2
(
(2)
n
)
x
2
_

_
_

1
x
1,n

2
x
2,n
_
(11.6)
Let G
n
denote the matrix (11.6). Then, we can rewrite this equation as
x
n+1
= G
n
( x
n
) (11.7)
It is convenient to introduce the Jacobian matrix for the functions g
1
and g
2
:
G(x) =
_

_
g
1
(x)
x
1
g
1
(x)
x
2
g
2
(x)
x
1
g
2
(x)
x
2
_

_
(11.8)
In (11.7), if x
n
is close to , then G
n
will be close to G(). This will make the norm of G(). The
matrix G() play a crucial role of g

() in the single equation xed-point theory .


24
Theorem 11.1 Let D be a closed, bounded, and convex set in the plane. Assume that the compo-
nents of g(x) are continuously dierentiable at all points of D, and further assume that
g(D) D, (11.9)
max
xD
||G(x)||

< 1 (11.10)
Then, we have the following:
(a) x = g(x) has a unique solution D.
(b) For any initial point x
0
D, the iteration (11.5) will converge to D.
(c)
|| x
n+1
||

(||G()||

+
n
)|| x
n
||

(11.11)
with
n
0 as n .
Proof. The existence of a xed point can be shown by proving that the sequence of iterates x
n
from (11.3) are convergent in D.
Suppose and are both xed points of g(x) in D. Then
= g() g() (11.12)
Apply the mean value theorem to component i, obtaining
g
i
() g
i
() = g
i
(
(i)
)( ), i = 1, 2 (11.13)
with
g
i
(x) =
_
g
i
x
1
g
i
x
2
_
and
(i)
D, on the line segment joining and . Since ||G(x)||

< 1, we have from the


denition of the norm that

g
i
(x)
x
1

g
i
(x)
x
2

< 1, x D, i = 1, 2.
Combining this with (11.13)
|g
i
() g
i
()| || ||

||g() g()||

|| ||

(11.14)
Combining with (11.12), this yields
|| ||

|| ||

which is possible only if = , showing the uniqueness of D.


(b). Condition (11.9) will ensure that all x
n
D if x
0
D. Next subtract x
n+1
= g(x
n
) from
= g(), obtaining
x
n+1
= g() g(x
n
)
25
The result (11.14) applies to any two points in D. Applying this
|| x
n+1
||

|| x
n
||

(11.15)
Inductively,
|| x
n
||


n
|| x
0
||

. (11.16)
Since < 1, this shows that x
n
as n .
(c). From (11.7), we have
|| x
n+1
||

||G
n
||

|| x
n
||

(11.17)
As n , the points
(i)
n
used in evaluating G
n
will all tend to , since they are on the line
segment joining x
n
and . Then, ||G
n
||

||G()||

as n . Result (11.11) follows from


(11.17) by letting
n
= ||G
n
||

||G()||

.
Corollary 11.2 Let be a xed point of g(x), and assume that the components of g(x) are con-
tinuously dierentiable in some neighborhood about . Further, assume that
||G()||

< 1. (11.18)
Then, for x
0
chosen suciently close to , the iteration x
n+1
= g(x
n
) will converge to , and the
results of Theorem 11.1 will be valid on some closed, bounded, convex region about .
Suppose that A is constant nonsingular matrix of order 2 2. We can then reformulate (11.1)
as
x = x + Af (x) g(x) (11.19)
To see the requirement on A, we produce the Jacobian matrix. Easily,
G(x) = I + AF(x)
where F(x) is the Jacobian matrix of f
1
and f
2
,
F(x) =
_

_
f
1
(x)
x
1
f
1
(x)
x
2
f
2
(x)
x
1
f
2
(x)
x
2
_

_
(11.20)
We want to choose A so that (11.18) is satised. And for rapid convergence, we want ||G()||

.
= 0,
or
A
.
= F()
1
One can choose
A
.
= F(x
0
)
1
This suggests using a continual updating of A, say A = F(x
n
)
1
. The resulting method is
x
n+1
= x
n
F(x
n
)
1
f (x
n
), n 0. (11.21)
26
11.2 Newtons Method for Nonlinear Systems
As with Newtons method for a single equation, there is more than one way of viewing and deriv-
ing the Newton method for solving a system of nonlinear equations. We begin with an analytic
derivation, and then we give a geometrical perspective.
Apply Taylors theorem for functions of two variables to each of the equations f
i
(x
1
, x
2
) = 0,
expanding f
i
() about x
0
; for i = 1, 2
0 = f
i
() = f
i
(x
0
) + (
1
x
1,0
)
f
i
(x
0
)
x
1
+ (
2
x
2,0
)
f
i
(x
0
)
x
2
+
+
1
2
_
(
1
x
1,0
)

x
1
+ (
2
x
2,0
)

x
2
_
2
f
i
(
(i)
) (11.22)
with
(i)
on the line segment joining x
0
and . If we drop the second-order terms, we obtain the
approximation
0
.
= f
1
(x
0
) + (
1
x
1,0
)
f
1
(x
0
)
x
1
+ (
2
x
2,0
)
f
1
(x
0
)
x
2
0
.
= f
2
(x
0
) + (
1
x
1,0
)
f
2
(x
0
)
x
1
+ (
2
x
2,0
)
f
2
(x
0
)
x
2
(11.23)
In matrix form,
0
.
= f (x
0
) +F(x
0
)( x
0
) (11.24)
with F(x
0
) the Jacobian matrix of f, given in (11.20).
Solving for ,

.
= x
0
F(x
0
)
1
f (x
0
) x
1
.
The approximation x
1
should be an improvement on x
0
, provided x
0
is chosen suciently close to
. This leads to the iteration method rst obtained at the end of the last section,
x
n+1
= x
n
F(x
n
)
1
f (x
n
), n 0. (11.25)
This is Newtons method for solving the nonlinear system f (x) = 0.
In practice, we do not invert F(x
n
), particularly for systems having more than two equations.
Instead, we solve a linear system for a correction term to x
n
:
F(x
n
)
n+1
= f (x
n
)
x
n+1
= x
n
+
n+1
(11.26)
this is more ecient in computation time, requiring only about one-third as many operations as
inverting F(x
n
).
There is a geometrical derivation for Newtons method, in analogy with the tangent line ap-
proximation used with single nonlinear equations studied in Section 3. The graph in space of the
equation
z = f
i
(x
0
) + (x
1
x
1,0
)
f
i
(x
0
)
x
1
+ (x
2
x
2,0
)
f
i
(x
0
)
x
2
p
i
(x
1
, x
2
)
is a plane that is tangent to the graph of z = f
i
(x
1
, x
2
) at the point x
0
, i = 1, 2. If x
0
is near to ,
then these tangent planes should be good approximations to the associated surfaces of z = f
i
(x
1
, x
2
),
27
for x = (x
1
, x
2
) near . Then, the intersection of the zero curves of the tangent planes z = p
i
(x
1
, x
2
)
should be a good approximation to the corresponding intersection of the zero curves of the original
surfaces z = f
i
(x
1
, x
2
). This results in the statement (11.23). The intersection of the zero curves
of z = f
i
(x
1
, x
2
), i = 1, 2.
Convergence Analysis: For the convergence of Newtons method (11.25), regard it as a
xed-point iteration method with
g(x) = x
n
F(x
n
)
1
f (x). (11.27)
Also assume that
Determinanat of F() = 0
which is the analogue of assuming is a simple root when dealing with a single equation, as in
Theorem 3.1. It can then be shown that the Jacobian G(x) of (11.27) is zero at x = ; consequently
the condition (11.18) is easily satised.
Corollary 11.2 then implies that x
n
converges to , provided x
0
is chosen suciently close to
. In addition, it can be shown that the iteration is quadratic convergent, i.e.,
|| x
n+1
||

B|| x
n
||
2

, n 0,
for some constant B > 0.
28

Вам также может понравиться