Вы находитесь на странице: 1из 7

MACM 316

Assignment 1 Solutions

1.2 Finite Precision Arithmetic


1.2:6e Rounding Arithmetic
Use four-digit rounding arithmetic to perform the following calculation. Compute the absolute error and relative
error with the exact value determined to within at least 5 digits.

 !


13
6
13
f l 67
f l f l 14
f l(0.9286 0.8571)
14 7
= fl
(1)
fl
2e 5.4
f l(f l(2f l(e)) f l(5.4))
f l(f l(2(2.718)) 5.400)


f l(0.9286 0.8571)
(2)
= fl
f l(5.436 5.400)


0.0715
= fl
note that subtraction left us with f ewer than 4 digits
0.036
(3)
= 1.986

(4)

however repeating the same calculation 16 digit rounding arithmetic


13
14

76
1.953540139286012
2e 5.4

(5)

a = |1.953540139286012 1.986| 3.2 102

(6)

Then the absolute error is

and the relative error is


r =

|1.953540139286012 1.986|
1.7 102 = 1.7%
1.953540139286012

(7)

Note the choice to represent the error in two digits is somewhat arbitrary, but should be sufficient to give you an
idea of how accurate the approximation is.

1.2:14a Chopping Arithmetic and the Quadratic Formula


Use four-digit chopping arithmetic and the formulae of Example 5 to find the most accurate approximations to the
roots of the following quadratic equations. Compute the absolute and relative errors.
1
1 2 123
x
x+ =0
3
4
6

(8)

(9)

Applying the quadratic formula

x =

123
4

123
4
1
23

+ 4 13 61

but we want to avoid subtracting numbers of the same sign (or adding numbers of opposite sign) so we split this
up and modify the problematic case and then do the finite precision arithmetic.

x+ =

x =

123
4


123 2
4
2 13

4 13 61

(10)

123
4

4 13 61

(11)

123
4


123 2
4
2 13
2 16

123 2
4

(12)
4 13 61

MACM 316

Assignment 1 Solutions

Now doing the calculations in finite precision



f
l
fl

x+ x
+ = f l


fl fl
= fl

123
4

123
4




+ fl

+ fl

r
r

 
fl fl fl


123 2
4


123 2
4

f l 2f l

fl fl fl

f l 2f l

f l f l 4f l

1

1
3

f l f l 4f l

1

1
3




fl

fl

1
6

1
6





(13)





(14)


r  


2
f
l
30.75
+
f
l

f
l
(f
l
(4(0.3333))
(0.1666))
f
l
f
l
(30.75)

= fl

f l (2(0.3333))



p
f l (945.5 f l ((1.333)(0.1666)))
f l 30.75 + f l

= fl
0.6666
p


f l 30.75 + f l
f l (945.5 0.2220)

= fl
0.6666

 !
f l 30.75 + f l 945.2
= fl
0.6666


f l (30.75 + 30.74)
= fl
0.6666


61.49
= fl
0.6666

(16)

(17)

(18)
(19)
(20)

= 92.24

Similarly reusing some of our intermediate results from above,

x x
= f l

123
4

+ fl

f l (2(0.1666))
= fl
61.50


0.3332
= fl
61.49


fl fl

r

1
6



f l 2f l

2 
f l f l f l 123
f l f l 4f l
4


(21)

1
3

(15)



fl

1
6






(22)

(23)
(24)

= 0.005418

(25)

Repeating these calculations with 16 digit chopping arithmetic, the solution is


x+ = 92.24457962731231

x = 0.005420372687697272

(26)

The absolute errors are


a+ = |x+ x
+ | 4.6 103

a = |x x
| 2.4 106

(27)

and the relative errors are


r+ =

|x+ x
+ |
5.0 105 = 0.0050%
|x+ |

r =

|x x
|
4.4 104 = 0.044%
|x |

(28)

MACM 316

Assignment 1 Solutions

1.2:18 Finite Precision Taylor Series


We wish to approximate e5 using the 9th order Taylor polynomial.
e5

n
X
(5)i
i=0

i!

sumn

(29)

where sumn is defined by the recursion relation


sum1 = 0

(30)


sumn = f l sumn1 + f l

f l ((5) )
f l (n!)



(31)

Note that there are two approximations here. The first is the fact that we do not take the limit n . The error
associated with this approximation is called truncation error. This should get smaller as we add terms to the
series.
The second approximation is the fact that we use finite precision arithmetic to evaluate this truncated series.
The error associated with this approximation is called roundoff error.
Thus, when evaluate the relative error in sumi
ri =

|e5 sumi |
e5

(32)

we should expect contributions from both the roundoff and truncation error.
The table below shows the intermediate results in computing sum9 . Since ri increases as we add terms to the
series, we know our calculation is being dominated by roundoff error because the truncation error should be
decreasing as we add terms.



f l((5)i )
i
i
f l (5)
f l (i!) f l
sumi
ri
f l(i!)
0
1
2
3
4
5
6
7
8
9

1.00e + 00
5.00e + 00
2.50e + 01
1.25e + 02
6.25e + 02
3.12e + 03
1.56e + 04
7.81e + 04
3.90e + 05
1.95e + 06

1.00e + 00
1.00e + 00
2.00e + 00
6.00e + 00
2.40e + 01
1.20e + 02
7.20e + 02
5.04e + 03
4.03e + 04
3.62e + 05

1.00e + 00
5.00e + 00
1.25e + 01
2.08e + 01
2.60e + 01
2.60e + 01
2.16e + 01
1.55e + 01
9.67e + 00
5.38e + 00

1.00e + 00
4.00e + 00
8.50e + 00
1.23e + 01
1.37e + 01
1.23e + 01
9.30e + 00
6.20e + 00
3.47e + 00
1.91e + 00

1.5e + 02
5.9e + 02
1.3e + 03
1.8e + 03
2.0e + 03
1.8e + 03
1.4e + 03
9.2e + 02
5.1e + 02
2.8e + 02

The reason that this is happening is that the sign of sumi1 is always opposite the sign of f l

f l((5)i )
f l(i!)

so

Eq. (31) is adding two numbers of opposite sign at each step. As we know, this is prone to loss of precision.
The solution is to compute the series in a different way which avoids adding numbers of opposite sign. This can
be achieved by instead computing
e5 =
where sumi is defined by the recursion relation

1
1
Pn
5
e
i=0

5i
i!

1
sumi

(33)

sum1 = 0

(34)


sumi = f l sumi1 + f l

f l(5 )
i!



(35)

1
by this method. The first thing to notice is
The table below shows the intermediate results in computing sum
9
that there are no signs in this table. The second thing to notice is that ri decreases as we add terms suggesting
that truncation error is now more important

MACM 316

Assignment 1 Solutions

f l 5i

i
0
1
2
3
4
5
6
7
8
9

1.00e + 00
5.00e + 00
2.50e + 01
1.25e + 02
6.25e + 02
3.12e + 03
1.56e + 04
7.81e + 04
3.90e + 05
1.95e + 06

f l (i!)
1.00e + 00
1.00e + 00
2.00e + 00
6.00e + 00
2.40e + 01
1.20e + 02
7.20e + 02
5.04e + 03
4.03e + 04
3.62e + 05

fl

f l(5i )
f l(i!)

1.00e + 00
5.00e + 00
1.25e + 01
2.08e + 01
2.60e + 01
2.60e + 01
2.16e + 01
1.55e + 01
9.67e + 00
5.38e + 00

1
sumi

ri

1.00e + 00
1.67e 01
5.40e 02
2.54e 02
1.53e 02
1.10e 02
8.93e 03
7.87e 03
7.35e 03
7.09e 03

1.5e + 02
2.4e + 01
7.0e + 00
2.8e + 00
1.3e + 00
6.3e 01
3.3e 01
1.7e 01
9.1e 02
5.3e 02

In conclusion, we would have to say that the second method is a significant improvement over the first as the
relative errors generated by this method are a lot smaller. We also have the ability to further reduce these errors by
adding more terms to the series whereas it wasnt clear whether or not this led to appreciable improvements in the
first case.

1.3 Convergence
1.3:6b convergence as n
Find the rate of convergence of

lim sin

By making the substitution h =

1
n,

1
n2

=0

we have the related problem of finding the rate of convergence of



lim sin h2 = 0
h0

Thus we have a function

f (h) = sin h2

(36)

(37)

(38)

that trivially converges f (0) = 0.


We want to find the largest value of p such that for some constant K
|f (h) f (0)| = |f (h)| K |hp | for small h

(39)

We know from Taylors theorem (1.14) that


h2
f (c) for some c between 0 and h
2



h2
4 sin(c2 )c2 + 2 cos(c2 )
= sin(x2 ) + h 2x cos(x2 ) x=0 +
2

= h2 2 sin(c2 )c2 + cos(c2 )

f (h) = f (0) + hf (0) +

So



|f (h)| = h2 2 sin(c2 )c2 + cos(c2 )



h2 2c2 sin(c2 ) + cos(c2 )


h2 2c2 c2 + 1

h2 2h4 + 1
2

h (2 + 1) for h 1

= 3h

(40)
(41)
(42)

(43)
(44)
(45)
(46)
(47)
(48)

MACM 316

Assignment 1 Solutions

Which tells us that Eq. (54) holds for K = 3, p = 2 and by small h, we mean h 1.
To show that 2 is indeed the largest value of p for which we can make Eq. (54) hold, we simply expand the Taylor
series to one more order
h3
h2
(49)
f (h) = f (0) + hf (0) + f (h) + f (3) (c) for some c between 0 and h
2
6


h3
(50)
= 0 + h2 2 sin(x2 )x2 + cos(x2 ) x=0 + f (3) (c)
6
h3
= h2 + f (3) (c)
(51)
6
(52)
In other words, the h2 term does not vanish like the ones before it, so we cant do the same thing to show that
|f (h)| K|h3 |.
So transforming

|f (h) f (0)| = |f (h)| 3 h2
h1
(53)

back to our original problem Eq. 36 via h =

1
n

we have


 
   




sin 1 lim sin 1 = sin 1 3 1

n2



2
2
2
n
n
n
n

so the sequence converges like

1
n2

1n

(54)

or

sin

1
n2

=O

1
n2

(55)

1.3:7c convergence as h 0
Find the rate of convergence of

sin(h) h cos(h)
=0
(56)
h
We want to examine the behaviour of the function
sin(h) h cos(h)
(57)
f (h) =
h
near h = 0. But we need to be careful since we have h in the denominator, so we expand just the numerator in a
Taylor series.
The lowest order nontrivial Taylor series about h = 0 for sin and cos are
lim

h0

h3
cos(c1 ) for c1 between 0 and h
6
2
h
cos(h) = 1
cos(c2 ) for c2 between 0 and h
2
sin(h) = h

(58)
(59)

So we have



h h3 cos(c ) h 1 h2 cos(c )
1
2


6
2

|f (h) f (0)| = |f (h)| =

h




2
2


h
h
= 1
cos(c1 ) 1 +
cos(c2 )
6
2


1

1

= cos(c1 ) + cos(c2 ) h2
6
2



1
1




cos(c1 ) + cos(c2 ) h2
6
2
 
2

h2
3

(60)
(61)
(62)
(63)
(64)
(65)

MACM 316

Assignment 1 Solutions

So

sin(h) h cos(h)
= O h2
h

(66)

1.3:14 Orders of convergence


Make a table listing h, h2 , h3 and h4 for h = 0.5, 0.1.0.01, 0.001 and discuss the varying rates of convergence.
h
5.00e 01
1.00e 01
1.00e 02
1.00e 03

h2
2.50e 01
1.00e 02
1.00e 04
1.00e 06

h3
1.25e 01
1.00e 03
1.00e 06
1.00e 09

h4
6.25e 02
1.00e 04
1.00e 08
1.00e 12

Clearly the higher the power, the faster the convergence, (i.e., h > h2 > h3 > h4 ). For every order of magnitude by
which we decrease h, hp decreases by p orders of magnitude.
Now Suppose that 0 < q < p and that F (h) = L + O(hp ). Show that F (h) = L + O(hq ).
We know that
|hp | |hq | for q p, |h| 1

(67)

And statment that F (h) = L + O(hp ) means that


|F (h) L| K |hp | for small h

(68)

|F (h) L| K |hp | K |hq | for |h| 1


|F (h) L| K |hq |

(69)
(70)

F (h) = L + O (hq )

(71)

So we can say

Which translates back into

6.1 Linear Algebra


6.1:10 Singular matrices
Given the linear system

1
1

2
1

2 x = 3
2
1
1

(72)

For what values of does the system have no solutions and infinitely many solutions?
Finding the values of for which the determinant vanishes will only tell us if we have infinitely many or no
solutions but will not distinguish between the two cases. Furthermore part c asks us to solve the system for general

MACM 316

Assignment 1 Solutions

so well go ahead and do that first and see what values of might give us problems

2
1
1

1 1
2
1
2
3
1
0
1
7 0
2

1
1
0 1 + 1 2(1 + )
2

2
1 1

1
0
1
7 0
2
0
0 1 (1 + )

2
1 1
1 0
1
6= 1
7 0
1
0
0 1 1

1 1 0 2 1
1 0
1
7 0
1
0
0 1
1

1 0 0 1 1
1
7 0 1 0
1
0 0 1
1

1
1 0 0 1
1
= 0 1 0
1
0 0 1
1

(73)

(74)

(75)

(76)

(77)

(78)

So

x1 =
x2 = 1
x3 =

1
1

1
1

(79)
(80)
(81)

We can see in the third row of Eq (74) that = 1 will cause problems. If = 1 we have an inconsistent system
(no solutions):

1 1 1 2
0
1 0
1
(82)
0
0 0
2
If = 1, we have no constraint on x3 (infinitely many solutions):

1 1 1 2
0
1
0
1
0
0
0
0

(83)

Вам также может понравиться