You are on page 1of 164

INTRODUCTION TO STATISTICS

Applied Mathematics & Scientific Computation Group


Cranfield Defence and Security
Cranfield University
Defence Academy of the United Kingdom

Contents
1 Dealing with Uncertainty

1.1

Measurements and Counts . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2

Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.3

Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.4

Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.5

Why Statistics Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.6

Confidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.7

Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.8

Why Study Statistics? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Introduction to Statistics and Data

2.1

What is Statistics? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2

Collecting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.3

Organising, Analysing and Interpreting Data . . . . . . . . . . . . . . . . . .

2.4

Types of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.5

Graphical Summaries of Data . . . . . . . . . . . . . . . . . . . . . . . . . .

2.6

Numerical Summaries of Data: Location . . . . . . . . . . . . . . . . . . . .

13

2.7

Numerical Summaries of Data: Dispersion . . . . . . . . . . . . . . . . . . .

15

2.8

Grouped Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

2.9

Populations and Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

3 Introduction to Probability

20

3.1

Probability and Relative Frequency . . . . . . . . . . . . . . . . . . . . . . .

20

3.2

Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

3.3

Independent Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

3.4

Mutually Exclusive Events . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

3.5

General Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

3.6

Binary Outcomes (success/failure) . . . . . . . . . . . . . . . . . . . . . . . .

22

3.7

Dependent Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

3.8

Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

4 The Binomial Distribution

28

4.1

Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

4.2

Derivation of the Probability Function . . . . . . . . . . . . . . . . . . . . .

28

4.3

Summation of Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

4.4

Parameters of the Binomial Distribution . . . . . . . . . . . . . . . . . . . .

31

4.5

Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33

5 The Poisson Distribution

37

5.1

Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

5.2

Probability Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

5.3

Summation of Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

5.4

Parameters of the Poisson Distribution . . . . . . . . . . . . . . . . . . . . .

39

5.5

Additive Property of the Poisson Distribution . . . . . . . . . . . . . . . . .

40

5.6

Binomial and Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

5.7

Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

6 The Normal Distribution

48

6.1

Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

6.2

The Form of the Normal Distribution . . . . . . . . . . . . . . . . . . . . . .

48

6.3

Calculating Probabilities from the Normal . . . . . . . . . . . . . . . . . . .

49

6.4

Sums and Differences of Normal Random Variables . . . . . . . . . . . . . .

55

6.5

Distribution of the Sample Mean . . . . . . . . . . . . . . . . . . . . . . . .

56

6.6

Normal Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

62

6.7

Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

64

7 Sampling and Sampling Distributions

67

7.1

Introductory Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

67

7.2

Some Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68

7.3

Distribution of the Sample Mean . . . . . . . . . . . . . . . . . . . . . . . .

68

7.4

Example: Process Control . . . . . . . . . . . . . . . . . . . . . . . . . . . .

70

7.5

Two Sample Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71

7.6

Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

73

8 Significance Tests

74

8.1

Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

74

8.2

Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

74

8.3

Two Tailed and One Tailed Tests . . . . . . . . . . . . . . . . . . . . . . . .

75

8.4

Critical Values

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

76

8.5

One-Sample Test of a Normal Mean (Z Test) . . . . . . . . . . . . . . . . . .

77

8.6

The Meaning of Significance . . . . . . . . . . . . . . . . . . . . . . . . . .

79

8.7

Two-Sample Test of Normal Means (Z Test) . . . . . . . . . . . . . . . . . .

80

8.8

Types of Error

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

82

8.9

Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

82

9 Confidence Intervals

86

9.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

86

9.2

Single Sample Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

86

9.3

Two Sample Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

89

9.4

Intervals and Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

91

9.5

Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

92

10 Significance Tests and Confidence Intervals: Small Samples

93

10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

10.2 Students t-Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

10.3 One Sample Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

10.4 Two Sample Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

95

10.5 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

99

10.6 Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101


10.7 When to Use Which Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
10.8 An Explanation of Degrees of Freedom . . . . . . . . . . . . . . . . . . . . . 105
11 Tests and Confidence Intervals for Variances

106

11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106


11.2 Fishers F-Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
11.3 Two Sample Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3

11.4 One Sample Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109


11.5 One Sample Confidence Limits . . . . . . . . . . . . . . . . . . . . . . . . . . 110
11.6 Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
12 Tests and Intervals for Binomial Proportions

115

12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115


12.2 Large Sample Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
12.3 Small Sample Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
12.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
13 Comparison of Observed and Predicted Frequencies

118

13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118


13.2 The 2 Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
13.3 Goodness of Fit Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
13.4 Contingency Table Test of Association . . . . . . . . . . . . . . . . . . . . . 120
13.5 Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
14 Correlation and Regression

123

14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123


14.2 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
14.3 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
15 Reference Material

128

15.1 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128


15.2 Tutorial exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

1
1.1

Dealing with Uncertainty

Measurements and Counts

The raw material of statistics is data in the form of figures. These figures may be measurements such as the burning time of rockets of a given type or the ranges of shots fired from
a gun under specified conditions; or they may be counts, such as the number of hits on a
target when successive salvoes of, say, six rounds are fired.

1.2

Variation

The important thing about these data is that the variety of figures arise from an unintended
variation in the quantity represented. In the examples quoted the burning times should be
nominally identical and the number of hits should be the same for each salvo, but, in
practice, this is never the case.
This variation occurs almost without exception in both natural and man-made processes
and activities. However hard we try, we cannot produce two identical articles, or repeat a
trial and get an identical result. We are inclined to treat this variation as a nuisance and
ignore it, but this can be misleading; false conclusions and decisions can easily be made if
we pretend that every result can be expressed in a single firm figure.

1.3

Example

As a simple example, suppose the average petrol consumption of a certain type of scout
car operated by recce regiments over the past two years is 7.8 mpg. Recently however a
somewhat expensive and time-consuming modification to reduce petrol consumption has
been suggested. An exploratory trial has been conducted by modifying six scout cars and
the following mpg figures for these six vehicles over a two month period obtained:
8.7 9.1 7.9 10.2 7.0 8.1
It is your task to decide, on the basis of this evidence, whether it is worth modifying all
scout cars.
Note that only one of the six mpg figures is below the unmodified average value of 7.8
mpg and the mean value for these six modified cars of 8.5 mpg is almost 10% up on the
old figure. On the basis of this average figure therefore you might naively be tempted
to believe that the modification improves petrol consumption by about 10%. However if
one also looks at the car to car variation in these figures then a statistical analysis reveals
that quite frequently and purely by chance one could have obtained similar or even better
results from six unmodified cars. Thus, one would require more evidence than this before
authorising similar modifications to all scout cars, irrespective of any arguments about the
costs involved.

1.4

Distributions

The statistician recognises and accepts this variation in measurements. He or she has welltried techniques for condensing the data and assessing the shape and size of this variation
by using the idea of distributions.
A quantity that can take different values is called a variable.
A quantity that varies in such a way that we cannot predict with certainty what its next
value will be is called a random variable. A random variable possesses a distribution
which then describes how likely each possible value is.

1.5

Why Statistics Works

In practice, it is found that these distributions follow well-defined patterns. Whereas the
outcome of a single measurement or trial is unpredictable, the outcome of a large number
of repeated events will give a predictable result in the form of a distribution. For example,
toss a coin once and the result cannot be foretold; toss it 10000 times and the proportion of
heads will be very nearly equal to 21 (provided the coin is unbiased). This is known as the
Law of Large Numbers. It is this fact that makes statistics tick ! This is why statistics,
correctly used, can be such a powerful and effective tool.
Fortunately, the patterns that these distributions follow usually have convenient mathematical forms which the statistician can manipulate without difficulty. There are two bonuses
from this. Firstly, you do not have to do 10000 or so repeat trials to establish the distribution; useful information can be derived from 2 or 3, but of course the more the better.
Secondly, all the hard work can be done once and for all and the characteristics of the popular distributions set out in tables; a large number of these exist. The Normal, Binomial,
Poisson, F and 2 (chi-squared) distributions are those most widely used.
Given some representative data, the statistician, having identified the distribution, can then
use its properties to form some sound conclusions; for example in section 1.3, how frequently
one might obtain similar or even better results from unmodified scout cars. Predictions can
be made in the light of these conclusions.
The strength of the statisticians conclusions lies in the fact that (s)he has taken the variation
into account. The price that has to be paid for this is that no conclusion is certain, it will
always be associated with some probability. The answer that you give depends on the risk
you are prepared to take that it might be wrong. This is an essential and inescapable
element in prediction and decision making.

1.6

Confidence

For example, suppose that the average (mean) burning time of a type of rocket is to be
determined by a trial. The answer would be quoted, not as a single figure, but as lying
between two limits. These are known as confidence limits and there would be some stated
risk, say 5%, or 1 in 20, that the true mean burning time falls outside them; which is the
6

same as saying that there is 95% confidence that it falls within them. If greater confidence
(lower risk) were required, then the limits would have to be wider, unless further evidence
is obtained by extending the trials.

1.7

Significance

The same principle applies, for instance to the problem of deciding whether to adopt one
type of tank track or another on the evidence of trials. Statistical evidence may show that
one type of track lasts significantly longer than another; by this we mean that the difference
between the trial results is too large to be attributed to chance variation. There will be a
small stated risk that the observed difference could occur just by chance, and the decision
must be made in the light of this risk. This risk, known as the significance level, is often
chosen as 5% or 1%.
In the example of section 1.3, results similar to, or better than, those obtained from the trial
could have been obtained purely by chance from unmodified cars about 9% of the time.

1.8

Why Study Statistics?

Because variation cannot be avoided and is often embarrassingly large, statistics finds a wide
application in the field of management. The manager does not have to become a statistician,
but ought to know enough about the subject to appreciate its relevance to his/her problems
and decisions; he or she needs to have some confidence in the validity of the arguments and
processes used, and be able to interpret the answers that the statistician gives. The manager
must know when to call in professional statistical advice both when planning investigations
and when analysing the results.

2
2.1

Introduction to Statistics and Data

What is Statistics?

Statistics involves the Collection, Organisation, Analysis and Interpretation of numerical


data. It is essentially about using data to aid decision-making in the face of uncertainty.
Example: we want to see if a new make of tyre performs better than the present type on
our lorries, so we set up a trial (experiment) to see.
(1) Pick some lorries (how many?), allocate new and old tyre types to them (how?) and
after a time measure their performance (how?).
(2) Compare the performances of the new and old types (how?). Have the new ones done
sufficiently well to persuade us to change?
Statistics is often seen as dealing only with the technical details of (2), but can in fact
contribute to (1) as well.

2.2

Collecting Data

This is the process of obtaining measurements or counts.


However, this is really the second stage, since in most cases the first stage is designing the
trial/experiment/survey. We must decide exactly what we want to measure, how to do
it and which objects or individuals to do it on. Our conclusions will only be valid if we
actually measured what we thought we were measuring and if the observations we made
were representative of the overall population we were interested in.
1. Understand the objectives of the investigation.
2. Identify the population for which information is required.
3. Decide exactly what information we want and how to measure it.
4. Select a representative sample from this population.
5. Consider how actually to collect the data to ensure accuracy and completeness
This is of course all highly specific to each given situation, so that, in the following, we will
assume that the above has already been performed satisfactorily.
However, note that, in particular, the selection of a representative sample may be very
difficult in some situations, such as opinion polls.

2.3

Organising, Analysing and Interpreting Data

Next we need to present the data in an appropriate format so that we can extract relevant
information, usually by way of pictures and numerical summaries.
8

1. Find out how and why the data were collected and what is being measured/counted.
2. Obtain any background information.
3. Assess the reliability of the data and how much information they really contain concerning what we are interested in.
The point here is that if the designing and collecting are done by different people to the
organising, analysing and interpreting then they must talk to each other. This is obvious,
but often not done.
1. Explore the data, using pictures and summary statistics.
2. Use appropriate formal statistical methods to draw conclusions.
3. Communicate results clearly.
4. Keep brain engaged at all times.
Sometimes an exploratory, graphical investigation is all that is possible, or desirable. However, in many cases this will fail to uncover all of the information within the data. Statistical
analysis helps us to make the best possible use of our data.

2.4

Types of Data

We measure or count some quantity of interest.


The answers we get are data. They are a sample of observations of this quantity of
interest.
Numbers which we calculate from the data are (summary) statistics.
Data can be
Discrete: only particular values can occur.
E.g. when we count something only whole numbers are possible.
Continuous: any number (possibly within limits) can occur.
E.g. when we measure something.

2.5

Graphical Summaries of Data

Example 1 (discrete)
The numbers of track failures on each of 100 APCs in a year were:
3, 1, 0, 0, 2, 1, 1, 3, 1, 0, 2, 2, 2, 0, 1, 4, 1, 1, 2, 2, 1, 3, 3, 0, 1, 1, 2, 2, 3, 1, 5, 1, 0, 1, 0, 2,
3, 3, 0, 0, 1, 1, 1, 3, 4, 1, 4, 0, 3, 2, 2, 2, 1, 6, 1, 0, 0, 2, 1, 1, 1, 2, 3, 3, 0, 1, 0, 0, 1, 2, 1, 1,
1, 2, 1, 1, 2, 1, 2, 1, 3, 2, 3, 3, 0, 0, 1, 0, 0, 0, 4, 7, 1, 0, 1, 2, 4, 4, 3, 2.
9

We can display these data in various ways.


a) A histogram of frequencies, or relative frequencies. For discrete data with a small range
of values a histogram is a rod graph with a separate rod for each distinct value.

20
10
0

Number of APCs

30

40

Histogram of number of track failures for 100 APCs

Track failures

10

b) A table of frequencies, relative frequencies, and/or cumulative relative frequencies.


Failures

Frequency

Relative
Frequency

0
1
2
3
4
5
6
7

21
34
21
15
6
1
1
1

0.21
0.34
0.21
0.15
0.06
0.01
0.01
0.01

Cumulative
Relative
Frequency
0.21
0.55
0.76
0.91
0.97
0.98
0.99
1.00

c) An empirical cumulative (relative) frequency graph

0.6
0.4
0.0

0.2

Cumulative relative frequency

0.8

1.0

Empirical cumulative frequencies of track failures for 100 APCs

Track failures

This data set is slightly skewed the tail is slightly longer to the right than to the left.

11

Example 2 (continuous)
The detonation heights in metres of 50 air-burst fuses were:
10.2 9.5 7.7 8.6 5.7 6.8 11.4 8.2 7.8 9.8 7.6 8.4 6.9 0.0 12.1 8.4 8.7 5.6 7.7 11.1 14.0 7.7 8.9
6.5 1.5 8.7 8.2 6.4 9.1 8.0 8.0 4.8 10.4 11.6 2.7 9.1 7.9 10.1 4.5 8.7 9.3 9.7 8.1 7.6 10.2 5.1 9.0
3.5 7.5 9.9
As for discrete data, we can create a table, plot a histogram of frequencies or relative
frequencies and plot a cumulative relative frequency graph.
However, this time it is less obvious how to group the data for display, either pictorially or
in a table. We must choose classes (ranges of values), preferably all of the same width. This
also applies to discrete data when the range of values observed is large.

Class
Interval

Midpoint

0 - 0.99
1 - 1.99
2 - 2.99
3 - 3.99
4 - 4.99
5 - 5.99
6 - 6.99
7 - 7.99
8 - 8.99
9 - 9.99
10 - 10.99
11 - 11.99
12 - 12.99
13 - 13.99
14 - 14.99

0.5
1.5
2.5
3.5
4.5
5.5
6.5
7.5
8.5
9.5
10.5
11.5
12.5
13.5
14.5

Freq- Relative Cum.


uency
FreqRel.
uency Freq.
1
0.02
0.02
1
0.02
0.04
1
0.02
0.06
1
0.02
0.08
2
0.04
0.12
3
0.06
0.18
4
0.08
0.26
8
0.16
0.42
12
0.24
0.66
8
0.16
0.82
4
0.08
0.90
3
0.06
0.96
1
0.02
0.98
0
0.00
0.98
1
0.02
1.00

12

This data set is fairly symmetric.


Note that changing the class boundaries will change the pictures, often quite drastically if
many observations were near the boundaries.
Hence it is important to pick the class boundaries before producing the plot, rather than
unscrupulously trying many different sets of class boundaries in order to get the shape you
want.

6
0

Number of fuses

10

12

Histogram of Detonation Heights

10

15

Detonation Height

Note that in the histogram the vertical bars are centred at the midpoint of the intervals.
In the cumulative frequency curve (sometime referred to in books as an ogive for no good
reason), however, we plot the cumulative frequencies against the upper end of the intervals.
This is because, for example the value 0.82 for class 9-9.99 means precisely that 82% of
observations are 9.99.

2.6

Numerical Summaries of Data: Location

Pictures, especially histograms, show the overall distribution of the sample data, but it is
often useful to summarise this numerically. Hence we usually display the data both with
pictures and with sample (summary) statistics.
13

0.6
0.4
0.0

0.2

Cumulative relative frequency

0.8

1.0

Empirical cumulative frequencies of Detonation Heights

10

15

Detonation Height

Notation:
n : the sample size
x1 , x2 , . . . , xn : the values of the observations
Three measures of location are
a) The sample mean: the average of the observations.
x =

1
(x1 + x2 + x3 + . . . + xn )
n

n
1X
xi
=
n i=1

b) The sample median: the middle observation.


This is the value of the n+1
-th largest observation (or average the middle two if n is even).
2
E.g. if n = 5 then the median is just the value of the 3rd largest observation.
If n = 6 it is the average of the 3rd and 4th largest.
c) The sample mode: the most frequently occurring value. This is pretty meaningless for
continuous data so instead we usually consider the modal class instead.
14

In most cases the mean is the most useful measure, though the median may be preferable
for very skewed distributions.
APC data:
x =

1
163
(3 + 1 + . . . + 2) =
= 1.63
100
100

median = 1
mode = 1
Detonation height data:
x =

1
398.9
(10.2 + 9.5 + . . . + 10.2) =
= 7.978
50
50

median = 8.2
modal class = 8 8.99

2.7

Numerical Summaries of Data: Dispersion

A measure of location is rarely enough by itself. Two samples can have a similar mean but
still be very different if one is more spread-out than the other.
Three measures of the spread or dispersion of a sample are:
a) The sample range: the largest value minus the smallest.
b) The sample interquartile range: the lower and upper quartiles are defined like the
median, except that they are the one-quarter and three-quarters points, rather than the
halfway point. The interquartile range is the difference between them.
If the observations are ordered from smallest to largest then the quartiles are the values of
the n+1
-th and the 3( n+1
)-th observations. E.g. if n = 7 then the lower quartile is the value
4
4
of the 2nd (i.e. 2nd smallest) observation and the upper quartile is the value of the 6th
observation.
c) The sample variance: this is defined as
S2 =


1 
(x1 x)2 + . . . + (xn x)2
n1
n
1 X
(xi x)2
n 1 i=1

and with a little algebra


n
X
1
x2i n
x2
=
n 1 i=1

15

n
1
X 2
xi
=
n1
i=1

n
X

xi

i=1

!2

In other words we square the deviation of each observation from the mean, then average
these squared deviations. The variance is therefore the average squared deviation from the
1
rather than n1 later).
mean. (We mention the reason for using n1
The larger the spread of points the larger the variance is.
Note that all we need to calculate x and S 2 are the sum
The individual data values are not needed.

x and the sum of squares

x2 .

The sample standard deviation S is the square root of the sample variance. This is often
more interpretable as it is measured in the same units as the observations themselves.
In most calculators the standard deviation is calculated using the button marked n1 , or
something similar. However, if you were to find a variance without the inbuilt calculator
button you would always use one of the last two versions of the formula to do it.
In practice we nearly always use the variance and/or standard deviation to summarise
dispersion. The range is too sensitive to one or two unusual values and the inter-quartile
range is too awkward to deal with mathematically.
APC data:
S2 =


1 
(3 1.63)2 + (1 1.63)2 + . . . + (2 1.63)2
99

which is more easily calculated using

x = 163 and

x2 = 459 so that


1 
459 100 1.632
99
= 1.952626

S2 =

S = 1.397364
range = 0 to 7 = 7
The interquartile range is not very useful here.
Detonation height data:
S2 =

calculated using


1 
(10.2 7.978)2 + (9.5 7.978)2 + . . . + (9.9 7.978)2
49

x = 398.9 and

x2 = 3514.23 so that

1 
3514.23 50 7.9782
49
= 6.771547

S2 =

S = 2.60222
16

range
lower quartile
upper quartile
interquartile range

=
=
=
=

0 to 14 = 14
6.875
9.45
9.45 6.875 = 2.575

It is often very informative to summarise graphically the range, quartiles and median for
several samples at once with a boxplot. To illustrate this the figure shows a boxplot first
for the raw detonation heights, then for the data with 4 added to all the values, then for
the data with all values multiplied by 1.5.

15

20

Boxplots of genuine and modified detonation data

10

2.8

Grouped Data

Sometimes we may wish to calculate the above summary statistics directly from frequency
tables like those in section 2.5. This may either be because we only have the grouped
(tabulated) data, or to save time.
APC data:
In this case the grouped data are the same as the raw data, so we get exactly the same
17

answers
1
(0 21 + 1 34 + 2 21 + 3 15 + 4 6 + 5 1 + 6 1 + 7 1)
100
= 1.63

1 
(0 1.63)2 21 + (1 1.63)2 34 + . . . + (7 1.63)2 1
=
99
= 1.952626

x =

S2

Detonation height data:


In continuous cases the grouping into classes has lost some information so we dont obtain
exactly the same answers as before. We can calculate an approximate mean and variance
by assuming that each observation takes the value of the midpoint of the class it is in.
1
(0.5 1 + 1.5 1 + . . . + 8.5 12 + . . . + 14.5 1)
50
= 8.02
1 
(0.5 8.02)2 1 + (1.5 8.02)2 1 + . . . + (8.5 8.02)2 12 + . . .

49

. . . + (14.5 8.02)2 1

x
S2

= 7.11184
S 2.6668

Similarly we can approximate the median and the quartiles by the midpoints of the classes
in which they fall:
approx. median = 8.5
modal class = 8 8.99
approx. interquartile range = 9.5 6.5 = 3

2.9

Populations and Samples

In statistical terminology, a population is the whole set of things about which conclusions
are to be drawn. Sometimes this is very large but finite, such as the British electorate, but
often it is in effect infinite, such as all shells of a particular type which have been or will be
produced.
The mean of the population is usually denoted and the variance of the population
is usually denoted 2 , so that the standard deviation of the population is . These
have similar interpretations to the sample versions.
When we inspect or analyse the data we must allow for the fact that they are only a sample
from the overall population that we are interested in.
Example (from section 2.1):
We collect data from tyres fitted to 20 lorries over a period of 1 month. However, if we had
picked a different 20 lorries or a different month then our results would have been different.
18

How do we allow for this random variation in our results?


The answer is that we explicitly build random variation into our decision making, i.e. our
model of what is happening. To do this we must make assumptions as to the form which
this random variation will take.
Hence in order to analyse or interpret data, we must first understand the concept of probability, which underpins the whole thing.
Important Note: In statistical terminology a sample always means a set of observations,
not, as in some other subjects, a single observation.

19

3
3.1

Introduction to Probability

Probability and Relative Frequency

A trial or experiment is just something with more than one possible outcome.
E.g. Toss a coin; Roll a die; Roll ten dice and count how many 6s there are.
An event is simply something that may or may not happen when we perform a trial or
experiment.
E.g. Its a Head; Its a 1; There are fewer than two 6s.
The probability of an event occurring is measured by the relative frequency of this event
in an arbitrarily large number of trials, i.e. the relative frequency in the population.
For example, when it is stated that the probability of hitting a target with a single round
is 0.65 (or 65%), what this actually means is that if an enormously large number of rounds
were fired under the same conditions then 65% would hit the target.
Probability is therefore measured on a scale from 0 (impossible) to 1 (certain). Some
examples of familiar probabilities are:
1. Probability of a head on tossing a coin =
2. Probability of rolling a six with a die =

1
6

1
2

(50%)
(16.67%)

3. Probability of drawing an ace of spades from a pack =

3.2

1
52

(1.92%)

Probability Distributions

Probability can be interpreted in terms of relative frequency in more complicated cases too.
A probability distribution describes the probabilities of all possible outcomes of a trial,
and can be regarded as the set of relative frequencies in an infinitely large number of
trials.
For example the relative frequency of getting at least 3 hits when a very large number of
salvoes are fired can be equated to the probability of getting at least 3 hits with a single
salvo.
In this case the relative frequency distribution, describing how many salvoes would produce
no hits, how many would produce exactly one hit etc, may equally well be regarded as a
probability distribution and interpreted accordingly.
A probability distribution can be as simple as that for tossing a coin:
P (Head) =

1
;
2

P (T ail) =

20

1
2

3.3

Independent Events

Two events A and B are independent if the chance of each occurring is unaffected by
whether the other occurs or not. This is the case if and only if
P (A AND B) = P (A) P (B)
Similarly, a set of n events are independent if and only if
P(event 1 AND event 2 AND . . . AND event n)
= P(event 1) P(event 2) . . . P(event n)
Example
The probability of a card drawn at random being black is 21 and the probability that it is a
1
10 is 13
. Clearly the probability of a card being black is not related to its value, and hence
the probability that it is both black and a 10 is the product of the 2 probabilities
P (Black AND 10) =

1
1
1
=
13 2
26

Example
If on a field telephone system the probability that any hand set is working is 99%, whilst
independently the probability of the switchboard operating correctly is 97%, then the probability of one outpost being able to contact another outpost is given by
0.99 0.97 0.99 = 0.9507

3.4

Mutually Exclusive Events

Two events A and B are mutually exclusive if they cannot both happen, so that
P (A AND B) = 0
Hence the two events are (very) dependent.
It follows from this that
P (A OR B) = P (A) + P (B)
Similarly, if n events are all mutually exclusive then
P(event 1 OR event 2 OR . . . OR event n)
= P(event 1) + P(event 2) +. . .+ P(event n)
Example
A card cannot both be a king (probability

1
)
13

21

and an ace (probability

1
),
13

so these two

events are mutually exclusive. Hence the probability of drawing either a king or an ace
from the pack is the sum of the two probabilities;
P (Ace OR King) =

1
2
1
+
=
13 13
13

Example
Consider a tank which may be conveniently divided into three target areas A, B and C. If
the probabilities of hitting and thereby killing the tank with a single round, aimed at the
centre point, are 8%, 30% and 14% respectively for the three areas A, B and C then the
total probability of achieving a kill is 52%.
Definition
A set of events is mutually exclusive and exhaustive if in addition they are the only possible
events, i.e. precisely one of the events must occur. If so, then
P(event 1) +. . .+ P(event n) = 1.
In other words P(event 1), P(event 2), . . ., P(event n) give a probability distribution.

3.5

General Events

For any pair of events


P (A OR B) = P (A) + P (B) P (A AND B)
Hence for independent events
P (A OR B) = P (A) + P (B) P (A) P (B)
and for mutually exclusive events
P (A OR B) = P (A) + P (B)

3.6

Binary Outcomes (success/failure)

Let p be the probability of success and q the probability of failure. It is assumed that a
trial must have one or other as the outcome. Thus the events are mutually exclusive and
exhaustive and so the total probability must be unity. Hence by this rule the probability of
either success or failure is p + q = 1 so that q = 1 p. Success and failure are interpreted
as occasion demands. The term success is often applied to the event of interest, such as
finding a defective item.
Suppose that we repeat a trial of this type n times independently. Then
P (n successes) = p p . . . p = pn
22

and similarly
P (0 successes) = q q . . . q = q n
Similarly
P (r successes then n r failures) = p p . . . p q q . . . q
= pr q nr
This is true for any specified ordering of r successes and n r failures, in other words any
other particular sequence of successes and failures which finished with r successes and n r
failures. Such sequences would be mutually exclusive.
Now consider the whole sequence of trials, with
overall success = at least one occurrence (success)
overall failure = no occurrences (all failures)
Then
P (overall failure) = P (0) = q q . . . q = q n
and so
P (overall success) = 1 P (overall failure)
= 1 qn
= 1 (1 p)n
Example
Four rounds are fired simultaneously at a target, each independently having a chance of 0.4
of hitting. One hit is enough to kill the target. Then
P (kill) = 1 P (all miss) = 1 (1 0.4)4 = 0.8704
For some repeated trials, the opposite of no occurences is just one occurrence. This
happens when one occurrence precludes further trials. The procedure above will still apply
and gives the probability of just one occurence.
Example
It is required to find the probability of a tank being disabled by a minefield consisting of 5
rows of mines when it is estimated that the probability of a tank being disabled by a single
row of mines is 17%. The probability that it successfully gets through one row of mines is
0.83, and so the probability that it gets through all 5 rows is
P (5 successes) = (0.83)5 = 0.3939
Hence the overall probability of it being disabled is
P (not 5 successes) = 1 0.3939 = 0.6061
23

3.7

Dependent Events

The above has mostly concerned independent trials, and hence events, with one special case
of dependence (mutual exclusivity).
More general forms of dependence exist, and probabilities of multiple events when those
events are dependent can often be neatly displayed by a Tree Diagram (at end of chapter).
Example
A bomb disposal unit is receiving simulators from three different factories in the proportions
50%, 30% and 20%. If the percentages of defective output from these factories are 3%, 4%
and 5% respectively, find the proportion of defective simulators received.

We obtain the leaf probabilities by multiplying the probabilities on the branches leading
up to it. We can sum the leaf probabilities as shown because the leaves are all mutually
exclusive.
Conditional Probability
If we have two events A and B then the conditional probability of A happening given that
B happens is denoted
P (A|B)
If A and B are independent then P (A|B) = P (A) while if they are mutually exclusive then
P (A|B) = 0.
Example (continued)
The probabilities on the second set of branches of the tree diagram are all conditional
probabilities
P (defective|factory 1) = 0.03
P (defective|factory 2) = 0.04
P (defective|factory 3) = 0.05
24

Formula for Conditional Probability


The formal definition is that
P (A AND B)
P (B)

P (A|B) =
From this it immediately follows that

P (A AND B) = P (A|B) P (B)


which is the formula which we implicitly use in the tree diagram.
Example (continued)
We can use both of the above versions of the formula to calculate the probability that a
simulator which is found to be defective came from factory 3.
From the second version of the formula, or directly from the tree
P (factory 3 AND defective) =
=
=
=

P (defective AND factory 3)


P (defective|factory 3) P (factory 3)
0.05 0.2
0.010

Hence the probability that a simulator came from factory 3, given that it is defective, is
P (factory 3|defective) =
=

P (factory 3 AND defective)


P (defective)
0.010
0.037

= 0.27027
Hence, of the defective items, 27% of them come from factory 3.

25

3.8

Further Examples

Example 1
Three LAW operators A, B and C, are to fire at a tank target. The probability of A achieving
a hit with an individual round is known to be 0.85, while for the less well-trained B and C
the probabilities are 0.7 and 0.45 respectively.
1. If each operator were to fire one round calculate the probability that
(a) all three will hit the target
(b) all three will miss the target
(c) exactly two hits are obtained
(d) at least one hit is obtained
assuming that the three operators perform independently of one another.
2. If each operator were to fire two rounds, calculate the average number of hits obtained
in total.
Solution
1. Under the assumption of independence,
(a)
P (three hits) =
=
=
=

P (A hit and B hit and C hit)


P (A hit) P (B hit) P (C hit)
0.85 0.70 0.45
0.27

(b)
P (three misses) =
=
=
=
=

P (A miss and B miss and C miss)


P (A miss) P (B miss) P (C miss)
(1 0.85) (1 0.70) (1 0.45)
0.15 0.30 0.55
0.025

(c)
P (exactly two hits) = P [(A hit and B hit and C miss)
or (A hit and B miss and C hit)
or (A miss and B hit and C hit)]
= (0.85 0.70 0.55) +
(0.85 0.30 0.45) +
(0.15 0.70 0.45)
= 0.49
26

(d) We can save some effort here by using


P (at least one hit) = 1 P (three misses)
= 1 0.025
= 0.975
2. The average number of hits obtained when each operator fires two rounds is
(0.85 2) + (0.70 2) + (0.45 2) = 4.0

27

4
4.1

The Binomial Distribution

Application

The binomial is a commonly occuring discrete distribution. It arises when a trial can have
only one of two possible outcomes which we may term success and failure, and it gives
the distribution of the number of successes when the trial is repeated a certain number of
times. This directly follows from section 3.6.
Suppose we have n independent trials, each with success probability p and failure probability
q = 1 p. Let X be a Random Variable describing the number of successes in the n
trials.
E.g. toss ten coins and count the number of heads.
Then we say that the random variable X follows the binomial distribution with parameters
n (number of trials) and p (success probability). For short, we write
X Bi(n, p)

4.2

Derivation of the Probability Function

The shape of the distribution can be obtained from the basic laws of probability. We can
derive a function that gives each of the probabilities in the probability distribution, and this
is usually referred to as the probability function.
From section 3.6 we have
P (X = 0) = q n
P (X = n) = pn
and that
P (X = r) = c pr q nr
where c is the number of different ways in which r successes and nr failures can be ordered.
Clearly this depends on both n and r.
Consider some values of r:
r = 0 : clearly c = 1
r = n : clearly c = 1
r=1 :
sf f . . . f
f sf . . . f
clearly
..
.
c=n
ff . . . fs
and so also c = n when r = n 1.
28

n
r

In general, c is in fact Cr =
n
r

= n choose r, where

n!
n(n 1) . . . (n r + 1)
=
(n r)!r!
r(r 1) . . . 3.2.1

is the number of ways of choosing r objects out of n.


Hence the probability function for the binomial distribution is given by
P (X = r) =

n
r

pr q nr

for r = 0, 1, . . . , n

where X Bi(n, p).


Taking particular cases of this general formula:
P (0) = Probability of 0 successes = q n
P (1) = Probability of 1 success = npq n1
n(n1) 2 n2
p q
= n(n1)
p2 q n2
12
2
P (n 2) = Probability of n 2 successes = n(n1)(n2)...3
pn2 q 2 = n(n1)
pn2 q 2
1.2.3....(n2)
2
pn1 q = npn1 q
P (n 1) = Probability of n 1 successes = n(n1)(n2)...2
1.2.3....(n1)
P (n) = Probability of n successes = n(n1)(n2)...1
pn = pn
1.2.3....n

P (2) = Probability of 2 successes =

Note that the coefficients of the terms are symmetrical at each end of the distribution, but
the shape of the distribution would only be symmetrical if p = q = 12 (e.g. tossing a coin).
However, it tends to become more symmetrical (whatever the value of p) as n becomes large.
Note that these probabilities can be thought of as population equivalents of the sample
relative frequencies in section 2.5. We can similarly draw a rod graph of the probabilities.
Example
A salvo of 6 rounds is fired. The probability of a hit with a single round is assumed to be
0.6, independently of the other rounds. Find the probabilities of obtaining 0, 1, 2, 3, 4, 5
and 6 hits .
Hence if X is a random variable describing the number of hits in a single salvo then X
Bi(6, 0.6). Therefore the probabilities of 0, 1, 2, 3, 4, 5, 6 hits are given by substituting
n = 6, p = 0.6, q = 0.4 in the formula. Hence we obtain:
Probability
Probability
Probability
Probability
Probability
Probability
Probability

of
of
of
of
of
of
of

zero hits (all 6 rounds miss) = (0.4)6 = 0.0041


exactly one round hitting = 6 (0.6) (0.4)5 = 0.0369
exactly 2 rounds hitting = 65
(0.6)2 (0.4)4 = 0.1382
12
654
exactly 3 rounds hitting = 123 (0.6)3 (0.4)3 = 0.2765
6543
exactly 4 rounds hitting = 1234
(0.6)4 (0.4)2 = 0.3110
65432
exactly 5 rounds hitting = 12345
(0.6)5 (0.4) = 0.1865
654321
(0.6)6 = (0.6)6 = 0.0467
all 6 rounds hitting = 123456

29

0.15
0.0

0.05

0.10

P(r)

0.20

0.25

0.30

Binomial, n=6, p=0.6

4.3

Summation of Terms

Just as in the sample case in section 2.5, we can calculate cumulative probabilities. The set
of these is often called the cumulative distribution function.
Hence the value of the cumulative distribution function at r is just the probability of r or
fewer successes. It is given by the sum of the first (r + 1) terms, i.e. up to and including
the term containing pr .
Note that this is not the same as the probability of fewer than r successes, which does not
include the pr term.
The (cumulative) distribution function is therefore given by
P (X r) =

r
X

P (X = i)

for r = 0, 1, . . . , n

i=0

Similarly
P (X < r) =

r1
X
i=0

P (X = i) = P (X r 1)

The probability of at least r successes is the sum of the last (n r + 1) terms, from the
term containing pr onwards.
30

If the required probability involves summing more than half the terms, the arithmetic can
be shortened by using the fact that all the terms add up to unity:
P (X r) + P (X > r) = 1
P (X < r) + P (X r) = 1
Thus, sum the remaining terms instead and subtract from 1.
In particular
P (X 1) = 1 P (X < 1) = 1 P (X = 0) = 1 q n
or in other words
P (at least one success) = 1 P (no successes)
= 1 qn
Example
The probability function and cumulative distribution function from the example above where
X Bi(6, 0.6) can be tabulated as
Number of
hits (r)

4.4

Probability
P (X = r)

Cumulative
Probability P (X r)

0.0041

0.0041

0.0369

0.0410

0.1382

0.1792

0.2765

0.4557

0.3110

0.7667

0.1865

0.9532

0.0467

1.0000

Parameters of the Binomial Distribution

The binomial distribution is completely specified by the number of trials n and the success
probability p, remembering that the n trials must be independent.
A random variable and its distribution can be used to describe a population. In the binomial
case the population mean and variance are given by
population mean = np
population variance 2 = np(1 p)
31

so that
population standard deviation =

np(1 p)

Note: the population mean is also called the expectation or expected value of X. This
is a misnomer really, since we do not expect X to equal .
Example
For X Bi(6, 0.6), the population mean is the average number of hits
np = 6 0.6 = 3.6
The variance is
np(1 p) = 6 0.6 0.4 = 1.44
Estimating the Parameters of the Binomial from a Sample
All of the above has assumed known n and p. We should know n, but will only know p in
special cases (dice, cards etc). However, if we have results from N (> n, hopefully) trials,
it is natural to estimate p by
#successes
p =
N
where theis standard statistical terminology meaning estimate of.
Estimates of the mean and standard deviation of X, if desired, may then be derived from p
using:

= n
p

n
p(1 p)

See sections 6.6 and 12.


Example
Consider again the case of firing salvoes of n = 6 rounds, but this time without assuming
that p is known.
We have the following data after 40 such salvoes have been fired:
Number of hits 0 1

Frequency

5 10 13

1 1

5 6 TOTAL
7 3 40

The total number of shells fired is N = 40 6 = 240.


The total number of hits is
(0 1) + (1 1) + (2 5) + (3 10) + (4 13) + (5 7) + (6 3)
= 0 + 1 + 10 + 30 + 52 + 35 + 18 = 146

32

Hence

146
= 0.6083
240
Therefore estimates of the mean and standard deviation (of the number of hits in a salvo)
are
p =

= n
p = 3.65

4.5

n
p(1 p) = 1.196

Further Examples

Example 1
A reconnaissance party carries five signal flares in order to summon help. Experience has
shown that each flare has a probability of 0.90 of functioning correctly. Calculate the
probability that;
1. at least one flare will function correctly
2. exactly two flares will function correctly
3. at most, two flares will fail to function correctly.
Solution
Let the random variable X denote the number of flares which function correctly. Then X
may assume one of the values X = 0, 1, 2, 3, 4, 5 and follows a binomial distribution with
p = P (flare functions correctly) = 0.90
and so
q = P (flare fails to function correctly) = 1 p = 0.10
where the number of independent trials is n = 5. Hence we write X Bi(5, 0.9).
1. We require
P (at least one flare functions) = P (X 1)
= 1 P (X = 0)
where,
P (X = 0) =

5
0

p0 q 5 = (0.10)5 = 0.00001

so that,
P (X 1) = 1 0.00001 = 0.99999

33

2. We require,
P (exactly two flares function) = P (X = 2)
!
5
=
p2 q 3
2
= (10)(0.9)2(0.1)3
= 0.0081
3. We require,
P (at most two flares fail to function) = P (X 3)
= 1 P (X 2)
Now,
P (X = 1) =

5
1

p1 q 4 = (5)(0.9)(0.1)4 = 0.00045

and so
P (X 2) = P (X = 0) + P (X = 1) + P (X = 2)
= 0.00001 + 0.00045 + 0.0081 = 0.00856
Hence,
P (X 3) = 1 0.00856 = 0.99144
Example 2
A factory which produces toasters claims that at least 72% of the total production will
function exactly according to specification and that defectives occur within the production
only as a consequence of random variations in the production process.
To test this claim a random sample of 10 toasters is taken from the production line on each
of 100 working days and the number of toasters which conform to the specifications (within
each sample) is recorded;
7
8
8
7
7
8
6
7
7
9

9
7
8
4
5
6
7
9
5
9

7
8
7
7
9
8
6
6
7
8

7
7
6
8
8
7
7
6
7
6

9
7
9
8
7
8
6
8
9
9

7
7
7
9
7
8
5
8
7
8

Is the factorys claim believable?


34

9
6
8
6
8
9
7
8
7
8

8
7
7
7
9
7
8
7
8
8

5
9
7
6
8
8
7
9
7
9

8
7
6
8
5
8
4
5
9
7

Solution
The frequency and relative frequency distributions for the number of non-defective toasters
in each of the samples of 10 toasters are:
Non-defective toasters 3
Frequency
0
Relative Frequency
0.00

4
5
6
7
8
9
10
2
6
12
35
28
17
0
0.02 0.06 0.12 0.35 0.28 0.17 0.00

Total number of non-defective toasters = 4 2 + 5 6 + . . . = 732


Total number of toasters inspected = 1000
Hence, the sample estimate of the proportion of non-defective toasters within the total
production of the factory is
732
p =
= 0.732
1000
p is the sample estimate of the probability of a single toaster, selected at random from the
total production, being non-defective.
This suggests that the true proportion of non-defectives is probably over 72% as claimed,
though of course we cannot be certain. We see how to assess this properly in section 12.4.
We now consider whether the defectives are indeed occurring randomly, or whether there is
some pattern. For example, if there are more defectives produced at some times of day or
week, or if the operating characteristics of the production line vary over time, then we will
get bunching of the defectives, so that more samples will contain either a very large or a
very small number of defectives. Hence the distribution will not look binomial, because the
assumption of independent trials with the same success probability will be violated.
We assume that the non-defective toasters occur at random throughout the total production
and let the random variable X denote the number of non-defective toasters occurring within
a random sample of 10 toasters. Hence, X follows the binomial distribution with 10 trials
and an estimated probability (of being non-defective) of p = 0.732.
Then the estimated proportions of samples which will contain various numbers of nondefectives are
P (X = 4) =

10
4

(0.732)4(0.268)6 = 0.0223

P (X = 5) =

10
5

(0.732)5(0.268)5 = 0.0732

P (X = 6) =

10
6

(0.732)6(0.268)4 = 0.1667

P (X = 7) =

10
7

(0.732)7(0.268)3 = 0.2601

P (X = 8) =

10
8

(0.732)8(0.268)2 = 0.2664

P (X = 9) =

10
9

(0.732)9(0.268)1 = 0.1617

35

P (X = 10) = 0.73210 = 0.0442


and hence by subtraction
P (X 3) = 1 sum of the above = 0.0054
Hence, multiplying the above probabilities by 100 (the number of days) we obtain the
following table of Observed and Expected frequencies under the assumption that the manufacturers claim is correct:
Non-defective toasters 03
4
5
6
7
8
9
10
Observed Frequency
0
2
6
12
35
28
17
0
Expected Frequency
0.54 2.23 7.32 16.67 26.01 26.64 16.17 4.42
The numbers are reasonably close except for 6 and 7, suggesting that the distribution
probably is binomial, so that defectives are probably random and there is no bunching of
defectives. In section 13.1 we describe a way of assessing the comparison of Observed and
Expected frequencies under the assumption of a hypothesis, and return to this example in
section 13.5.
Example 3
A certain mission requires a minimum of 5 helicopters to be operable throughout. In a
mission of this type it is estimated that each helicopter has an 80% chance of survival. How
many helicopters should be sent on the mission to have a 90% chance of success?
Solution
Let X be the number of helicopters operational throughout the mission, so that X Bi(n, p)
where p = 0.8 and n is to be determined. We must find n such that
P (X 5) 0.9
If n = 6
Therefore
If n = 7

Therefore
If n = 8

Therefore

P (6) = (0.8)6
P (5) = 61 (0.2)(0.8)5
P ( 5 successes)

= 0.262
= 0.393
= 0.655

P (7) = (0.8)7
P (6) = 71 (0.2)(0.8)6
P (5) = 71 26 (0.2)2 (0.8)5
P ( 5 successes)

= 0.210
= 0.367
= 0.275
= 0.852

P (8) = (0.8)8
P (7) = 81 (0.2)(0.8)7
P (6) = 81 27 (0.2)2 (0.8)6
P (5) = 81 27 63 (0.2)3 (0.8)5
P ( 5 successes)

= 0.168
= 0.336
= 0.294
= 0.147
= 0.945

Hence n should be at least 8.


36

5
5.1

The Poisson Distribution

Application

The Poisson distribution arises when events occur at random, often over time or in space.
It models the number of events which occur in a given time period, or over a given spatial
area.
Example
1. Prussian officers kicked to death by horses
2. Car crashes on a stretch of road.
Random in time means that knowing when the last event occurred says nothing about
when the next will occur. (Similarly for a spatial process).
The only parameter is m, the mean number of events occurring (e.g. in a specified time
period).
Example
We might have
1. m = 1.7 officers/year
2. m = 3.2 crashes/month or, equivalently, m = 38.4 crashes/year
Although few processes are truly completely random, the Poisson has been shown to be a
good approximation to reality in many cases.

5.2

Probability Function

If X is a random variable following the Poisson distribution with mean m, we write


X P o(m)
and the probability function is given by
mr m
e
for
P (X = r) =
r!

r = 0, 1, 2, . . .

(you have to take this on trust, Im afraid). Hence for example


m0 m
e = em
0!
m1 m
P (X = 1) =
e = mem
1!
m2 m m2 m
e =
e
P (X = 2) =
2!
2
P (X = 0) =

and so on.
Note that we have the convenient formula
P (X = r) =

m
P (X = r 1)
r
37

(0! = 1)

0.3
0.0

0.1

0.2

P(r)

0.4

0.5

0.6

Poisson, m=0.5

Unlike the binomial, there is no fixed upper limit to the number of events that can occur.
However
P (X = r) 0 for r >> m
Hence although the distribution may be represented in rod graph form, there is no limit to
the number of columns, but the probabilities will decrease to negligible values as r increases.
The shape of the distribution is very asymmetrical for small m but becomes symmetrical as
m increases.
Example
If X P o(3) then for example
34 3
e = 0.168031
4!
35 3
e = 0.100819
P (X = 5) =
5!

P (X = 4) =

The latter could also have been derived as


P (X = 5) =

3
P (X = 4) = 0.100819
5

38

5.3

Summation of Terms

The remarks made about the summing of terms for the binomial apply to all discrete
distributions. For the Poisson distribution it is always preferable to obtain the probability
of at least r occurrences by subtracting the probability of fewer than r occurrences from
1, since this avoids the approximation of neglecting the terms in the infinite tail of the
distribution. For example
P (X 1) = 1 P (X = 0) = 1 em
In general then we use
P (X r) =
P (X > r) =

r
X

P (X = i)

i=0

P (X = i)

i=r+1

= 1 P (X r)

(avoids infinite sum)

Example
The average number of defectives in a box of ammunition is known to be 21 . We calculate
the probability of finding boxes with 0, 1, 2 defectives by substitution of m = 12 in the
formula above, so that
P (X = 0)
P (X = 1)
P (X = 2)

= e0.5
= 0.5
above
1
0.5
= 2 above

= 0.606530
= 0.303265
= 0.075816
0.985611

P (X 3) = 1 0.985611
= 0.014389

5.4

Parameters of the Poisson Distribution

The mean m completely specifies the distribution


population mean = m
population variance 2 = m

population standard deviation =


m
Estimating the Parameters of the Poisson from a Sample
In reality, of course, m is usually unknown. However, if N independent trials, all from a
P o(m) distribution, have results r, . . . , rN , we can estimate m by
m
=

N
1 X
ri
N i=1

39

This applies similarly even if the lengths of the time periods differ.
Example
If we have the information:
3 Prussians killed in 1879
5 Prussians killed in 1883-85
8 Prussians killed in 1890-93
m
=

3+5+8
= 2 per year
1+3+4

(assuming that the mean is constant over time).


Example
Suppose a sample of 500 boxes of ammunition have been examined and a count made of the
number of defectives in each box to give the following result:
Number of
defectives
0
1
2
3
4
5

Frequency
309
142
40
8
1
0
500

Total
0
142
80
24
4
0
250

Total number of defectives is


= (0 309) + (1 142) + (2 40) + (3 8) + (4 1)
= 0 + 142 + 80 + 24 + 4
= 250.
Total number of boxes is 500, so estimated mean number of defectives per box is
m
=

5.5

250
= 0.5.
500

Additive Property of the Poisson Distribution

If occurrences from one source have a Poisson distribution with mean m1 , and occurrences
from a different but independent source have a Poisson distribution with mean m2 , then it
can be shown that occurrences from either source have a Poisson distribution with mean
m1 + m2 .
If X1 P o(m1 )
X2 P o(m2 )
Then X1 + X2 P o(m1 + m2 )
40

Example
This additive property of the Poisson distribution is useful in the study of accident statistics
which, in many situations, may be well described by a Poisson model. For example, if at a
traffic black spot there are on average 2 serious accidents per year, the probability P (r) of
obtaining r = 0, 1, 2, 3 etc accidents in any one year is given by the formula
P (r) =

2r 0.1353
2r e2
=
r!
r!

Similarly if, at another black spot, there are on average 3 accidents per year the corresponding probability P (r) of r accidents in a year is given by
P (r) =

3r 0.0498
3r e3
=
r!
r!

The additive property of the Poisson distribution now tells us that the probability of obtaining r accidents in a year from the two sites combined, is simply
P (r) =

(2 + 3)r e(2+3)
5r 0.0067
=
r!
r!

In other words
Black spot 1 accidents
Black spot 2 accidents
Accidents at either

5.6

P o(2)
P o(3)
P o(5)

Binomial and Poisson

In fact a binomial distribution with large number of trials n and small success probability
p gives almost identical probabilities to a Poisson with mean m = np.
This is useful when
1. calculating by hand, or even by computer if n is huge.
2. n and p are both unknown but m is known.
A rule of thumb is that the approximation is reasonable for n > 10 and p < 0.1.
Example
Consider a box of 450 rounds of ammunition where the probability of a round being defective
is 0.004. Hence the number of defectives in a box follows a binomial distribution with
n = 450 and p = 0.004. Hence the probability of exactly r defectives is
P (r) =

450 449 . . . (450 r + 1)


(0.004)r (0.996)450r
r (r 1) . . . 1

However, since the average number of defectives is


p n = 0.004 450 = 1.8
41

this means we can approximate this formula with that for the Poisson with mean 1.8:
P (r) =

1.8r
e1.8
r ... 1

Note that this is considerably easier to calculate, especially if using the formula to obtain
P (r) from P (r 1). The results are very similar:
r

Poisson

Binomial

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14

0.165299
0.297538
0.267784
0.160671
0.072302
0.026029
0.007809
0.002008
0.000452
0.000090
0.000016
0.000003
0.000000
0.000000
0.000000

0.164703
0.297657
0.268369
0.160950
0.072233
0.025876
0.007707
0.001963
0.000437
0.000086
0.000015
0.000002
0.000000
0.000000
0.000000

42

5.7

Further Examples

Example 1
On average a repair section are asked to repair 2.5 items in a day. However, they can only
handle 4 items per day.
On those days in which they get 5 or more items, they have to send them elsewhere at a
fixed cost of 500, regardless of the number of items sent on that day.
How often are they likely to get 5 or more items for repair - and hence, what is their total
cost penalty likely to be in a working year of 310 days?
Solution
If we can assume that repairs arrive at random then the the number of repairs in a day can
be described by a Poisson distribution with m = 2.5.
Hence
P (5 or more) = 1 (P (0) + P (1) + P (2) + P (3) + P (4))
where P (r) =

(2.5)r e2.5
r!

so that
P (0) = e2.5

= 0.0821

P (1) = 2.5e2.5

= 0.2052

P (2) =

(2.5)2 e2.5
2

= 0.2565

P (3) =

(2.5)3 e2.5
32

= 0.2138

P (4) =

(2.5)4 e2.5
432

= 0.1336

Hence
P (0) + P (1) + P (2) + P (3) + P (4) = 0.8912
and so
P (5 or more ) = 1 0.8912 = 0.1088
Therefore
Frequency of 5 or more in 310 days = 310 0.1088 = 33.728
expected cost = 33.728 500
= 16864

43

Example 2
As above, but the cost penalty is 300 for each item sent for repair which cannot be handled.
Therefore a 300 penalty is incurred if 5 items received, 600 if 6 are received etc.
Now
P (5) =

(2.5)5 e2.5
= 0.066801
5432

and so on, so that (with more decimal places now required for accuracy)
Defectives
Cost
Prob.

5
6
7
8
9
10
11
300
600
900
1200
1500
1800
2100
0.066801 0.027834 0.009941 0.003106 0.000863 0.000216 0.0000616

The table is simplified, with the probabilities of more than 10 repairs lumped together. If
we work out the probabilities for more than 10 repairs separately, we find that the average
cost per day is
300 0.066801 + . . . + 1800 0.000216 +
2100 0.000049 + 2400 0.000010 + . . . = 51.2315
so that the average cost per year is
51.2315 310 = 15881.77
Example 3
A manufacturer of tent canvas knows from experience that, on average, there is one flaw
within every 10m2 of material produced. He receives an order from the army for 100m2 of
canvas, to be delivered in five 20m2 rolls. Each roll is worth 100 if it is flawless, 80
if it contains between one and three flaws inclusive, and will be rejected by the army if it
contains greater than three flaws.
Calculate the mean number of rolls of canvas which are rejected and hence the expected
cost of the order to the army.
Solution
Let X denote a random variable describing the number of flaws which occur within one
20m2 roll of canvas. Under the assumption that the flaws will occur at random intervals
within the material, X has a Poisson distribution. The average number of flaws which are
expected to occur in 10m2 of canvas is 1, so that the average number of flaws within 20m2
of canvas is m = 2. Hence
X P o(2)
Therefore
P (X = 0) =

20 e2
= 0.1353
0!
44

21 e2
= 0.2707
1!
22 e2
= 0.2707
P (X = 2) =
2!
23 e2
P (X = 3) =
= 0.1804
3!
P (X = 1) =

Hence the probability that a 20m2 roll contains between one and three flaws is given by
P (1 X 3) = P (X = 1) + P (X = 2) + P (X = 3) = 2(0.2707) + 0.1804 = 0.7218
Finally, the probability that a 20m2 roll contains more than three flaws is therefore
P (X > 3) = 1 P (X 3) = 1 (0.1353 + 0.7218) = 0.1429
Hence we have prices and costs of
100
80
rejected

0.1353
0.7218
0.1429

Let k denote the total number of 20m2 rolls which, on average, the manufacturer must
produce in order to satisfy the contract, where k will include those rolls which will be
rejected. Then
(0.1353 + 0.7218) k = 5
giving

5
= 5.8336
0.8571
Hence, on average, 5.8336 rolls of canvas will need to be produced and 0.8336 rolls will be
rejected by the army.
k=

Thus the expected cost of the contract to the army is


(0.1353 5.8336 100) + (0.7218 5.8336 80) = 415.78
Example 4
Experience has shown that 99 % of air-burst HE fragmentation rounds will detonate correctly, the remainder detonating early, and possibly hazarding our own troops advancing in
the open. For a particular engagement, 500 air-burst fragmentation rounds will be fired in
support of an infantry dismounted attack.
Calculate the probability that,
1. all 500 rounds will detonate correctly.
2. greater than 495 rounds will detonate correctly.

45

Solution
The probability that an individual round will detonate correctly is 0.99, hence the probability
that an individual round will detonate early, i.e. will fail to detonate correctly, is (10.99) =
0.01.
The number of rounds fired or, equivalently, the number of independent trials is n = 500.
Let X denote the number of rounds which detonate correctly, then X has a binomial distribution with p = 0.99, q = 0.01 and n = 500. Hence X Bi(500, 0.99).
Alternatively, let Y denote the number of rounds which detonate incorrectly (early). Then
Y also has a binomial distribution with p = 0.01, q = 0.99 and n = 500. Since in this case,
p is small (p < 0.1) and n is large (n > 10) then a Poisson approximation to the binomial
is appropriate with Poisson mean
m = n p = 500 0.01 = 5
that is, on average 5 of the 500 rounds will detonate early. Hence Y Bi(500, 0.01) can be
approximated by Y P o(5).
Here we give both answers:
1. The probability that all 500 rounds will detonate correctly is clearly the same as the
probability that none of the 500 rounds will detonate early. Thus,
P (Y = 0) = 0.99100 = 0.006570
50 e5
= e5 = 0.006738
Poisson: P (Y = 0) =
0!

Binomial:

2. The probability that greater than 495 rounds will detonate correctly is the probability
that fewer than 5 rounds will detonate early:
P (Y < 5) = P (Y = 0) + P (Y = 1) + P (Y = 2) + P (Y = 3) + P (Y = 4)
Hence, using the Binomial
P (Y = 1) = 500 0.01 0.99499 = 0.033184
500 499
P (Y = 2) =
0.012 0.99498 = 0.083631
2
500 499 498
P (Y = 3) =
0.013 0.99497 = 0.140230
32
P (Y = 4) =

500 499 498 497


0.014 0.99496 = 0.175995
432

and using the Poisson


P (Y = 1) =

51 e5
= 0.033690
1!
46

52 e5
= 0.084224
2!
53 e5
= 0.140374
P (Y = 3) =
3!
54 e5
P (Y = 4) =
= 0.175467
4!

P (Y = 2) =

So, the probability that fewer than 5 rounds will detonate early is
Binomial: P (Y < 5) = 0.439611
Poisson: P (Y < 5) = 0.440493
For larger n the results will be even closer.

47

6
6.1

The Normal Distribution

Application

Binomial and Poisson are discrete distributions - they can only take certain (whole number)
values. They are used to model and describe counts. Many distributions are continuous,
used to model and describe measurements. The most important is the Normal or Gaussian
distribution.
It can be shown mathematically that when a considerable number of independent factors
(more often than not unknown) contribute in either direction to the error (or deviation from
the mean) then a normal distribution will result.
In other words measurements can be well-described (modelled) by the normal distribution if
they can be regarded as a sum of the effects of many independent factors. This is a common
real life situation.
Example
Heights (of people of the same age, sex and nationality) can be modelled by a normal
distribution because an individuals height can be thought of as the sum of very many
genetic and environmental factors (they are not all independent, but there are so many of
them that this has little effect on the validity of the model).
This result, the Central Limit Theorem, in effect makes statistics practicable by allowing
the normal distribution to be used in a huge number of different areas.
A corollary of this is that, even if individual measurements are not drawn from a normal
distribution, a sum or a mean of a large number of them will follow a normal distribution.
Hence tests and confidence limits (see section 1.6) for means can use the normal distribution
in many situations.

6.2

The Form of the Normal Distribution

A continuous measurement can take infinitely many values, so we cant define P (X = r)


as in the discrete case.
Instead we define the relative frequency density or probability density function (p.d.f.)
often denoted f (x). This is like a smoothed histogram for an infinitely large sample.
In addition the area under the curve f (x) is defined to be unity, so we can view a p.d.f.
as one unit of probability paint spread over the range of values. However, we omit the
formula for the normal p.d.f. since we never actually need it.
The normal distribution has 2 parameters
= mean
2 = variance
(hence = standard deviation).
48

( 0)

10

-10

-5

mean=-2, variance=4

mean=2, variance=2.25

-5

10

10

0.0 0.1 0.2 0.3 0.4 0.5

f(x)
-10

0.0 0.1 0.2 0.3 0.4 0.5

f(x)
-5

0.0 0.1 0.2 0.3 0.4 0.5

-10

f(x)

mean=0, variance=9

0.0 0.1 0.2 0.3 0.4 0.5

f(x)

mean=0, variance=1

-10

-5

10

If the random variable X follows a normal distribution with mean and variance 2 then
we write
X N(, 2 )

The normal p.d.f. is the so-called bell curve. Note that it extends from to , although
the probabilities are very low beyond 3.
The peak (i.e. the mode) is at the mean where x = , and the distribution is symmetric
about the mean so that this is also the median.

6.3

Calculating Probabilities from the Normal

We often require the probability that an observation from the normal random variable X
lies between two specified values a1 , a2 , assuming that the parameters , 2 are known. This
is just the area under the curve between a1 and a2 , since the total area under the curve is
unity.
Hence the area under the curve between a1 and a2 gives
P (a1 X a2 )
There is no closed form solution to this so that we need to use statistical tables.
49

0.0

0.1

0.2

f(x)

0.3

0.4

0.5

Area under curve gives P(-1<X<2) for N(0,1)

-4

-3

-2

-1

Tables give answers only for the standard normal with mean zero and variance 1. A
standard normal random variable is often denoted Z, so that Z N(0, 1).
Probabilities for the standard normal are given in section 15.1 table 1. These give Q(z),
which is the area between 0 and z, in terms of z. Hence this gives the probability that an
observation from a normal distribution with mean 0 and variance 1 lies between 0 and z.
Note: We are using the standard notation that capital letters X, Z represent random
variables while small case x, z, a1 , a2 represent numbers. Often an observation from X is
called x, an observation from Z is called z, and so on.

50

Probabilities for the standard normal distribution


The tables give
Q(z) = P (0 Z z)
for the random variable
Z N(0, 1)
and the number
z0
If the probability required is not of the form P (0 Z z) for positive z then we use
symmetry and the fact that the total probability is 1. Symmetry means that the area
between z and 0 is also Q(z), while the total probability means that the area to the right
of z must be 0.5 Q(z).
Hence
P (z Z 0) =
=
P (Z z) =
=
P (Z z) =

P (0 Z z)
Q(z)
0.5 P (0 Z z)
0.5 Q(z)
P (Z z) = 0.5 Q(z)

It is easily seen from a diagram that to obtain the probability of the variable lying between
two values of z, the relevant values of Q should be added if the values of z straddle the
mean 0, or subtracted if they fall on the same side.
Examples
If Z N(0, 1) then
P (Z 1.5) = 0.5 0.4332 = 0.0668
and from this we can immediately say that
P (Z 1.5) = 1 0.0668 = 0.9332
= 0.5 + 0.4332 = 0.9332

or

By symmetry we can also say that


P (Z 1.5) = P (Z 1.5) = 0.0668
P (Z 1.5) = P (Z 1.5) = 0.9332
Hence other examples are
P (Z 0.89)
P (Z 2.43)
P (Z 0.1)
P (Z 1.77)

=
=
=
=

0.5 + 0.3133 = 0.8133


0.5 0.4925 = 0.0075
0.5 + 0.0398 = 0.5398
0.5 0.4616 = 0.0384
51

0.1

0.2

f(z)

0.3

0.4

0.5

Standard normal: areas under curve

Q(z)

Q(z)

0.5-Q(z)

0.0

0.5-Q(z)

-4

-2

Probabilities of being within an interval work similarly


P (0 Z 1.5)
P (1.5 Z 1.5)
P (0.5 Z 1.5)
P (1.5 Z 0.5)
P (0.5 Z 1.5)
P (1.5 Z 0.5)

=
=
=
=
=
=

0.4332
0.4332 + 0.4332 = 0.8664
0.1915 + 0.4332 = 0.6247
0.4332 + 0.1915 = 0.6247
0.4332 0.1915 = 0.2417
0.4332 0.1915 = 0.2417

Probabilities for any normal distribution


If
X N(, 2 )
then

X
N(0, 1)

Hence, first we convert a question about X into one about Z by making the above substitution.
Z=

52

If X N(, 2 ) and x then


X
x

P ( X x) = P


x
= P 0Z
where Z N(0, 1)



x
= Q

x
= Q(z)
for z =

Hence any question about a value x from a N(, 2 ) distribution can be converted to one
about z = x
from a N(0, 1) distribution. Note that z is then the number of standard

deviations that x lies away from its mean.


Example
If X N(20, 52 ) then
P (X 22) = P

22 20
X 20

5
5

22 20
= P Z
5


= P (Z 0.4)
= 0.5 0.1554
= 0.3446
P (13.2 X 21.5) = P

X 20
21.5 20
13.2 20

5
5
5

= P (1.36 Z 0.3)
= 0.4131 + 0.1179
= 0.5310
Two Important Cases
From tables:

so

Q(1.960) = 0.475
2 Q(1.960) = 0.95

Hence 95% of the total probability is within 1.96 standard deviations of the mean.
Similarly
Q(1.645) = 0.45
Hence 95% of the total probability is below 1.645 standard deviations above the mean.

53

0.2

f(z)

0.3

0.4

0.5

Standard normal

0.1

95%

0.0

2.5%
-4

-2

2.5%
0

0.2

f(z)

0.3

0.4

0.5

Standard normal

0.1

95%

0.0

5%

-4

-2

54

6.4

Sums and Differences of Normal Random Variables

Some results needed later on.


(1) If the random variables X1 and X2 are independent with
X1 N(1 , 12 )
X2 N(2 , 22 )
then
X1 + X2 N(1 + 2 , 12 + 22 )
X1 X2 N(1 2 , 12 + 22 )
Note that in both cases the variances are added.
(2) Similar results apply to sums and differences of any number of independent normal
random variables.
In particular, if X1 , . . . , Xn are all independently N(, 2 ) then
X1 + . . . + Xn N( + . . . + , 2 + . . . + 2 )
i.e.
N(n , n 2 )
(3) A useful result is that if
Y N(m, v)
then for any known constant c
cY N(cm, c2 v)
(4) Hence, using both (2) and (3)
= 1 (X1 + . . . + Xn )
X
n

1
1
2
n , 2 n
N
n
n


1 2
N ,
n

55

6.5

Distribution of the Sample Mean

If we have a sample of n independent observations from the N(, 2 ) distribution, then the

sample mean x is also an observation from a random variable, denoted X.


From section 6.4 point (4):
N , 1 2
X
n


is normally distributed with:


so that the sample mean X
mean

2
n

variance
is therefore
The standard deviation of X
s

=
n
n

which is often called the standard error of the mean.


By the Central Limit Theorem (section 6.1), if n is large ( 30, say) then the above
result is approximately true even if the individual observations are not from a normal distribution.
Example 1
Suppose we have 50 observations from a distribution with mean 3.5 and variance 20. If the
underlying distribution is normal then
N(3.5, 0.4)
X
Even if the underlying distribution is not normal, this is still approximately true
N(3.5, 0.4)
X
Example 2
Suppose that the time for a customer in a queue to be served is normally distributed with
mean ten minutes and standard deviation two minutes, so that X N(10, 22). Hence for
example
10.2 10
P (X > 10.2) = P Z >
= P (Z > 0.1) = 0.5 0.0398 = 0.4602
2


If there are 40 customers then


2
N 10, 2
X
40

= N(10, 0.1)

so that
10
> 10.2) = P Z > 10.2

P (X
0.1

= P (Z > 0.63) = 0.5 0.2357 = 0.2643


56

Hence the chance of an individual customer taking longer than 10.2 minutes is 46%, but the
chance of the mean time of all 40 customers exceeding 10.2 minutes is only 26.4%.
Furthermore, if we cant be sure that the distribution of individual times really follows a
normal distribution then the probability for an individual customer will almost certainly be
wrong, but the probability for the mean will still be approximately correct.
Example 3
Here we demonstrate (pictures overleaf) the above formula using simulated data from a
normal and a non-normal distribution.
In each case the first histogram shows 1000 observations from the given distribution, so that
the histogram is approximately the same shape as the p.d.f.
The subsequent histograms show what happens when you have 1000 sample means rather
than 1000 individual observations. For example, in the pictures with n = 10, 1000 samples,
each of 10 observations, have been taken. The histogram is then formed from these 1000
sample means.
Hence when in practice we have a single sample of 10 observations, this is in effect a single
value from the histogram shown for n = 10.
Reminder: A sample is a set of observations, not a single observation.
Summary
1. If we have n independent observations x1 . . . xn from a normally distributed population,
then
x1 . . . xn are n values from X N(, 2 )
If we calculate the observed sample mean x then
2

N ,
x is one value from X
n

2. If we similarly have n independent observations from a population whose distributional


form is unknown, then
x1 . . . xn are n values from X mean , variance 2
If we calculate the observed sample mean x then, provided n is large (say n 30)
2
N ,
x is one value from X
n

where here means is approximately distributed as.

57

for n = 1, 2, 5, 10, where X N(100, 10).


The histograms illustrate the distribution of X,
Each picture is a histogram of 1000 (simulated) observed values of x.
Note how the variance of the sample mean decreases as the sample size increases.

58

100 150 200


50
0
90

95

100

105

110

90

95

100

105

110

105

110

50
0

50

100 150 200

n=2

100 150 200

n=1

59

50

100 150 200

Normal distribution: mean=100, variance=10

90

95

100

105

110

90

95

100

for n = 1, 3, 10, 30, where X has an expoThe histograms illustrate the distribution of X,
nential distribution with mean 5 (and variance 25). Each picture is a histogram of 1000
(simulated) observed values of x.
is clearly
As before, the spread of values, and hence the variance of the sample mean X,
lower for larger n. The histograms of the sample mean also become more symmetrical as n
does not just become symmetrical as n increases,
increases. In fact, the distribution of X
it becomes normal.

60

0
0

10

15

20

25

30

10

15

20

25

30

20

25

30

n=3

60

100

50 100 150 200

140

n=1

0 20

61

20

40

60

80

20 40 60 80 100

Exponential Distribution: mean=5, variance=25

10

15

20

25

30

10

15

6.6

Normal Approximations

Binomial
If the number n of trials in the binomial is large then the calculation of the probability of at
least r successes may involve the summation of a large number of terms, and the factorial
terms can be very cumbersome.
However, as n becomes large, the binomial distribution tends towards the normal, unless
the value of p is very close to 0 or 1, and a much quicker (though approximate) answer can
be obtained from tables of the normal distribution.
If X Bi(n, p) then, from section 4.4,
= np
2 = np(1 p)
For n large and p not too large or small, then a good approximation is
X N(np, np(1 p))
Rule of thumb: the approximation is reasonable if
n
1
< p <
n+1
n+1
np(1 p) 10
Note the difference between this and the Poisson approximation to the binomial. The normal
is a good approximation when n is large and p is neither large nor small, while the Poisson
is a good approximation when n is large and p is small.
Poisson
If X P o(m) then, from section 5.4,
= m
2 = m
For large m, a good approximation is
X N(m, m)
Rule of thumb: OK if m > 40.
Neither of these approximations should be used outside the 3 limits.

62

Example
An unbiased coin is tossed 100 times. Find the probability of getting 60 or more heads.
The number of heads observed will follow a binomial distribution with n = 100 and p =
(and q = 12 ).
X Bi(100, 0.5)

1
2

and we require
P (X 60).
Using normal approximation:
= np
= 100 0.5 = 50
2
= np(1 p) = 50 0.5 = 25
(NB: clearly satisfies rules of thumb)
so
X N(50, 25)
However we are converting discrete to continuous. Hence probability piled up at x = 60 is
being spread out to the range 59.5 - 60.5. Hence we want to find P (X 59.5) in order to
include all of this probability. This is a continuity correction. Hence
P (X 60)
= P (X 59.5)
=
=
=
=
=

(binomial)
(normal)

X 50
59.5 50
P

5
5
P (Z 1.9)
0.5 Q(1.9)
0.5 0.4713
0.0287


NB: exact answer is 0.02845.

63

6.7

Further Examples

Example 1
It is known from past experience that the lifetime of a front tyre on a staff car follows a
normal distribution with a mean lifetime of 25000 miles and a standard deviation of 3000
miles, while the lifetime of a rear tyre follows a normal distribution with a mean lifetime of
32000 miles and a standard deviation of 4000 miles.
Calculate the probability that a staff car, selected at random, is still running on the original
tyres after 30000 miles of use.
Solution
Let the random variable X denote the lifetime of a front tyre, then
X N(25000, 30002)
Similarly, let the random variable Y denote the lifetime of a rear tyre, so that
Y N(32000, 40002)
For a front tyre,
30000 25000
X 25000
>
3000
3000
P (Z > 1.667)
where Z N(0, 1)
0.5 Q(1.667)
0.5 0.4525
0.0475

P (X > 30000) = P
=
=
=
=

and, for a rear tyre,


Y 32000
30000 32000
>
4000
4000
P (Z > 0.500)
where Z N(0, 1)
0.5 + Q(0.500)
0.5 + 0.1915
0.6915

P (Y > 30000) = P
=
=
=
=

Hence, assuming that the tyres lifetimes are independent, the probability that the car is
still running on the original tyres after 30000 miles use is the probability that two front
tyres and two rear tyres survive 30000 miles use.
(0.0475)2 (0.6915)2 = 0.001079
However, the independence assumption seems rather dubious (given that the tyres will all
have been on exactly the same roads), so this is only approximate.

64

Example 2
A mortar is deployed against a long convoy which is travelling along a road of width 20
metres, perpendicular to the line of fire and at a range of 3 km (measured to the centre of
the road).
At this range, it is known that the standard deviation of the fall of shot in range is 40
metres; the fall of shot having a normal distribution with a mean point of impact of 3000
metres.
1. Calculate the probability of a hit on the road by the first round fired, assuming that
the mortar has been bedded-in and adjusted.
2. The first round is observed to land 50 metres short of the road centre and the mortar
crew adjust the range setting to add 50 metres. If the initial setting was correct,
calculate the probability of:
(a) the first round falling short of the road centre by 50 metres or more
(b) the second round hitting the road after the adjustment has been made.
Solution
1. Let the random variable X1 denote the point of impact of the first round fired from the
mortar, then X1 N(3000, 402).
X1 3000
3010 3000
2990 3000

40
40
40
P (0.25 Z 0.25)
where Z N(0, 1)
2 Q(0.25)
by symmetry of N(0, 1)
2 0.0987
0.1974

P (2990 X1 3010) = P
=
=
=
=

2 (a). Defining the random variable X1 as in (1),


P (X1 2950) =
=
=
=
=

2950 3000
X1 3000

P
40
40
P (Z 1.25)
where Z N(0, 1)
0.5 Q(1.25)
by symmetry of N(0, 1)
0.5 0.3944
0.1056


(b). Let the random variable X2 denote the range of the second round fired after the
adjustment has been made, then X2 N(3050, 402 ) assuming that the standard deviation
remains unchanged.
Then,
P (2990 X2 3010) = P

X2 3050
3010 3050
2990 3050

40
40
40
65

=
=
=
=

P (1.5 Z 1.0)
where Z N(0, 1)
Q(1.5) Q(1.0)
by symmetry of N(0, 1)
0.4332 0.3413
0.0919

Comment:
This demonstrates the perils of trying to correct for something without being sure that there
is anything wrong. By incorrectly concluding that the point of aim was wrong, even though
the observed fall of shot was not particularly unlikely, the chance of hitting the road has
been halved.

66

7
7.1

Sampling and Sampling Distributions

Introductory Example

It is believed (hypothesized) that the fall of shot from a gun for a particular setup is normally
distributed about a mean range of 10000m with a standard deviation of 100m, so that the
fall of shot is described by the random variable X where
X N(10000, 1002)
We wish to test this belief (hypothesis) by firing 1 round. The result is 9750m.
Now if our hypothesis is correct then the chance of the shell falling this far short, or even
further short, is
P (X 9750) =
=
=
=

9750 10000
P Z
100
P (Z 2.5)
0.5 Q(2.5)
0.0062


So if our belief were correct then the chance of falling at least this far short is 0.62%. This
suggests that our belief is probably wrong!
It could be wrong in any or all of 3 ways:
1. The mean is not 10000m
2. The standard deviation is not 100m
3. The distribution is not normal
This used a sample of size 1 to test a hypothesis about a population (i.e. fall of shot of all
rounds from the gun).
Clearly it is better to fire several rounds! We could then compare the sample mean x to
the hypothesized population mean, often denoted 0 (in this case 10000m). This is common
sense but there are also good theoretical reasons for taking as large a sample as possible, as
we shall discuss below.
Note the important distinction between , the unknown true population mean, and 0 , the
hypothesized population mean.

67

7.2

Some Definitions

1. Population
Totality of all possible readings when a quantity is repeatedly measured or counted.
E.g. weights of all possible shells from a production line.
2. Sample
A set of measurements taken from this population. Often denoted x1 , . . . , xn .
E.g. the weights of 20 shells.
3. Random sample
A sample where each individual measurement in the population is equally likely to be
picked, so that the sample is a fair representation of the population.
E.g. dont pick 20 successive shells, pick from whole days production.
4. Population parameter
A fixed (but usually unknown) numerical characteristic of a population.
E.g. , 2; m; n, p.
E.g. true (overall) mean weight of shells; true variance of weight of shells.
5. Sample statistic
A numerical characteristic of a sample, often used to estimate the corresponding population parameter. These vary from sample to sample, and so are (observations from)
random variables.
E.g. mean x and variance S 2 of the sample of 20 shells.
6. Sampling distribution
The distribution of a sample statistic.
E.g. distributions of the mean and the variance of samples of 20 shells.
7. Mean of a population and expected value of a random variable
The mean of a random variable X is also called its expected value and denoted E(X).
Hence when we use a random variable to describe a population, this is the population
mean.
E.g. if X N(, 2 ) then E(X) = .
8. Variance of a random variable and a population
Similarly, the variance of a random variable, and hence of the population it is describing, is denoted V ar(X).
E.g. if X N(, 2 ) then V ar(X) = 2 .

7.3

Distribution of the Sample Mean

Previously we used a sample of size 1 to test a hypothesis. Now we consider a sample of


the sample mean, assuming the
size n. We find the distribution of the random variable X,
hypothesis is correct. Then we compare the observed sample mean x to this distribution.
The (sampling) distribution of the sample mean is the relative frequency distribution of an
infinitely large number of sample means drawn from the same population. From section
68

6.4 we have that, for n observations from a normally distributed population with mean ,
is
variance 2 , the distribution of the sample mean X
2

N ,
X
n

so that, equivalently,

X
N(0, 1)
/ n
In addition, thanks to the Central Limit Theorem (section 6.1), even if the observations
are not from a normal distribution, the sample mean is (approximately) normally distributed
for large n (say, n 30).
Example
Having identified the probability distribution of sample means, we can now return to our
original experimental sample and see where its mean, x, lies in relation to this distribution.
If, in the example, we observe x = 9950m from n = 25 observations, then if the hypothesis
is true, 9950m is an observation from
1002

X N 10000,
25

= N(10000, 202)

Hence the chance of the sample mean falling at least this far short is
9950) = P Z 9950 10000
P (X
20
= P (Z 2.5)
= 0.0062 (again)


Hence a sample mean of 9950m or less is just as (un)likely as a single observation of 9750m
or less.
Why large samples?
The properties of the distribution of sample means listed above give us several very good
reasons for trying to choose as large a sample size as possible. These are:
1. Larger samples are more likely to reflect accurately the population. Unusual observations are averaged out so that the sample mean x will tend to be closer to the true
2
= , this means X
varies less about as n gets
population mean . Since Var(X)
n
larger.
2. Hence also if the true population mean is not what we think, this is easier to spot
is small.
if n is large and Var(X)
gets more normal-like as n increases. Hence
3. For non-normal data the distribution of X
we can use normal tables.

69

4. Just as the sample mean x is likely to be closer to the true population mean if n
is large, similarly the sample variance S 2 is likely to be closer to the true population
variance 2 if n is large.
Important note: Up until section 10.1 we will assume
1. Either the distribution from which the observations are drawn is known to be normal,
is (approximately)
or the sample size is large (n 30) so that the distribution of X
normal anyway.
2. Either the variance 2 is known, or the sample size is large (n 30) so that S 2 is
sufficiently close to 2 that we need not worry about the difference.
We relax these assumptions from section 10.1 onwards.

7.4

Example: Process Control

This is an example of cases where statistical testing is vital in order to avoid making changes
when they may be, not just unnecessary, but in fact counterproductive. Experience in many
industries has shown that, if we modify a process without strong evidence that something
is wrong, we usually make things worse because we are compensating for problems which
did not actually exist.
A production line makes components which should be 10.40 cm long. Hence if the components being produced seem to have a mean length which is not 10.40 cm then this needs to
be detected, so the machines can be reset to correct this.
A random sample of n = 30 components is taken from a weeks production, giving a sample
mean of x = 10.51 cm and sample standard deviation S = 0.64 cm. Is the production line
in control, i.e. is it making components of the right mean length?
If the production line is in control then the true mean is 10.40, so our initial hypothesis
is that = 10.40. If this hypothesis is correct then we have 30 observations from X
N(10.40, 2 ), so that the sample mean x = 10.51 is a single value drawn from the random
variable
2
N 10.40 ,
X
30

This assumes that the individual observations come from a normal distribution, which is
will be at least approximately normal.
believable, but the sample size is 30 so that X
2
Similarly, we dont know the value of and hence have to estimate it with S 2 = 0.642 ,
which is reasonable again because n 30. Hence if the production line is in control then
x = 10.51 is a single observation from
2
N 10.40 , 0.64
X
30

70

and so, equivalently,


zobs =

10.51 10.40
0.64

30

= 0.94

is an observation from (approximately) Z N(0, 1). Now,


P (Z > zobs ) = P (Z > 0.94) = 0.5 0.3264 = 0.1736
Hence, if the true mean length really is 10.40 cm then the chance of an observed sample
mean length of 10.51 cm or more is 17.36%. This is quite likely, suggesting that, even if the
production process is in control, the observed excess of 0.11 cm over the target length is not
too surprising.
a) Hence, although it might seem at first that the components are oversized, there is in fact
no evidence that this is the case, and the observed excess of 0.11 cm could easily be due to
chance variation. If the value of 0.11 cm had been taken at face value then the machines
would have been reset to make slightly shorter components, thus correcting for a problem
which probably did not exist, and hence in fact causing a problem.
b) It can be argued that, since the machines need resetting if they are making components
which are either too large or too small, then a sample mean length of 0.11 cm less than
the target value of 10.40 cm is in a sense equivalent to the observed value of 0.11 cm more.
Hence our measure of how surprising the results are should be
P (Z > 0.94) + P (Z < 0.94) = 2 0.1736 = 0.3472
Hence the case for the process being out of control looks even less convincing.
c) Although there is no evidence that the mean length is wrong, the standard deviation is
rather large, suggesting that it is in fact the variability in component length which needs
correcting. We return to this in section 11.1.

7.5

Two Sample Case

Often we wish to compare two samples, to see if it is reasonable to believe that they came
from populations with the same mean.
For example, we may have a number, nA , of measurements of fuel consumption for scout
cars of type A and a number, nB , of measurements of scout cars of type B. We would like to
have some idea if, overall, the two types of scout car have similar fuel consumptions. To do
this we must consider the difference of the two sample means and determine the probability
of getting a difference as large or larger than this observed difference, if the samples both
come from populations with the same mean values.
To compare two sample means, in order to decide whether the respective population means
A X
B . If the true population means are A and
are equal, we need the distribution of X
2
B and the true population variances A and B2 , then we can use the results in section 6.4
A X
B has the properties:
to show that the distribution of X
The mean is

A X
B ) = A B
E(X
71

The variance is

B2
A2

+
Var(XA XB ) =
nA nB
so that the standard deviation is
s

Hence

A2
2
+ B
nA nB

2
2
A X
B N A B , A + B
X
nA nB

so that, equivalently,
A X
B (A B )
X
r

2
A
nA

2
B
nB

N(0, 1)

As with the one-sample case, this is exactly true if both samples are drawn from normally
distributed populations, while it is approximately true if this is not so but the sample sizes
are large, i.e. nA and nB are both at least 30.
Example
Returning to the original example, suppose we now have two guns and wish to see whether
their mean ranges are the same. Gun A gives xa = 9950m from na = 25 observations
and gun B gives xb = 10025m from nb = 20 observations. We continue to assume that
the observed ranges are independent observations from normal distributions with standard
deviation 100m.
If the true population means are the same then
2
2
A X
B N 0 , 100 + 100
X
25
20

so that
zobs =

(9950 10025) 0
q

1002
25

1002
20

75
30
= 2.5
=

is an observation from N(0, 1). From tables


P (Z 2.5) = 0.0062
So if the mean ranges are really the same, the chance of getting what we did (or something
even less likely) is only 0.62%. This is very unlikely, suggesting that in fact the means are
not the same. That for gun B appears to be longer.

72

7.6

Further Examples

Example
Extensive firings of a certain type of rocket have established a (population) mean range of
2150m. A sample of 40 rockets fired after a years storage gave a sample mean range of
2084m with a sample standard deviation of 160m. We wish to see if storage has decreased
the mean range.
The hypothesis is that
= 2150m
and we have observed
x = 2084m
S = 160m
from n = 40 observations.
where
If the hypothesis is true then the observed x = 2084m is an observation from X,
= 2150m
E(X)
2
160
= 160 std. dev. =
= 25.3m
Var(X)
40
40
i.e. if the hypothesis is true then x = 2084 is an observation from
N(2150, 25.32)
X
Hence
zobs =

2084 2150
= 2.61
25.3

is an observation from Z N(0, 1).


Note that the above is only approximately true. We are using the fact that n 30 to justify
is approximately normally distributed and that the unknown population
assuming that X
variance can be estimated reasonably well by the sample variance S 2 = 1602.
From tables
P (Z 2.61) = 0.5 Q(2.61)
= 0.0045
So if the range is really 2150m, the chance of getting what we did (or something even less
likely) is only 0.45%.
This is such an unlikely event that we would take this as evidence that the mean range
of rockets has decreased and is no longer 2150m. We might be tempted to go further and
suggest this change in range is due to storage but of course it could also be due to other
factors.

73

8
8.1

Significance Tests

Purpose

Single sample case


Significance tests are used to decide whether it is reasonable to believe that a sample could
have come from a population with a specified value of a particular parameter (e.g. the
mean). The verdict is based upon the nearness, or otherwise, of the observed value of the
estimator of the parameter to the specified parameter value.
Two sample case
Similarly, tests are used to decide whether it is reasonable to believe that two samples come
from populations with the same value of a particular parameter (e.g. the mean).
This section covers significance tests for the means of populations when those populations
are normally distributed with known variance. In such cases the tests are exactly correct.
However, even if the population distribution and/or variance is not known these tests are
approximately correct if the sample size is large, thanks to the Central Limit Theorem
(section 7.3).
The methodology developed here can be carried over to small samples, and to an examination of other parameters besides the mean, simply by considering the relevant sampling
distribution (see section 10.1 onwards).
In the examples given in the previous section we have already used samples to form verdicts about the populations from which they were taken, i.e. we have already performed
significance tests. The stages of using a significance test to reach a verdict can be stated
explicitly as below.

8.2

Method

This is described in terms of the one-sample case, but the same ideas apply to two samples.
1. Question: Does the sample come from a population with a specified parameter value?
The value is usually given a 0 subscript, e.g. 0 .
2. Assume that it does. This is the null hypothesis, H0 .
3. Use data to estimate the parameter. This observed value is an estimate. The estimate
is then regarded as an observation from a random variable, the estimator.
4. Use H0 to find the sampling distribution of the estimator. Hence find the chance of
observing what we did, or something even more extreme. This probability is often
called the p-value.
5. If this is small, either
(a) our sample was very unusual (atypical of its population)
or
74

(b) the null hypothesis is false.


6. The significance level is the probability, specified beforehand, such that we reject
H0 if the p-value falls below it. This is also called the size of the test, and denoted
. It is usually taken to be 0.05 (5%) or 0.01 (1%).

8.3

Two Tailed and One Tailed Tests

The null hypothesis H0 must be something which it makes sense to believe in the absence
of evidence to the contrary. This is usually something like no change or no difference,
and can be thought of as a presumption of innocent until proven guilty.
The alternative hypothesis H1 is what we believe if we reject H0 . Normally this is just
the negation of H0 , there is a change/difference but sometimes we may wish to specify the
direction of change.
The most common case is to test
H0 : No change v H1 : Change
Hence one-sample tests are often of the form
H0 : = 0
versus H1 : 6= 0
where 0 is the hypothesized mean. We reject H0 if x is much larger or much smaller than
0 , in other words if zobs is either large positive or large negative.
This is a two-tailed test.
Exactly the same applies to two-sample tests, except that the hypotheses are
H0 : A = B
versus H1 : A 6= B
and we compare xA xB to zero.
However, tests can also be of the form
H0 : No improvement
or H0 : No deterioration

v H1 : Improvement
v H1 : Deterioration

This leads to hypotheses of the form


H0 : 0
versus H1 : > 0

(or A B )
(or A > B )

In this case, only reject H0 if the observed value x is larger than 0 (or xA xB is larger
than zero), i.e. if zobs is large positive. Note that, therefore, if zobs is negative then we need
go no further since H0 will definitely not be rejected.
75

This is a one-tailed test. Note that, for the purposes of executing the test, if we have
H1 : > 0 then a null hypothesis of H0 : 0 is in effect the same as one of H0 : = 0 .
Similarly, for
H0 : 0
versus H1 : < 0

(or A B )
(or A < B )

we only reject H0 if the observed value x is smaller than 0 (or xA xB is smaller than
zero), i.e. if zobs is large negative. Therefore, if zobs is positive then we need go no further
since H0 will definitely not be rejected.
In the one-tailed case with H1 : > 0 the p-value is just the probability of obtaining a
value greater than or equal to zobs from a standard normal random variable Z.
In the one-tailed case with H1 : < 0 it is similarly the probability of obtaining a value
less than or equal to zobs from a standard normal random variable Z.
However, in the two-tailed case we must allow for the fact that if we obtain a positive
zobs then the probability of obtaining the value we did or something even more extreme
includes both values greater than zobs and values less than zobs . Hence, by symmetry, the
p-value is twice the probability of obtaining a value greater than or equal to zobs from a
standard normal random variable Z. (Similarly if zobs is negative it is twice the probability
of obtaining a value less than or equal to zobs ). This was done in the example in section 7.4.
The initial example in sections 7.1 and 7.3 is two-tailed since we would have rejected the null
hypothesis if the apparent range had been greater than 1000m as well as less than 1000m.
Hence the p-value should be doubled from 0.0062 to 0.0124, though this does not change
our conclusions as it is still very small. In the example in section 7.6 we would only reject
H0 if the mean range seemed to be smaller than the hypothesised value of 2150m, making
this a one-tailed test. Hence the p-value need not be doubled.
Important Note: The hypotheses, and hence whether the test is one or two tailed, are a
function of the question, not the data. Ideally, they should be framed before collecting or
seeing the data.

8.4

Critical Values

We do not usually calculate the exact p-value. If we have specified a 1% one-tailed test then
from section 15.1 table 1 we have that
P (Z > 2.33) = 0.5 0.4901 0.01
This means that if zobs > 2.33 then the p-value must be less than 0.01. Hence we will reject
H0 if and only if zobs exceeds 2.33, so that 2.33 is the critical value for a 1% one-tailed
test, often denoted zcrit .
Similarly
P (Z > 1.96) = 0.5 0.4750 = 0.025
= P (Z > 1.96) + P (Z < 1.96) = 0.05
76

Hence for a 5% two-tailed test we reject H0 if and only if zobs exceeds 1.96 in magnitude.
For tests of significance at the 5% or 1% level using the normal distribution, we therefore
use the critical values given in the following table (the fourth column is explained in section
9.4):
Significance
level
5%
1%

Critical Value (zcrit ) Critical Value (zcrit ) Confidence


two-tailed test
one-tailed test
level
1.96
1.64
95%
2.58
2.33
99%

Section 15.1 table 2 gives critical values for some more significance levels.
The purist view is that we should specify our significance level beforehand and reject (or
not) H0 on the basis of that critical value alone. If we have to choose between two courses
of action, depending on whether we reject H0 or not, then this is the method to follow. We
pick according to how convincing we want the evidence against H0 to be before we will
reject it.
However, in practice people often do not specify and instead use critical values to assess
where the p-value lies, then translate this back into English, indicating the weight of evidence
against H0 .
Recall that the p-value is the probability of observing what we actually did, or something
even less compatible with H0 , on the assumption that H0 is true. A common interpretation
of a p-value of p is then something like:
p
p
p
p
p

>
<
<
<
<

8.5

0.1 : no evidence against H0


0.1 : slight evidence against H0
0.05 : fairly strong evidence against H0
0.01 : strong evidence against H0
0.001 : very strong evidence against H0

One-Sample Test of a Normal Mean (Z Test)

Observed mean x from a sample of size n. Can we assume that the true population mean
is equal to some hypothesised value 0 , or not?
H0 : = 0
versus H1 : 6= 0
(Or a similar one-tailed test).
The following procedure is valid only if either
1. Both of the following are true:
(a) The observed data are n independent observations from a normal distribution
(i.e. they come from a population that is normally distributed).
77

(b) The variance of this normal distribution, 2 , is known.


In this case the test is exactly correct.
2. At least one of the above is not true, but the sample size is large, with n 30 being
a common definition of large. In this case the test is approximately correct.
If one of the above criteria is satisfied then the procedure is:
1. Calculate the test statistic.
zobs =

x 0

/ n

Here is the true standard deviation, if known, otherwise substitute the sample
standard deviation S.
2. If H0 is true then this is an observation from N(0, 1). Hence compare the observed
value zobs to the appropriate critical value from section 15.1 table 2.
3. For a 2-tailed test, reject H0 if
|zobs | zcrit
4. For a 1-tailed test, reject H0 if
zobs zcrit
(or if zobs zcrit , depending on which way round the hypotheses are).
This is often referred to as a one-sample z test.
Example
A factory producing 7.62mm ammunition has to undergo an annual quality control test
conducted by Ordnance Board inspectors. The nominal mass of the bullet is specified to be
9.33g. A sample of 100 rounds selected at random from the production line gave a mean
bullet mass of 9.28g and a standard deviation of 0.15g.
Would the inspector conclude that the production process is producing rounds of the correct
mean weight, or not?
Let the random variable X denote the mass of a bullet drawn at random from the production
line of the factory; in this instance, the actual distribution of X is unknown.
We initially assume that the production process is working properly, i.e. it is producing
rounds which have a bullet mass distributed about the specified value of 9.33g, hence we
assume that the population bullet mass is equal to 9.33g.
Hence we have
H0 : = 9.33 v H1 : 6= 9.33
The sample size n = 100 rounds is sufficiently large to assume that the sample standard
deviation S = 0.15g provides a reasonably good approximation to , the standard deviation
78

of the population. Similarly, the masses will probably be normally distributed, but with n
so large the sample mean will be approximately normally distributed anyway.
Hence if our null hypothesis is true then our observed x is an observation from a random vari
able X,
which is (at least approximately) normal with mean 9.33g and standard deviation
0.15/ 100. Therefore
zobs =

x 9.33

0.15/ 100

where zobs is an observation from Z N(0, 1)

9.28 9.33
0.015
= 3.33
=

Hence the sample mean value x = 9.28g, lies 3.33 standard deviations to the left of the
expected mean value, assuming the null hypothesis to be correct.
There are two (essentially equivalent) ways to interpret this result, either we calculate the
p-value exactly or we just find the region within which it lies.
a) From tables, the probability of obtaining a value of x as small (or smaller) than 9.28g is
therefore
9.28) = P (Z 3.33) =
P (X
=
=
=

0.5 Q(3.33)
0.5 Q(3.33) by symmetry of N(0, 1)
0.5 0.4996
0.0004

Thus, we should expect this result to occur only four times out of every ten thousand samples
of size n = 100 drawn from this population, if the population mean really is = 9.33.
However, our alternative hypothesis was two-sided so that Z 3.33 is just as unlikely, so
we double the probability to obtain a p-value of 0.0008.
Clearly, since this result is so unlikely it provides evidence to suggest that the population
mean weight of the bullets is not 9.33g, and we would therefore reject H0 . It seems that (on
average) underweight rounds are being produced.
b) More simply, we compare -3.33 to table 2a in section 15.1 and see that it exceeds (in
magnitude) even the 0.1% critical value. Hence the p-value is less than 0.001 and so, using
the translation in section 8.4, we can say that there is very strong evidence against H0
(p < 0.001). Hence there is very strong evidence that is not 9.33, and since zobs is negative
we conclude that in fact is less than 9.33.

8.6

The Meaning of Significance

If H0 is not rejected, this doesnt mean that H0 is true, it just means that our data do not
give us sufficient evidence to reject it. Hence it is always better to say do not reject H0
rather than accept H0 , even though it sounds clumsy.
79

If H0 is rejected, we often say that x is significantly different from the hypothesised value
0 .
However, note that significant is being used in a specific technical sense, meaning that
(roughly speaking) there is a significant chance that H0 is false.
Hence statistical significance is not necessarily the same as practical importance of the difference between the true and the hypothesised mean. However, we can usually rejig the
question so that it is.
Example
Past experience shows that the fuel consumption of our staff cars is 20 mpg. We are testing
an engine upgrade to see if we can improve this. The obvious hypotheses are
H0 : 20
versus H1 : > 20
However, if we have 100 observations then it is possible that a sample mean of 20.2 could
lead to a significant result. We would have evidence that the true mean of the modified
engines is greater than 20, but our best guess is that it is in fact only 20.2. Embarking on
expensive upgrades of the engine just for this might not seem very clever, especially since a
better upgrade might come along soon.
However, all we need to do is specify beforehand how much of an improvement is necessary
before we will consider switching. If we require the upgraded engines to have at least 22
mpg then we just test
H0 : 22
versus H1 : > 22

8.7

Two-Sample Test of Normal Means (Z Test)

Observed sample means xA , x


B from samples of size nA , nB . Can we assume that the two
population means are the same?
H0 : A = B
versus H1 : A 6= B

(so A B = 0)

(Or a similar one-tailed test).


The following procedure is valid only if either
1. Both of the following are true:
(a) Both observed data sets consist of independent observations from normally distributed populations.
(b) The variances of these normal distributions, A2 and B2 , are known.
In this case the test is exactly correct.
80

2. At least one of the above is not true, but the sample sizes are large, with nA 30 and
nB 30 being a common definition of large. In this case the test is approximately
correct.
1. Calculate the test statistic.

xA xB 0
zobs = r 2
2
A
+ nBB
nA

Here A and B are the true standard deviations for populations A and B, if known,
otherwise substitute the sample standard deviations SA and SB .
2. If H0 is true then this is an observation from N(0, 1). Hence compare the observed
value zobs to the appropriate critical value from section 15.1 table 2.
3. For a 2-tailed test, reject H0 if
|zobs | zcrit
4. For a 1-tailed test, reject H0 if
zobs zcrit
(or if zobs zcrit , depending on which way round the hypotheses are).
This is often referred to as a two-sample z test.
Example
Two types of body armour are being trialled by firing bullets at a given velocity and measuring the residual velocity, that is the velocity of the bullet after going through the armour.
Stronger armour will have lower mean residual velocity. The results are as follows.

Existing Armour
New Armour

Mean (m/s)
311.4
298.1

Standard Deviation Sample Size


43.6
30
37.5
40

Is there evidence at the 5% level that the new armour is stronger than the old?
We only want to know if the mean residual velocity for the new armour is lower than for
the existing armour, so the hypotheses are
H0 : N E v H1 : N < E
The sample sizes are fairly large, so we can assume approximate normality of the sample
means and estimate the population variances with the sample variances. Hence
298.1 311.4
zobs = q 2
2
37.5
+ 43.6
40
30
= 1.34

The critical value for a 5% 1-tailed test is zcrit = 1.64, so clearly |zobs | < zcrit and hence we
do not reject H0 . On the basis of these data there is no reason to conclude that the new
armour is any stronger than the existing armour.
81

8.8

Types of Error

A hypothesis test is a decision rule, and we can err in either of two ways:
1. Reject H0 when it is in fact true, a Type 1 Error. The probability () of this is
precisely the chosen significance level (eg 5%, 1%). This follows from the way the tests
are designed.
2. Accept H0 when it is false, a Type II Error. The probability () of this depends
on both H1 and the true value of .
Note that in significance tests we therefore control the Type 1 Error probability by our
choice of significance level. However, we can never know exactly what the Type II Error
probability is because it depends on the unknown true value of the parameter.
For a given sample size, decreasing will always increase and vice-versa. In other words,
if we are very concerned about incorrectly rejecting H0 and so pick a small significance level,
this makes it more likely, if H0 is really false, that we will incorrectly fail to detect this.
However, for any value of , increasing n will decrease .
A plot of the probability of a Type II Error against the possible true parameter values
is sometimes called the Operating Characteristic (OC) curve. This curve is used to display
the consequences of adopting any particular decision rule.

8.9

Further Examples

Example 1
A mathematical model has been developed to predict the range of a rifle-launched grenade,
given the initial velocity v and the angle of departure .
According to the theory, the nominal range of the grenade with parameters v = 70 m/s and
= 30o is 358.5m.
A trial was set up to compare this theoretical result with that obtained in practice. A
sample of 60 rounds were fired, which gave a sample mean range of x = 363.1m, and a
sample standard deviation S = 31.8m.
Is there evidence at the 5% significance level to suggest that the theoretical mean range and
the true mean range R are different?
Solution
null hypothesis
alternative hypothesis

H0 : R = 358.5
H1 : R =
6 358.5

We are interested in detecting a difference from the theoretical value, and not an increase
(or decrease), hence a two-tailed test of significance at the 5% level is appropriate.

82

Since the sample size n = 60 is large (> 30), we can assume that the sample standard
deviation S is a reasonable approximation to the population standard deviation . Also,
since n is large, it is appropriate to assume that the distribution of sample means (of size
n = 60) drawn from this population is approximately normal, with a hypothesised mean of
0 = 358.5m, and standard deviation
31.8
S
= = 4.1054
n
60
Then
zobs =

x 0
363.1 358.5
=
= 1.12
S/ n
4.1054

where zobs is an observation from Z N(0, 1).


Hence the sample mean range obtained by the trials lies 1.12 standard deviations away from
the theoretical mean range value.
From tables, the critical value of z for a two-tailed test at a significance level of 5% is
zcrit =1.96.
Since zobs < zcrit there is insufficient evidence to reject the null hypothesis H0 , therefore we
have no evidence to contradict the theoretical range value of 358.5 metres.
Example 2
A trial is being arranged which will require two 105mm guns in similar states of wear. Two
weapons are selected and a sample of 40 rounds are fired from each weapon; the muzzle
velocities are recorded for each round fired. The sample statistics are given in the following
table:
Weapon
Gun A
Gun B

Sample Mean
Sample Standard Sample
Muzzle Velocity
Deviation
Size
700.5
22.4
40
715.7
18.3
40

Is there any evidence of a difference between the mean muzzle velocities of the two weapons,
at the 1% significance level?
Solution
null hypothesis
H0 : A B = 0
alternative hypothesis H1 : A B 6= 0
Since we are interested in detecting a difference between the mean muzzle velocities of the
two weapons, a two-tailed test of significance is appropriate.
Since both sample sizes are large, nA = nB = 40, then the sample standard deviations can
be assumed to be reasonable approximations to the population standard deviations A and
B respectively.
83

A of size
For the Gun A muzzle velocity measurements, the distribution of sample means X
nA = 40 drawn from the population can be assumed to be approximately normal with mean
A and standard deviation,
22.4
SA
= = 3.5418 m/s

nA
40
Similarly, for the Gun B muzzle velocity measurements, the distribution of sample means
B of size nB = 40 drawn from the population can be assumed to be approximately normal
X
with mean B and standard deviation,
18.3
SB
= = 2.8935 m/s

nB
40
We are testing the difference between the population means on the evidence supplied by two
A X
B ). Under the assumptions made
samples, thus the random variable of interest is (X
above, this will be at least approximately normally distributed with a mean of (A B ),
which is equal to zero under the null hypothesis. Its standard deviation is
s

SA2
S2
+ B = 3.54182 + 2.89352 = 4.5735 m/s
nA nB

Hence under H0

A X
B N(0, 4.57352 )
X

Now, the observed difference in means is


xA xB = 700.5 715.7 = 15.2m/s
Hence
zobs =

(
xA xB ) 0
r

2
SA

nA

2
SB

nB

(700.5 715.7)
= q 2
2
22.4
+ 18.3
40
40

15.2
4.5735
= 3.3235
=

Hence the difference between the sample mean muzzle velocities of the two weapons lies
3.3235 standard deviations away from the hypothesised difference, if the null hypothesis is
indeed true.
From tables, the critical value for a two-tailed test of significance at a level of 1% is zcrit
= 2.58. Since |zobs | > zcrit we therefore reject the null hypothesis H0 at the 1% level.
Hence there is strong evidence to suggest that the two weapons have different mean muzzle
velocities and are therefore unsuitable for the trial.
Example 3
The take-off distances of two aircraft A and B were recorded 50 and 70 times respectively.
The sample means and standard deviations of these measurements are summarised in the
following table:
84

Mean
Aircraft A 251.6
Aircraft B 228.1

Standard Deviation Sample Size


33.4
50
34.1
70

Is there any evidence at the 1% significance level of a difference between the take-off distances
of the two aircraft?
Solution
Here we have hypotheses
H0 : A = B v A 6= B
The sample sizes are large, so that the assumption of normality and the use of sample
variances in place of population variances will not be too inaccurate.
Hence the test statistic is
251.6 228.1
zobs = q
= 3.767
(34.1)2
(33.4)2
+
50
70
Since this is greater than the critical value of zcrit = 2.58 for 1% significance (two-tailed)
there is strong evidence to suggest a difference in the take-off distances of the two aircraft,
with the distance for A being longer.

85

9
9.1

Confidence Intervals

Introduction

Decision rules are useful for deciding between two courses of action. However, it is often
more useful to know whereabouts the true mean might lie, given the sample mean x. A
range of values where might plausibly lie is a Confidence Interval. The end points are
referred to as confidence limits.
Typically we construct a 95% Confidence Interval (CI) and say that we are 95% confident
that the true mean lies within the interval. The precise meaning of the interval is as
follows:
If we draw many samples and construct a 95% CI for each, approximately 95%
of these CIs will contain .
Similarly, if we wanted to be more confident of including the true mean, we could construct
a 99% CI, which will therefore be wider.
The procedure for constructing confidence intervals is very like a hypothesis test. Essentially,
a 95% CI is the set of possible (hypothesised) values of which would not be rejected by a
5% test.

9.2

Single Sample Case

Example
To determine a practical average fuel consumption for the Warrior APC across country, a
trial was performed using thirty vehicles each equipped with an accurate measuring device.
The fuel consumption was recorded for each of these over the same cross-country course.
The 30 data values obtained gave a sample mean fuel consumption of x = 1.97 mpg and a
sample standard deviation of S = 0.51 mpg.
Derive the 95% confidence interval for the population mean fuel consumption for Warrior
APCs (of similar age and wear as those which were employed in the trial, and in similar
conditions to the course used).

Since the sample size n = 30 is fairly large, then the distribution of sample means X
can be assumed to be normal with a mean
of , the unknown true population mean fuel
consumption, and standard deviation / n. In other words
2
N ,
X
n

The observed sample mean x = 1.97 is an observation from this distribution. As a further
consequence of taking a large sample, the sample standard deviation S provides a reasonable
approximation for the population standard deviation .
86

Therefore
Z=

N(0, 1)

and from normal tables we have zcrit = 1.96, that is


P (1.96 Z 1.96) = 0.95
Thus,

P 1.96

rearranging gives

1.96 = 0.95
!

+ 1.96 = 0.95
1.96 X
P X
n
n
but it seems natural to substitute in the
This is an expression for the random variable X,
observed sample mean x (and observed sample standard deviation S if necessary) to give
the 95% confidence limits, or 95% confidence interval:
x 1.96

S
n

= 1.97 1.96

0.51

30

= 1.7875 mpg

x + 1.96

S
n

= 1.97 + 1.96

0.51

30

= 2.1525 mpg

Hence a 95% confidence interval is (1.7875, 2.1525) mpg. We are 95% confident that the
true mean fuel consumption is between about 1.79 and 2.15 mpg.
Two-sided Intervals: General Formula
Suppose we have a sample of n observations with mean x. As in section 8.5, the formula
below is only appropriate where either the population is normally distributed with known
and that S is a good
variance, or if n is sufficiently large that we can assume normality of X
estimate of .
Then the CI for is

x zcrit x + zcrit
n
n

where
1. If is unknown then replace by the estimate S.
2. zcrit is the critical value from normal tables, so that for example zcrit = 1.96 gives a
95% CI and zcrit = 2.58 gives a 99% CI
Hence for example a 95% CI for is

x 1.96 , x + 1.96 )
x 1.96 = (
n
n
n
(usually using S for ).
87

Example
From a sample of 60 mortar rounds fired from the same setup, the mean range is found to
be 350m with standard deviation 42m. Determine the 95% confidence limits for the mean
of the population, i.e. the mean range for rounds fired with this setup.
Since n = 60 is large, the sample mean can be assumed to be at least approximately normally
distributed, and we can estimate the unknown by S = 42. Thus the approximate 95%
confidence limits are given by
S
x zcrit
n
which is
42
350 1.96 = 350 10.6275 = (339.37, 360.63)
60
Hence we can be 95% confident that the true mean range is between 339.37m and 360.63m.
One-sided Intervals: General Formula
Confidence intervals are usually two-sided as in the previous paragraph. However onesided intervals are used when interest is restricted to just the highest or just the lowest
value of the parameter being estimated. With one-sided intervals zcrit is taken from table
2b, as with a one-tailed significance test.
In general a one-sided CI takes the form
, x + zcrit

if we require an upper limit only, or


x zcrit

,
n

for a lower limit only. For example, for the 95% confidence level, we take the value zcrit =
1.64.
Example
Forty-six light bulbs of a particular make are tested and their lifetimes recorded, with the
following results:
n = 46 , x = 1070 hours , S = 245 hours
From the data determine the one-sided lower 95% confidence limit for the mean lifetime of
this type of bulb.
Since the sample is large we can use the normal distribution as usual and estimate by
S = 245. Since we require a one-sided confidence interval, the appropriate value of zcrit is
1.64.
Hence the lower confidence limit is
245
1070 1.64 = 1070 59.24 = 1010.76
46
Hence we can be 95% confident that the mean lifetime of this type of light bulb is at least
1010.76 hours.
88

9.3

Two Sample Case

Example
The results of a cross-country fuel consumption trial on the FV432 APC and the Warrior
APC are given in the following table:
APC

Sample Sample
Size
Mean
FV432
45
2.74
Warrior
30
1.97

Sample Standard
Deviation
0.62
0.51

Find the 99% confidence interval for the difference between the mean fuel consumptions of
the two vehicles cross-country.
FV432: Since the sample size nF = 45 is large, then it can be assumed that the distribution
F is
of sample means X
2

F N F , F
X
nF

Warrior: Since the sample size nW = 30 is large, then it can be assumed that the distribution
W is
of sample means X
2

W N W , W
X
nW

As the sample size is at least 30 in each case, the sample variances can be taken as reasonable
approximations to the population variances. We require a 99% confidence interval for the
difference between the mean fuel consumptions for the FV432 and Warrior, that is for
F X
W ) which, by the assumptions
F W . Hence we consider the random variable (X
made above, is distributed as
F X
W N F W
X

2
2
, F + W
nF
nW

From normal tables we have for a 99% confidence interval that zcrit = 2.58, that is
P (2.58 Z 2.58) = 0.99
where Z N(0, 1), i.e.
Z=

F X
W ) (F W )
(X
r

2
F
nF

2
W
nW

N(0, 1)

Thus,

F X
W ) (F W )

(X
r
2.58
2.58

P
= 0.99

2
2
F
W
+ nW
nF
89

rearranging gives

F X
W 2.58
P X

F2
2
F X
W + 2.58
+ W F W X
nF
nW

F2
2
+ W
nF
nW

= 0.99

F X
W , but again it seems natural to
This is an expression for the random variable X
substitute in the observed difference in sample means xF xW (and the observed sample
2
variances SF2 and SW
if necessary) to give the 99% confidence interval.
Plugging in the observed standard deviations to obtain the estimated standard deviation of
F X
W we obtain
X
s

S2
SF2
+ W
nF
nW

0.622 0.512
+
= 0.131195 mpg
45
30

Hence the 99% confidence limits are


(
xF xW ) 2.58

SF2
S2
+ W = (2.74 1.97) (2.58)(0.131195) = 0.4315 mpg
nF
nW

(
xF xW ) + 2.58

SF2
S2
+ W = (2.74 1.97) + (2.58)(0.131195) = 1.1085 mpg
nF
nW

and

Hence a 99% confidence interval for the difference in mean fuel consumption between the
two vehicles is (0.4315, 1.1085) mpg, with the FV432 having the larger mean.
General Formula
Suppose we have two samples, size nA and nB , means xA and xB . As in section 8.7, the
formula below is only appropriate where either both samples come from populations which
are normally distributed with known variances, or if both sample sizes are sufficiently large
that we can assume normality of the sample means and that the sample variances are good
estimates of the population variances.
The CI for the difference in population means A B is
xA xB zcrit

A2
2
+ B
nA nB

where A2 , B2 are the variances of the two populations from which the samples were drawn,
or if unknown their estimates SA2 and SB2 .
The modification for a one-sided interval is the same as in the one-sample case.
Example
Two sets of soldiers each train for and perform a navigation exercise. One group trained
using Virtual Reality (VR) equipment while the other used standard methods (maps, books).
The response measured was the time (in minutes) for each soldier to complete the route,
and the results were:
VR : nV = 42, xV = 23.4 mins , SV = 5.7 mins
Standard : nS = 37, xS = 25.9 mins , SS = 6.8 mins
90

It is quite likely that the individual times will not follow a normal distribution (and a
histogram would illustrate this) since it is easier to make a mistake and lose a lot of time
than it is to save a lot of time. This would lead to a skewed distribution with a much
longer tail to the right than to the left. However, the sample sizes are both sufficiently large
V and X
S should both be approximately normal and the sample standard deviations
that X
should be reasonable approximations of the population ones. Hence an approximate 95%
CI for the difference (Standard minus Virtual Reality) is
25.9 23.4 1.96

5.72 6.82
+
42
37

2.5 2.79
(0.29 , 5.29)
Hence we are 95% confident that the true difference in means is somewhere between Standard
taking 5.29 minutes longer and VR taking 0.29 minutes longer. Note that zero is within
this interval. In fact, from section 9.4, this means that a 5% test of
H0 : V S = 0 v H1 : V S 6= 0
would not reject H0 .

9.4

Intervals and Tests

A two-sided 95% CI and a two-tailed 5% test are equivalent in the following sense:
For a 5% test of H0 : = 0 v H1 : 6= 0
H0 is rejected if 0 is outside the 95% CI for .
H0 is not rejected if 0 is inside the 95% CI for .
Similarly for 99% CI and 1% test etc, and similarly for 1-tailed test and 1-sided CI of
appropriate sizes.
Example
We return to the example in section 7.4 where a random sample of n = 30 components was
taken from a weeks production, giving a sample mean length of x = 10.51 cm with a sample
standard deviation of S = 0.64 cm. Making the usual assumptions, a 95% CI for the true
mean length is
0.64
10.51 1.96
30
10.51 0.23
(10.28 , 10.74)
Note that the target length of 10.40 cm is inside this interval, which agrees with the
fact that the p-value in section 7.4 exceeded 0.05, so that a 5% test would fail to reject
H0 : = 10.40.
91

9.5

Further Examples

Example
From two samples obtained from different types of scout car the following fuel consumption
figures were obtained:
Type A : nA = 36, xA = 7.5 mpg , SA = 0.6 mpg
Type B : nB = 64, xB = 6.1 mpg , SB = 0.5 mpg
Find the 99% confidence limits for the difference between the means of the populations from
which these samples were taken, (i.e. the 99% confidence limits on the possible difference
in mpg of the two types of scout car).
Solution
Since both samples are fairly large the distribution for the difference of sample means will be
at least approximately normal. The confidence limits are determined as above with A , B
approximated by the sample standard deviations 0.6, 0.5 respectively and zcrit taken from
tables to be 2.58. Thus the 99% limits for the difference between the fuel consumption of
Type A and Type B scout cars are:
7.5 6.1 2.58

0.62 0.52
+
= 1.4 0.304 mpg.
36
64

Hence we can be 99% confident that the difference is between 1.096 and 1.704 mpg, with
Type A having the higher mpg.

92

10

10.1

Significance Tests and Confidence Intervals: Small


Samples
Introduction

The previous section examined how one might draw conclusions about the mean of a population from the analysis of a large set of measurements, i.e. a large sample. In practice,
however, one often has relatively few measurements on which to base a decision. We therefore must also consider the small sample case.
The tests and CIs in sections 7.1 to 9.5 are exactly correct if the data are observations from
a normal distribution with known population variance. They are approximately correct
if the data are observations from a non-normal distribution and/or the population variance
is unknown, but only if n is large. In this context large depends on how non-normal the
distribution of the data appears to be, but n 30 is a common rule of thumb.
The tests and CIs in this section are exactly correct if the data are observations from a
normal distribution with unknown variance.

10.2

Students t-Distribution

We perform tests and construct confidence intervals almost exactly as before, but replace
critical values from the normal distribution (table 2) with those from Students t-distribution
(table 3). Note that this is valid for n large or small, but it is only for small n where it
makes much difference.
The t distribution is like a N(0, 1) with fatter tails (see picture overleaf), with the exact
shape depending on the degrees of freedom, . Roughly,
= n (at least 1)
See section 10.8 for more explanation of degrees of freedom.
The t distribution is tabulated in section 15.1 table 3. A t with degrees of freedom is
sometimes denoted t .
Note that the t distribution becomes more normal-like as n (and so ) increases, and in fact
t is precisely the normal distribution.

10.3

One Sample Tests

The following procedure applies to the case when the data can be assumed to come from a
normal distribution but the true population variance is unknown.
The procedure is almost exactly the same as that for the one-sample z test in section 8.5,
with the exceptions noted below.

93

0.0

0.1

0.2

f(x)

0.3

0.4

0.5

Students t distribution with degrees of freedom 1, 3, 9 and 29

-4

-2

Observed mean x from a sample of size n. Can we assume that the true population mean
is equal to some hypothesised value 0 , or not?
H0 : = 0
H1 : 6= 0
(the ideas are similar for one-tailed tests).
Calculate the test statistic
tobs =

x 0
S
n

where S is the sample standard deviation.


Compare this to critical values of t (often denoted tcrit ) with
=n1
degrees of freedom.
In other words, if H0 is true then tobs is an observation from a t distribution, so we compare
it to the appropriate critical values to see if it is surprisingly large in magnitude.
The above is for a two-tailed test, while in the one-tailed case we make exactly the same
changes as for the one-tailed z test.
94

Example
Five rounds fired from a gun fall at ranges of 12560, 12490, 12550, 12500 and 12520 metres.
Are these results consistent with a range table prediction of 12500 metres?
We have hypotheses:
H0 : = 12500
H1 : 6= 12500
so that this is a single-sample, two-tailed test. Now, from the data above we have
n = 5,
so that

x = 62620,

x =
S =

x2 = 784256600

62620
= 12524
5
v
u
u
t
s

n
X
1
x2 n
x2
n 1 i=1 i

1
(784256600 5 125242)
4

= 30.5
Hence the test statistic is
tobs =

12524 12500
30.5

= 1.76

Compare to t with = 5 1 = 4 degrees of freedom. For 2-tailed 5% test,


tcrit = 2.78
Hence tobs < tcrit , so there is no evidence to reject H0 . Therefore any apparent deviation
from the range table figure could easily be due to random variation.

10.4

Two Sample Tests

Two samples of sizes nA and nB give mean values xA , xB respectively: could they reasonably
both have come from populations with the same mean?
H0 : A = B
H1 : A 6= B
(again, same idea for one-tailed tests).
This divides into two distinct situations.

95

Paired Samples
Sometimes nA = nB and there is a natural pairing between the observations in the two
samples (for example, before and after values for the same individual).
In such cases calculate the difference for each pair. Then to test for a difference in means
perform a one-sample test of H0 : d = 0 where d is the true mean difference.
Note that the observed differences must satisfy the assumptions required by a one-sample
test in section 10.3.
Non-Paired Samples
This is the more usual case. The following procedure only works if both of the following are
true:
1. Both observed data sets consist of independent observations from normal distributions.
2. The variances of these normal distributions are unknown but can be assumed to be
equal.
A common rule of thumb is that it is reasonable to assume equality of the population
variances if
1
S2
A2 2
2
SB
If this is so, calculate the pooled estimate of the common variance 2 ,
Sp2 =

(nA 1)SA2 + (nB 1)SB2


nA + nB 2

Note that this is a weighted average of the two variances, and is just the simple average if
nA = nB .
Then find

xA xB
xA xB
q
=
tobs = r 2
1
1
2
Sp
Sp
S
p nA + nB
+ nB
nA

Compare to critical values of t with


= nA + nB 2
degrees of freedom.
If the population variances cannot be assumed to be (roughly) equal then we must resort
to a rather unpleasant approximation, which will not be detailed here.
Example: Paired t test
A trial compares two types of ATGW, with 10 soldiers firing each once. The data are
analysed and a figure of merit, varying from 0 to 10 depending on the miss distance from
the centre of the target, is awarded to each ATGW. Calculate whether there is any evidence
at the 5% significance level of a difference in performance between the two types.
The figures are:
96

Soldier
ATGW A
ATGW B

1
7
9

2 3 4
3 5 9
6 4 9

5 6
7 5
9 6

7 8 9
4 3 2
8 7 2

10
3
2

There is a natural pairing between observations (soldier number).


Hence calculate differences (B-A) and perform a one-sample test on these, with
H0 : d = 0
H1 : d 6= 0
where d is the true mean difference between A and B.
Note that, for this test to work, we need to assume that the observed differences are independent observations from a normal distribution. It is not clear whether the figure of merit will
follow a normal distribution, since it is not the result of many factors operating additively
and independently (section 6.1). However, we only need to be able to assume normality for
the distribution of the differences, rather than the individual figure of merit values, and the
very fact that they are differences means that the distribution should be fairly symmetrical.
It is symmetry that is the most important part of the normality assumption, so that we
should therefore not go too far astray.
Form the differences (B-A)
2, 3, 1, 0, 2, 1, 4, 4, 0, 1
giving
x = 1.4
S = 1.897
n = 10

(pairs)

and hence
tobs =

1.4 0
1.897

10

= 2.33

The 5% critical value, for = 9 is tcrit = 2.26 so reject H0 since tobs exceeds this. There
is evidence of a difference, with B doing better. However, since tobs only just exceeds tcrit ,
and the assumption of normality is a little dubious, it can be argued that this conclusion
should be toned down. More data are certainly needed.
Example: Unpaired 2-sample t test
The following distances in metres were recorded for the take-off of two aircraft:
Aircraft A
Aircraft B

263 232 268 258 225 273


215 210 225 200 235 245 240 238 235

Assuming that the distances are normally distributed, is there any evidence of a difference
(at 5% significance level) between the take-off distances of the two aircraft?
97

Hence the hypotheses are


H0 : A = B
H1 : A 6= B
From the data
xA = 253.2 , SA = 19.874 , nA = 6
xB = 227.0 , SB = 15.443 , nB = 9
Hence the variance ratio is
19.8742 /15.4432 = 1.6562
so that pooling is appropriate. Then
Sp2

5 19.8742 + 8 15.4432
= 298.6795
=
13

so that Sp = 17.2823. Then


tobs =

253.2 227.0
17.2823

r

1
6

1
9

= 2.87

From the t-table for 5%, two-tailed, with degrees of freedom = 6 + 9 2 = 13, we have
tcrit = 2.16. Since tobs exceeds tcrit , we conclude that there is evidence of a difference between
the take-off distances of the aircraft, with aircraft A having the longer take-off distance.

98

10.5

Confidence Intervals

All previous comments in sections 10.3 and 10.4 on when tests are appropriate apply equally
to confidence intervals.
One Sample Case
The basic procedure is exactly as in section 9.2, we simply use critical values from the t
distribution instead of the normal.
Hence a two-sided confidence interval for is
S
x tcrit
n
where tcrit is the critical value from the t distribution with = n 1 degrees of freedom.
Example
Three tanks are fitted with new tracks and each is driven under trial conditions until either
of its tracks fails. The distances until failure are 6000, 7000 and 8000 miles. Hence
x = 7000
1
(60002 + 70002 + 80002 3 70002)
S2 =
2
= 1000000
S = 1000
If the track failures are due to wear rather than catastrophic failure then the distances
should be approximately normally distributed, so a 95% CI is
1000
7000 4.30
3
7000 2483
= (4517 , 9483)
For very small sample sizes, CIs are often very wide indeed!
Two Sample Case
Just as with tests, this breaks down into paired and unpaired cases.
For paired data use the above one-sample procedure on the differences.
For unpaired data calculate the pooled variance Sp2 , if appropriate, and the CI is
xA xB tcrit Sp

1
1
+
nA nB

Again, the critical value comes from t with = nA + nB 2.


Example: Paired t interval
From the paired t test example in section 10.4 we have
x = 1.4 , S = 1.897 , n = 10
99

so that the 95% CI for the difference B-A is


1.897
1.4 2.26
10
1.4 1.356
(0.044 , 2.756)
Note that zero is just outside the 95% interval, as expected since previously H0 was just
rejected at the 5% level.
Example: Unpaired 2-sample t interval
For the take-off data in section 10.4, calculate a 95% CI for the difference (A-B) in mean
take-off lengths.
We have
xA = 253.2
xB = 227.0
Sp = 17.2823 = 13
Hence the CI is
253.2 227.0 2.16 17.2823

1 1
+
= 26.2 19.6746
6 9
= (6.525 , 45.875)

Hence we are 95% confident that the difference in means is between about 6.5 and 45.9
metres, with A having the larger take-off distance. Note that this interval does not include
zero, as expected since the test rejected H0 (see section 9.4).

100

10.6

Further Examples

Example 1: One-sample t test


As part of an internal Quality Control test at a particular factory, 10 rounds of small arms
ammunition are selected at random from each days production and are fired on a testing
range which is equipped to record the muzzle velocity of each round. On one such day, the
muzzle velocities recorded (metres/second) were:
810 830 845 805 837
793 809 790 824 840
The mean muzzle velocity is required to be at least 838 m/s, and it is known that when
production was started this was being achieved. Test at the 1% level of significance whether
there is any evidence that rounds produced at the factory are now failing to achieve this.
Solution
From the data above, the sample mean muzzle velocity is x = 818.3 m/s, with a sample
standard deviation S = 19.653 m/s.
Assuming that the population of muzzle velocities of this type of ammunition follows a
normal distribution about a mean of with a standard deviation , then
null hypothesis
H0 : 838 m/s
alternative hypothesis H1 : < 838 m/s
so we have a one-tailed test at the 1% level of significance.
Note that, since this is an existing process which we know was working properly in the past,
the null hypothesis is that it is still working properly. It makes sense to behave as though
the process is fine unless we have evidence to the contrary (as in section 7.4).
Since the the population standard deviation is unknown, we use the test statistic
tobs =
=

x 0
S
n

818.3 838
19.653

10

= 3.17

which is an observation from Students t-distribution with (n 1) = 9 degrees of freedom,


under the assumption that the null hypothesis H0 is true.
For a one-tailed test with a 1% level of significance, the critical value of t9 from tables is
tcrit = 2.82. Since tobs < 2.82 we reject the null hypothesis H0 . Therefore there is sufficient
evidence from the sample data to suggest that the mean muzzle velocity of the ammunition
produced is lower than the specified value 838 m/s.
An equivalent way to assess this is to calculate the one-sided 99% confidence interval, with
an upper limit only. The CI is
19.653
818.3 + 2.82
= 835.8
10
101

so that the interval is (0,835.8) and we are 99% confident that the true mean velocity is
no more than 835.8 m/s. The hypothesised (target) value of (at least) 838 is outwith this
interval and hence 838 is not a credible value of .
Sometimes even after a one-tailed test we prefer a two-sided confidence interval. For example
here the 95% two-sided confidence interval gives
19.653
= (804.3 , 832.3)
818.3 2.26
10
so that we are 95% confident that the true mean muzzle velocity is within this range.
Example 2: Two-sample (unpaired) t test
Two machines are set up to produce bullets for AR15 5.56 45 ammunition, the nominal
diameter of the bullet being 5.56 mm. A sample of 10 rounds is selected at random from
the production of each machine and the diameter of each bullet is measured. The sample
data obtained are as follows:
Machine A
Machine B

5.50 5.54 5.56 5.49 5.60 5.44 5.49 5.58 5.56 5.55
5.59 5.55 5.63 5.61 5.52 5.58 5.56 5.60 5.51 5.52

Does this data set provide sufficient evidence, at the 1% significance level, to conclude
that Machine A is producing bullets with a different mean diameter to those produced by
Machine B?
Solution
The hypotheses are
H0 : A = B v H1 : A 6= B
We have
nA = 10 , xA = 5.5310 , SA = 0.049318
nB = 10 , xB = 5.5670 , SB = 0.041647
This is a two-sample test since there is no pairing in the observations, and we need to be
able to assume that both sets of observations are from normally distributed populations.
This seems reasonable, but we would look at similar past data to check.
The ratio of the sample variances is
0.0493182
= 1.4023
0.0416472
so that pooling variances is fine. Then
Sp2 =

9 0.0493182 + 9 0.0416472
= 0.00208333
18
Sp = 0.0456435
102

Hence
tobs =

5.5310 5.5670
q

0.0456435
= 1.76

1
10

1
10

which is compared to t with 10 + 10 2 = 18 degrees of freedom. Hence tcrit = 2.85 for a


1% two-tailed test, so do not reject H0 .
We could construct a 99% CI for the difference (A-B) as
s

5.5310 5.5670 2.85 0.0456435

1
1
+
= (0.0942 , 0.0222)
10 10

which includes zero, as expected.


Example 3: Paired t test
During the Warrior gunnery conversion programme for a particular infantry battalion, ten
soldiers were tested on the Desk Top Trainer (DTT) with computer generated commands,
followed by the same exercise repeated on the Turret Trainer (TT) with a Commander
present to issue the instructions.
The scores obtained (out of a maximum 100) are given below,
Soldier
DTT
TT

1 2 3
79 82 91
77 80 93

4 5 6 7 8 9
87 63 95 71 78 86
87 55 90 68 76 86

10
90
89

Assuming that scores are normally distributed, test at the 1% level whether there is any
evidence to suggest a difference in the gunners performance between Turret Trainer and
Desk Top Trainer.
Solution
Since the scores are paired for each soldier, we use the differences X=DTT-TT,
Soldier
X

1
2
+2 +2

3 4 5
6
-2 0 +8 +5

7
8 9
+3 +2 0

10
+1

The sample mean of the differences is x = 2.1, with a standard deviation S = 2.807 and a
sample size n = 10.
We start by assuming that there is no difference between the population mean score for the
DTT and the population mean score for the TT, and we are interested in detecting any
difference between the DTT and TT scores, so this is a two-tailed test of significance at the
1% level,
H0 : d = 0
H1 : d 6= 0
103

where d is the population mean for the differences X.


The test statistic is
tobs =

x 0
S
n

which, if H0 is true, is an observation from the Students t-distribution with n 1 = 9


degrees of freedom.
Thus,
tobs =

2.1 0.0
2.807

10

= 2.366

From the tables, the critical value of t9 at the 1% level of significance is tcrit = 3.25. Since
tobs < tcrit , we do not reject H0 . There is no evidence (at the 1% level) to suggest that the
gunners score differently on the Turret Trainer than they do on the Desk Top Trainer.
A 99% two-sided CI is given by
2.807
= (0.785 , 4.985)
2.1 3.25
10
which includes zero.
Example 4: One-sample t CI
Five measurements of the diameter of a sphere (in cm) gave the sample
5.33 5.37 5.36 5.32 5.38
Find the 95% confidence interval for the diameter.
Solution
By calculation

n=5

x = 5.352

S = 0.0259

Hence, using tcrit = 2.78 for 5 1 = 4 degrees of freedom, a 95% CI for the true mean is
5.352 2.78

0.0259

= 5.352 0.032
5
= (5.320, 5.384)

This requires the assumption that the observations come from a normal distribution, and in
the absence of past information on similar problems it is difficult to assess whether this is
valid. Hence one can argue that the above is only very approximate, and should be regarded
as a lower bound on the true level of uncertainty. In other words, the CI probably should
be wider than that above, but how much wider we dont know. We should be at least as
uncertain about the true diameter as the above CI indicates.

104

10.7

When to Use Which Test

The following table summarises when the z and t tests and confidence intervals are appropriate.
Note the slight extra complication with two-sample t test and intervals, noted in section
10.4.
Distribution
normal
normal
normal
non-normal
non-normal

10.8

2
known
unknown
unknown
any
any

n
any
any
large
large
small

Use
z
t
z
z
-

Accuracy
exact
exact
approximate
approximate
-

An Explanation of Degrees of Freedom

Suppose you are asked to write 3 numbers with no restrictions imposed upon them. You
have complete freedom of choice with regard to all 3 numbers; there are 3 degrees of freedom.
Suppose now you are asked to write 3 numbers with the restriction that their sum must be
some particular value, say 20. You cannot now choose all 3 numbers freely because as soon
as the choice of the first 2 is made the third number is fixed. Your choices are governed by
the relation X1 + X2 + X3 = 20. In this situation there are only 2 degrees of freedom. The
total number of variables is 3 but the number of restrictions upon them is 1 and thus the
number of free variables is 3 1 = 2.
Suppose now you are asked to write 5 numbers such that their sum is 30 and also such that
the sum of the first 2 is 20. Although there are 5 variables you do not have freedom of choice
with regard to all 5. As soon as you choose the first number the second is determined by the
relation X2 = 20 X1 . Also as soon as you select X3 and X4 , the last is determined by the
relation X5 = 30 20 X3 X4 . The degrees of freedom are thus found by subtracting the
number of independent restrictions placed on variables from the total number of variables,
5 2 = 3 in this case.
If we have a sample of size n, its variance is calculated from all n measurements in the
formula
n
X
i=1

(xi x)2

However, since x the mean of the sample has already been determined, only n 1 of these
measurements are independent. Hence the above sum is divided by n 1 rather than n to
obtain S 2 , and consequently the degrees of freedom associated with a single sample t-test is
= n 1.
Similarly in the two sample case the calculation of Sp2 involves two quantities, the means xA
and xB , and hence places two restrictions on the nA + nB measurements. Hence the degrees
of freedom in this case are = nA + nB 2.
105

11
11.1

Tests and Confidence Intervals for Variances


Introduction

The variance (or standard deviation) gives the best measure of the spread of a population,
and can therefore be thought of as a measure of consistency.
The material in this section only applies if the data are observations from a normal distribution. The central limit theorem does not apply to variances, so that even for a large
sample size the assumption of normality is needed.
One Sample Case
To check whether the variance of a population is compatible with some designated value,
the sample variance is compared with the given value. Alternatively, confidence limits may
be derived to provide a range of values in which the population variance is likely to be.
Two Sample Case
To check whether two populations have the same variance, samples would be taken from
each population and the two sample variances compared. Such a test would indicate for
example whether some modification has produced a change in consistency.
Example
Two different types of automatic cutting machine are set to cut strips of metal to a nominal length of 30cm. However, random effects and engineering limitations mean that each
machine cuts strips which have a distribution of lengths. Although both distributions may
be centred around the nominal 30cm value we would obviously prefer the machine which
gave rise to the smaller spread of lengths about this value, i.e. the distribution with the
smaller variance if this can be identified. In this context therefore we are trying to determine
whether one machine has a better precision than the other, and can use a two-sample test
to compare the variances. Further, if strips need to be between 29cm and 31cm to be usable
then clearly we want = 30cm (or very close) and 2 small, preferably < 0.5cm. This
requires a one-sample test, or confidence interval.

11.2

Fishers F-Distribution

The basic ideas are just the same as when dealing with means, except that we usually
have to look at ratios of variances (rather than differences in means), and we use the F
distribution rather than the normal or t.
The comparison of two sample variances is made by forming the variance ratio:
S12
S22
If the population variances are equal then this follows Fishers F-distribution, provided that
the two populations being sampled follow normal distributions.
The F distribution is tabulated in section 15.1. Table 6 is for one-tailed tests and one-sided
CIs while table 7 is for two-tailed tests and two-sided CIs.
106

Values of Fcrit are given for significance levels of 5% and 1% and for degrees of freedom
1 = n1 1 and 2 = n2 1, where n1 , n2 are the sizes of the samples with sample variances
S12 , S22 respectively. These critical values are used in tests and CIs for both one and two
sample cases.

11.3

Two Sample Tests

Two tailed tests


Two samples of size n1 and n2 give sample variances S12 and S22 respectively. Could they
reasonably both have come from populations having the same variance? Hence we test
H0 : 12 = 22 v H1 : 12 6= 22
This procedure is only valid if both populations can be assumed to be normally distributed.
Procedure
1. Calculate

S12
where S12 > S22
2
S2
Note that here the suffix 1 is used for the larger of the two values of S 2 and the suffix 2
for the smaller, so that the tabulated values of Fcrit are always greater than unity (since
the choice of numerator and denominator is clearly arbitrary, we might as well choose the
larger sample variance to be the numerator and hence simplify the tables). Compare Fobs
with the value Fcrit selected from the F-distribution on table 7 with degrees of freedom
Fobs =

1 = n1 1
2 = n2 1

= numerator degrees of freedom


= denominator degrees of freedom

2. If Fobs > Fcrit then there is evidence that the two population variances are different;
otherwise there is insufficient evidence to conclude that there is any difference in variance
between the two populations.
Note: the rule of thumb for deciding whether you can pool the variances for a two-sample
t test is just an ad hoc version of this test.
Example
For the cutting machines in section 11.1, samples are taken from machines A and B giving
nA = 15 , SA = 0.63 cm , nB = 25 , SB = 0.86 cm
Assuming lengths to be normally distributed, is there any evidence of a difference in consistency between the two machines?
We test
H0 : A2 = B2 v H1 : A2 6= B2
and since SB > SA the test statistic is
Fobs =

SB2
0.862
= 1.86
=
SA2
0.632
107

The degrees of freedom are


1 = 25 1 = 24
2 = 15 1 = 14
so that the critical values from table 7 are
5% : 2.79 , 1% : 3.96
Hence Fobs < Fcrit at the 5% level so we conclude that there is no evidence against H0 , and
that therefore there is no evidence that the machines have different consistencies.
One tailed tests
As above, but this time choose the suffices 1 and 2 so that the hypotheses are
H0 : 12 22 v H1 : 12 > 22
(again, the choice of which to label as sample 1 is arbitrary so we might as well do it the
way that makes the tables simpler). Then calculate
Fobs =

S12
S22

and compare to Fcrit selected from the F-distribution on table 6 with degrees of freedom
1 = n1 1
2 = n2 1

= numerator degrees of freedom


= denominator degrees of freedom

Note that H1 is such that we only reject H0 if Fobs is much bigger than 1, i.e. if it is in the
right-hand tail of the distribution. Hence, similarly to a one-tailed test of means in section
8.3, if Fobs < 1 then we need go no further since H0 will definitely not be rejected.
Example (continued)
If the question had been is there any evidence that machine B is less consistent than
machine A then the hypotheses become
H0 : B2 A2 v H1 : B2 > A2
Here H1 is that B has the larger population variance, so the test statistic is B over A:
Fobs =

0.862
SB2
=
= 1.86
SA2
0.632

with 1 = 24, 2 = 14 again, but from table 6 the critical values are
5% : 2.35 , 1% : 3.43
Hence again Fobs < Fcrit so there is no evidence against H0 .
Example (continued)
If the question had been is there any evidence that machine B is more consistent than
machine A then the hypotheses become
H0 : B2 A2 v H1 : B2 < A2
108

This time H1 is that A has the larger population variance, so the test statistic is A over B:
Fobs =

0.632
SA2
=
= 0.54
SB2
0.862

Since Fobs < 1 we can see that it is in the wrong tail of the distribution. Hence there is no
evidence against H0 .

11.4

One Sample Tests

A sample of size n, assumed to come from a normally distributed population, gives a sample
variance S 2 . Could it reasonably have been taken from a population with designated variance
02 ?
H0 : 2 = 02 v 2 6= 02
This uses exactly the same procedure as in the two-sample case above. The hypothesised
variance 02 is treated exactly like the sample variances in the two-sample case, but with
degrees of freedom.
Unlike the case for means, where we may want them to be either large, within a certain
range, or small, we nearly always want variances to be small. If a process currently has
variance 02 and we make a change to try to improve the consistency, that is decrease the
variance, we would test
H0 : 2 02 v 2 < 02
However, if we were changing the process for some other reason (to change the mean or
reduce the cost) but wanted to check whether this had reduced the consistency, i.e. increased
the variance, we would test
H0 : 2 02 v 2 > 02
The differences in performing a one-tailed test are exactly as in the two-sample case.
Example
The weights of shells from a production line are normally distributed with variance 10g2 .
The machinery is upgraded and it is hoped that this will reduce the variance. A random
sample of 25 shells is taken from the production line, giving a sample variance of 5g2 . Test
at the 5% level whether there is any evidence of a decrease in variance.
Here we are looking for a decrease in variance, so the hypotheses are
H0 : 2 10 v 2 < 10
Then
Fobs =

10
02
=
=2
2
S
5

Fcrit = 1.73 for 5% with 1 = , 2 = 25 1 = 24


Hence we reject H0 and conclude that the upgrade does seem to have reduced the variance.
109

11.5

One Sample Confidence Limits

Given a sample of size n with sample variance S 2 , find, at some specified confidence level,
confidence limits for the population variance 2 . This is only valid if the population can be
assumed to be normally distributed.
Procedure
Confidence intervals for variances, unlike those for means, are asymmetrical, and usually
extend much further above the observed S 2 than below it. A confidence interval for the
standard deviation can be obtained by taking square roots in the obvious way.
The upper confidence limit U2 is given by
U2 = FU S 2
where FU is read from the F-distribution (table 7) at the appropriate confidence level, using
degrees of freedom 1 = , 2 = n 1.
The lower confidence limit L2 is given by
L2 =

S2
FL

where FL is read from the same table and confidence level but using reversed degrees of
freedom 1 = n 1, 2 = .
With variances one-sided CIs are quite commonly of interest, because we are often only
interested in an upper limit. In this case only one calculation is needed, using table 6.
Example
A sample of 13 propellant grains gave an estimated standard deviation for burning time of
0.0137 seconds. Find the upper one-sided 95% confidence limit for the variance and hence
for the standard deviation of propellant grains of this type (in other words, how bad could
they be as regards consistency?).
Here n = 13, S 2 = 0.01372 . From F-distribution table at 5% level (one tailed);
1 = , 2 = 12 : FU = 2.30
Hence, one-sided 95% upper limit on variance U2 is given by
U2 = 2.30 0.01372 = 0.000431687

Hence that for the standard deviation is 0.000431687 = 0.0208, so that the one-sided CI
for the standard deviation is
(0 , 0.0208) seconds
Hence we are 95% confident that the population standard deviation is no larger than 0.0208
seconds.

110

11.6

Further Examples

Example 1
According to specification, the consistency in range for the 81mm UK Mortar, given by the
standard deviation, is 40.47 metres at two-thirds of maximum range. Consistency is defined
to be the distribution of the fall of shot about the mean point of impact (mpi).
It is thought that the range consistency of a particular mortar of this type may have degraded
as a consequence of excessive wear, and to test this, a sample of 20 rounds were fired from
the mortar after bedding-in and adjustment.
The range co-ordinates (in metres) of the sample were:
3766
3620
3785
3595

3670
3685
3700
3665

3598
3710
3617
3800

3720
3648
3750
3650

3620
3765
3633
3773

Does this sample provide evidence at the 5% level to suggest that there has been a degradation in the range consistency of the weapon? From past data we can assume that the fall
of shot follows a normal distribution.
If a degradation in range consistency is shown to have occurred, it has been decided that,
rather than scrap the weapon, it will continue in service with a revised figure for the consistency standard deviation at two-thirds of maximum range. Calculate the 99% upper
confidence limit for the range consistency standard deviation, based on the sample data
given above.
Solution
We wish to test
H0 : 2 40.472
H1 : 2 > 40.472
This is one-tailed since we are only interested in detecting evidence of an increase in the
variance.
From the data, S = 66.75 and so
Fobs =

66.752
S2
= 2.72
=
02
40.472

From table 6, for a one-tailed test at the 5% level with numerator d.f. 1 = 20 1 = 19 and
denominator d.f. 2 = , the critical value is Fcrit = 1.59 (by interpolation).
Since Fobs > 1.59, there is sufficient evidence to reject H0 in favour of H1 , and we conclude
that there has been a degradation in the range consistency of the mortar, probably as a
consequence of age and wear.
The 99% upper confidence limit U2 is given by,
U2 = 2.49 (66.75)2 = 11094.35
111

since 1 = and 2 = 19, one-tailed 1%. Hence


U = 105.33m
Hence we are 99% confident that the standard deviation is no more than 105.33m.
Example 2
A trial was performed to compare the consistency, in range and line, of the 51mm Assault
Mortar when fired from differing base-land. The consistency of the weapon is defined to be
the distribution of the fall of shot of a series about the mean point of impact (mpi), and is
measured by the standard deviation.
It is assumed from past experience that the distributions in range (Y ), and line (X), follow
independent normal distributions, Y N(y , y2 ) and X N(x , x2 ), respectively.
The results obtained from part of the trial are as follows:
Meadowland (M)
YM 584 598 602 566 610 574 565 612 615 591
XM +4 +9 7 12 +7 +11 9 2 +8 11
Sand Dune (D)
YD 579 566 545 591 583 600 608 551 575
XD +9 6 +1 +10 5 11 3 +7 9
In both cases, the mortar was bedded-in and adjusted for a nominal range of 600 metres.
The X co-ordinate was measured as a distance in metres from a fixed line.
Is there any evidence to suggest a difference between the range consistency for the weapon on
the two types of base-land? Similarly, is there any evidence to suggest a difference between
the line consistency for the weapon on the two types of base-land?
Solution
The sample statistics are,
Meadow yM = 591.70m
xM = 0.200m
Dune
yD = 577.56m
xD = 0.780m

SyM = 18.8033m
SxM = 9.004m
SyD = 21.0602m
SxD = 7.8863m

nM = 10
nM = 10
nD = 9
nD = 9

Range Consistency (Y)


Since Sy2D > Sy2M we have

21.06022
= 1.2545
18.80332
From the tables of the F-distribution, for a two-tailed test at the 5% level of significance
with 8 and 9 degrees of freedom, Fcrit = 4.10. Since Fobs < 4.10 we conclude that there is
no evidence of a difference between the range consistency of the two base types.
Fobs =

112

Line Consistency (X)


Since SxM > SxD we have

9.0042
= 1.3035
7.88632
From the tables of the F-distribution, for a two-tailed test at the 5% level of significance
with 9 and 8 degrees of freedom Fcrit = 4.36. Since Fobs < 4.36 we conclude that there is
no evidence of a difference between the line consistency of the two base types.
Fobs =

Example 3
Two surveyors make repeated measurements of the same angle. Their results only differ in
the figure of seconds of angle, the values of which are:
A 40
B 38

29 37 41 38 35
37 40 33 32 39

40 34

Do these results give evidence of a difference between A and B in terms of consistency? You
can assume that the measurements are normally distributed.
Solution
Hence we have
H0 : A2 = B2 v A2 6= B2
For Surveyor A, SA2 = 18.7; for Surveyor B, SB2 = 10.3.
Hence using suffix 1 for the larger S 2 :
S12 = 18.7, n1 = 6, 1 = 5
and suffix 2 for the smaller S 2 :
S22 = 10.3, n2 = 8, 2 = 7
Therefore

18.7
= 1.8
10.3
From table 7 (two-tailed) with 1 = 5, 2 = 7, for 5%, we have Fcrit = 5.29. Since Fobs does
not exceed Fcrit we conclude that there is no evidence of a difference between A and B.
Fobs =

As an aside, if the question had been is there any evidence that B is more consistent than
A then we would have used the hypotheses
H0 : A2 B2 v A2 > B2
In this case we specifically look for evidence that B is more consistent, and hence has a
lower variance. The one-tailed value from table 6 is Fcrit = 3.97, but Fobs is still lower than
this so again we have no evidence against H0 .

113

Example 4
A production line makes components which should be 6.30 cm long, so we require that
= 6.30. Additionally, it is required that at least 90% of components are between 6.27 and
6.33 cm long for the process to be in control.
If the lengths can be assumed to be normally distributed, and a random sample of 20 items
gives a standard deviation of S = 0.026, is there any evidence that the process is out of
control ?
Solution
From table 2, 90% of items are within 1.64 standard deviations of the mean, so the requirement to be in control is
1.64 0.03

0.03
= 0.018
1.64

Hence, the process is in control if 0.018. Therefore 0 = 0.018 and we test


H0 : 2 0.0182 v 2 > 0.0182
Hence
Fobs =

S2
0.0262
= 2.086
=
02
0.0182

From table 6 (one-tailed), with 1 = 19 and 2 = ,


Fcrit = 1.59 (5%) , Fcrit = 1.91 (1%)
(by interpolation). Hence Fobs > Fcrit for a 1% test, so we conclude that there is strong
evidence against H0 , i.e. that the process is out of control.

114

12
12.1

Tests and Intervals for Binomial Proportions


Introduction

Suppose r successes are observed in n repeated independent trials (the term success being
used to describe the attribute of interest). If the sample of size n may be regarded as being
drawn from a population having proportion p of successes then p = r/n gives the best
estimate of p. The parameter p may of course equally well be interpreted as the probability
of a success in an individual trial.
We can use the estimate p to produce a CI for the true success probability p, or to test a
hypothesized value p0 .

12.2

Large Sample Case

For large n we can use the normal approximation to do this. If X Bi(n, p) then, from
section 6.6
X N(np, np(1 p))
Viewed as a random variable, p is just X/n so that, by the results in section 6.4, its variance
is 1/n2 times that of X. Hence
X
p(1 p)
p =
N p,
n
n

This is appropriate whenever the normal approximation is (section 6.6).


Confidence Intervals
To construct a CI we need to estimate p with p in the variance above. Therefore, if n trials
produce r successes, then p = r/n and a 95% CI for p is
p 1.96

p(1 p)
n

Example
If a medical treatment has a 70% success rate in a trial, calculate a 95% confidence interval
for p for sample sizes of 50, 100 and 1000.
Since p = 0.7 in each case, the CI is
0.7 1.96

0.7 0.3
n

for n = 50, 100, 1000. This gives:


n = 50 : (0.573 , 0.827)
n = 100 : (0.610 , 0.790)
n = 1000 : (0.672 , 0.728)
115

Note that n = 50 is the smallest sample for which this approximation is valid.
Tests
Similarly a test of H0 : p = p0 would use the test statistic
p p0
zobs = q

p0 (1p0 )
n

and compare to normal tables in the usual way. (Note that this uses p0 in the variance,
since the test statistic is calculated under the assumption that H0 is true).

12.3

Small Sample Case

If the normal approximation is not appropriate then we calculate the p-value directly using
the binomial distribution. (See next section).

12.4

Examples

Example 1
A new type of explosive charge is being tested, but they are so expensive that only six such
charges are available for test detonations. In a test under extreme conditions, five destroy
their target and one does not. We will only consider the charge for future use if the chance
of target destruction is greater than 0.4. Test at the 5% level.
In this case the burden of proof must be with the manufacturers of the charges, who need
to convince us that their product conforms to specification. Hence the hypotheses are
H0 : p 0.4
H1 : p > 0.4
The sample estimate is 5 successes out of 6, so clearly the normal approximation is not
appropriate.
Assuming p = 0.4, the chance of obtaining 5 or more successes is given by the binomial
distribution with n = 6, p = 0.4, q = 0.6
chance of 6 successes :
chance of 5 successes :

p6 = 0.0041
np5 q = 0.0369

Hence the chance of 5 or 6 successes is 0.0041+0.0369 = 0.0410.


When testing at the 5% level, we are saying that we will reject H0 if the chance of observing
what we did, or something even less compatible with H0 , is less than 0.05.
The above shows that this probability (the p-value) is 0.0410. Since this is less than 0.05,
we therefore reject H0 and conclude that the test result is not consistent with a value as
low as 0.4, so that the charge appears to be performing to specification.
116

If instead we had had n = 100 charges with r = 50 successes, we could work out the
probability directly as above, but it is much easier to use the normal approximation as in
section 12.2. Since p = 0.5 and p0 = 0.4, the test statistic is
0.5 0.4
= 2.04124
zobs = q
0.40.6
100

Comparing to critical values from normal tables, zobs > 1.64 so that this is significant at the
5% level (one-tailed) and hence there is fairly strong evidence against H0 .
Example 2
We return to example 2 in section 4.5. A sample of n = 1000 gave an estimate of p = 0.732.
We can construct a 95% confidence interval as
s

0.732 0.268
1000
= (0.70455 , 0.75945)

0.732 1.96

The factory claimed that at least 72% of its production works correctly, but the CI contains
values below 0.72, suggesting that one would not be entirely convinced by this claim.
You might argue that a better approach here would be to test H0 : p 0.72 v H1 : p < 0.72
but this seems rather dubious since the claim that p > 0.72 is simply an assertion, rather
than something which it makes sense to believe in the absence of evidence to the contrary.
Hence it is not really a valid choice for H0 . A more sensible test would use H0 : p 0.72
instead, placing the burden of proof with the company, as in the previous example.
A common approach in process control is that for a new product the null hypothesis is that
the product does not conform to the required specifications, whereas for a well-established
product the null hypothesis is that new items from the production line do still conform to
the required specifications.

117

13

13.1

Comparison of Observed and Predicted Frequencies


Introduction

This commonly arises in 2 ways:


1. We wish to check our assumptions. For example, do data seem to come from a
particular distribution?
2. We wish to see if two factors are associated, by testing the null hypothesis that they
are not.
Comparison between the observed results and those predicted on the basis of some hypothesis is made by comparing frequencies. Calculate expected or predicted frequencies, by
assuming that the hypothesis is correct. Compare these to observed frequencies by calculating a measure of discrepancy between the two. Reject the null hypothesis if this measure is
sufficiently large. (Recall section 4.5 example 2).
The Measure of Discrepancy
Suppose that we have n observations, split into c classes.
The observed frequencies for each class are O1 , O2 , . . . , Oc , so that
O1 + O2 + . . . + Oc = n
On the basis of our hypothesis, we find predicted/expected frequencies for each class, E1 , E2 , . . . , Ec ,
where
E1 + E2 + . . . + Ec = n
Then the discrepancy (lack of fit) statistic is
X2 =

(Oc Ec )2
(O1 E1 )2
+ ...+
E1
Ec

This will be large if the hypothesis is false.

13.2

The 2 Distribution

The discrepancy measure X 2 follows, to a good approximation, the 2 distribution tabulated


in section 15.1, table 5. This table gives values for significance levels between 10% and
0.1%. The value depends also on the degrees of freedom , which represents the number of
independent frequency comparisons made.
The effect of constraining the predicted frequencies to add up to n is to reduce the value
of to c 1. It is further reduced if, for example, use is made of the estimates from the
sample (e.g. the sample mean) in calculating the E values.
118

If H0 is true then the calculated X 2 is an observation from the 2 distribution with degrees
of freedom
= c - number of times sample data were used in calculating the Ei values
= c - 1 - number of parameters estimated
Values of degrees of freedom for various typical problems will be found in the examples.
This is only an approximate application of the 2 distribution. The approximation improves
as predicted frequencies become larger, and it becomes unsatisfactory for values of Ei less
than about 5. If this occurs, sometimes such small frequency classes can be combined with
neighbouring ones until a sufficiently high value is reached. If this cannot be done, an exact
distribution, requiring special tables, must be used for the test of goodness of fit.
Rule of thumb: the approximation is reasonable provided all Ei are 1 and 80% of them
are 5. (Some people argue for the more stringent criterion of all Ei 5).
These tests are best illustrated by examples.

13.3

Goodness of Fit Tests

Example from section 5.4


We have 500 ammo boxes contain 250 defectives, as shown in the table columns (a) and (b).
Are defectives occurring randomly?
If they are, then the observed frequencies should follow the Poisson distribution, so we test
H0 : data come from Poisson
H1 : they dont
First calculate the sample mean as x = 250/500 = 0.5 and use this to estimate the mean of
the distribution, so that m
= 0.5. Using this as the mean of the Poisson distribution, derive
the probabilities for 0, 1, 2, . . . defectives from the formula for the Poisson distribution and
multiply by n = 500 to give the predicted frequencies of column (c) below.
P (0) = e0.5
= 0.6065
P (1) = 0.5e0.5 = 0.3033
2
P (2) = 0.5
e0.5 = 0.0758
2!
P ( 3) = 1 P (0) P (1) P (2)
= 0.0144
Note that we must combine the last three classes to give a total predicted frequency of at
least 5 (since if we had separate classes for 4 and 5 or more then this would not be the
case).

119

Number of Observed Predicted (O E)2


Defectives Frequency Frequency
E
(Poisson)
O
E
(a)

(b)

(c)

(d)

0
1
2
3
4
5 or more

309
142
40
8}
1}
0}
500

303.265
151.633
37.908

0.108
0.612
0.115

7.200

0.450

500

1.285

The number of degrees of freedom is given by c 1 1:


4 number of frequency comparisons made
-1 agreement of total frequency
-1 use of estimated mean
2
From the 2 distribution table, with = 2, 5% level of significance, 2crit = 5.991. The
observed X 2 calculated in the last column has value 1.285. Since X 2 < 2crit there is
insufficient evidence to cast any doubt on the hypothesis that the data follow a Poisson
distribution and hence the hypothesis stands.

13.4

Contingency Table Test of Association

A sample of 10000 persons is examined for lung cancer and it is found that 100 of these
have the disease. Each individual is also classified as a smoker or non-smoker. The results
are tabulated below:
(Observed)
Smokers
Non-smokers
Total

Cancer
90
10
100

OK
6910
2990
9900

Total
7000
3000
10000

Do these figures indicate that there is any association between smoking and lung cancer?
Now, if the factors are not associated then we can calculate expected frequencies E using
P (A and B) = P (A) P (B)
where A and B are the row and column category. Hence
P (Smoker AND Cancer) =
120

100
7000

10000 10000

so the expected number of such people is the above 10000, i.e.


E=

7000 100
= 70
10000

Similarly for smoker and OK,


E=
In general
E=

7000 9900
= 6930
10000

row total column total


overall total

(Expected)
Smokers
Non-smokers
Total
Then

Cancer
70
30
100

OK
6930
2970
9900

Total
7000
3000
10000

202
202
202 202
+
+
+
= 19.24
X =
70
30
6930 2970
2

From the formula for degrees of freedom in section 13.2, if we have r rows and c columns
then
= rc 1 (r 1) (c 1) = (r 1)(c 1)
Hence for tests of this type
degrees of freedom = (no. of rows 1) (no. of columns 1)
Here
degrees of freedom = (2 1) (2 1) = 1
From table 5 with = 1
0.1% critical value
1% critical value
5% critical value

10.830
6.635
3.841

Observed X 2 is significant even at 0.1% level. Hence reject H0 - there is strong evidence of
an association. More smokers have cancer than would be expected under a hypothesis of no
association.
Note: For 22 tables like this there is in fact a slight modification to the standard procedure
which improves the approximation.

121

13.5

Further Examples

Example
We return to example 2 in section 4.5. The observed and expected frequencies calculated
there can be compared using the method in section 13.3. We need to combine the 05
categories into one for the 2 approximation to be valid, and hence obtain
(8 10.09)2 (12 16.67)2
(0 4.42)2
+
+ ...+
10.09
16.67
4.42
= 9.3805

X2 =

There are 6 categories and an estimate of p was used to calculate the expected frequencies,
so the 2 has 6 1 1 = 4 degrees of freedom. Hence if
H0 : data come from a binomial distribution
is correct then 9.3805 is an observation from a 2 distribution with 4 degrees of freedom.
From table 5 the 5% critical value is 9.488 so that we almost, but not quite, reject H0 .
Hence we would be fairly equivocal about whether the data really do follow a binomial
distribution.

122

14
14.1

Correlation and Regression

Introduction

In many cases we wish to know whether two variables (e.g. height and weight) are related
to each other, based on a sample of data
(xi , yi) i = 1, 2, 3, . . . , n
In other words there are n individuals giving n pairs of observations.
There are many ways to assess this, depending on the types of variables and the type of
relationship. Here we introduce the two simplest ways.

14.2

Correlation

This section deals only with the Pearson (product moment) correlation coefficient. Other
correlation coefficients (Spearman, Kendall) also exist.
The sample (Pearson) correlation coefficient rxy (or just r), is given by
P

rxy = qP

(xi x)(yi y)

(xi x)

P
2

(yi y)2

Sxy
= q
Sxx Syy

(see AMOR formula book). This is standardised so that it is always between -1 (perfect negative relationship) and +1 (perfect positive relationship). A value close to zero corresponds
to no (apparent) relationship.
A positive correlation indicates that large values of y tend to go with large values of x and
small values of y with small values of x. Similarly a negative correlation indicates that large
values of y tend to go with small values of x and small values of y with large values of x.
Note that the correlation coefficient only assesses a linear relationship, and can be meaningless if the variables are related non-linearly. For example, if the relationship is strong but
quadratic then the correlation coefficient will be close to zero.
Testing the significance of the correlation coefficient
The population correlation coefficient is usually denoted , and it is natural to use the
sample correlation coefficient r to test whether there is any evidence of a (linear) relationship
between the variables. This translates into the hypotheses
H0 : = 0 v H1 : 6= 0
Table 4 gives critical value for 5% and 1% tests. For the degrees of freedom = n 2, find
the critical value rcrit form the table and reject H0 if
|r| > rcrit
Obviously if r > rcrit we conclude that there is evidence of a positive correlation and if
r < rcrit we conclude that there is evidence of a negative correlation.
123

Example
To determine the relationship between normal stress and the shear resistance of a substance,
a shear box experiment was performed, giving the following results:
Normal stress x (psi)
Shear resistance y (psi)

11
13
15
17
19
21
15.2 17.7 19.3 21.5 23.9 25.4

In order to use the formulae we calculate


X

xi = 96 ,

yi = 123.0 ,

xi yi = 2039.8 ,

x2i = 1606 ,

yi2 = 2595.44

so that
x = 16.0 , y = 20.5
and hence
2039.8 6 16.0 20.5
= 0.998
rxy = q
(1606 6 16.02 )(2595.44 6 20.52)
The critical value for a 1% test with = 6 2 = 4 is rcrit = 0.92 so that r > rcrit and so
there is strong evidence of a (positive) correlation between x and y, even from such a small
sample.
Important cautionary note
However, it is important to realise that correlation is not the same as causation. Just because
y and x are correlated, it does not mean that x causes y, or vice versa. It is often the
case that an apparent relationship is the result of y and x both being caused by another
variable which we have not measured.
To take a deliberately silly example, we might measure many behavioural variables on a
sample of people and calculate the correlations between them. If two of these variables are
frequency of playing basketball and frequency of hitting your head on door frames then
we would probably find a positive correlation between them. We might then conclude either
that hitting your head makes you want to play basketball or that playing basketball makes
you careless so that you keep hitting your head. Of course, tall people are much more likely
both to play basketball and to hit their heads, so that it is likely to be this third variable
which induces the correlation between the other two.
When the correlation between smoking and lung cancer was first noticed the cigarette companies made this point, that correlation is not the same as causation. However, in that case
the medical evidence demonstrating a causal link was soon found.

124

14.3

Regression

This tries to expand on the ideas above, not just measuring whether there is a relationship,
but trying to model it explicitly.
In general we have a response variable y and an explanatory variable x, and we try to
model the relationship between them. However, we need to allow for the fact that there will
be random variation, so that two observations of y at the same value of x will not produce
the same results.
Fitting a simple linear regression line
The usual way to determine the underlying trend in a set of noisy data (xi , yi) is to draw
the best line through the experimental points. If one suspects a linear dependency of y on
x then one would try to draw the best straight line.
y = + x
through the points and hence calculate the values of the slope and intercept which
characterise this line.
Unfortunately, the best line for one person is not necessarily the best line for another
unless both adopt the same convention for defining best. In general in data-fitting the
best fit is to choose the line, the regression line, which minimises the sum of the squares
of the vertical distances from the observed points to the fitted line.
We now draw the distinction between and , which are population parameters defining

the true line, and their estimates from the sample,


and .
Hence we choose
and to minimise the quantity
n
X
i=1

(yi ymodel )2

i.
where, for a hypothesised straight line relationship, ymodel =
+ x
The expressions for these estimates in terms of the n experimental points (xi , yi) are given
by
xi yi n
xy Sxy
=
2
x2
xi n
Sxx

= y x
=

where the summations in each case are over all n observations (see formula book).
The fitted line

y =
+ x
represents our estimate of the mean value of y at any given value of x.
Note that these formulae will fit the best straight line even if the relationship is not a
straight line at all, so it is important to check this. If the dependency of y on x is thought
to be a quadratic function then we can instead fit
ymodel = + x + x2
125

In this case minimisation of the sum of the squares of the distances from the observed
points to the line again defines estimates of the three parameters , , in terms of the set
(xi , yi) i = 1, 2, . . . , n. These expressions are somewhat more complicated than in the case
of a straight line model.
Example (continued)
In this test the value of x is specified by the experimenter, while y can be viewed as a
response to x. We assume there is a relationship of the form
y = + x
between these variables. (Normally there is not necessarily any physical interpretation of
the intercept and slope terms but here we can say that is the cohesion of the substance
and = tan where is the angle of friction).
Now,
2039.8 6 16.0 20.5
=
= 1.02571
1606 6 16.02

= 20.5 1.02571 16.0 = 4.08864


Hence the fitted line is
y = 4.08864 + 1.02571x

Therefore, for any given stress value x, our estimate of the mean shear resistance is given
by the above expression.
Confidence limits
Since the values of
and are only estimates based on a sample of all possible measurements
which could have been made, the regression line based on these estimates is itself only an
estimate of the mean value of y at any value of x, and not necessarily the true value. The
question therefore arises: what confidence can we have in our estimate of the true equation
of the line?
We have already seen the variance of an estimator of a mean enables us to form confidence
limits for the mean of a population. In a similar way we can define mathematically the

variances of the estimators ,
and y, and thus place confidence intervals on these quantities.
The actual formulae for these variances and confidence intervals are rather involved and are
given in the formula book. They are based on the assumption that the variation in y values
for any particular value of x can be described by a normal distribution with mean zero and
variance 2 .
Example (continued)

The formulae in the book give the following estimates of the variances of y,
and :
y : 0.073429;

: 0.280778;

: 0.001049

Then 95% confidence intervals for and can be created using the t distribution critical
values with n 2 degrees of freedom ( = 4 and tcrit = 2.78 in this case)
: (2.6155, 5.5617)
: (0.9357, 1.1158)
126

Note that if the confidence interval for includes the value = 0 then we would conclude
that there was no evidence that we should include this second parameter in our model
y = + x. Consequently this could be reduced to y = , or in other words y does not
depend on x.
Confidence limits for the mean value of y at a specified value x0 of x are given by
1.0257x0

v
u
u
+ 4.0886 2.78 t0.073429

1 (x0 16)2
+
6
70

We can similarly define a tolerance interval which is in effect confidence limits for a single
new observation at a specified value x0 of x:
1.0257x0

v
u
u
+ 4.0886 2.78 t0.073429

1 (x0 16)2
1+ +
6
70

Often this is the main thing we want to know, i.e. if we take a new observation of y at
x = x0 , in what range of values do we think it will fall?

127

15
15.1

Reference Material

Tables

1. Table of normal distribution


2. Table of normal critical values
3. Table of t critical values
4. Table of critical values of correlation coefficient
5. Table of critical values of 2 distribution
6. Table of one-tailed critical values of F distribution
7. Table of two-tailed critical values of F distribution

15.2

Tutorial exercises

Exercises 1: Simple and conditional probability


Exercises 2: Binomial and Poisson probability
Exercises 3: Normal probability
Exercises 4: Large sample tests and confidence intervals
Exercises 5: Small sample tests and confidence intervals
Exercises 6: Consistency, goodness of fit and regression

128

TABLE 1

Areas Under the Standard Normal Distribution, N(0,1)


The figures in the table are the area Q which represents
the probability of finding a value between 0 and z.
For other normal distributions, N(, 2 ), convert the
scale value x to z using z = x
. Q is then the proba
bility of finding a value between and x.
z

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.0
0.1
0.2
0.3
0.4
0.5

0.0000
0.0398
0.0793
0.1179
0.1554
0.1915

0.0040
0.0438
0.0832
0.1217
0.1591
0.1950

0.0080
0.0478
0.0871
0.1255
0.1628
0.1985

0.0120
0.0517
0.0910
0.1293
0.1664
0.2019

0.0160
0.0557
0.0948
0.1331
0.1700
0.2054

0.0199
0.0596
0.0987
0.1368
0.1736
0.2088

0.0239
0.0636
0.1026
0.1406
0.1772
0.2123

0.0279
0.0675
0.1064
0.1443
0.1808
0.2157

0.0319
0.0714
0.1103
0.1480
0.1844
0.2190

0.0359
0.0753
0.1141
0.1517
0.1879
0.2224

0.6
0.7
0.8
0.9
1.0

0.2257
0.2580
0.2881
0.3159
0.3413

0.2291
0.2611
0.2910
0.3186
0.3438

0.2324
0.2642
0.2939
0.3212
0.3461

0.2357
0.2673
0.2967
0.3238
0.3485

0.2389
0.2704
0.2995
0.3264
0.3508

0.2422
0.2734
0.3023
0.3289
0.3531

0.2454
0.2764
0.3051
0.3315
0.3554

0.2486
0.2794
0.3078
0.3340
0.3577

0.2517
0.2823
0.3106
0.3365
0.3599

0.2549
0.2852
0.3133
0.3389
0.3621

1.1
1.2
1.3
1.4
1.5

0.3643
0.3849
0.4032
0.4192
0.4332

0.3665
0.3869
0.4049
0.4207
0.4345

0.3686
0.3888
0.4066
0.4222
0.4357

0.3708
0.3907
0.4082
0.4236
0.4370

0.3729
0.3925
0.4099
0.4251
0.4382

0.3749
0.3944
0.4115
0.4265
0.4394

0.3770
0.3962
0.4131
0.4279
0.4406

0.3790
0.3980
0.4147
0.4292
0.4418

0.3810
0.3997
0.4162
0.4306
0.4429

0.3830
0.4015
0.4177
0.4319
0.4441

1.6
1.7
1.8
1.9
2.0

0.4452
0.4554
0.4641
0.4713
0.4772

0.4463
0.4564
0.4649
0.4719
0.4778

0.4474
0.4573
0.4656
0.4726
0.4783

0.4484
0.4582
0.4664
0.4732
0.4788

0.4495
0.4591
0.4671
0.4738
0.4793

0.4505
0.4599
0.4678
0.4744
0.4798

0.4515
0.4608
0.4686
0.4750
0.4803

0.4525
0.4616
0.4693
0.4756
0.4808

0.4535
0.4625
0.4699
0.4761
0.4812

0.4545
0.4633
0.4706
0.4767
0.4817

2.1
2.2
2.3
2.4
2.5

0.4821
0.4861
0.4893
0.4918
0.4938

0.4826
0.4864
0.4896
0.4920
0.4940

0.4830
0.4868
0.4898
0.4922
0.4941

0.4834
0.4871
0.4901
0.4925
0.4943

0.4838
0.4875
0.4904
0.4927
0.4945

0.4842
0.4878
0.4906
0.4929
0.4946

0.4846
0.4881
0.4909
0.4931
0.4948

0.4850
0.4884
0.4911
0.4932
0.4949

0.4854
0.4887
0.4913
0.4934
0.4951

0.4857
0.4890
0.4916
0.4936
0.4952

2.6
2.7
2.8
2.9
3.0

0.4953
0.4965
0.4974
0.4981
0.4987

0.4955
0.4966
0.4975
0.4982
0.4987

0.4956
0.4967
0.4976
0.4982
0.4987

0.4957
0.4968
0.4977
0.4983
0.4988

0.4959
0.4969
0.4977
0.4984
0.4988

0.4960
0.4970
0.4978
0.4984
0.4989

0.4961
0.4971
0.4979
0.4985
0.4989

0.4962
0.4972
0.4979
0.4985
0.4989

0.4963
0.4973
0.4980
0.4986
0.4990

0.4964
0.4974
0.4981
0.4986
0.4990

3.1
3.2
3.3
3.4

0.4990
0.4993
0.4995
0.4997

0.4991
0.4993
0.4995
0.4997

0.4991
0.4994
0.4995
0.4997

0.4991
0.4994
0.4996
0.4997

0.4992
0.4994
0.4996
0.4997

0.4992
0.4994
0.4996
0.4997

0.4992
0.4994
0.4996
0.4997

0.4992
0.4995
0.4996
0.4997

0.4993
0.4995
0.4996
0.4997

0.4993
0.4995
0.4997
0.4998

TABLE 2

Critical Points of the Standard Normal Distribution


These tables give the critical values of z which determine the tail or tails of the distribution
which contain the percentage of the population.
2a. Two tails

(%) =
zcrit =

10

1.64 1.96 2.33 2.58

0.5

0.2

0.1

2.81 3.09 3.29

2b. One tail

(%) =
zcrit =

10

1.28 1.64 2.05 2.33

0.5

0.2

0.1

2.58 2.87 3.09

TABLE 3

Critical Points of the t Distribution

One tail

One tail (%)=


Two tails (%)=
v=1
2
3
4
5

Notes:

Two tails

5
2.5
1
0.5
0.1
0.05
10
5
2
1
0.2
0.1
6.31 12.71 31.82 63.66 318.29 636.58
2.92 4.30 6.96 9.92 22.33 31.60
2.35 3.18 4.54 5.84 10.21 12.92
2.13 2.78 3.75 4.60
7.17
8.61
2.02 2.57 3.36 4.03
5.89
6.87

6
7
8
9
10

1.94
1.89
1.86
1.83
1.81

2.45
2.36
2.31
2.26
2.23

3.14
3.00
2.90
2.82
2.76

3.71
3.50
3.36
3.25
3.17

5.21
4.79
4.50
4.30
4.14

5.96
5.41
5.04
4.78
4.59

12
15
20
24
30

1.78
1.75
1.72
1.71
1.70

2.18
2.13
2.09
2.06
2.04

2.68
2.60
2.53
2.49
2.46

3.05
2.95
2.85
2.80
2.75

3.93
3.73
3.55
3.47
3.39

4.32
4.07
3.85
3.75
3.65

40
50
70
100

1.68
1.68
1.67
1.66
1.64

2.02
2.01
1.99
1.98
1.96

2.42
2.40
2.38
2.36
2.33

2.70
2.68
2.65
2.63
2.58

3.31
3.26
3.21
3.17
3.09

3.55
3.50
3.43
3.39
3.29

For a one sample test, v = n 1


For a two sample test, v = n1 + n2 2
When n is large, this distribution approaches N(0,1).

TABLE 4

Critical Points of the Correlation Coefficient

1
2
3
4
5
6
7
8
9
10

= 5%
1.00
0.95
0.88
0.81
0.75
0.71
0.67
0.63
0.60
0.58

= 1%
1.00
0.99
0.96
0.92
0.87
0.83
0.80
0.77
0.74
0.71

12
14
16
18
20

0.53
0.50
0.47
0.44
0.42

0.66
0.62
0.59
0.56
0.54

22
24
26
28
30
40

0.40
0.39
0.37
0.36
0.35
0.30

0.52
0.50
0.48
0.46
0.45
0.39

50
60
70
80
90

0.27
0.25
0.23
0.22
0.21

0.35
0.33
0.30
0.28
0.27

100
200
500
1000

0.20
0.14
0.09
0.06

0.25
0.18
0.12
0.08

Note. The degrees of freedom are = number of data points - 2.

TABLE 5: CRITICAL VALUES FOR 2 DISTRIBUTION

10

2.5

0.1

=1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
50
60
70
80
90
100

2.706
4.605
6.251
7.779
9.236
10.64
12.02
13.36
14.68
15.99
17.28
18.55
19.81
21.06
22.31
23.54
24.77
25.99
27.20
28.41
29.62
30.81
32.01
33.20
34.38
35.56
36.74
37.92
39.09
40.26
51.81
63.17
74.40
85.53
96.58
107.6
118.5

3.841
5.991
7.815
9.488
11.07
12.59
14.07
15.51
16.92
18.31
19.68
21.03
22.36
23.68
25.00
26.30
27.59
28.87
30.14
31.41
32.67
33.92
35.17
36.42
37.65
38.89
40.11
41.34
42.56
43.77
55.76
67.50
79.08
90.53
101.9
113.1
124.3

5.024
7.378
9.348
11.14
12.83
14.45
16.01
17.53
19.02
20.48
21.92
23.34
24.74
26.12
27.49
28.85
30.19
31.53
32.85
34.17
35.48
36.78
38.08
39.36
40.65
41.92
43.19
44.46
45.72
46.98
59.34
71.42
83.30
95.02
106.6
118.1
129.6

6.635
9.210
11.35
13.28
15.09
16.81
18.48
20.09
21.67
23.21
24.72
26.22
27.69
29.14
30.58
32.00
33.41
34.81
36.19
37.57
38.93
40.29
41.64
42.98
44.31
45.64
46.96
48.28
49.59
50.89
63.69
76.15
88.38
100.4
112.3
124.1
135.8

10.83
13.82
16.27
18.47
20.52
22.46
24.32
26.12
27.88
29.59
31.26
32.91
34.53
36.12
37.70
39.25
40.79
42.31
43.82
45.31
46.80
48.27
49.73
51.18
52.62
54.05
55.48
56.89
58.30
59.70
73.40
86.66
99.61
112.3
124.8
137.2
149.4

TABLE 6: ONE-TAILED CRITICAL VALUES FOR F DISTRIBUTION

10

12

15

20

24

5%
1%

161.4
4052

199.5
5000

215.7
5403

224.6
5625

230.2
5764

234.0
5859

236.8
5928

238.9
5982

240.5
6022

241.9
6056

243.9
6106

246.0
6157

248.0
6209

249.1
6235

254.3
6366

5%
1%

18.51
98.50

19.00
99.00

19.16
99.17

19.25
99.25

19.30
99.30

19.33
99.33

19.35
99.36

19.37
99.37

19.38
99.39

19.40
99.40

19.41
99.42

19.43
99.43

19.45
99.45

19.45
99.46

19.50
99.50

5%
1%

10.13
34.12

9.55
30.82

9.28
29.46

9.12
28.71

9.01
28.24

8.94
27.91

8.89
27.67

8.85
27.49

8.81
27.35

8.79
27.23

8.74
27.05

8.70
26.87

8.66
26.69

8.64
26.60

8.53
26.13

5%
1%

7.71
21.20

6.94
18.00

6.59
16.69

6.39
15.98

6.26
15.52

6.16
15.21

6.09
14.98

6.04
14.80

6.00
14.66

5.96
14.55

5.91
14.37

5.86
14.20

5.80
14.02

5.77
13.93

5.63
13.46

5%
1%

6.61
16.26

5.79
13.27

5.41
12.06

5.19
11.39

5.05
10.97

4.95
10.67

4.88
10.46

4.82
10.29

4.77
10.16

4.74
10.05

4.68
9.89

4.62
9.72

4.56
9.55

4.53
9.47

4.36
9.02

5%
1%

5.99
13.75

5.14
10.92

4.76
9.78

4.53
9.15

4.39
8.75

4.28
8.47

4.21
8.26

4.15
8.10

4.10
7.98

4.06
7.87

4.00
7.72

3.94
7.56

3.87
7.40

3.84
7.31

3.67
6.88

5%
1%

5.59
12.25

4.74
9.55

4.35
8.45

4.12
7.85

3.97
7.46

3.87
7.19

3.79
6.99

3.73
6.84

3.68
6.72

3.64
6.62

3.57
6.47

3.51
6.31

3.44
6.16

3.41
6.07

3.23
5.65

5%
1%

5.32
11.26

4.46
8.65

4.07
7.59

3.84
7.01

3.69
6.63

3.58
6.37

3.50
6.18

3.44
6.03

3.39
5.91

3.35
5.81

3.28
5.67

3.22
5.52

3.15
5.36

3.12
5.28

2.93
4.86

5%
1%

5.12
10.56

4.26
8.02

3.86
6.99

3.63
6.42

3.48
6.06

3.37
5.80

3.29
5.61

3.23
5.47

3.18
5.35

3.14
5.26

3.07
5.11

3.01
4.96

2.94
4.81

2.90
4.73

2.71
4.31

10

5%
1%

4.96
10.04

4.10
7.56

3.71
6.55

3.48
5.99

3.33
5.64

3.22
5.39

3.14
5.20

3.07
5.06

3.02
4.94

2.98
4.85

2.91
4.71

2.85
4.56

2.77
4.41

2.74
4.33

2.54
3.91

11

5%
1%

4.84
9.65

3.98
7.21

3.59
6.22

3.36
5.67

3.20
5.32

3.09
5.07

3.01
4.89

2.95
4.74

2.90
4.63

2.85
4.54

2.79
4.40

2.72
4.25

2.65
4.10

2.61
4.02

2.40
3.60

12

5%
1%

4.75
9.33

3.89
6.93

3.49
5.95

3.26
5.41

3.11
5.06

3.00
4.82

2.91
4.64

2.85
4.50

2.80
4.39

2.75
4.30

2.69
4.16

2.62
4.01

2.54
3.86

2.51
3.78

2.30
3.36

13

5%
1%

4.67
9.07

3.81
6.70

3.41
5.74

3.18
5.21

3.03
4.86

2.92
4.62

2.83
4.44

2.77
4.30

2.71
4.19

2.67
4.10

2.60
3.96

2.53
3.82

2.46
3.66

2.42
3.59

2.21
3.17

14

5%
1%

4.60
8.86

3.74
6.51

3.34
5.56

3.11
5.04

2.96
4.69

2.85
4.46

2.76
4.28

2.70
4.14

2.65
4.03

2.60
3.94

2.53
3.80

2.46
3.66

2.39
3.51

2.35
3.43

2.13
3.00

15

5%
1%

4.54
8.68

3.68
6.36

3.29
5.42

3.06
4.89

2.90
4.56

2.79
4.32

2.71
4.14

2.64
4.00

2.59
3.89

2.54
3.80

2.48
3.67

2.40
3.52

2.33
3.37

2.29
3.29

2.07
2.87

16

5%
1%

4.49
8.53

3.63
6.23

3.24
5.29

3.01
4.77

2.85
4.44

2.74
4.20

2.66
4.03

2.59
3.89

2.54
3.78

2.49
3.69

2.42
3.55

2.35
3.41

2.28
3.26

2.24
3.18

2.01
2.75

17

5%
1%

4.45
8.40

3.59
6.11

3.20
5.18

2.96
4.67

2.81
4.34

2.70
4.10

2.61
3.93

2.55
3.79

2.49
3.68

2.45
3.59

2.38
3.46

2.31
3.31

2.23
3.16

2.19
3.08

1.96
2.65

18

5%
1%

4.41
8.29

3.55
6.01

3.16
5.09

2.93
4.58

2.77
4.25

2.66
4.01

2.58
3.84

2.51
3.71

2.46
3.60

2.41
3.51

2.34
3.37

2.27
3.23

2.19
3.08

2.15
3.00

1.92
2.57

19

5%
1%

4.38
8.18

3.52
5.93

3.13
5.01

2.90
4.50

2.74
4.17

2.63
3.94

2.54
3.77

2.48
3.63

2.42
3.52

2.38
3.43

2.31
3.30

2.23
3.15

2.16
3.00

2.11
2.92

1.88
2.49

20

5%
1%

4.35
8.10

3.49
5.85

3.10
4.94

2.87
4.43

2.71
4.10

2.60
3.87

2.51
3.70

2.45
3.56

2.39
3.46

2.35
3.37

2.28
3.23

2.20
3.09

2.12
2.94

2.08
2.86

1.84
2.42

21

5%
1%

4.32
8.02

3.47
5.78

3.07
4.87

2.84
4.37

2.68
4.04

2.57
3.81

2.49
3.64

2.42
3.51

2.37
3.40

2.32
3.31

2.25
3.17

2.18
3.03

2.10
2.88

2.05
2.80

1.81
2.36

22

5%
1%

4.30
7.95

3.44
5.72

3.05
4.82

2.82
4.31

2.66
3.99

2.55
3.76

2.46
3.59

2.40
3.45

2.34
3.35

2.30
3.26

2.23
3.12

2.15
2.98

2.07
2.83

2.03
2.75

1.78
2.31

23

5%
1%

4.28
7.88

3.42
5.66

3.03
4.76

2.80
4.26

2.64
3.94

2.53
3.71

2.44
3.54

2.37
3.41

2.32
3.30

2.27
3.21

2.20
3.07

2.13
2.93

2.05
2.78

2.01
2.70

1.76
2.26

24

5%
1%

4.26
7.82

3.40
5.61

3.01
4.72

2.78
4.22

2.62
3.90

2.51
3.67

2.42
3.50

2.36
3.36

2.30
3.26

2.25
3.17

2.18
3.03

2.11
2.89

2.03
2.74

1.98
2.66

1.73
2.21

5%
1%

3.84
6.63

3.00
4.61

2.60
3.78

2.37
3.32

2.21
3.02

2.10
2.80

2.01
2.64

1.94
2.51

1.88
2.41

1.83
2.32

1.75
2.18

1.67
2.04

1.57
1.88

1.52
1.79

1.00
1.00

TABLE 7: TWO-TAILED CRITICAL VALUES FOR F DISTRIBUTION

10

12

15

20

24

5%
1%

647.8
16211

799.5
20000

864.2
21615

899.6
22500

921.8
23056

937.1
23437

948.2
23715

956.7
23925

963.3
24091

968.6
24224

976.7
24426

984.9
24630

993.1
24836

997.2
24940

1018
25465

5%
1%

38.51
198.5

39.00
199.0

39.17
199.2

39.25
199.2

39.30
199.3

39.33
199.3

39.36
199.4

39.37
199.4

39.39
199.4

39.40
199.4

39.41
199.4

39.43
199.4

39.45
199.4

39.46
199.5

39.50
199.5

5%
1%

17.44
55.55

16.04
49.80

15.44
47.47

15.10
46.19

14.88
45.39

14.73
44.84

14.62
44.43

14.54
44.13

14.47
43.88

14.42
43.69

14.34
43.39

14.25
43.08

14.17
42.78

14.12
42.62

13.90
41.83

5%
1%

12.22
31.33

10.65
26.28

9.98
24.26

9.60
23.15

9.36
22.46

9.20
21.97

9.07
21.62

8.98
21.35

8.90
21.14

8.84
20.97

8.75
20.70

8.66
20.44

8.56
20.17

8.51
20.03

8.26
19.32

5%
1%

10.01
22.78

8.43
18.31

7.76
16.53

7.39
15.56

7.15
14.94

6.98
14.51

6.85
14.20

6.76
13.96

6.68
13.77

6.62
13.62

6.52
13.38

6.43
13.15

6.33
12.90

6.28
12.78

6.02
12.14

5%
1%

8.81
18.63

7.26
14.54

6.60
12.92

6.23
12.03

5.99
11.46

5.82
11.07

5.70
10.79

5.60
10.57

5.52
10.39

5.46
10.25

5.37
10.03

5.27
9.81

5.17
9.59

5.12
9.47

4.85
8.88

5%
1%

8.07
16.24

6.54
12.40

5.89
10.88

5.52
10.05

5.29
9.52

5.12
9.16

4.99
8.89

4.90
8.68

4.82
8.51

4.76
8.38

4.67
8.18

4.57
7.97

4.47
7.75

4.41
7.64

4.14
7.08

5%
1%

7.57
14.69

6.06
11.04

5.42
9.60

5.05
8.81

4.82
8.30

4.65
7.95

4.53
7.69

4.43
7.50

4.36
7.34

4.30
7.21

4.20
7.01

4.10
6.81

4.00
6.61

3.95
6.50

3.67
5.95

5%
1%

7.21
13.61

5.71
10.11

5.08
8.72

4.72
7.96

4.48
7.47

4.32
7.13

4.20
6.88

4.10
6.69

4.03
6.54

3.96
6.42

3.87
6.23

3.77
6.03

3.67
5.83

3.61
5.73

3.33
5.19

10

5%
1%

6.94
12.83

5.46
9.43

4.83
8.08

4.47
7.34

4.24
6.87

4.07
6.54

3.95
6.30

3.85
6.12

3.78
5.97

3.72
5.85

3.62
5.66

3.52
5.47

3.42
5.27

3.37
5.17

3.08
4.64

11

5%
1%

6.72
12.23

5.26
8.91

4.63
7.60

4.28
6.88

4.04
6.42

3.88
6.10

3.76
5.86

3.66
5.68

3.59
5.54

3.53
5.42

3.43
5.24

3.33
5.05

3.23
4.86

3.17
4.76

2.88
4.23

12

5%
1%

6.55
11.75

5.10
8.51

4.47
7.23

4.12
6.52

3.89
6.07

3.73
5.76

3.61
5.52

3.51
5.35

3.44
5.20

3.37
5.09

3.28
4.91

3.18
4.72

3.07
4.53

3.02
4.43

2.72
3.90

13

5%
1%

6.41
11.37

4.97
8.19

4.35
6.93

4.00
6.23

3.77
5.79

3.60
5.48

3.48
5.25

3.39
5.08

3.31
4.94

3.25
4.82

3.15
4.64

3.05
4.46

2.95
4.27

2.89
4.17

2.60
3.65

14

5%
1%

6.30
11.06

4.86
7.92

4.24
6.68

3.89
6.00

3.66
5.56

3.50
5.26

3.38
5.03

3.29
4.86

3.21
4.72

3.15
4.60

3.05
4.43

2.95
4.25

2.84
4.06

2.79
3.96

2.49
3.44

15

5%
1%

6.20
10.80

4.77
7.70

4.15
6.48

3.80
5.80

3.58
5.37

3.41
5.07

3.29
4.85

3.20
4.67

3.12
4.54

3.06
4.42

2.96
4.25

2.86
4.07

2.76
3.88

2.70
3.79

2.40
3.26

16

5%
1%

6.12
10.58

4.69
7.51

4.08
6.30

3.73
5.64

3.50
5.21

3.34
4.91

3.22
4.69

3.12
4.52

3.05
4.38

2.99
4.27

2.89
4.10

2.79
3.92

2.68
3.73

2.63
3.64

2.32
3.11

17

5%
1%

6.04
10.38

4.62
7.35

4.01
6.16

3.66
5.50

3.44
5.07

3.28
4.78

3.16
4.56

3.06
4.39

2.98
4.25

2.92
4.14

2.82
3.97

2.72
3.79

2.62
3.61

2.56
3.51

2.25
2.98

18

5%
1%

5.98
10.22

4.56
7.21

3.95
6.03

3.61
5.37

3.38
4.96

3.22
4.66

3.10
4.44

3.01
4.28

2.93
4.14

2.87
4.03

2.77
3.86

2.67
3.68

2.56
3.50

2.50
3.40

2.19
2.87

19

5%
1%

5.92
10.07

4.51
7.09

3.90
5.92

3.56
5.27

3.33
4.85

3.17
4.56

3.05
4.34

2.96
4.18

2.88
4.04

2.82
3.93

2.72
3.76

2.62
3.59

2.51
3.40

2.45
3.31

2.13
2.78

20

5%
1%

5.87
9.94

4.46
6.99

3.86
5.82

3.51
5.17

3.29
4.76

3.13
4.47

3.01
4.26

2.91
4.09

2.84
3.96

2.77
3.85

2.68
3.68

2.57
3.50

2.46
3.32

2.41
3.22

2.09
2.69

21

5%
1%

5.83
9.83

4.42
6.89

3.82
5.73

3.48
5.09

3.25
4.68

3.09
4.39

2.97
4.18

2.87
4.01

2.80
3.88

2.73
3.77

2.64
3.60

2.53
3.43

2.42
3.24

2.37
3.15

2.04
2.61

22

5%
1%

5.79
9.73

4.38
6.81

3.78
5.65

3.44
5.02

3.22
4.61

3.05
4.32

2.93
4.11

2.84
3.94

2.76
3.81

2.70
3.70

2.60
3.54

2.50
3.36

2.39
3.18

2.33
3.08

2.00
2.55

23

5%
1%

5.75
9.63

4.35
6.73

3.75
5.58

3.41
4.95

3.18
4.54

3.02
4.26

2.90
4.05

2.81
3.88

2.73
3.75

2.67
3.64

2.57
3.47

2.47
3.30

2.36
3.12

2.30
3.02

1.97
2.48

24

5%
1%

5.72
9.55

4.32
6.66

3.72
5.52

3.38
4.89

3.15
4.49

2.99
4.20

2.87
3.99

2.78
3.83

2.70
3.69

2.64
3.59

2.54
3.42

2.44
3.25

2.33
3.06

2.27
2.97

1.94
2.43

5%
1%

5.02
7.88

3.69
5.30

3.12
4.28

2.79
3.72

2.57
3.35

2.41
3.09

2.29
2.90

2.19
2.74

2.11
2.62

2.05
2.52

1.94
2.36

1.83
2.19

1.71
2.00

1.64
1.90

1.00
1.00

STATISTICS - EXAMPLES 1
1. In an operation the probability that a helicopter fails to return from a sortie is 5%.
What is the chance that it survives 30 sorties?
2. Under a given set of tactical conditions a tank has an 85% chance of seeing an enemy
tank for long enough to engage it. It then has a 90% chance of hitting with its first
shot, and if it hits an 80% chance of killing it. Under these conditions, what is the
chance of the enemy tank surviving?
3. You are validating the results of a tank ammunition trial. The hit probability of a
tank gun with a single round is 13 , independently of all other shots, when firing at a
target at 2000 metres. If 5 rounds are fired without correction, find the chance of:
(a) Missing with the first 2 and hitting with the remainder.
(b) Making 4 hits and 1 miss.
(c) Making at least 2 hits.
4. Three soldiers each fire once at a target. Their probabilities of hitting are 0.2, 0.3 and
0.4, independently of each other. Find the probabilities of obtaining each of exactly
zero, one, two or three hits. Check that these probabilities sum to 1 (why should
they?).
5. An assault upon an enemy airfield is being planned. The assault team will need to be
parachuted to within 3 miles of the target, after which, to reach the airfield, they will
need to successfully cross a minefield followed by the airfield defences.
It is estimated that there is a 65% chance of remaining undetected during the paradrop,
and a 75% chance of successfully crossing the minefield. The chance of getting through
the airfield defences is 80% if the paradrop was undetected but only 20% if it was
detected.
Use a tree diagram to calculate the probability that the assault team succeeds in
reaching the airfield.
6. A screening test for a disease has the following characteristics:
If a person has the disease then there is a 95% chance that the test proves positive,
i.e. that it correctly detects the disease.
If a person does not have the disease then there is a 5% chance that the test result is
(incorrectly) positive.
If 1% of the population have the disease and a person selected at random tests positive,
find the probability that they really have the disease.

GREEN
STATISTICS - EXAMPLES 1 - SOLUTIONS
1. Chance of surviving one sortie is 0.95
Chance of surviving n sorties is therefore 0.95n
Chance of surviving 30 sorties is 0.9530 = 0.21464
2. Chance of engaging is 0.85
Chance of engaging and hitting is 0.85 0.9

Chance of engaging, hitting and killing is 0.85 0.9 0.8 = 0.612


Hence chance of survival is 1 0.612 = 0.388.

3. The chance of a hit is 31 , miss 32 .


(a) Chance of 2 misses followed by 3 hits is
 3
 2
4
1
2

=
= 0.016
3
3
243
(b) Chance of 4 hits and 1 miss in any order is
 4
2
10
1
=
= 0.041
5
3
3
243
(the factor of 5 is because the miss could occur on any of the 5 firings).
(c) Chance of zero hits is

 5

2
3

Chance of exactly 1 hit is 5

32
243
 4

1
2

3
3

80
243

(the factor of 5 is because the hit could occur on any of the firings).
Hence chance of at least 2 hits is 1

80
131
32

=
= 0.539
243 243
243

4.
P (0)
P (1)
P (2)
P (3)

=
=
=
=

(1 0.2) (1 0.3) (1 0.4) = 0.8 0.7 0.6 = 0.336


0.8 0.7 0.4 + 0.8 0.3 0.6 + 0.2 0.7 0.6 = 0.452
0.8 0.3 0.4 + 0.2 0.7 0.4 + 0.2 0.3 0.6 = 0.188
0.2 0.3 0.4 = 0.024

and P (0 or 1 or 2 or 3) = 1 since the events are mutually exclusive and exhaustive.

5. Well leave you to draw the tree diagram yourself, but in algebraic terms the problem
can be solved as follows.
Break it down into succeeding undetected and succeeding detected:
P (succeed undetected) = 0.65 0.75 0.8 = 0.3900
P (succeed detected) = 0.35 0.75 0.2 = 0.0525
Since these are mutually exclusive, the chance of success is 0.3900 + 0.0525 = 0.4425.
6. Again we leave you to draw the tree diagram yourself and give the answer algebraically.
We have
P (test positive | have disease) = 0.95
P (test positive | dont have disease) = 0.05
P (have disease) = 0.01
Hence
P (have disease and test positive) = 0.01 0.95 = 0.0095
P (dont have disease and test positive) = 0.99 0.05 = 0.0495
These are mutually exclusive, so
P (test positive) = 0.0095 + 0.0495 = 0.0590
Therefore there is a probability of 0.0590 that a person selected at random will test
positively. Of these, a proportion
0.0095
= 0.1610
0.0590
actually have the disease. This is in fact using the formula for conditional probability
P (have disease | test positive) =
=

P (have disease and test positive)


P (test positive)
0.0095
= 0.1610
0.0590

Hence, if a person tests positively there is still only a 16.1% chance that they actually
have the disease. This is a fairly typical result, in that the vast majority of people
do not have the disease so that, unless the test is incredibly accurate, the majority of
positive results will in fact be false positives.

STATISTICS - EXAMPLES 2
1. From a stock of old ammunition with 10% defectives, five rounds are fired. Find the
probabilities of each of 0, 1, 2, 3, 4 or 5 defective rounds in the sample.
2. If the proportion of defective items in a large production run is 8%, what is the
probability that a sample of 30 items will include fewer than 3 defectives?
3. In a large consignment of SAA the average number of dud rounds in a box is one.
Find the chance that a box selected at random contains more than 3 duds.
4. A man test firing a rifle at a figure target at 1000 metres range can expect to get 1
hit in 20. Calculate the chance of 2 or more hits when 10 rounds are fired.
5. In 1944 the question arose of whether V1 flying bombs falling in London were aimed
at pinpoint targets, or just aimed roughly in the direction of London. London was
divided into 576 areas each of 14 km2 , and the number of areas each with 0, 1, 2 etc
hits recorded. A total of 535 bombs fell in the 576 areas, with the distribution
Number of hits (x)
Number of areas with x hits

0
1 2 3
229 211 93 35

4 5 6
7 1
0

If the bombs were falling randomly then these data should look like 576 observations
535
from a Poisson distribution with mean 576
.
Calculate the expected frequencies using the Poisson distribution and use these to
decide whether the bombs appear to have fallen randomly or not.

GREEN
STATISTICS - EXAMPLES 2 - SOLUTIONS
1. Assuming independence of the rounds, this is binomial with n = 5, p = 0.1, i.e.
Bi(5, 0.1), so that
P (0) = 0.95 = 0.5905
P (1) = 5 0.1 0.94 = 0.3281
54
0.12 0.93 = 0.0729
P (2) =
2
54
P (3) =
0.13 0.92 = 0.0081
2
P (4) = 5 0.14 0.9 = 0.0005
P (5) = 0.15 = 0.00001
2. The number of defectives follows a binomial distribution with 30 trials and success
probability 0.08, that is a Bi(30, 0.08) distribution. Hence
P (0)

0.9230

0.0820

P (1)

30 0.08 0.9229

0.2138

P (2)

30 29
0.082 0.9228
12

0.2696

Total

0.5654

Hence the chance of fewer than 3 defectives is 0.5654.


3. If duds occur randomly then the number per box follows a Poisson distribution with
mean 1. Hence
P (0) = e1 = 0.3679
P (1) = 1 e1 = 0.3679
1
P (2) =
e1 = 0.1839
2!
1
e1 = 0.0613
P (3) =
3!
Chance of three or fewer = 0.3679 + 0.3679 + 0.1839 + 0.0613 = 0.9810.
Chance of more than three = 1 0.9810 = 0.0190.
1
4. Since 20
= 0.05, the number of hits when 10 rounds are fired follows a binomial distribution with 10 trials and success probability 0.05, that is a Bi(10, 0.05) distribution.
Hence

P(0) 0.9510

0.5987

P(1) 10 0.05 0.959

0.3151
0.9138

Chance of 0 or 1 hit = P (0) + P (1) = 0.9138


Chance of 2 or more hits is therefore 1 0.9138 = 0.0862.
5. If the bombs are falling randomly then these data should look like 576 observations
from a Poisson distribution, and the only thing we have to go on as to the mean of
this distribution is the observed data, so we estimate the mean m using
m
=

535
= 0.928819
576

Hence we calculate the chance of 0, 1, 2, 3, 4, 5 hits


P (0) = e0.928819 = 0.3950
P (1) = 0.928819 e0.928819 = 0.3669
0.9288192
e0.928819 = 0.1704
P (2) =
2!
0.9288193
e0.928819 = 0.0528
P (3) =
3!
0.9288194
e0.928819 = 0.0122
P (4) =
4!
0.9288195
P (5) =
e0.928819 = 0.0023
5!
P ( 6) = 1 sum of above = 0.0004
To compare observed to expected frequencies, multiply the above proportions by 576,
the number of observations.
Hence, if the bombs are falling randomly, we would expect the number of areas with
zero hits to be
576 0.3950 = 227.52
and so on, giving

Number of hits (x)


Observed no. of areas with x hits
Expected no. of areas with x hits

0
1
2
3
4
5
229
211
93
35
7
1
227.52 211.34 98.15 30.39 7.06 1.31

6
0
0.23

The numbers are all very close, suggesting that there is no evidence that the V1 bombs
are being aimed, at least at the scale which we are considering. (See section 13.3 for
a formal approach to testing this).

STATISTICS - EXAMPLES 3
1. For a normal distribution with mean 25 and standard deviation 5, find the probability
of a single observation falling within each of the following intervals:
(a) 17.0 to 19.0
(b) above 35.0
(c) 22.0 to 30.0
2. Limit gauges are used to reject all components of length greater than 1.51cm or less
than 1.49cm. A machine produces such components with the length normally distributed with mean 1.503cm and standard deviation 0.004cm. What proportion of
components will be rejected?
3. For a normal distribution with mean 10 and standard deviation 4, find
(a) The value k such that the chance of an observation exceeding k is 5%.
(b) A value k, exceeding 12, such that the chance of an observation falling in the
interval between 12 and k is 20%.
4. Of a large group of men selected for a clothing trial, 5% are under 60 inches in height
and 40% between 60 inches and 65 inches. Assuming heights are normally distributed,
find the mean and the standard deviation of heights.
5. You are calculating theoretical hit probabilities prior to a tank gun trial. The standard deviation of the distribution of hits about the mpi on a target is governed by
the variability of a number of independent errors. If the horizontal and vertical errors
(standard deviations) in mils are distributed in the manner shown below at 1000m, find
the overall horizontal and vertical error standard deviations. Hence find the chance of
a hit on a target 2 metres (horizontal) by 3 metres (vertical) when the mpi is at the
centre of the bottom of the target.
Note 1: 1 mil subtends 1 metre at 1000 metres, so you can take a mil as being equivalent
to a metre in this case (this is not quite correct, but we use it for simplicitys sake).
Note 2: A standard result (section 6.4) tells us that when we have multiple independent
sources of variation then the overall variance is simply the sum of the individual
variances.
Note 3: You can assume that horizontal error and vertical error are independent.
The following table gives the standard deviations associated with the various sources
of error:

Horizontal Sight-mechanical

0.5 Vertical Sight-mechanical

0.5

Barrel bend

0.1

Droop

0.2

Cross wind

0.5

Rangefinding

0.2

Ballistic

0.2

Ballistic

0.3

Laying

0.1

Laying

0.1

6. A ballistics trial to be carried out with a modified tank gun requires muzzle velocities
between 960 m/s and 970 m/s. Two types of round (A and B) seem to be the most
promising. Round A has a mean muzzle velocity of 963 m/s with a standard deviation
of 6 m/s while round B has a mean muzzle velocity of 968 m/s with a standard
deviation of 5 m/s, when fired from this gun.
(a) Which type of round should be selected for the trial in order to minimise the
number of wasted firings?
(b) After firing 5 of the rounds selected in (a), what is the probability that:
i. none of them have achieved a muzzle velocity in the required range?
ii. at least 4 of them have achieved a muzzle velocity in the required range?

GREEN
STATISTICS - EXAMPLES 3 - SOLUTIONS
1. Now, = 25, = 5 so X N(25, 52 ).
(a) We want
P (17 < X < 19) = P

19 25
17 25
= P (1.6 < Z < 1.2)
<Z<
5
5


Since, from tables,


P (0 > Z > 1.6) = 0.4452 and P (0 > Z > 1.2) = 0.3849
the chance of being inside the interval is the difference between the two values
= 0.4452 0.3849 = 0.0603
(b) Similarly


P (X > 35) = P Z >

35 25
= P (Z > 2.0) = 0.5 0.4772 = 0.0228
5


(c) Finally
P (22 < X < 30) = P

22 25
30 25
= P (0.6 < Z < 1.0)
<Z<
5
5


Now,
P (0 > Z > 0.6) = 0.2257 and P (0 < Z < 1.0) = 0.3413
so that the chance of being inside the interval is the sum of the two values
0.2257 + 0.3413 = 0.5670
2. The length is described by the random variable X N(1.503, 0.0042). The proportion
of components which are the correct size is
P (1.49 < X < 1.51) = P

1.49 1.503
1.51 1.503
<Z<
0.004
0.004

= P (3.25 < Z < 1.75)


= Q(1.75) + Q(3.25)
= 0.4599 + 0.4994 = 0.9593
so that 1 0.9593 = 0.0407 or 4.07% of them will be rejected.
3. Here X N(10, 42 )
1

(a) We need k such that


k 10
P (X > k) = P Z >
4

= 0.05

From tables,
Q(1.645) 0.4500 (by interpolation)
so that
P (Z > 1.645) = 0.5 0.4500 = 0.05
so that we equate
k 10
= 1.645 k = 10 + 4 1.645 = 16.58
4
(b) We need k such that
k 10
12 10
<Z<
4
4

P (12 < X < k) = P

= 0.2

From tables,
12 10
Q
= Q(0.5) = 0.1915
4


so that
k 10
Q
4

= 0.1915 + 0.2 = 0.3915

From tables,
Q(1.235) 0.3915 (by interpolation)
so that we equate
k 10
= 1.235 k = 10 + 4 1.235 = 14.94
4
4. We know X N(, 2 ) and we have
P (X < 60) = 0.05 and P (60 < X < 65) = 0.40
so that
P (X < 65) = 0.45
First translate these to statements about Z, putting
a1 =

60

a2 =
2

65

Note that these will both be negative because they are to the left of the mean. Hence,
putting
Q1 = P (a1 < Z < 0)

Q2 = P (a2 < Z < 0)

from tables we have


0.5 Q1 = 0.05 Q1 = 0.45 a1 = 1.645
Q1 Q2 = 0.40 Q2 = 0.05 a2 = 0.125
(both by interpolation) so that

P (1.645 < Z < 0.125) = 0.40


Then we just equate the pairs of expressions:
1.645 =

60

and

0.125 =

65

Solve for and (obtain first by subtraction):


1.645 = 60

and

0.125 = 65

Hence
1.645 (0.125) = 60 (65 )
Therefore 1.52 = 5, and so
=
so that

5
= 3.29 inches
1.52

= 60 + 1.645 = 65.41 inches


5. Using the result that the overall variance is the sum of the individual variances:
2
H
= 0.52 + 0.12 + 0.52 + 0.22 + 0.12 = 0.56 H = 0.7483

V2 = 0.52 + 0.22 + 0.22 + 0.32 + 0.12 = 0.43 V = 0.6557


At range 1000m, the error in metres is the same as that in mils.
Horizontally: we want P (1 X 1) when X N(0, 0.74832). Hence
P (1 X 1) = P

10
1 0
<Z<
0.7483
0.7483

= P (1.34 < Z < 1.34) = 0.4099 + 0.4099 = 0.8198


Vertically: we want P (0 Y 3) when Y N(0, 0.65572). Hence
P (0 Y 3) = P

00
30
<Z<
0.6557
0.6557

= P (0 < Z < 4.575) = 0.5


By independence, probability of hitting target = 0.8198 0.5 = 0.4099
3

6. (a) We want to pick the round type with the larger value of P (960 MV 970)
Round A
P (960 MV 970) = P

970 963
960 963
Z
6
6

where Z N(0, 1)
= P (0.5 Z 1.17)
= 0.1915 + 0.3790
= 0.5705
Round B
P (960 MV 970) = P

970 968
960 968
Z
5
5

where Z N(0, 1)
= P (1.6 Z 0.4)
= 0.4452 + 0.1554
= 0.6006
Hence, the type B round should be selected for the trial.
(b) The number of type B rounds achieving a muzzle velocity in the range from
960m/s to 970m/s will follow a binomial distribution with n = 5 and p = 0.6006.
i. Hence X Bi(5, 0.6006).
P (X = 0) = (1 p)n
= (1 0.6006)5
= 0.0102
ii. Again X Bi(5, 0.6006).
P (X 4) = P (X = 4) + P (X = 5)
P (X = 4) =
=
=
P (X = 5) =
=
=
Therefore P (X 4) =
=

5
4

p4 (1 p)

5(0.6006)4 (1 0.6006)
0.2598
p5
(0.6006)5
0.0781
0.2598 + 0.0781
0.3379

STATISTICS - EXAMPLES 4
1. Your trials establishment has a machine shop. The standard deviation of the lengths of
components (required to modify some trials vehicles) produced by a machine process
is well established as 0.002cm. A random sample of 5 measured lengths from a days
production are 2.1045, 2.1050, 2.1060, 2.1035, 2.1055. Between what limits may it
be stated with 95% confidence that the true mean length will be located? (You may
assume lengths are normally distributed).
2. Thirty six rounds test-fired from a mortar at the same target achieve a sample mean
range of 421m with a sample standard deviation of 42m. Is this evidence against the
range table figure of 400m? State any assumptions you make.
How would the answer change if the question had been is there any evidence that the
true mean range exceeds the range table figure?
3. Sixty measurements of the speed of a hovercraft operating under specified trial conditions have a sample mean value of 40.2 knots and a sample standard deviation of 5.7
knots. What are the 95% confidence limits for the true mean speed of this hovercraft
under these conditions? State any assumptions you make.
4. Six rounds from gun A and 5 rounds from gun B are test fired under the same conditions and fall at the following distances in metres from a datum point. The standard
deviations in range for guns of types A and B are well established and have the values
of 30m and 42m respectively. On the assumption that range is normally distributed,
test at the 5% level whether there is significant evidence of a difference in mean ranges
between the two guns.
Gun A
Gun B

50 72 69 84 75 102
125 84 180 170 43

Also construct a 95% confidence interval for the difference in means.


5. A trial is conducted to determine whether a cheaper alternative tyre for a scout car is
as hard-wearing as the tyre currently specified.
Fifty tyres of each type are used in similar terrain until a pre-specified level of wear is
reached. The mileage taken by each tyre to reach this level is recorded.
Assuming the recorded mileages to be approximately normally distributed, use the
results given below to test, at the 5% level, whether there is any evidence of a difference
between the mean resilience of the two types of tyre.
Existing tyre:

Mean
Standard deviation

= 6204 miles
= 510 miles

Cheaper tyre:

Mean
= 6101 miles
Standard deviation = 552 miles

Comment on how you might have improved the design of the trial.

GREEN
STATISTICS - EXAMPLES 4 - SOLUTIONS
1. Here x = 2.1049 and we know that length is normally distributed with = 0.002.
Hence exact 95% confidence limits for the population mean are
0.002
2.1049 1.96 = 2.1049 0.00175 = (2.10315, 2.10665) cm
5
2. Since n = 36 is quite large we can assume that the sample mean comes from a
distribution which is approximately normally distributed and that the sample variance
S 2 is a reasonably good estimate of the population variance 2 . Hence the z test will
be approximately correct.
We have x = 421, S 2 = 422 , n = 36.
the standard error of the mean, is estimated to be
The standard deviation of X,
s

42
S2
S
= =
=7
n
n
6

We test hypotheses
H0 : = 400 v H1 : 6= 400
Since x = 421 our observed test statistic is
421 400
=3
zobs =
7
If H0 is true then this value should look like an observation from a standard normal
distribution.
Now, from tables, for a two-tailed test the critical value for a 1% significance test is
2.58. Since the observed test statistic zobs exceeds this 1% critical value (in magnitude)
we would reject H0 with a 1% test. Hence there is strong evidence that the range table
figure should not be 400m, but somewhat higher. In fact the 0.5% critical value is
2.81, so that the result is significant even at the 0.5% level.
Note: Assuming that the data are normally distributed, which is pretty reasonable in
this case, we obtain the exact answer using the t distribution rather than the normal.
For the t35 distribution the 1% critical value is 2.724, so in this case it does not affect
our conclusions.
If the question was is there any evidence that the true mean range exceeds the range
table figure? then the hypotheses become
H0 : 400 v H1 : > 400
and this is a one-tailed test. The only change is that now we use the one-tailed critical
values, for example zcrit = 2.33 for a 1% test. This time zobs exceeds even the 0.2%
critical value of 2.87, so we would reject H0 for any significance level greater than
or equal to 0.2%. Hence clearly we still have strong evidence against H0 and the
conclusion remains unchanged.
1

3. Since n = 60 is quite large we can assume that the sample mean comes from a
distribution which is approximately normally distributed and that the sample variance
S 2 is a reasonably good estimate of the population variance 2 .
Hence approximate 95% confidence limits for the population mean are
S
x 1.96
n
where zcrit = 1.96 is the critical value for a (two-sided) 95% CI (or a two-tailed 5%
test).
Since
x = 40.2, S = 5.7, n = 60
the approximate 95% confidence limits are
5.7
40.2 1.96 = 40.2 1.4 = (38.8, 41.6) knots
60
Note: If we can assume that the data are normally distributed, we obtain the exact
answer using the t distribution rather than the normal. This simply entails replacing
1.96 with 2.00, the value for the t59 distribution. Even if we do not assume that the
data are normally distributed, it can still be argued that this will probably give a
slightly better approximation in most cases.
4. Calculate the sample mean ranges for each gun: xA = 75.3, xB = 120.4.
We are given the true standard deviations: A = 30, B = 42. Also nA = 6, nB = 5.
We test
H0 : A B = 0 v H1 : A B 6= 0
Since the standard deviations are given, and the data are observations from a normal
distribution, the test statistic is
xA xB
75.3 120.4
zobs = r 2
= q 2
= 2.01
2
422
30
B
A
+
+
6
5
nA
nB
For 5%, two tails the critical value is 1.96. The test statistic exceeds this in magnitude,
so the difference is significant at the 5% level. We conclude there is some evidence for
a difference in range between the two guns, with gun B being longer ranged than gun
A.
The confidence interval follows almost immediately, as
75.3 120.4 1.96

302 422
+
= 45.1 43.95 = (89.05 , 1.15)
6
5

5. We require a 2-sample test of


H0 : A = B v H1 : A 6= B
Since nA = nB = 50 are both fairly large we can use the two-sample z test with the
sample variances used to estimate the population variances.
xA xB
zobs = r 2
SA
S2
+ nBB
nA
6204 6101
= q 2
2
510
+ 552
50
50
= 0.969
For a 2-tailed test at the 5% level, the critical value from the normal distribution is
1.96. Since the test statistic is smaller than the critical value, the difference between
the sample mileages is not significant at the 5% level.
Hence there is insufficient evidence to conclude that the cheaper tyre is inferior. Note
that this does not necessarily mean that the cheaper tyre is as good as the existing
one, but simply that we have insufficient evidence to show otherwise. Perhaps a more
appropriate test to perform here would be to compare the results produced by the
cheaper tyre with a minimum acceptable standard.
In addition, it would have been better to have put, say, one tyre of each type on each
axle (randomising the left and right allocation), and then performed a paired test.
This would have been more likely to detect a difference, if one exists.
Note: If we can assume that the data are normally distributed, we obtain a more
accurate answer with the t distribution rather than the normal. We would use the
two-sample t test, using the pooled sample variance (the sample variances are close
enough to allow this). However, since the total sample size is 100 there is very little
difference between the tests.

STATISTICS - EXAMPLES 5
1. Six rounds test fired from a gun at the same target fall at ranges of 10125, 10240,
10200, 10280, 10320, 10180m. Assuming that range is normally distributed, find if
there is evidence at the 1% level that the true mean range of the gun differs from the
range table figure of 10285m. Also construct a 95% confidence interval for the true
mean range.
2. Sixteen tank gunners use a simulator and then perform a field trial. Their scores on
the simulator have a sample mean value of 74 with a sample standard deviation of 19.2,
while in the field their scores have a sample mean value of 67 with a sample standard
deviation of 17.8. Past experience shows that both field and simulator readings can
be assumed to follow a normal distribution. Is there any evidence from these figures
that gunners in general achieve a lower standard in the field? Suggest a better way to
assess this.
3. Ten soldiers test fire a rifle. They then refire it, after it has been modified, a week
later. Their scores (higher is better), in the same order, are:
1st week
67 24 57 55 63 54 56 68 33 43
2nd week
70 38 58 58 56 67 68 77 42 38
Each of these scores is based on the average miss distance for 50 shots.
Is there any evidence of an improvement? How will the test be affected if the scores
are not shown in the same order each time?
4. Extensive firings of the in-service weapon show that in a 3 month period gun barrels
wear on average by 0.1mm. It has been suggested that gun barrels may be treated by
a new process to resist barrel wear. In order to test this hypothesis thirty new guns
have been treated by this new process and after 3 months of in-service test firings
their barrel wear has been recorded. It is found that the average wear for these thirty
guns is 0.13mm, with an estimated standard deviation of 0.055mm. It is your job to
make sense of these numbers. Can you do so?
5. Looking through scores obtained by 4 ATGW operators using a simulator together
with their corresponding scores during a field trail you see the results given in the
table below:
Operator
Simulator score
Field score

1
60
52

2
40
37

3
52
50

4
25
24

Looking at the differences of these scores you find a mean difference of 3.5 and an
estimated standard deviation (of the differences) of 3.11. This gives an observed t
value for the paired test of 2.25. Finding a set of statistics tables you look up the
critical t value for a one-sided test with = 3 at the 5% level of significance as 2.35.
You conclude therefore that there is no evidence that operators in general have a lower
score in the field.
Major Thruster, however, looks at his set of tables of results and makes the following
comments:
1

To hell with statistics, every single one of those chaps got a lower score in the
field than on the simulator. That is certainly a significant result!
Moreover, after looking through his Statistics tables he spots that if you had taken a
10% level of significance the critical t value is 1.638, in which case even your method
of attacking the problem would have given a significant result.
(If you search for the value of t at 10% level of significance for a one tail test in your
tables you will appreciate that Major Thruster has access to a more detailed set of
tables than yours).
Discuss the validity of your approach and that of Major Thruster.

GREEN
STATISTICS - EXAMPLES 5 - SOLUTIONS
1. We test
H0 : = 10285 v H1 : 6= 10285
using a sample of size n = 6 where x = 10224.2 and S = 70.6. We are told that the
data are normally distributed, so an exact test uses
x 0
10224.2 10285
= 2.11
tobs =
=
70.6
S

From tables the 1%, two tail, critical value for t5 is 4.03. The test statistic does not
exceed this in magnitude so that the result is not significant at the 1% level. Hence
we can conclude that there is no evidence that this gun has a range different to the
range table value.
The 95% CI is
70.6
10224.2 2.57 = 10224.2 74.0734 = (10150.1 , 10298.3)
6
using the 5%, two tail, critical value for t5 of tcrit = 2.57. Note that the value 10285
is within the interval, so that a 5% test would also have lead to non-rejection of H0 .
2. Simulator scores

xA = 74 SA = 19.2

Field scores
xB = 67 SB = 17.8
with nA = nB = 16 and we test hypotheses
H0 : A B = 0 v H1 : A B > 0
Note that the question specifically tells us that the alternative hypothesis is that the
gunners do worse in the field, rather than differently.
Samples are fairly small and we can assume normally distributed observations so that
we can use a two-sample t test.
The two variances are 368.64 and 316.84 so that pooling is clearly reasonable, with
15 19.22 + 15 17.82
= 342.74
16 + 16 2
so that Sp = 18.5132. Hence the test statistic is
Sp2 =

tobs =

74 67

18.5132

1
16

1
16

74 67
= 1.0695
6.5454

the critical value (5%, one tailed, t30 ) is 1.70. The observed test statistic is less than
this so we do not reject H0 at the 5% significance level. Hence we can conclude that
there is no evidence that gunners in general achieve a lower standard in the field.
Clearly we should have recorded each gunners simulator and field scores and then
used a paired test on the differences. By allowing for inter-gunner variation in this
way we would increase our ability to detect any difference between simulator and field
results (see answer to question 3 for an illustration of this).
1

3. Is there evidence that the mean of the differences is greater than zero? Since each
column in the table is a pair of values for the same soldier, we can use a paired t test.
This requires the assumption that the differences follow a normal distribution. We
cannot verify this with such a small sample, but the distribution of the differences
should at least be symmetrical, and the scores are based on 50 shots, so that the
normality assumption is probably OK.
These differences (week 2 minus week 1) are:
x = 3, 14, 1, 3, 7, 13, 12, 9, 9, 5
On the assumption that the scores on the two weeks are from the same population
the mean difference should be 0, so we test
H0 : d 0 v H1 : d > 0
where d is the true mean difference. The sample mean and variance of the above are
xd = 5.2 , Sd2 = 54.8444 , Sd = 7.4057
Hence the standard error of the mean (estimated standard deviation of means of
samples of size 10) is
7.4057
Sd
=
= 2.3419
n
10
Hence the test statistic is
tobs =

xd 0
Sd

5.2 0
= 2.220
2.3419

The significance level was not specified in the question. From tables the critical value
for t9 (5%, one tailed) is 1.83, but that for 1% is 2.82, so our result is significant at 5%
but not at 1%. Hence there is fairly strong evidence that the modified mean is higher.
Is there evidence that the mean of the differences is greater than zero when order
is not considered? This time there is no pairing in the table, so we have to use an
unpaired 2-sample t test.
We test
H0 : 2 1 v H1 : 2 > 1
1st Firing

x1 = 52;

S1 = 14.46; n1 = 10

2nd Firing

x2 = 57.2; S2 = 13.90; n2 = 10
Pooled variance Sp2 =

so the test statistic is

(n1 1)S12 + (n2 1)S22


= 201.2
n1 + n2 2

57.2 52.0
5.2
x1 x2
tobs = r 
 = r
 = r
 = 0.82

1
201.2
1
Sp2 n11 + n12
201.2 10
+ 10
5
2

The critical value for t18 (5%, one tailed) is 1.73, so there is no evidence of a difference
between the means.
Note that the numerator is the same as above, but the denominator is much larger here.
This is because it includes the inter-soldier variability, which we are not interested in.
The paired test in effect removes this and allows us to concentrate on the difference
between weeks.
By throwing away the information on who fired which shot we have decreased our
ability to spot a difference between the two sets of firings.
In addition, the assumption of normality is less secure in this case, since it uses the
actual scores rather than the differences.
4. The null hypothesis in this case is that the mean wear of untreated guns and treated
guns is the same, while the alternative hypothesis should be that the mean wear of
treated guns is less than the mean wear of untreated guns
H0 : treated 0.1 v H1 : treated < 0.1
This is because we are only interested in whether the treatment reduces wear. Since
the mean wear of our sample is actually higher than that for untreated guns clearly
the test statistic will be in the wrong tail. Hence, without further calculations, we
can say that we do not reject the null hypothesis.
If, however, the alternative hypothesis was that there was a difference in the two mean
values of wear then we use a two-tailed test, which at the 5% level provides a significant
result, showing that the treated guns have greater wear. However, the question clearly
indicates that we should be using a one-tailed test, so that this would be a post-hoc
redefinition of the test.
The main point of this question, however, is to note that we must compare like with
like, so the above analysis is only valid if the value of 0.1 for guns not treated by the
process also refers to new guns, or wear does not depend on age.
Almost certainly, neither of these criteria are correct and therefore the whole test is
invalid. Whoever obtained the data should be told the error of his ways and the whole
exercise repeated under proper experimental conditions.
5. There are several points which must be made about this question.
Why do you use a one-sided test? The answer is probably because you looked at the
data and then decided which sort of test to do. This of course is using your data
to fit your answer and one should have decided upon the test prior to accumulating
the data. In fact, since a priori there is no reason to believe that field scores may be
lower than simulator scores one should use a two-sided test which, at the 5% level of
significance, has a critical t value of 3.18.
Major Thruster, whether he knows it or not, is actually performing another kind of
test on these data - the so-called sign test. Assuming no difference overall
in the two
 4
1
scores, the probability of all four people scoring lower in the field is 2 , or 6.25%.
This is by definition not significant at the 5% level. It is usually true to say that if
you have the actual measurements rather than just are they bigger or smaller, one

should always use the measurement values. The axiom is the more information you
have the better your result.
Major Thrusters attempt to change the significance level is a mis-use of statistical
methodology. He is using the data to determine his significance test rather than using
the significance test to test the data. In any case, using a 10% level of significance
means that one is willing to accept a 1 in 10 chance of being wrong. That is to say,
when there is in fact no difference between the true mean values, 1 in 10 times you
will decide that in fact there is a difference. This is likely to be too high a risk for
most situations.
Finally, why are you using a t test? Is it valid to assume that the observed differences
come from a normal distribution? Ideally we would have past data from similar trials
to verify this, or otherwise. If we cannot assume normality of the differences then the
use of the sign test above is in fact correct, since this does not require any assumption
of normality.

STATISTICS - EXAMPLES 6
1. A trial was conducted to compare the consistency in range of two types of artillery
HE round (X and Y ). A single gun fired 15 rounds of type X, followed by 13 rounds
of type Y , both types from the same gun. While both types of round had a similar
mean range, the standard deviation in range was estimated to be 162m for round X
and 195m for round Y .
Test the hypothesis that the consistency in range of rounds X and Y , as measured by
the variances, are the same at the 5% significance level.
Comment on the design of the trial and state any assumptions you make.
2. According to a standard testing procedure on the modulus of elasticity of rubber
specimens, it has been established that the standard deviation of measurements of
this modulus is 18.0 units.
A sample of 20 measurements are taken on a given specimen and the sample standard
deviation found to be 23.2 units.
(a) Test at the 5% level whether the variance of the procedure is being maintained.
(b) Construct a 95% confidence interval for the true variance of the population from
which the sample was drawn.
State any assumptions you make.
3. A manufacturer has specified that, based on past experience, the standard deviation
in the weight of a small arms round is no more than 0.08g. Before conducting a trial
with this type of round, a sample of 16 are chosen from the test batch and weighed.
The standard deviation in weight for this sample is found to be 0.12g.
Perform a 1% test of the hypothesis that the test batch conforms to the manufacturers
specification, stating any assumptions you make.
4. In a toxicity test for a new drug, 10 rats are given the drug and 2 die. It is known
that under the same conditions the standard existing drug would kill 50% of the rats
it was given to. Is there any evidence that the new drug has improved things (i.e.
decreased the kill rate)?
More experiments are performed and, out of 100 rats, 37 die. Find a 99% confidence
interval for the proportion of rats killed by the new drug.
5. A marksman fires five shots at a target and counts the number of bullseyes. After a
series of 100 such sets of 5 shots the results are as follows:
Number of bullseyes
Frequency

0 1 2 3
6 31 36 15

4 5
8 4

Assuming that the probability p of a bullseye remains constant, test whether the above
results are consistent with a binomial distribution. (If not, this suggests that the shots
within each set of 5 are not independent).
6. In a survey, the following information was extracted on the number of officers of each
rank owning various breeds of dog:
1

Golden Labrador
Black Labrador
Spaniel
Total

Lieutenant Captain Major Lt. Col.


46
61
57
36
73
78
59
30
21
11
14
4
140
150
130
70

Total
200
240
50
490

Is there any evidence, at the 1% level, of an association between rank and breed?
7. When designing a helicopter cockpit it is important for it to be of the right size to
cope with pilots of different sizes. An important measurement is the buttock-knee
length, and it is of interest to see if we can predict a pilots buttock-knee length from
his height, since if we can do this accurately then we can use existing information
on the heights of all pilots to calculate buttock-knee length for all of them, without
having to measure it directly.
The height and buttock-knee length of a random sample of 144 aircrew are measured.
Taking the buttock-knee length as y, the response variable, and heights as x, the
explanatory variable, we have
X

with n = 144.

y 2 = 543, 819 ;

x2 = 4, 425, 549 ;
xy = 1, 551, 020

y = 8, 840.1

x = 25, 235.8

(a) Find the correlation coefficient between y and x


(b) Find the regression line of y on x
(A plot of the data shows that a straight-line relationship is reasonable, so that these
are both appropriate).

GREEN
STATISTICS - EXAMPLES 6 - SOLUTIONS
1. We have sample standard deviations SX = 162, SY = 195, with sample sizes nX = 15
and nY = 13.
Assuming that ranges are normally distributed, the F-test can be used to compare the
variances.
Null hypothesis

2
H0 : X
= Y2

Alternative hypothesis

2
H1 : X
6= Y2

Test statistic is

Fobs =

SY2
1952
=
= 1.45
2
SX
1622

2
(SY2 is the numerator since SY2 > SX
). The critical value from the F-distribution with
12 and 14 df at the 5% significance level (2-tailed) is:

Fcrit = 3.05
Since the test statistic Fobs is less extreme than the critical value, we conclude that
there is insufficient evidence (at the 5% significance level) to conclude that the consistency of round X is different from that of round Y .
The trial design would have been better if the order of firings had been randomised.
By firing all of the X rounds and then all of the Y rounds some systematic error
may have been introduced. For example, wind speed or direction might have changed
substantially. The use of only one gun also leaves us open to the possibility of it being
atypical of guns of its type. We are not told whether the gun was warmed up by some
preliminary rounds being fired prior to any measurements being made, but this would
be desirable.
2. The following assumes that elasticity measurements are normally distributed.
(a) We test
H0 : 2 = 18.02 v H1 : 2 6= 18.02
Here S 2 = 23.22 is the larger, so
Fobs =

23.22
= 1.6612
18.02

The critical value for F with 19 and degrees of freedom (5%, 2-tailed) is
Fcrit = 1.71
Since Fobs < Fcrit , we conclude that there is no evidence of a difference.

(b) The critical values are


FU = 2.13 (1 = , 2 = 19)
FL = 1.71 (1 = 19, 2 = )
so that the CI for the variance is
23.22
, 23.22 2.13 = (314.76, 1146.45)
1.71
!

and so the CI for the standard deviation is


(17.74, 33.86)
Note that 18.0 is within this interval.
3. Specified 02 = 0.082 , observed S 2 = 0.122 .
Assuming the weight of rounds to be normally distributed, the F-test can be used to
compare the observed variance with the specified one.
Null hypothesis

H0 : 2 0.082

Alternative hypothesis

H1 : 2 > 0.082

Now, S 2 is the numerator as it is larger than the hypothesised 02 , so the test statistic
is
Fobs =

0.122
= 2.25
0.082

The critical value from the F-distribution with 15 and df at the 1% significance
level (1-tailed) is:
Fcrit = 2.04
Since the test statistic is more extreme that the critical value, the result is significant
at the 1% level. The evidence suggests that it is very unlikely that the test batch
conforms to the manufacturers specification.
4. We have
H0 : p >= 0.5 v H1 : p < 0.5
Hence the p-value is
P (X <= 2)

when X Bi(10, 0.5)


= 0.510 + 10 0.51 0.59 + 45 0.52 0.58
= 56 0.510
= 0.0547

In other words, if H0 is really true then there was only a 5.47% chance of observing
what we did (or something even less compatible with H0 ). Hence by definition this
2

is not quite significant at the 5% level. You would probably say that there is slight
evidence against H0 , and recommend collecting more data.
With the extra data, we use the estimated value
37
= 0.37
p =
100
so that the 99% CI (based on the normal approximation) is
0.37 2.58

0.37 0.63
100

0.37 0.124564
= (0.2454, 0.4946)
5. Clearly 5 100 = 500 rounds were fired, with
1 31 + 2 36 + 3 15 + 4 8 + 5 4 = 200
hits. Hence we estimate p by
p =

200
= 0.4
500

Hence, under
H0 : data come from binomial distribution
the data are 100 observations from the random variable X, where
X Bi(5, 0.4)
Using the usual binomial calculations to obtain the Probability line, then multiplying
by the number of observations (100) to obtain the Expected numbers, we find
Number of bullseyes
Observed
Probability
Expected
(OE)2
E

0
6
0.0778
7.776

1
31
0.2592
25.920

2
36
0.3456
34.560

3
15
0.2304
23.040

4
5
8
4
0.0768 0.0102
7.680 1.024

0.40563 0.99562 0.06000 2.80563 1.24812

Note that the two right-hand columns are amalgamated so that E > 5.
This gives X 2 = 5.515 and the 2 distribution has 5 1 1 degrees of freedom (five
classes, one parameter estimated) so that even the 10% critical value is 2crit = 6.252.
Hence even a 10% test would not reject H0 , and so there is no evidence against the
hypothesis that the observations come from a binomial distribution.
6. Under the null hypothesis of no association, we calculate the expected numbers as,
for each cell of the table
row total column total
E=
overall total
so that for example
130 240
= 63.67
E for Major, Black Labrador =
490
and so on. This gives the table
3

Golden Labrador
Black Labrador
Spaniel

Lieutenant Captain Major Lt. Col.


57.14
61.23 53.06
28.57
68.57
73.47 63.67
34.29
14.29
15.31 13.26
7.14

Hence the test statistic is


X2 =

(46 57.14)2
(4 7.14)2
+ ...+
= 11.63
57.14
7.14

The degrees of freedom are (4 1)(3 1) = 6, so that the 1% critical value is 2crit =
16.81. Hence at the 1% level there is no evidence against the null hypothesis. There
is no evidence for an association between rank and breed.
7. The sample means are
y = 61.3896 ; x = 175.249
and from AMOR formula book section 17.10, we have
Syy = 543819 144 61.38962 = 1128.88
Sxx = 4425549 144 175.2492 = 3001.00
Sxy = 1551020 144 61.3896 175.249 = 1803.63
(a) Hence the correlation coefficient is
rxy =

1803.63
= 0.980
1128.88 3001.00

Hence, not surprisingly, there is a very strong positive correlation between the
two.
(b) The regression line is given by
1803.63
=
= 0.601
3001.00

= 61.3896 0.601 175.249 = 43.94


so that the estimated mean buttock-knee length of pilots of height x0 cm is given
by
y = 43.94 + 0.601x0
We will not give details of the calculations, but the material in the formula book
can also be used to give confidence and tolerance intervals
x0 = 170 ; y = 58.236 ; (58.094, 58.377) , (57.112, 59.359)
x0 = 175 ; y = 61.240 ; (61.147, 61.333) , (60.122, 62.358)
x0 = 180 ; y = 64.245 ; (64.111, 64.379) , (63.122, 65.367)
The first interval is a 95% CI for the mean buttock-knee length for that height,
while the second wider one is a 95% tolerance interval for the buttock-knee length
of an individual pilot of that height.
4