Contents
1 Dealing with Uncertainty
1.1
1.2
Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3
Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4
Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5
1.6
Confidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.7
Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.8
2.1
What is Statistics? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2
Collecting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3
2.4
Types of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5
2.6
13
2.7
15
2.8
Grouped Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
2.9
18
3 Introduction to Probability
20
3.1
20
3.2
Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
3.3
Independent Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
3.4
21
3.5
General Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
3.6
22
3.7
Dependent Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
3.8
Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
28
4.1
Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
4.2
28
4.3
Summation of Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
4.4
31
4.5
Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
37
5.1
Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
5.2
Probability Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
5.3
Summation of Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
5.4
39
5.5
40
5.6
41
5.7
Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
48
6.1
Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
6.2
48
6.3
49
6.4
55
6.5
56
6.6
Normal Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
6.7
Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
67
7.1
Introductory Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
7.2
Some Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
68
7.3
68
7.4
70
7.5
71
7.6
Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
8 Significance Tests
74
8.1
Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
8.2
Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
8.3
75
8.4
Critical Values
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
76
8.5
77
8.6
79
8.7
80
8.8
Types of Error
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
8.9
Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
9 Confidence Intervals
86
9.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
86
9.2
86
9.3
89
9.4
91
9.5
Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
92
93
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
93
93
93
95
99
106
115
118
123
128
1
1.1
The raw material of statistics is data in the form of figures. These figures may be measurements such as the burning time of rockets of a given type or the ranges of shots fired from
a gun under specified conditions; or they may be counts, such as the number of hits on a
target when successive salvoes of, say, six rounds are fired.
1.2
Variation
The important thing about these data is that the variety of figures arise from an unintended
variation in the quantity represented. In the examples quoted the burning times should be
nominally identical and the number of hits should be the same for each salvo, but, in
practice, this is never the case.
This variation occurs almost without exception in both natural and manmade processes
and activities. However hard we try, we cannot produce two identical articles, or repeat a
trial and get an identical result. We are inclined to treat this variation as a nuisance and
ignore it, but this can be misleading; false conclusions and decisions can easily be made if
we pretend that every result can be expressed in a single firm figure.
1.3
Example
As a simple example, suppose the average petrol consumption of a certain type of scout
car operated by recce regiments over the past two years is 7.8 mpg. Recently however a
somewhat expensive and timeconsuming modification to reduce petrol consumption has
been suggested. An exploratory trial has been conducted by modifying six scout cars and
the following mpg figures for these six vehicles over a two month period obtained:
8.7 9.1 7.9 10.2 7.0 8.1
It is your task to decide, on the basis of this evidence, whether it is worth modifying all
scout cars.
Note that only one of the six mpg figures is below the unmodified average value of 7.8
mpg and the mean value for these six modified cars of 8.5 mpg is almost 10% up on the
old figure. On the basis of this average figure therefore you might naively be tempted
to believe that the modification improves petrol consumption by about 10%. However if
one also looks at the car to car variation in these figures then a statistical analysis reveals
that quite frequently and purely by chance one could have obtained similar or even better
results from six unmodified cars. Thus, one would require more evidence than this before
authorising similar modifications to all scout cars, irrespective of any arguments about the
costs involved.
1.4
Distributions
The statistician recognises and accepts this variation in measurements. He or she has welltried techniques for condensing the data and assessing the shape and size of this variation
by using the idea of distributions.
A quantity that can take different values is called a variable.
A quantity that varies in such a way that we cannot predict with certainty what its next
value will be is called a random variable. A random variable possesses a distribution
which then describes how likely each possible value is.
1.5
In practice, it is found that these distributions follow welldefined patterns. Whereas the
outcome of a single measurement or trial is unpredictable, the outcome of a large number
of repeated events will give a predictable result in the form of a distribution. For example,
toss a coin once and the result cannot be foretold; toss it 10000 times and the proportion of
heads will be very nearly equal to 21 (provided the coin is unbiased). This is known as the
Law of Large Numbers. It is this fact that makes statistics tick ! This is why statistics,
correctly used, can be such a powerful and effective tool.
Fortunately, the patterns that these distributions follow usually have convenient mathematical forms which the statistician can manipulate without difficulty. There are two bonuses
from this. Firstly, you do not have to do 10000 or so repeat trials to establish the distribution; useful information can be derived from 2 or 3, but of course the more the better.
Secondly, all the hard work can be done once and for all and the characteristics of the popular distributions set out in tables; a large number of these exist. The Normal, Binomial,
Poisson, F and 2 (chisquared) distributions are those most widely used.
Given some representative data, the statistician, having identified the distribution, can then
use its properties to form some sound conclusions; for example in section 1.3, how frequently
one might obtain similar or even better results from unmodified scout cars. Predictions can
be made in the light of these conclusions.
The strength of the statisticians conclusions lies in the fact that (s)he has taken the variation
into account. The price that has to be paid for this is that no conclusion is certain, it will
always be associated with some probability. The answer that you give depends on the risk
you are prepared to take that it might be wrong. This is an essential and inescapable
element in prediction and decision making.
1.6
Confidence
For example, suppose that the average (mean) burning time of a type of rocket is to be
determined by a trial. The answer would be quoted, not as a single figure, but as lying
between two limits. These are known as confidence limits and there would be some stated
risk, say 5%, or 1 in 20, that the true mean burning time falls outside them; which is the
6
same as saying that there is 95% confidence that it falls within them. If greater confidence
(lower risk) were required, then the limits would have to be wider, unless further evidence
is obtained by extending the trials.
1.7
Significance
The same principle applies, for instance to the problem of deciding whether to adopt one
type of tank track or another on the evidence of trials. Statistical evidence may show that
one type of track lasts significantly longer than another; by this we mean that the difference
between the trial results is too large to be attributed to chance variation. There will be a
small stated risk that the observed difference could occur just by chance, and the decision
must be made in the light of this risk. This risk, known as the significance level, is often
chosen as 5% or 1%.
In the example of section 1.3, results similar to, or better than, those obtained from the trial
could have been obtained purely by chance from unmodified cars about 9% of the time.
1.8
Because variation cannot be avoided and is often embarrassingly large, statistics finds a wide
application in the field of management. The manager does not have to become a statistician,
but ought to know enough about the subject to appreciate its relevance to his/her problems
and decisions; he or she needs to have some confidence in the validity of the arguments and
processes used, and be able to interpret the answers that the statistician gives. The manager
must know when to call in professional statistical advice both when planning investigations
and when analysing the results.
2
2.1
What is Statistics?
2.2
Collecting Data
2.3
Next we need to present the data in an appropriate format so that we can extract relevant
information, usually by way of pictures and numerical summaries.
8
1. Find out how and why the data were collected and what is being measured/counted.
2. Obtain any background information.
3. Assess the reliability of the data and how much information they really contain concerning what we are interested in.
The point here is that if the designing and collecting are done by different people to the
organising, analysing and interpreting then they must talk to each other. This is obvious,
but often not done.
1. Explore the data, using pictures and summary statistics.
2. Use appropriate formal statistical methods to draw conclusions.
3. Communicate results clearly.
4. Keep brain engaged at all times.
Sometimes an exploratory, graphical investigation is all that is possible, or desirable. However, in many cases this will fail to uncover all of the information within the data. Statistical
analysis helps us to make the best possible use of our data.
2.4
Types of Data
2.5
Example 1 (discrete)
The numbers of track failures on each of 100 APCs in a year were:
3, 1, 0, 0, 2, 1, 1, 3, 1, 0, 2, 2, 2, 0, 1, 4, 1, 1, 2, 2, 1, 3, 3, 0, 1, 1, 2, 2, 3, 1, 5, 1, 0, 1, 0, 2,
3, 3, 0, 0, 1, 1, 1, 3, 4, 1, 4, 0, 3, 2, 2, 2, 1, 6, 1, 0, 0, 2, 1, 1, 1, 2, 3, 3, 0, 1, 0, 0, 1, 2, 1, 1,
1, 2, 1, 1, 2, 1, 2, 1, 3, 2, 3, 3, 0, 0, 1, 0, 0, 0, 4, 7, 1, 0, 1, 2, 4, 4, 3, 2.
9
20
10
0
Number of APCs
30
40
Track failures
10
Frequency
Relative
Frequency
0
1
2
3
4
5
6
7
21
34
21
15
6
1
1
1
0.21
0.34
0.21
0.15
0.06
0.01
0.01
0.01
Cumulative
Relative
Frequency
0.21
0.55
0.76
0.91
0.97
0.98
0.99
1.00
0.6
0.4
0.0
0.2
0.8
1.0
Track failures
This data set is slightly skewed the tail is slightly longer to the right than to the left.
11
Example 2 (continuous)
The detonation heights in metres of 50 airburst fuses were:
10.2 9.5 7.7 8.6 5.7 6.8 11.4 8.2 7.8 9.8 7.6 8.4 6.9 0.0 12.1 8.4 8.7 5.6 7.7 11.1 14.0 7.7 8.9
6.5 1.5 8.7 8.2 6.4 9.1 8.0 8.0 4.8 10.4 11.6 2.7 9.1 7.9 10.1 4.5 8.7 9.3 9.7 8.1 7.6 10.2 5.1 9.0
3.5 7.5 9.9
As for discrete data, we can create a table, plot a histogram of frequencies or relative
frequencies and plot a cumulative relative frequency graph.
However, this time it is less obvious how to group the data for display, either pictorially or
in a table. We must choose classes (ranges of values), preferably all of the same width. This
also applies to discrete data when the range of values observed is large.
Class
Interval
Midpoint
0  0.99
1  1.99
2  2.99
3  3.99
4  4.99
5  5.99
6  6.99
7  7.99
8  8.99
9  9.99
10  10.99
11  11.99
12  12.99
13  13.99
14  14.99
0.5
1.5
2.5
3.5
4.5
5.5
6.5
7.5
8.5
9.5
10.5
11.5
12.5
13.5
14.5
12
6
0
Number of fuses
10
12
10
15
Detonation Height
Note that in the histogram the vertical bars are centred at the midpoint of the intervals.
In the cumulative frequency curve (sometime referred to in books as an ogive for no good
reason), however, we plot the cumulative frequencies against the upper end of the intervals.
This is because, for example the value 0.82 for class 99.99 means precisely that 82% of
observations are 9.99.
2.6
Pictures, especially histograms, show the overall distribution of the sample data, but it is
often useful to summarise this numerically. Hence we usually display the data both with
pictures and with sample (summary) statistics.
13
0.6
0.4
0.0
0.2
0.8
1.0
10
15
Detonation Height
Notation:
n : the sample size
x1 , x2 , . . . , xn : the values of the observations
Three measures of location are
a) The sample mean: the average of the observations.
x =
1
(x1 + x2 + x3 + . . . + xn )
n
n
1X
xi
=
n i=1
In most cases the mean is the most useful measure, though the median may be preferable
for very skewed distributions.
APC data:
x =
1
163
(3 + 1 + . . . + 2) =
= 1.63
100
100
median = 1
mode = 1
Detonation height data:
x =
1
398.9
(10.2 + 9.5 + . . . + 10.2) =
= 7.978
50
50
median = 8.2
modal class = 8 8.99
2.7
A measure of location is rarely enough by itself. Two samples can have a similar mean but
still be very different if one is more spreadout than the other.
Three measures of the spread or dispersion of a sample are:
a) The sample range: the largest value minus the smallest.
b) The sample interquartile range: the lower and upper quartiles are defined like the
median, except that they are the onequarter and threequarters points, rather than the
halfway point. The interquartile range is the difference between them.
If the observations are ordered from smallest to largest then the quartiles are the values of
the n+1
th and the 3( n+1
)th observations. E.g. if n = 7 then the lower quartile is the value
4
4
of the 2nd (i.e. 2nd smallest) observation and the upper quartile is the value of the 6th
observation.
c) The sample variance: this is defined as
S2 =
1
(x1 x)2 + . . . + (xn x)2
n1
n
1 X
(xi x)2
n 1 i=1
15
n
1
X 2
xi
=
n1
i=1
n
X
xi
i=1
!2
In other words we square the deviation of each observation from the mean, then average
these squared deviations. The variance is therefore the average squared deviation from the
1
rather than n1 later).
mean. (We mention the reason for using n1
The larger the spread of points the larger the variance is.
Note that all we need to calculate x and S 2 are the sum
The individual data values are not needed.
x2 .
The sample standard deviation S is the square root of the sample variance. This is often
more interpretable as it is measured in the same units as the observations themselves.
In most calculators the standard deviation is calculated using the button marked n1 , or
something similar. However, if you were to find a variance without the inbuilt calculator
button you would always use one of the last two versions of the formula to do it.
In practice we nearly always use the variance and/or standard deviation to summarise
dispersion. The range is too sensitive to one or two unusual values and the interquartile
range is too awkward to deal with mathematically.
APC data:
S2 =
1
(3 1.63)2 + (1 1.63)2 + . . . + (2 1.63)2
99
x = 163 and
x2 = 459 so that
1
459 100 1.632
99
= 1.952626
S2 =
S = 1.397364
range = 0 to 7 = 7
The interquartile range is not very useful here.
Detonation height data:
S2 =
calculated using
1
(10.2 7.978)2 + (9.5 7.978)2 + . . . + (9.9 7.978)2
49
x = 398.9 and
x2 = 3514.23 so that
1
3514.23 50 7.9782
49
= 6.771547
S2 =
S = 2.60222
16
range
lower quartile
upper quartile
interquartile range
=
=
=
=
0 to 14 = 14
6.875
9.45
9.45 6.875 = 2.575
It is often very informative to summarise graphically the range, quartiles and median for
several samples at once with a boxplot. To illustrate this the figure shows a boxplot first
for the raw detonation heights, then for the data with 4 added to all the values, then for
the data with all values multiplied by 1.5.
15
20
10
2.8
Grouped Data
Sometimes we may wish to calculate the above summary statistics directly from frequency
tables like those in section 2.5. This may either be because we only have the grouped
(tabulated) data, or to save time.
APC data:
In this case the grouped data are the same as the raw data, so we get exactly the same
17
answers
1
(0 21 + 1 34 + 2 21 + 3 15 + 4 6 + 5 1 + 6 1 + 7 1)
100
= 1.63
1
(0 1.63)2 21 + (1 1.63)2 34 + . . . + (7 1.63)2 1
=
99
= 1.952626
x =
S2
49
. . . + (14.5 8.02)2 1
x
S2
= 7.11184
S 2.6668
Similarly we can approximate the median and the quartiles by the midpoints of the classes
in which they fall:
approx. median = 8.5
modal class = 8 8.99
approx. interquartile range = 9.5 6.5 = 3
2.9
In statistical terminology, a population is the whole set of things about which conclusions
are to be drawn. Sometimes this is very large but finite, such as the British electorate, but
often it is in effect infinite, such as all shells of a particular type which have been or will be
produced.
The mean of the population is usually denoted and the variance of the population
is usually denoted 2 , so that the standard deviation of the population is . These
have similar interpretations to the sample versions.
When we inspect or analyse the data we must allow for the fact that they are only a sample
from the overall population that we are interested in.
Example (from section 2.1):
We collect data from tyres fitted to 20 lorries over a period of 1 month. However, if we had
picked a different 20 lorries or a different month then our results would have been different.
18
19
3
3.1
Introduction to Probability
A trial or experiment is just something with more than one possible outcome.
E.g. Toss a coin; Roll a die; Roll ten dice and count how many 6s there are.
An event is simply something that may or may not happen when we perform a trial or
experiment.
E.g. Its a Head; Its a 1; There are fewer than two 6s.
The probability of an event occurring is measured by the relative frequency of this event
in an arbitrarily large number of trials, i.e. the relative frequency in the population.
For example, when it is stated that the probability of hitting a target with a single round
is 0.65 (or 65%), what this actually means is that if an enormously large number of rounds
were fired under the same conditions then 65% would hit the target.
Probability is therefore measured on a scale from 0 (impossible) to 1 (certain). Some
examples of familiar probabilities are:
1. Probability of a head on tossing a coin =
2. Probability of rolling a six with a die =
1
6
1
2
(50%)
(16.67%)
3.2
1
52
(1.92%)
Probability Distributions
Probability can be interpreted in terms of relative frequency in more complicated cases too.
A probability distribution describes the probabilities of all possible outcomes of a trial,
and can be regarded as the set of relative frequencies in an infinitely large number of
trials.
For example the relative frequency of getting at least 3 hits when a very large number of
salvoes are fired can be equated to the probability of getting at least 3 hits with a single
salvo.
In this case the relative frequency distribution, describing how many salvoes would produce
no hits, how many would produce exactly one hit etc, may equally well be regarded as a
probability distribution and interpreted accordingly.
A probability distribution can be as simple as that for tossing a coin:
P (Head) =
1
;
2
P (T ail) =
20
1
2
3.3
Independent Events
Two events A and B are independent if the chance of each occurring is unaffected by
whether the other occurs or not. This is the case if and only if
P (A AND B) = P (A) P (B)
Similarly, a set of n events are independent if and only if
P(event 1 AND event 2 AND . . . AND event n)
= P(event 1) P(event 2) . . . P(event n)
Example
The probability of a card drawn at random being black is 21 and the probability that it is a
1
10 is 13
. Clearly the probability of a card being black is not related to its value, and hence
the probability that it is both black and a 10 is the product of the 2 probabilities
P (Black AND 10) =
1
1
1
=
13 2
26
Example
If on a field telephone system the probability that any hand set is working is 99%, whilst
independently the probability of the switchboard operating correctly is 97%, then the probability of one outpost being able to contact another outpost is given by
0.99 0.97 0.99 = 0.9507
3.4
Two events A and B are mutually exclusive if they cannot both happen, so that
P (A AND B) = 0
Hence the two events are (very) dependent.
It follows from this that
P (A OR B) = P (A) + P (B)
Similarly, if n events are all mutually exclusive then
P(event 1 OR event 2 OR . . . OR event n)
= P(event 1) + P(event 2) +. . .+ P(event n)
Example
A card cannot both be a king (probability
1
)
13
21
1
),
13
so these two
events are mutually exclusive. Hence the probability of drawing either a king or an ace
from the pack is the sum of the two probabilities;
P (Ace OR King) =
1
2
1
+
=
13 13
13
Example
Consider a tank which may be conveniently divided into three target areas A, B and C. If
the probabilities of hitting and thereby killing the tank with a single round, aimed at the
centre point, are 8%, 30% and 14% respectively for the three areas A, B and C then the
total probability of achieving a kill is 52%.
Definition
A set of events is mutually exclusive and exhaustive if in addition they are the only possible
events, i.e. precisely one of the events must occur. If so, then
P(event 1) +. . .+ P(event n) = 1.
In other words P(event 1), P(event 2), . . ., P(event n) give a probability distribution.
3.5
General Events
3.6
Let p be the probability of success and q the probability of failure. It is assumed that a
trial must have one or other as the outcome. Thus the events are mutually exclusive and
exhaustive and so the total probability must be unity. Hence by this rule the probability of
either success or failure is p + q = 1 so that q = 1 p. Success and failure are interpreted
as occasion demands. The term success is often applied to the event of interest, such as
finding a defective item.
Suppose that we repeat a trial of this type n times independently. Then
P (n successes) = p p . . . p = pn
22
and similarly
P (0 successes) = q q . . . q = q n
Similarly
P (r successes then n r failures) = p p . . . p q q . . . q
= pr q nr
This is true for any specified ordering of r successes and n r failures, in other words any
other particular sequence of successes and failures which finished with r successes and n r
failures. Such sequences would be mutually exclusive.
Now consider the whole sequence of trials, with
overall success = at least one occurrence (success)
overall failure = no occurrences (all failures)
Then
P (overall failure) = P (0) = q q . . . q = q n
and so
P (overall success) = 1 P (overall failure)
= 1 qn
= 1 (1 p)n
Example
Four rounds are fired simultaneously at a target, each independently having a chance of 0.4
of hitting. One hit is enough to kill the target. Then
P (kill) = 1 P (all miss) = 1 (1 0.4)4 = 0.8704
For some repeated trials, the opposite of no occurences is just one occurrence. This
happens when one occurrence precludes further trials. The procedure above will still apply
and gives the probability of just one occurence.
Example
It is required to find the probability of a tank being disabled by a minefield consisting of 5
rows of mines when it is estimated that the probability of a tank being disabled by a single
row of mines is 17%. The probability that it successfully gets through one row of mines is
0.83, and so the probability that it gets through all 5 rows is
P (5 successes) = (0.83)5 = 0.3939
Hence the overall probability of it being disabled is
P (not 5 successes) = 1 0.3939 = 0.6061
23
3.7
Dependent Events
The above has mostly concerned independent trials, and hence events, with one special case
of dependence (mutual exclusivity).
More general forms of dependence exist, and probabilities of multiple events when those
events are dependent can often be neatly displayed by a Tree Diagram (at end of chapter).
Example
A bomb disposal unit is receiving simulators from three different factories in the proportions
50%, 30% and 20%. If the percentages of defective output from these factories are 3%, 4%
and 5% respectively, find the proportion of defective simulators received.
We obtain the leaf probabilities by multiplying the probabilities on the branches leading
up to it. We can sum the leaf probabilities as shown because the leaves are all mutually
exclusive.
Conditional Probability
If we have two events A and B then the conditional probability of A happening given that
B happens is denoted
P (AB)
If A and B are independent then P (AB) = P (A) while if they are mutually exclusive then
P (AB) = 0.
Example (continued)
The probabilities on the second set of branches of the tree diagram are all conditional
probabilities
P (defectivefactory 1) = 0.03
P (defectivefactory 2) = 0.04
P (defectivefactory 3) = 0.05
24
P (AB) =
From this it immediately follows that
Hence the probability that a simulator came from factory 3, given that it is defective, is
P (factory 3defective) =
=
= 0.27027
Hence, of the defective items, 27% of them come from factory 3.
25
3.8
Further Examples
Example 1
Three LAW operators A, B and C, are to fire at a tank target. The probability of A achieving
a hit with an individual round is known to be 0.85, while for the less welltrained B and C
the probabilities are 0.7 and 0.45 respectively.
1. If each operator were to fire one round calculate the probability that
(a) all three will hit the target
(b) all three will miss the target
(c) exactly two hits are obtained
(d) at least one hit is obtained
assuming that the three operators perform independently of one another.
2. If each operator were to fire two rounds, calculate the average number of hits obtained
in total.
Solution
1. Under the assumption of independence,
(a)
P (three hits) =
=
=
=
(b)
P (three misses) =
=
=
=
=
(c)
P (exactly two hits) = P [(A hit and B hit and C miss)
or (A hit and B miss and C hit)
or (A miss and B hit and C hit)]
= (0.85 0.70 0.55) +
(0.85 0.30 0.45) +
(0.15 0.70 0.45)
= 0.49
26
27
4
4.1
Application
The binomial is a commonly occuring discrete distribution. It arises when a trial can have
only one of two possible outcomes which we may term success and failure, and it gives
the distribution of the number of successes when the trial is repeated a certain number of
times. This directly follows from section 3.6.
Suppose we have n independent trials, each with success probability p and failure probability
q = 1 p. Let X be a Random Variable describing the number of successes in the n
trials.
E.g. toss ten coins and count the number of heads.
Then we say that the random variable X follows the binomial distribution with parameters
n (number of trials) and p (success probability). For short, we write
X Bi(n, p)
4.2
The shape of the distribution can be obtained from the basic laws of probability. We can
derive a function that gives each of the probabilities in the probability distribution, and this
is usually referred to as the probability function.
From section 3.6 we have
P (X = 0) = q n
P (X = n) = pn
and that
P (X = r) = c pr q nr
where c is the number of different ways in which r successes and nr failures can be ordered.
Clearly this depends on both n and r.
Consider some values of r:
r = 0 : clearly c = 1
r = n : clearly c = 1
r=1 :
sf f . . . f
f sf . . . f
clearly
..
.
c=n
ff . . . fs
and so also c = n when r = n 1.
28
n
r
In general, c is in fact Cr =
n
r
= n choose r, where
n!
n(n 1) . . . (n r + 1)
=
(n r)!r!
r(r 1) . . . 3.2.1
n
r
pr q nr
for r = 0, 1, . . . , n
Note that the coefficients of the terms are symmetrical at each end of the distribution, but
the shape of the distribution would only be symmetrical if p = q = 12 (e.g. tossing a coin).
However, it tends to become more symmetrical (whatever the value of p) as n becomes large.
Note that these probabilities can be thought of as population equivalents of the sample
relative frequencies in section 2.5. We can similarly draw a rod graph of the probabilities.
Example
A salvo of 6 rounds is fired. The probability of a hit with a single round is assumed to be
0.6, independently of the other rounds. Find the probabilities of obtaining 0, 1, 2, 3, 4, 5
and 6 hits .
Hence if X is a random variable describing the number of hits in a single salvo then X
Bi(6, 0.6). Therefore the probabilities of 0, 1, 2, 3, 4, 5, 6 hits are given by substituting
n = 6, p = 0.6, q = 0.4 in the formula. Hence we obtain:
Probability
Probability
Probability
Probability
Probability
Probability
Probability
of
of
of
of
of
of
of
29
0.15
0.0
0.05
0.10
P(r)
0.20
0.25
0.30
4.3
Summation of Terms
Just as in the sample case in section 2.5, we can calculate cumulative probabilities. The set
of these is often called the cumulative distribution function.
Hence the value of the cumulative distribution function at r is just the probability of r or
fewer successes. It is given by the sum of the first (r + 1) terms, i.e. up to and including
the term containing pr .
Note that this is not the same as the probability of fewer than r successes, which does not
include the pr term.
The (cumulative) distribution function is therefore given by
P (X r) =
r
X
P (X = i)
for r = 0, 1, . . . , n
i=0
Similarly
P (X < r) =
r1
X
i=0
P (X = i) = P (X r 1)
The probability of at least r successes is the sum of the last (n r + 1) terms, from the
term containing pr onwards.
30
If the required probability involves summing more than half the terms, the arithmetic can
be shortened by using the fact that all the terms add up to unity:
P (X r) + P (X > r) = 1
P (X < r) + P (X r) = 1
Thus, sum the remaining terms instead and subtract from 1.
In particular
P (X 1) = 1 P (X < 1) = 1 P (X = 0) = 1 q n
or in other words
P (at least one success) = 1 P (no successes)
= 1 qn
Example
The probability function and cumulative distribution function from the example above where
X Bi(6, 0.6) can be tabulated as
Number of
hits (r)
4.4
Probability
P (X = r)
Cumulative
Probability P (X r)
0.0041
0.0041
0.0369
0.0410
0.1382
0.1792
0.2765
0.4557
0.3110
0.7667
0.1865
0.9532
0.0467
1.0000
The binomial distribution is completely specified by the number of trials n and the success
probability p, remembering that the n trials must be independent.
A random variable and its distribution can be used to describe a population. In the binomial
case the population mean and variance are given by
population mean = np
population variance 2 = np(1 p)
31
so that
population standard deviation =
np(1 p)
Note: the population mean is also called the expectation or expected value of X. This
is a misnomer really, since we do not expect X to equal .
Example
For X Bi(6, 0.6), the population mean is the average number of hits
np = 6 0.6 = 3.6
The variance is
np(1 p) = 6 0.6 0.4 = 1.44
Estimating the Parameters of the Binomial from a Sample
All of the above has assumed known n and p. We should know n, but will only know p in
special cases (dice, cards etc). However, if we have results from N (> n, hopefully) trials,
it is natural to estimate p by
#successes
p =
N
where theis standard statistical terminology meaning estimate of.
Estimates of the mean and standard deviation of X, if desired, may then be derived from p
using:
= n
p
n
p(1 p)
Frequency
5 10 13
1 1
5 6 TOTAL
7 3 40
32
Hence
146
= 0.6083
240
Therefore estimates of the mean and standard deviation (of the number of hits in a salvo)
are
p =
= n
p = 3.65
4.5
n
p(1 p) = 1.196
Further Examples
Example 1
A reconnaissance party carries five signal flares in order to summon help. Experience has
shown that each flare has a probability of 0.90 of functioning correctly. Calculate the
probability that;
1. at least one flare will function correctly
2. exactly two flares will function correctly
3. at most, two flares will fail to function correctly.
Solution
Let the random variable X denote the number of flares which function correctly. Then X
may assume one of the values X = 0, 1, 2, 3, 4, 5 and follows a binomial distribution with
p = P (flare functions correctly) = 0.90
and so
q = P (flare fails to function correctly) = 1 p = 0.10
where the number of independent trials is n = 5. Hence we write X Bi(5, 0.9).
1. We require
P (at least one flare functions) = P (X 1)
= 1 P (X = 0)
where,
P (X = 0) =
5
0
p0 q 5 = (0.10)5 = 0.00001
so that,
P (X 1) = 1 0.00001 = 0.99999
33
2. We require,
P (exactly two flares function) = P (X = 2)
!
5
=
p2 q 3
2
= (10)(0.9)2(0.1)3
= 0.0081
3. We require,
P (at most two flares fail to function) = P (X 3)
= 1 P (X 2)
Now,
P (X = 1) =
5
1
p1 q 4 = (5)(0.9)(0.1)4 = 0.00045
and so
P (X 2) = P (X = 0) + P (X = 1) + P (X = 2)
= 0.00001 + 0.00045 + 0.0081 = 0.00856
Hence,
P (X 3) = 1 0.00856 = 0.99144
Example 2
A factory which produces toasters claims that at least 72% of the total production will
function exactly according to specification and that defectives occur within the production
only as a consequence of random variations in the production process.
To test this claim a random sample of 10 toasters is taken from the production line on each
of 100 working days and the number of toasters which conform to the specifications (within
each sample) is recorded;
7
8
8
7
7
8
6
7
7
9
9
7
8
4
5
6
7
9
5
9
7
8
7
7
9
8
6
6
7
8
7
7
6
8
8
7
7
6
7
6
9
7
9
8
7
8
6
8
9
9
7
7
7
9
7
8
5
8
7
8
9
6
8
6
8
9
7
8
7
8
8
7
7
7
9
7
8
7
8
8
5
9
7
6
8
8
7
9
7
9
8
7
6
8
5
8
4
5
9
7
Solution
The frequency and relative frequency distributions for the number of nondefective toasters
in each of the samples of 10 toasters are:
Nondefective toasters 3
Frequency
0
Relative Frequency
0.00
4
5
6
7
8
9
10
2
6
12
35
28
17
0
0.02 0.06 0.12 0.35 0.28 0.17 0.00
10
4
(0.732)4(0.268)6 = 0.0223
P (X = 5) =
10
5
(0.732)5(0.268)5 = 0.0732
P (X = 6) =
10
6
(0.732)6(0.268)4 = 0.1667
P (X = 7) =
10
7
(0.732)7(0.268)3 = 0.2601
P (X = 8) =
10
8
(0.732)8(0.268)2 = 0.2664
P (X = 9) =
10
9
(0.732)9(0.268)1 = 0.1617
35
Therefore
If n = 8
Therefore
P (6) = (0.8)6
P (5) = 61 (0.2)(0.8)5
P ( 5 successes)
= 0.262
= 0.393
= 0.655
P (7) = (0.8)7
P (6) = 71 (0.2)(0.8)6
P (5) = 71 26 (0.2)2 (0.8)5
P ( 5 successes)
= 0.210
= 0.367
= 0.275
= 0.852
P (8) = (0.8)8
P (7) = 81 (0.2)(0.8)7
P (6) = 81 27 (0.2)2 (0.8)6
P (5) = 81 27 63 (0.2)3 (0.8)5
P ( 5 successes)
= 0.168
= 0.336
= 0.294
= 0.147
= 0.945
5
5.1
Application
The Poisson distribution arises when events occur at random, often over time or in space.
It models the number of events which occur in a given time period, or over a given spatial
area.
Example
1. Prussian officers kicked to death by horses
2. Car crashes on a stretch of road.
Random in time means that knowing when the last event occurred says nothing about
when the next will occur. (Similarly for a spatial process).
The only parameter is m, the mean number of events occurring (e.g. in a specified time
period).
Example
We might have
1. m = 1.7 officers/year
2. m = 3.2 crashes/month or, equivalently, m = 38.4 crashes/year
Although few processes are truly completely random, the Poisson has been shown to be a
good approximation to reality in many cases.
5.2
Probability Function
r = 0, 1, 2, . . .
and so on.
Note that we have the convenient formula
P (X = r) =
m
P (X = r 1)
r
37
(0! = 1)
0.3
0.0
0.1
0.2
P(r)
0.4
0.5
0.6
Poisson, m=0.5
Unlike the binomial, there is no fixed upper limit to the number of events that can occur.
However
P (X = r) 0 for r >> m
Hence although the distribution may be represented in rod graph form, there is no limit to
the number of columns, but the probabilities will decrease to negligible values as r increases.
The shape of the distribution is very asymmetrical for small m but becomes symmetrical as
m increases.
Example
If X P o(3) then for example
34 3
e = 0.168031
4!
35 3
e = 0.100819
P (X = 5) =
5!
P (X = 4) =
3
P (X = 4) = 0.100819
5
38
5.3
Summation of Terms
The remarks made about the summing of terms for the binomial apply to all discrete
distributions. For the Poisson distribution it is always preferable to obtain the probability
of at least r occurrences by subtracting the probability of fewer than r occurrences from
1, since this avoids the approximation of neglecting the terms in the infinite tail of the
distribution. For example
P (X 1) = 1 P (X = 0) = 1 em
In general then we use
P (X r) =
P (X > r) =
r
X
P (X = i)
i=0
P (X = i)
i=r+1
= 1 P (X r)
Example
The average number of defectives in a box of ammunition is known to be 21 . We calculate
the probability of finding boxes with 0, 1, 2 defectives by substitution of m = 12 in the
formula above, so that
P (X = 0)
P (X = 1)
P (X = 2)
= e0.5
= 0.5
above
1
0.5
= 2 above
= 0.606530
= 0.303265
= 0.075816
0.985611
P (X 3) = 1 0.985611
= 0.014389
5.4
N
1 X
ri
N i=1
39
This applies similarly even if the lengths of the time periods differ.
Example
If we have the information:
3 Prussians killed in 1879
5 Prussians killed in 188385
8 Prussians killed in 189093
m
=
3+5+8
= 2 per year
1+3+4
Frequency
309
142
40
8
1
0
500
Total
0
142
80
24
4
0
250
5.5
250
= 0.5.
500
If occurrences from one source have a Poisson distribution with mean m1 , and occurrences
from a different but independent source have a Poisson distribution with mean m2 , then it
can be shown that occurrences from either source have a Poisson distribution with mean
m1 + m2 .
If X1 P o(m1 )
X2 P o(m2 )
Then X1 + X2 P o(m1 + m2 )
40
Example
This additive property of the Poisson distribution is useful in the study of accident statistics
which, in many situations, may be well described by a Poisson model. For example, if at a
traffic black spot there are on average 2 serious accidents per year, the probability P (r) of
obtaining r = 0, 1, 2, 3 etc accidents in any one year is given by the formula
P (r) =
2r 0.1353
2r e2
=
r!
r!
Similarly if, at another black spot, there are on average 3 accidents per year the corresponding probability P (r) of r accidents in a year is given by
P (r) =
3r 0.0498
3r e3
=
r!
r!
The additive property of the Poisson distribution now tells us that the probability of obtaining r accidents in a year from the two sites combined, is simply
P (r) =
(2 + 3)r e(2+3)
5r 0.0067
=
r!
r!
In other words
Black spot 1 accidents
Black spot 2 accidents
Accidents at either
5.6
P o(2)
P o(3)
P o(5)
In fact a binomial distribution with large number of trials n and small success probability
p gives almost identical probabilities to a Poisson with mean m = np.
This is useful when
1. calculating by hand, or even by computer if n is huge.
2. n and p are both unknown but m is known.
A rule of thumb is that the approximation is reasonable for n > 10 and p < 0.1.
Example
Consider a box of 450 rounds of ammunition where the probability of a round being defective
is 0.004. Hence the number of defectives in a box follows a binomial distribution with
n = 450 and p = 0.004. Hence the probability of exactly r defectives is
P (r) =
this means we can approximate this formula with that for the Poisson with mean 1.8:
P (r) =
1.8r
e1.8
r ... 1
Note that this is considerably easier to calculate, especially if using the formula to obtain
P (r) from P (r 1). The results are very similar:
r
Poisson
Binomial
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
0.165299
0.297538
0.267784
0.160671
0.072302
0.026029
0.007809
0.002008
0.000452
0.000090
0.000016
0.000003
0.000000
0.000000
0.000000
0.164703
0.297657
0.268369
0.160950
0.072233
0.025876
0.007707
0.001963
0.000437
0.000086
0.000015
0.000002
0.000000
0.000000
0.000000
42
5.7
Further Examples
Example 1
On average a repair section are asked to repair 2.5 items in a day. However, they can only
handle 4 items per day.
On those days in which they get 5 or more items, they have to send them elsewhere at a
fixed cost of 500, regardless of the number of items sent on that day.
How often are they likely to get 5 or more items for repair  and hence, what is their total
cost penalty likely to be in a working year of 310 days?
Solution
If we can assume that repairs arrive at random then the the number of repairs in a day can
be described by a Poisson distribution with m = 2.5.
Hence
P (5 or more) = 1 (P (0) + P (1) + P (2) + P (3) + P (4))
where P (r) =
(2.5)r e2.5
r!
so that
P (0) = e2.5
= 0.0821
P (1) = 2.5e2.5
= 0.2052
P (2) =
(2.5)2 e2.5
2
= 0.2565
P (3) =
(2.5)3 e2.5
32
= 0.2138
P (4) =
(2.5)4 e2.5
432
= 0.1336
Hence
P (0) + P (1) + P (2) + P (3) + P (4) = 0.8912
and so
P (5 or more ) = 1 0.8912 = 0.1088
Therefore
Frequency of 5 or more in 310 days = 310 0.1088 = 33.728
expected cost = 33.728 500
= 16864
43
Example 2
As above, but the cost penalty is 300 for each item sent for repair which cannot be handled.
Therefore a 300 penalty is incurred if 5 items received, 600 if 6 are received etc.
Now
P (5) =
(2.5)5 e2.5
= 0.066801
5432
and so on, so that (with more decimal places now required for accuracy)
Defectives
Cost
Prob.
5
6
7
8
9
10
11
300
600
900
1200
1500
1800
2100
0.066801 0.027834 0.009941 0.003106 0.000863 0.000216 0.0000616
The table is simplified, with the probabilities of more than 10 repairs lumped together. If
we work out the probabilities for more than 10 repairs separately, we find that the average
cost per day is
300 0.066801 + . . . + 1800 0.000216 +
2100 0.000049 + 2400 0.000010 + . . . = 51.2315
so that the average cost per year is
51.2315 310 = 15881.77
Example 3
A manufacturer of tent canvas knows from experience that, on average, there is one flaw
within every 10m2 of material produced. He receives an order from the army for 100m2 of
canvas, to be delivered in five 20m2 rolls. Each roll is worth 100 if it is flawless, 80
if it contains between one and three flaws inclusive, and will be rejected by the army if it
contains greater than three flaws.
Calculate the mean number of rolls of canvas which are rejected and hence the expected
cost of the order to the army.
Solution
Let X denote a random variable describing the number of flaws which occur within one
20m2 roll of canvas. Under the assumption that the flaws will occur at random intervals
within the material, X has a Poisson distribution. The average number of flaws which are
expected to occur in 10m2 of canvas is 1, so that the average number of flaws within 20m2
of canvas is m = 2. Hence
X P o(2)
Therefore
P (X = 0) =
20 e2
= 0.1353
0!
44
21 e2
= 0.2707
1!
22 e2
= 0.2707
P (X = 2) =
2!
23 e2
P (X = 3) =
= 0.1804
3!
P (X = 1) =
Hence the probability that a 20m2 roll contains between one and three flaws is given by
P (1 X 3) = P (X = 1) + P (X = 2) + P (X = 3) = 2(0.2707) + 0.1804 = 0.7218
Finally, the probability that a 20m2 roll contains more than three flaws is therefore
P (X > 3) = 1 P (X 3) = 1 (0.1353 + 0.7218) = 0.1429
Hence we have prices and costs of
100
80
rejected
0.1353
0.7218
0.1429
Let k denote the total number of 20m2 rolls which, on average, the manufacturer must
produce in order to satisfy the contract, where k will include those rolls which will be
rejected. Then
(0.1353 + 0.7218) k = 5
giving
5
= 5.8336
0.8571
Hence, on average, 5.8336 rolls of canvas will need to be produced and 0.8336 rolls will be
rejected by the army.
k=
45
Solution
The probability that an individual round will detonate correctly is 0.99, hence the probability
that an individual round will detonate early, i.e. will fail to detonate correctly, is (10.99) =
0.01.
The number of rounds fired or, equivalently, the number of independent trials is n = 500.
Let X denote the number of rounds which detonate correctly, then X has a binomial distribution with p = 0.99, q = 0.01 and n = 500. Hence X Bi(500, 0.99).
Alternatively, let Y denote the number of rounds which detonate incorrectly (early). Then
Y also has a binomial distribution with p = 0.01, q = 0.99 and n = 500. Since in this case,
p is small (p < 0.1) and n is large (n > 10) then a Poisson approximation to the binomial
is appropriate with Poisson mean
m = n p = 500 0.01 = 5
that is, on average 5 of the 500 rounds will detonate early. Hence Y Bi(500, 0.01) can be
approximated by Y P o(5).
Here we give both answers:
1. The probability that all 500 rounds will detonate correctly is clearly the same as the
probability that none of the 500 rounds will detonate early. Thus,
P (Y = 0) = 0.99100 = 0.006570
50 e5
= e5 = 0.006738
Poisson: P (Y = 0) =
0!
Binomial:
2. The probability that greater than 495 rounds will detonate correctly is the probability
that fewer than 5 rounds will detonate early:
P (Y < 5) = P (Y = 0) + P (Y = 1) + P (Y = 2) + P (Y = 3) + P (Y = 4)
Hence, using the Binomial
P (Y = 1) = 500 0.01 0.99499 = 0.033184
500 499
P (Y = 2) =
0.012 0.99498 = 0.083631
2
500 499 498
P (Y = 3) =
0.013 0.99497 = 0.140230
32
P (Y = 4) =
51 e5
= 0.033690
1!
46
52 e5
= 0.084224
2!
53 e5
= 0.140374
P (Y = 3) =
3!
54 e5
P (Y = 4) =
= 0.175467
4!
P (Y = 2) =
So, the probability that fewer than 5 rounds will detonate early is
Binomial: P (Y < 5) = 0.439611
Poisson: P (Y < 5) = 0.440493
For larger n the results will be even closer.
47
6
6.1
Application
Binomial and Poisson are discrete distributions  they can only take certain (whole number)
values. They are used to model and describe counts. Many distributions are continuous,
used to model and describe measurements. The most important is the Normal or Gaussian
distribution.
It can be shown mathematically that when a considerable number of independent factors
(more often than not unknown) contribute in either direction to the error (or deviation from
the mean) then a normal distribution will result.
In other words measurements can be welldescribed (modelled) by the normal distribution if
they can be regarded as a sum of the effects of many independent factors. This is a common
real life situation.
Example
Heights (of people of the same age, sex and nationality) can be modelled by a normal
distribution because an individuals height can be thought of as the sum of very many
genetic and environmental factors (they are not all independent, but there are so many of
them that this has little effect on the validity of the model).
This result, the Central Limit Theorem, in effect makes statistics practicable by allowing
the normal distribution to be used in a huge number of different areas.
A corollary of this is that, even if individual measurements are not drawn from a normal
distribution, a sum or a mean of a large number of them will follow a normal distribution.
Hence tests and confidence limits (see section 1.6) for means can use the normal distribution
in many situations.
6.2
( 0)
10
10
5
mean=2, variance=4
mean=2, variance=2.25
5
10
10
f(x)
10
f(x)
5
10
f(x)
mean=0, variance=9
f(x)
mean=0, variance=1
10
5
10
If the random variable X follows a normal distribution with mean and variance 2 then
we write
X N(, 2 )
The normal p.d.f. is the socalled bell curve. Note that it extends from to , although
the probabilities are very low beyond 3.
The peak (i.e. the mode) is at the mean where x = , and the distribution is symmetric
about the mean so that this is also the median.
6.3
We often require the probability that an observation from the normal random variable X
lies between two specified values a1 , a2 , assuming that the parameters , 2 are known. This
is just the area under the curve between a1 and a2 , since the total area under the curve is
unity.
Hence the area under the curve between a1 and a2 gives
P (a1 X a2 )
There is no closed form solution to this so that we need to use statistical tables.
49
0.0
0.1
0.2
f(x)
0.3
0.4
0.5
4
3
2
1
Tables give answers only for the standard normal with mean zero and variance 1. A
standard normal random variable is often denoted Z, so that Z N(0, 1).
Probabilities for the standard normal are given in section 15.1 table 1. These give Q(z),
which is the area between 0 and z, in terms of z. Hence this gives the probability that an
observation from a normal distribution with mean 0 and variance 1 lies between 0 and z.
Note: We are using the standard notation that capital letters X, Z represent random
variables while small case x, z, a1 , a2 represent numbers. Often an observation from X is
called x, an observation from Z is called z, and so on.
50
P (0 Z z)
Q(z)
0.5 P (0 Z z)
0.5 Q(z)
P (Z z) = 0.5 Q(z)
It is easily seen from a diagram that to obtain the probability of the variable lying between
two values of z, the relevant values of Q should be added if the values of z straddle the
mean 0, or subtracted if they fall on the same side.
Examples
If Z N(0, 1) then
P (Z 1.5) = 0.5 0.4332 = 0.0668
and from this we can immediately say that
P (Z 1.5) = 1 0.0668 = 0.9332
= 0.5 + 0.4332 = 0.9332
or
=
=
=
=
0.1
0.2
f(z)
0.3
0.4
0.5
Q(z)
Q(z)
0.5Q(z)
0.0
0.5Q(z)
4
2
=
=
=
=
=
=
0.4332
0.4332 + 0.4332 = 0.8664
0.1915 + 0.4332 = 0.6247
0.4332 + 0.1915 = 0.6247
0.4332 0.1915 = 0.2417
0.4332 0.1915 = 0.2417
X
N(0, 1)
Hence, first we convert a question about X into one about Z by making the above substitution.
Z=
52
P ( X x) = P
x
= P 0Z
where Z N(0, 1)
x
= Q
x
= Q(z)
for z =
Hence any question about a value x from a N(, 2 ) distribution can be converted to one
about z = x
from a N(0, 1) distribution. Note that z is then the number of standard
22 20
X 20
5
5
22 20
= P Z
5
= P (Z 0.4)
= 0.5 0.1554
= 0.3446
P (13.2 X 21.5) = P
X 20
21.5 20
13.2 20
5
5
5
= P (1.36 Z 0.3)
= 0.4131 + 0.1179
= 0.5310
Two Important Cases
From tables:
so
Q(1.960) = 0.475
2 Q(1.960) = 0.95
Hence 95% of the total probability is within 1.96 standard deviations of the mean.
Similarly
Q(1.645) = 0.45
Hence 95% of the total probability is below 1.645 standard deviations above the mean.
53
0.2
f(z)
0.3
0.4
0.5
Standard normal
0.1
95%
0.0
2.5%
4
2
2.5%
0
0.2
f(z)
0.3
0.4
0.5
Standard normal
0.1
95%
0.0
5%
4
2
54
6.4
55
6.5
If we have a sample of n independent observations from the N(, 2 ) distribution, then the
2
n
variance
is therefore
The standard deviation of X
s
=
n
n
= N(10, 0.1)
so that
10
> 10.2) = P Z > 10.2
P (X
0.1
Hence the chance of an individual customer taking longer than 10.2 minutes is 46%, but the
chance of the mean time of all 40 customers exceeding 10.2 minutes is only 26.4%.
Furthermore, if we cant be sure that the distribution of individual times really follows a
normal distribution then the probability for an individual customer will almost certainly be
wrong, but the probability for the mean will still be approximately correct.
Example 3
Here we demonstrate (pictures overleaf) the above formula using simulated data from a
normal and a nonnormal distribution.
In each case the first histogram shows 1000 observations from the given distribution, so that
the histogram is approximately the same shape as the p.d.f.
The subsequent histograms show what happens when you have 1000 sample means rather
than 1000 individual observations. For example, in the pictures with n = 10, 1000 samples,
each of 10 observations, have been taken. The histogram is then formed from these 1000
sample means.
Hence when in practice we have a single sample of 10 observations, this is in effect a single
value from the histogram shown for n = 10.
Reminder: A sample is a set of observations, not a single observation.
Summary
1. If we have n independent observations x1 . . . xn from a normally distributed population,
then
x1 . . . xn are n values from X N(, 2 )
If we calculate the observed sample mean x then
2
N ,
x is one value from X
n
57
58
95
100
105
110
90
95
100
105
110
105
110
50
0
50
n=2
n=1
59
50
90
95
100
105
110
90
95
100
for n = 1, 3, 10, 30, where X has an expoThe histograms illustrate the distribution of X,
nential distribution with mean 5 (and variance 25). Each picture is a histogram of 1000
(simulated) observed values of x.
is clearly
As before, the spread of values, and hence the variance of the sample mean X,
lower for larger n. The histograms of the sample mean also become more symmetrical as n
does not just become symmetrical as n increases,
increases. In fact, the distribution of X
it becomes normal.
60
0
0
10
15
20
25
30
10
15
20
25
30
20
25
30
n=3
60
100
140
n=1
0 20
61
20
40
60
80
20 40 60 80 100
10
15
20
25
30
10
15
6.6
Normal Approximations
Binomial
If the number n of trials in the binomial is large then the calculation of the probability of at
least r successes may involve the summation of a large number of terms, and the factorial
terms can be very cumbersome.
However, as n becomes large, the binomial distribution tends towards the normal, unless
the value of p is very close to 0 or 1, and a much quicker (though approximate) answer can
be obtained from tables of the normal distribution.
If X Bi(n, p) then, from section 4.4,
= np
2 = np(1 p)
For n large and p not too large or small, then a good approximation is
X N(np, np(1 p))
Rule of thumb: the approximation is reasonable if
n
1
< p <
n+1
n+1
np(1 p) 10
Note the difference between this and the Poisson approximation to the binomial. The normal
is a good approximation when n is large and p is neither large nor small, while the Poisson
is a good approximation when n is large and p is small.
Poisson
If X P o(m) then, from section 5.4,
= m
2 = m
For large m, a good approximation is
X N(m, m)
Rule of thumb: OK if m > 40.
Neither of these approximations should be used outside the 3 limits.
62
Example
An unbiased coin is tossed 100 times. Find the probability of getting 60 or more heads.
The number of heads observed will follow a binomial distribution with n = 100 and p =
(and q = 12 ).
X Bi(100, 0.5)
1
2
and we require
P (X 60).
Using normal approximation:
= np
= 100 0.5 = 50
2
= np(1 p) = 50 0.5 = 25
(NB: clearly satisfies rules of thumb)
so
X N(50, 25)
However we are converting discrete to continuous. Hence probability piled up at x = 60 is
being spread out to the range 59.5  60.5. Hence we want to find P (X 59.5) in order to
include all of this probability. This is a continuity correction. Hence
P (X 60)
= P (X 59.5)
=
=
=
=
=
(binomial)
(normal)
X 50
59.5 50
P
5
5
P (Z 1.9)
0.5 Q(1.9)
0.5 0.4713
0.0287
63
6.7
Further Examples
Example 1
It is known from past experience that the lifetime of a front tyre on a staff car follows a
normal distribution with a mean lifetime of 25000 miles and a standard deviation of 3000
miles, while the lifetime of a rear tyre follows a normal distribution with a mean lifetime of
32000 miles and a standard deviation of 4000 miles.
Calculate the probability that a staff car, selected at random, is still running on the original
tyres after 30000 miles of use.
Solution
Let the random variable X denote the lifetime of a front tyre, then
X N(25000, 30002)
Similarly, let the random variable Y denote the lifetime of a rear tyre, so that
Y N(32000, 40002)
For a front tyre,
30000 25000
X 25000
>
3000
3000
P (Z > 1.667)
where Z N(0, 1)
0.5 Q(1.667)
0.5 0.4525
0.0475
P (X > 30000) = P
=
=
=
=
P (Y > 30000) = P
=
=
=
=
Hence, assuming that the tyres lifetimes are independent, the probability that the car is
still running on the original tyres after 30000 miles use is the probability that two front
tyres and two rear tyres survive 30000 miles use.
(0.0475)2 (0.6915)2 = 0.001079
However, the independence assumption seems rather dubious (given that the tyres will all
have been on exactly the same roads), so this is only approximate.
64
Example 2
A mortar is deployed against a long convoy which is travelling along a road of width 20
metres, perpendicular to the line of fire and at a range of 3 km (measured to the centre of
the road).
At this range, it is known that the standard deviation of the fall of shot in range is 40
metres; the fall of shot having a normal distribution with a mean point of impact of 3000
metres.
1. Calculate the probability of a hit on the road by the first round fired, assuming that
the mortar has been beddedin and adjusted.
2. The first round is observed to land 50 metres short of the road centre and the mortar
crew adjust the range setting to add 50 metres. If the initial setting was correct,
calculate the probability of:
(a) the first round falling short of the road centre by 50 metres or more
(b) the second round hitting the road after the adjustment has been made.
Solution
1. Let the random variable X1 denote the point of impact of the first round fired from the
mortar, then X1 N(3000, 402).
X1 3000
3010 3000
2990 3000
40
40
40
P (0.25 Z 0.25)
where Z N(0, 1)
2 Q(0.25)
by symmetry of N(0, 1)
2 0.0987
0.1974
P (2990 X1 3010) = P
=
=
=
=
2950 3000
X1 3000
P
40
40
P (Z 1.25)
where Z N(0, 1)
0.5 Q(1.25)
by symmetry of N(0, 1)
0.5 0.3944
0.1056
(b). Let the random variable X2 denote the range of the second round fired after the
adjustment has been made, then X2 N(3050, 402 ) assuming that the standard deviation
remains unchanged.
Then,
P (2990 X2 3010) = P
X2 3050
3010 3050
2990 3050
40
40
40
65
=
=
=
=
P (1.5 Z 1.0)
where Z N(0, 1)
Q(1.5) Q(1.0)
by symmetry of N(0, 1)
0.4332 0.3413
0.0919
Comment:
This demonstrates the perils of trying to correct for something without being sure that there
is anything wrong. By incorrectly concluding that the point of aim was wrong, even though
the observed fall of shot was not particularly unlikely, the chance of hitting the road has
been halved.
66
7
7.1
Introductory Example
It is believed (hypothesized) that the fall of shot from a gun for a particular setup is normally
distributed about a mean range of 10000m with a standard deviation of 100m, so that the
fall of shot is described by the random variable X where
X N(10000, 1002)
We wish to test this belief (hypothesis) by firing 1 round. The result is 9750m.
Now if our hypothesis is correct then the chance of the shell falling this far short, or even
further short, is
P (X 9750) =
=
=
=
9750 10000
P Z
100
P (Z 2.5)
0.5 Q(2.5)
0.0062
So if our belief were correct then the chance of falling at least this far short is 0.62%. This
suggests that our belief is probably wrong!
It could be wrong in any or all of 3 ways:
1. The mean is not 10000m
2. The standard deviation is not 100m
3. The distribution is not normal
This used a sample of size 1 to test a hypothesis about a population (i.e. fall of shot of all
rounds from the gun).
Clearly it is better to fire several rounds! We could then compare the sample mean x to
the hypothesized population mean, often denoted 0 (in this case 10000m). This is common
sense but there are also good theoretical reasons for taking as large a sample as possible, as
we shall discuss below.
Note the important distinction between , the unknown true population mean, and 0 , the
hypothesized population mean.
67
7.2
Some Definitions
1. Population
Totality of all possible readings when a quantity is repeatedly measured or counted.
E.g. weights of all possible shells from a production line.
2. Sample
A set of measurements taken from this population. Often denoted x1 , . . . , xn .
E.g. the weights of 20 shells.
3. Random sample
A sample where each individual measurement in the population is equally likely to be
picked, so that the sample is a fair representation of the population.
E.g. dont pick 20 successive shells, pick from whole days production.
4. Population parameter
A fixed (but usually unknown) numerical characteristic of a population.
E.g. , 2; m; n, p.
E.g. true (overall) mean weight of shells; true variance of weight of shells.
5. Sample statistic
A numerical characteristic of a sample, often used to estimate the corresponding population parameter. These vary from sample to sample, and so are (observations from)
random variables.
E.g. mean x and variance S 2 of the sample of 20 shells.
6. Sampling distribution
The distribution of a sample statistic.
E.g. distributions of the mean and the variance of samples of 20 shells.
7. Mean of a population and expected value of a random variable
The mean of a random variable X is also called its expected value and denoted E(X).
Hence when we use a random variable to describe a population, this is the population
mean.
E.g. if X N(, 2 ) then E(X) = .
8. Variance of a random variable and a population
Similarly, the variance of a random variable, and hence of the population it is describing, is denoted V ar(X).
E.g. if X N(, 2 ) then V ar(X) = 2 .
7.3
6.4 we have that, for n observations from a normally distributed population with mean ,
is
variance 2 , the distribution of the sample mean X
2
N ,
X
n
so that, equivalently,
X
N(0, 1)
/ n
In addition, thanks to the Central Limit Theorem (section 6.1), even if the observations
are not from a normal distribution, the sample mean is (approximately) normally distributed
for large n (say, n 30).
Example
Having identified the probability distribution of sample means, we can now return to our
original experimental sample and see where its mean, x, lies in relation to this distribution.
If, in the example, we observe x = 9950m from n = 25 observations, then if the hypothesis
is true, 9950m is an observation from
1002
X N 10000,
25
= N(10000, 202)
Hence the chance of the sample mean falling at least this far short is
9950) = P Z 9950 10000
P (X
20
= P (Z 2.5)
= 0.0062 (again)
Hence a sample mean of 9950m or less is just as (un)likely as a single observation of 9750m
or less.
Why large samples?
The properties of the distribution of sample means listed above give us several very good
reasons for trying to choose as large a sample size as possible. These are:
1. Larger samples are more likely to reflect accurately the population. Unusual observations are averaged out so that the sample mean x will tend to be closer to the true
2
= , this means X
varies less about as n gets
population mean . Since Var(X)
n
larger.
2. Hence also if the true population mean is not what we think, this is easier to spot
is small.
if n is large and Var(X)
gets more normallike as n increases. Hence
3. For nonnormal data the distribution of X
we can use normal tables.
69
4. Just as the sample mean x is likely to be closer to the true population mean if n
is large, similarly the sample variance S 2 is likely to be closer to the true population
variance 2 if n is large.
Important note: Up until section 10.1 we will assume
1. Either the distribution from which the observations are drawn is known to be normal,
is (approximately)
or the sample size is large (n 30) so that the distribution of X
normal anyway.
2. Either the variance 2 is known, or the sample size is large (n 30) so that S 2 is
sufficiently close to 2 that we need not worry about the difference.
We relax these assumptions from section 10.1 onwards.
7.4
This is an example of cases where statistical testing is vital in order to avoid making changes
when they may be, not just unnecessary, but in fact counterproductive. Experience in many
industries has shown that, if we modify a process without strong evidence that something
is wrong, we usually make things worse because we are compensating for problems which
did not actually exist.
A production line makes components which should be 10.40 cm long. Hence if the components being produced seem to have a mean length which is not 10.40 cm then this needs to
be detected, so the machines can be reset to correct this.
A random sample of n = 30 components is taken from a weeks production, giving a sample
mean of x = 10.51 cm and sample standard deviation S = 0.64 cm. Is the production line
in control, i.e. is it making components of the right mean length?
If the production line is in control then the true mean is 10.40, so our initial hypothesis
is that = 10.40. If this hypothesis is correct then we have 30 observations from X
N(10.40, 2 ), so that the sample mean x = 10.51 is a single value drawn from the random
variable
2
N 10.40 ,
X
30
This assumes that the individual observations come from a normal distribution, which is
will be at least approximately normal.
believable, but the sample size is 30 so that X
2
Similarly, we dont know the value of and hence have to estimate it with S 2 = 0.642 ,
which is reasonable again because n 30. Hence if the production line is in control then
x = 10.51 is a single observation from
2
N 10.40 , 0.64
X
30
70
10.51 10.40
0.64
30
= 0.94
7.5
Often we wish to compare two samples, to see if it is reasonable to believe that they came
from populations with the same mean.
For example, we may have a number, nA , of measurements of fuel consumption for scout
cars of type A and a number, nB , of measurements of scout cars of type B. We would like to
have some idea if, overall, the two types of scout car have similar fuel consumptions. To do
this we must consider the difference of the two sample means and determine the probability
of getting a difference as large or larger than this observed difference, if the samples both
come from populations with the same mean values.
To compare two sample means, in order to decide whether the respective population means
A X
B . If the true population means are A and
are equal, we need the distribution of X
2
B and the true population variances A and B2 , then we can use the results in section 6.4
A X
B has the properties:
to show that the distribution of X
The mean is
A X
B ) = A B
E(X
71
The variance is
B2
A2
+
Var(XA XB ) =
nA nB
so that the standard deviation is
s
Hence
A2
2
+ B
nA nB
2
2
A X
B N A B , A + B
X
nA nB
so that, equivalently,
A X
B (A B )
X
r
2
A
nA
2
B
nB
N(0, 1)
As with the onesample case, this is exactly true if both samples are drawn from normally
distributed populations, while it is approximately true if this is not so but the sample sizes
are large, i.e. nA and nB are both at least 30.
Example
Returning to the original example, suppose we now have two guns and wish to see whether
their mean ranges are the same. Gun A gives xa = 9950m from na = 25 observations
and gun B gives xb = 10025m from nb = 20 observations. We continue to assume that
the observed ranges are independent observations from normal distributions with standard
deviation 100m.
If the true population means are the same then
2
2
A X
B N 0 , 100 + 100
X
25
20
so that
zobs =
(9950 10025) 0
q
1002
25
1002
20
75
30
= 2.5
=
72
7.6
Further Examples
Example
Extensive firings of a certain type of rocket have established a (population) mean range of
2150m. A sample of 40 rockets fired after a years storage gave a sample mean range of
2084m with a sample standard deviation of 160m. We wish to see if storage has decreased
the mean range.
The hypothesis is that
= 2150m
and we have observed
x = 2084m
S = 160m
from n = 40 observations.
where
If the hypothesis is true then the observed x = 2084m is an observation from X,
= 2150m
E(X)
2
160
= 160 std. dev. =
= 25.3m
Var(X)
40
40
i.e. if the hypothesis is true then x = 2084 is an observation from
N(2150, 25.32)
X
Hence
zobs =
2084 2150
= 2.61
25.3
73
8
8.1
Significance Tests
Purpose
8.2
Method
This is described in terms of the onesample case, but the same ideas apply to two samples.
1. Question: Does the sample come from a population with a specified parameter value?
The value is usually given a 0 subscript, e.g. 0 .
2. Assume that it does. This is the null hypothesis, H0 .
3. Use data to estimate the parameter. This observed value is an estimate. The estimate
is then regarded as an observation from a random variable, the estimator.
4. Use H0 to find the sampling distribution of the estimator. Hence find the chance of
observing what we did, or something even more extreme. This probability is often
called the pvalue.
5. If this is small, either
(a) our sample was very unusual (atypical of its population)
or
74
8.3
The null hypothesis H0 must be something which it makes sense to believe in the absence
of evidence to the contrary. This is usually something like no change or no difference,
and can be thought of as a presumption of innocent until proven guilty.
The alternative hypothesis H1 is what we believe if we reject H0 . Normally this is just
the negation of H0 , there is a change/difference but sometimes we may wish to specify the
direction of change.
The most common case is to test
H0 : No change v H1 : Change
Hence onesample tests are often of the form
H0 : = 0
versus H1 : 6= 0
where 0 is the hypothesized mean. We reject H0 if x is much larger or much smaller than
0 , in other words if zobs is either large positive or large negative.
This is a twotailed test.
Exactly the same applies to twosample tests, except that the hypotheses are
H0 : A = B
versus H1 : A 6= B
and we compare xA xB to zero.
However, tests can also be of the form
H0 : No improvement
or H0 : No deterioration
v H1 : Improvement
v H1 : Deterioration
(or A B )
(or A > B )
In this case, only reject H0 if the observed value x is larger than 0 (or xA xB is larger
than zero), i.e. if zobs is large positive. Note that, therefore, if zobs is negative then we need
go no further since H0 will definitely not be rejected.
75
This is a onetailed test. Note that, for the purposes of executing the test, if we have
H1 : > 0 then a null hypothesis of H0 : 0 is in effect the same as one of H0 : = 0 .
Similarly, for
H0 : 0
versus H1 : < 0
(or A B )
(or A < B )
we only reject H0 if the observed value x is smaller than 0 (or xA xB is smaller than
zero), i.e. if zobs is large negative. Therefore, if zobs is positive then we need go no further
since H0 will definitely not be rejected.
In the onetailed case with H1 : > 0 the pvalue is just the probability of obtaining a
value greater than or equal to zobs from a standard normal random variable Z.
In the onetailed case with H1 : < 0 it is similarly the probability of obtaining a value
less than or equal to zobs from a standard normal random variable Z.
However, in the twotailed case we must allow for the fact that if we obtain a positive
zobs then the probability of obtaining the value we did or something even more extreme
includes both values greater than zobs and values less than zobs . Hence, by symmetry, the
pvalue is twice the probability of obtaining a value greater than or equal to zobs from a
standard normal random variable Z. (Similarly if zobs is negative it is twice the probability
of obtaining a value less than or equal to zobs ). This was done in the example in section 7.4.
The initial example in sections 7.1 and 7.3 is twotailed since we would have rejected the null
hypothesis if the apparent range had been greater than 1000m as well as less than 1000m.
Hence the pvalue should be doubled from 0.0062 to 0.0124, though this does not change
our conclusions as it is still very small. In the example in section 7.6 we would only reject
H0 if the mean range seemed to be smaller than the hypothesised value of 2150m, making
this a onetailed test. Hence the pvalue need not be doubled.
Important Note: The hypotheses, and hence whether the test is one or two tailed, are a
function of the question, not the data. Ideally, they should be framed before collecting or
seeing the data.
8.4
Critical Values
We do not usually calculate the exact pvalue. If we have specified a 1% onetailed test then
from section 15.1 table 1 we have that
P (Z > 2.33) = 0.5 0.4901 0.01
This means that if zobs > 2.33 then the pvalue must be less than 0.01. Hence we will reject
H0 if and only if zobs exceeds 2.33, so that 2.33 is the critical value for a 1% onetailed
test, often denoted zcrit .
Similarly
P (Z > 1.96) = 0.5 0.4750 = 0.025
= P (Z > 1.96) + P (Z < 1.96) = 0.05
76
Hence for a 5% twotailed test we reject H0 if and only if zobs exceeds 1.96 in magnitude.
For tests of significance at the 5% or 1% level using the normal distribution, we therefore
use the critical values given in the following table (the fourth column is explained in section
9.4):
Significance
level
5%
1%
Section 15.1 table 2 gives critical values for some more significance levels.
The purist view is that we should specify our significance level beforehand and reject (or
not) H0 on the basis of that critical value alone. If we have to choose between two courses
of action, depending on whether we reject H0 or not, then this is the method to follow. We
pick according to how convincing we want the evidence against H0 to be before we will
reject it.
However, in practice people often do not specify and instead use critical values to assess
where the pvalue lies, then translate this back into English, indicating the weight of evidence
against H0 .
Recall that the pvalue is the probability of observing what we actually did, or something
even less compatible with H0 , on the assumption that H0 is true. A common interpretation
of a pvalue of p is then something like:
p
p
p
p
p
>
<
<
<
<
8.5
Observed mean x from a sample of size n. Can we assume that the true population mean
is equal to some hypothesised value 0 , or not?
H0 : = 0
versus H1 : 6= 0
(Or a similar onetailed test).
The following procedure is valid only if either
1. Both of the following are true:
(a) The observed data are n independent observations from a normal distribution
(i.e. they come from a population that is normally distributed).
77
x 0
/ n
Here is the true standard deviation, if known, otherwise substitute the sample
standard deviation S.
2. If H0 is true then this is an observation from N(0, 1). Hence compare the observed
value zobs to the appropriate critical value from section 15.1 table 2.
3. For a 2tailed test, reject H0 if
zobs  zcrit
4. For a 1tailed test, reject H0 if
zobs zcrit
(or if zobs zcrit , depending on which way round the hypotheses are).
This is often referred to as a onesample z test.
Example
A factory producing 7.62mm ammunition has to undergo an annual quality control test
conducted by Ordnance Board inspectors. The nominal mass of the bullet is specified to be
9.33g. A sample of 100 rounds selected at random from the production line gave a mean
bullet mass of 9.28g and a standard deviation of 0.15g.
Would the inspector conclude that the production process is producing rounds of the correct
mean weight, or not?
Let the random variable X denote the mass of a bullet drawn at random from the production
line of the factory; in this instance, the actual distribution of X is unknown.
We initially assume that the production process is working properly, i.e. it is producing
rounds which have a bullet mass distributed about the specified value of 9.33g, hence we
assume that the population bullet mass is equal to 9.33g.
Hence we have
H0 : = 9.33 v H1 : 6= 9.33
The sample size n = 100 rounds is sufficiently large to assume that the sample standard
deviation S = 0.15g provides a reasonably good approximation to , the standard deviation
78
of the population. Similarly, the masses will probably be normally distributed, but with n
so large the sample mean will be approximately normally distributed anyway.
Hence if our null hypothesis is true then our observed x is an observation from a random vari
able X,
which is (at least approximately) normal with mean 9.33g and standard deviation
0.15/ 100. Therefore
zobs =
x 9.33
0.15/ 100
9.28 9.33
0.015
= 3.33
=
Hence the sample mean value x = 9.28g, lies 3.33 standard deviations to the left of the
expected mean value, assuming the null hypothesis to be correct.
There are two (essentially equivalent) ways to interpret this result, either we calculate the
pvalue exactly or we just find the region within which it lies.
a) From tables, the probability of obtaining a value of x as small (or smaller) than 9.28g is
therefore
9.28) = P (Z 3.33) =
P (X
=
=
=
0.5 Q(3.33)
0.5 Q(3.33) by symmetry of N(0, 1)
0.5 0.4996
0.0004
Thus, we should expect this result to occur only four times out of every ten thousand samples
of size n = 100 drawn from this population, if the population mean really is = 9.33.
However, our alternative hypothesis was twosided so that Z 3.33 is just as unlikely, so
we double the probability to obtain a pvalue of 0.0008.
Clearly, since this result is so unlikely it provides evidence to suggest that the population
mean weight of the bullets is not 9.33g, and we would therefore reject H0 . It seems that (on
average) underweight rounds are being produced.
b) More simply, we compare 3.33 to table 2a in section 15.1 and see that it exceeds (in
magnitude) even the 0.1% critical value. Hence the pvalue is less than 0.001 and so, using
the translation in section 8.4, we can say that there is very strong evidence against H0
(p < 0.001). Hence there is very strong evidence that is not 9.33, and since zobs is negative
we conclude that in fact is less than 9.33.
8.6
If H0 is not rejected, this doesnt mean that H0 is true, it just means that our data do not
give us sufficient evidence to reject it. Hence it is always better to say do not reject H0
rather than accept H0 , even though it sounds clumsy.
79
If H0 is rejected, we often say that x is significantly different from the hypothesised value
0 .
However, note that significant is being used in a specific technical sense, meaning that
(roughly speaking) there is a significant chance that H0 is false.
Hence statistical significance is not necessarily the same as practical importance of the difference between the true and the hypothesised mean. However, we can usually rejig the
question so that it is.
Example
Past experience shows that the fuel consumption of our staff cars is 20 mpg. We are testing
an engine upgrade to see if we can improve this. The obvious hypotheses are
H0 : 20
versus H1 : > 20
However, if we have 100 observations then it is possible that a sample mean of 20.2 could
lead to a significant result. We would have evidence that the true mean of the modified
engines is greater than 20, but our best guess is that it is in fact only 20.2. Embarking on
expensive upgrades of the engine just for this might not seem very clever, especially since a
better upgrade might come along soon.
However, all we need to do is specify beforehand how much of an improvement is necessary
before we will consider switching. If we require the upgraded engines to have at least 22
mpg then we just test
H0 : 22
versus H1 : > 22
8.7
(so A B = 0)
2. At least one of the above is not true, but the sample sizes are large, with nA 30 and
nB 30 being a common definition of large. In this case the test is approximately
correct.
1. Calculate the test statistic.
xA xB 0
zobs = r 2
2
A
+ nBB
nA
Here A and B are the true standard deviations for populations A and B, if known,
otherwise substitute the sample standard deviations SA and SB .
2. If H0 is true then this is an observation from N(0, 1). Hence compare the observed
value zobs to the appropriate critical value from section 15.1 table 2.
3. For a 2tailed test, reject H0 if
zobs  zcrit
4. For a 1tailed test, reject H0 if
zobs zcrit
(or if zobs zcrit , depending on which way round the hypotheses are).
This is often referred to as a twosample z test.
Example
Two types of body armour are being trialled by firing bullets at a given velocity and measuring the residual velocity, that is the velocity of the bullet after going through the armour.
Stronger armour will have lower mean residual velocity. The results are as follows.
Existing Armour
New Armour
Mean (m/s)
311.4
298.1
Is there evidence at the 5% level that the new armour is stronger than the old?
We only want to know if the mean residual velocity for the new armour is lower than for
the existing armour, so the hypotheses are
H0 : N E v H1 : N < E
The sample sizes are fairly large, so we can assume approximate normality of the sample
means and estimate the population variances with the sample variances. Hence
298.1 311.4
zobs = q 2
2
37.5
+ 43.6
40
30
= 1.34
The critical value for a 5% 1tailed test is zcrit = 1.64, so clearly zobs  < zcrit and hence we
do not reject H0 . On the basis of these data there is no reason to conclude that the new
armour is any stronger than the existing armour.
81
8.8
Types of Error
A hypothesis test is a decision rule, and we can err in either of two ways:
1. Reject H0 when it is in fact true, a Type 1 Error. The probability () of this is
precisely the chosen significance level (eg 5%, 1%). This follows from the way the tests
are designed.
2. Accept H0 when it is false, a Type II Error. The probability () of this depends
on both H1 and the true value of .
Note that in significance tests we therefore control the Type 1 Error probability by our
choice of significance level. However, we can never know exactly what the Type II Error
probability is because it depends on the unknown true value of the parameter.
For a given sample size, decreasing will always increase and viceversa. In other words,
if we are very concerned about incorrectly rejecting H0 and so pick a small significance level,
this makes it more likely, if H0 is really false, that we will incorrectly fail to detect this.
However, for any value of , increasing n will decrease .
A plot of the probability of a Type II Error against the possible true parameter values
is sometimes called the Operating Characteristic (OC) curve. This curve is used to display
the consequences of adopting any particular decision rule.
8.9
Further Examples
Example 1
A mathematical model has been developed to predict the range of a riflelaunched grenade,
given the initial velocity v and the angle of departure .
According to the theory, the nominal range of the grenade with parameters v = 70 m/s and
= 30o is 358.5m.
A trial was set up to compare this theoretical result with that obtained in practice. A
sample of 60 rounds were fired, which gave a sample mean range of x = 363.1m, and a
sample standard deviation S = 31.8m.
Is there evidence at the 5% significance level to suggest that the theoretical mean range and
the true mean range R are different?
Solution
null hypothesis
alternative hypothesis
H0 : R = 358.5
H1 : R =
6 358.5
We are interested in detecting a difference from the theoretical value, and not an increase
(or decrease), hence a twotailed test of significance at the 5% level is appropriate.
82
Since the sample size n = 60 is large (> 30), we can assume that the sample standard
deviation S is a reasonable approximation to the population standard deviation . Also,
since n is large, it is appropriate to assume that the distribution of sample means (of size
n = 60) drawn from this population is approximately normal, with a hypothesised mean of
0 = 358.5m, and standard deviation
31.8
S
= = 4.1054
n
60
Then
zobs =
x 0
363.1 358.5
=
= 1.12
S/ n
4.1054
Sample Mean
Sample Standard Sample
Muzzle Velocity
Deviation
Size
700.5
22.4
40
715.7
18.3
40
Is there any evidence of a difference between the mean muzzle velocities of the two weapons,
at the 1% significance level?
Solution
null hypothesis
H0 : A B = 0
alternative hypothesis H1 : A B 6= 0
Since we are interested in detecting a difference between the mean muzzle velocities of the
two weapons, a twotailed test of significance is appropriate.
Since both sample sizes are large, nA = nB = 40, then the sample standard deviations can
be assumed to be reasonable approximations to the population standard deviations A and
B respectively.
83
A of size
For the Gun A muzzle velocity measurements, the distribution of sample means X
nA = 40 drawn from the population can be assumed to be approximately normal with mean
A and standard deviation,
22.4
SA
= = 3.5418 m/s
nA
40
Similarly, for the Gun B muzzle velocity measurements, the distribution of sample means
B of size nB = 40 drawn from the population can be assumed to be approximately normal
X
with mean B and standard deviation,
18.3
SB
= = 2.8935 m/s
nB
40
We are testing the difference between the population means on the evidence supplied by two
A X
B ). Under the assumptions made
samples, thus the random variable of interest is (X
above, this will be at least approximately normally distributed with a mean of (A B ),
which is equal to zero under the null hypothesis. Its standard deviation is
s
SA2
S2
+ B = 3.54182 + 2.89352 = 4.5735 m/s
nA nB
Hence under H0
A X
B N(0, 4.57352 )
X
(
xA xB ) 0
r
2
SA
nA
2
SB
nB
(700.5 715.7)
= q 2
2
22.4
+ 18.3
40
40
15.2
4.5735
= 3.3235
=
Hence the difference between the sample mean muzzle velocities of the two weapons lies
3.3235 standard deviations away from the hypothesised difference, if the null hypothesis is
indeed true.
From tables, the critical value for a twotailed test of significance at a level of 1% is zcrit
= 2.58. Since zobs  > zcrit we therefore reject the null hypothesis H0 at the 1% level.
Hence there is strong evidence to suggest that the two weapons have different mean muzzle
velocities and are therefore unsuitable for the trial.
Example 3
The takeoff distances of two aircraft A and B were recorded 50 and 70 times respectively.
The sample means and standard deviations of these measurements are summarised in the
following table:
84
Mean
Aircraft A 251.6
Aircraft B 228.1
Is there any evidence at the 1% significance level of a difference between the takeoff distances
of the two aircraft?
Solution
Here we have hypotheses
H0 : A = B v A 6= B
The sample sizes are large, so that the assumption of normality and the use of sample
variances in place of population variances will not be too inaccurate.
Hence the test statistic is
251.6 228.1
zobs = q
= 3.767
(34.1)2
(33.4)2
+
50
70
Since this is greater than the critical value of zcrit = 2.58 for 1% significance (twotailed)
there is strong evidence to suggest a difference in the takeoff distances of the two aircraft,
with the distance for A being longer.
85
9
9.1
Confidence Intervals
Introduction
Decision rules are useful for deciding between two courses of action. However, it is often
more useful to know whereabouts the true mean might lie, given the sample mean x. A
range of values where might plausibly lie is a Confidence Interval. The end points are
referred to as confidence limits.
Typically we construct a 95% Confidence Interval (CI) and say that we are 95% confident
that the true mean lies within the interval. The precise meaning of the interval is as
follows:
If we draw many samples and construct a 95% CI for each, approximately 95%
of these CIs will contain .
Similarly, if we wanted to be more confident of including the true mean, we could construct
a 99% CI, which will therefore be wider.
The procedure for constructing confidence intervals is very like a hypothesis test. Essentially,
a 95% CI is the set of possible (hypothesised) values of which would not be rejected by a
5% test.
9.2
Example
To determine a practical average fuel consumption for the Warrior APC across country, a
trial was performed using thirty vehicles each equipped with an accurate measuring device.
The fuel consumption was recorded for each of these over the same crosscountry course.
The 30 data values obtained gave a sample mean fuel consumption of x = 1.97 mpg and a
sample standard deviation of S = 0.51 mpg.
Derive the 95% confidence interval for the population mean fuel consumption for Warrior
APCs (of similar age and wear as those which were employed in the trial, and in similar
conditions to the course used).
Since the sample size n = 30 is fairly large, then the distribution of sample means X
can be assumed to be normal with a mean
of , the unknown true population mean fuel
consumption, and standard deviation / n. In other words
2
N ,
X
n
The observed sample mean x = 1.97 is an observation from this distribution. As a further
consequence of taking a large sample, the sample standard deviation S provides a reasonable
approximation for the population standard deviation .
86
Therefore
Z=
N(0, 1)
P 1.96
rearranging gives
1.96 = 0.95
!
+ 1.96 = 0.95
1.96 X
P X
n
n
but it seems natural to substitute in the
This is an expression for the random variable X,
observed sample mean x (and observed sample standard deviation S if necessary) to give
the 95% confidence limits, or 95% confidence interval:
x 1.96
S
n
= 1.97 1.96
0.51
30
= 1.7875 mpg
x + 1.96
S
n
= 1.97 + 1.96
0.51
30
= 2.1525 mpg
Hence a 95% confidence interval is (1.7875, 2.1525) mpg. We are 95% confident that the
true mean fuel consumption is between about 1.79 and 2.15 mpg.
Twosided Intervals: General Formula
Suppose we have a sample of n observations with mean x. As in section 8.5, the formula
below is only appropriate where either the population is normally distributed with known
and that S is a good
variance, or if n is sufficiently large that we can assume normality of X
estimate of .
Then the CI for is
x zcrit x + zcrit
n
n
where
1. If is unknown then replace by the estimate S.
2. zcrit is the critical value from normal tables, so that for example zcrit = 1.96 gives a
95% CI and zcrit = 2.58 gives a 99% CI
Hence for example a 95% CI for is
x 1.96 , x + 1.96 )
x 1.96 = (
n
n
n
(usually using S for ).
87
Example
From a sample of 60 mortar rounds fired from the same setup, the mean range is found to
be 350m with standard deviation 42m. Determine the 95% confidence limits for the mean
of the population, i.e. the mean range for rounds fired with this setup.
Since n = 60 is large, the sample mean can be assumed to be at least approximately normally
distributed, and we can estimate the unknown by S = 42. Thus the approximate 95%
confidence limits are given by
S
x zcrit
n
which is
42
350 1.96 = 350 10.6275 = (339.37, 360.63)
60
Hence we can be 95% confident that the true mean range is between 339.37m and 360.63m.
Onesided Intervals: General Formula
Confidence intervals are usually twosided as in the previous paragraph. However onesided intervals are used when interest is restricted to just the highest or just the lowest
value of the parameter being estimated. With onesided intervals zcrit is taken from table
2b, as with a onetailed significance test.
In general a onesided CI takes the form
, x + zcrit
,
n
for a lower limit only. For example, for the 95% confidence level, we take the value zcrit =
1.64.
Example
Fortysix light bulbs of a particular make are tested and their lifetimes recorded, with the
following results:
n = 46 , x = 1070 hours , S = 245 hours
From the data determine the onesided lower 95% confidence limit for the mean lifetime of
this type of bulb.
Since the sample is large we can use the normal distribution as usual and estimate by
S = 245. Since we require a onesided confidence interval, the appropriate value of zcrit is
1.64.
Hence the lower confidence limit is
245
1070 1.64 = 1070 59.24 = 1010.76
46
Hence we can be 95% confident that the mean lifetime of this type of light bulb is at least
1010.76 hours.
88
9.3
Example
The results of a crosscountry fuel consumption trial on the FV432 APC and the Warrior
APC are given in the following table:
APC
Sample Sample
Size
Mean
FV432
45
2.74
Warrior
30
1.97
Sample Standard
Deviation
0.62
0.51
Find the 99% confidence interval for the difference between the mean fuel consumptions of
the two vehicles crosscountry.
FV432: Since the sample size nF = 45 is large, then it can be assumed that the distribution
F is
of sample means X
2
F N F , F
X
nF
Warrior: Since the sample size nW = 30 is large, then it can be assumed that the distribution
W is
of sample means X
2
W N W , W
X
nW
As the sample size is at least 30 in each case, the sample variances can be taken as reasonable
approximations to the population variances. We require a 99% confidence interval for the
difference between the mean fuel consumptions for the FV432 and Warrior, that is for
F X
W ) which, by the assumptions
F W . Hence we consider the random variable (X
made above, is distributed as
F X
W N F W
X
2
2
, F + W
nF
nW
From normal tables we have for a 99% confidence interval that zcrit = 2.58, that is
P (2.58 Z 2.58) = 0.99
where Z N(0, 1), i.e.
Z=
F X
W ) (F W )
(X
r
2
F
nF
2
W
nW
N(0, 1)
Thus,
F X
W ) (F W )
(X
r
2.58
2.58
P
= 0.99
2
2
F
W
+ nW
nF
89
rearranging gives
F X
W 2.58
P X
F2
2
F X
W + 2.58
+ W F W X
nF
nW
F2
2
+ W
nF
nW
= 0.99
F X
W , but again it seems natural to
This is an expression for the random variable X
substitute in the observed difference in sample means xF xW (and the observed sample
2
variances SF2 and SW
if necessary) to give the 99% confidence interval.
Plugging in the observed standard deviations to obtain the estimated standard deviation of
F X
W we obtain
X
s
S2
SF2
+ W
nF
nW
0.622 0.512
+
= 0.131195 mpg
45
30
SF2
S2
+ W = (2.74 1.97) (2.58)(0.131195) = 0.4315 mpg
nF
nW
(
xF xW ) + 2.58
SF2
S2
+ W = (2.74 1.97) + (2.58)(0.131195) = 1.1085 mpg
nF
nW
and
Hence a 99% confidence interval for the difference in mean fuel consumption between the
two vehicles is (0.4315, 1.1085) mpg, with the FV432 having the larger mean.
General Formula
Suppose we have two samples, size nA and nB , means xA and xB . As in section 8.7, the
formula below is only appropriate where either both samples come from populations which
are normally distributed with known variances, or if both sample sizes are sufficiently large
that we can assume normality of the sample means and that the sample variances are good
estimates of the population variances.
The CI for the difference in population means A B is
xA xB zcrit
A2
2
+ B
nA nB
where A2 , B2 are the variances of the two populations from which the samples were drawn,
or if unknown their estimates SA2 and SB2 .
The modification for a onesided interval is the same as in the onesample case.
Example
Two sets of soldiers each train for and perform a navigation exercise. One group trained
using Virtual Reality (VR) equipment while the other used standard methods (maps, books).
The response measured was the time (in minutes) for each soldier to complete the route,
and the results were:
VR : nV = 42, xV = 23.4 mins , SV = 5.7 mins
Standard : nS = 37, xS = 25.9 mins , SS = 6.8 mins
90
It is quite likely that the individual times will not follow a normal distribution (and a
histogram would illustrate this) since it is easier to make a mistake and lose a lot of time
than it is to save a lot of time. This would lead to a skewed distribution with a much
longer tail to the right than to the left. However, the sample sizes are both sufficiently large
V and X
S should both be approximately normal and the sample standard deviations
that X
should be reasonable approximations of the population ones. Hence an approximate 95%
CI for the difference (Standard minus Virtual Reality) is
25.9 23.4 1.96
5.72 6.82
+
42
37
2.5 2.79
(0.29 , 5.29)
Hence we are 95% confident that the true difference in means is somewhere between Standard
taking 5.29 minutes longer and VR taking 0.29 minutes longer. Note that zero is within
this interval. In fact, from section 9.4, this means that a 5% test of
H0 : V S = 0 v H1 : V S 6= 0
would not reject H0 .
9.4
A twosided 95% CI and a twotailed 5% test are equivalent in the following sense:
For a 5% test of H0 : = 0 v H1 : 6= 0
H0 is rejected if 0 is outside the 95% CI for .
H0 is not rejected if 0 is inside the 95% CI for .
Similarly for 99% CI and 1% test etc, and similarly for 1tailed test and 1sided CI of
appropriate sizes.
Example
We return to the example in section 7.4 where a random sample of n = 30 components was
taken from a weeks production, giving a sample mean length of x = 10.51 cm with a sample
standard deviation of S = 0.64 cm. Making the usual assumptions, a 95% CI for the true
mean length is
0.64
10.51 1.96
30
10.51 0.23
(10.28 , 10.74)
Note that the target length of 10.40 cm is inside this interval, which agrees with the
fact that the pvalue in section 7.4 exceeded 0.05, so that a 5% test would fail to reject
H0 : = 10.40.
91
9.5
Further Examples
Example
From two samples obtained from different types of scout car the following fuel consumption
figures were obtained:
Type A : nA = 36, xA = 7.5 mpg , SA = 0.6 mpg
Type B : nB = 64, xB = 6.1 mpg , SB = 0.5 mpg
Find the 99% confidence limits for the difference between the means of the populations from
which these samples were taken, (i.e. the 99% confidence limits on the possible difference
in mpg of the two types of scout car).
Solution
Since both samples are fairly large the distribution for the difference of sample means will be
at least approximately normal. The confidence limits are determined as above with A , B
approximated by the sample standard deviations 0.6, 0.5 respectively and zcrit taken from
tables to be 2.58. Thus the 99% limits for the difference between the fuel consumption of
Type A and Type B scout cars are:
7.5 6.1 2.58
0.62 0.52
+
= 1.4 0.304 mpg.
36
64
Hence we can be 99% confident that the difference is between 1.096 and 1.704 mpg, with
Type A having the higher mpg.
92
10
10.1
The previous section examined how one might draw conclusions about the mean of a population from the analysis of a large set of measurements, i.e. a large sample. In practice,
however, one often has relatively few measurements on which to base a decision. We therefore must also consider the small sample case.
The tests and CIs in sections 7.1 to 9.5 are exactly correct if the data are observations from
a normal distribution with known population variance. They are approximately correct
if the data are observations from a nonnormal distribution and/or the population variance
is unknown, but only if n is large. In this context large depends on how nonnormal the
distribution of the data appears to be, but n 30 is a common rule of thumb.
The tests and CIs in this section are exactly correct if the data are observations from a
normal distribution with unknown variance.
10.2
Students tDistribution
We perform tests and construct confidence intervals almost exactly as before, but replace
critical values from the normal distribution (table 2) with those from Students tdistribution
(table 3). Note that this is valid for n large or small, but it is only for small n where it
makes much difference.
The t distribution is like a N(0, 1) with fatter tails (see picture overleaf), with the exact
shape depending on the degrees of freedom, . Roughly,
= n (at least 1)
See section 10.8 for more explanation of degrees of freedom.
The t distribution is tabulated in section 15.1 table 3. A t with degrees of freedom is
sometimes denoted t .
Note that the t distribution becomes more normallike as n (and so ) increases, and in fact
t is precisely the normal distribution.
10.3
The following procedure applies to the case when the data can be assumed to come from a
normal distribution but the true population variance is unknown.
The procedure is almost exactly the same as that for the onesample z test in section 8.5,
with the exceptions noted below.
93
0.0
0.1
0.2
f(x)
0.3
0.4
0.5
4
2
Observed mean x from a sample of size n. Can we assume that the true population mean
is equal to some hypothesised value 0 , or not?
H0 : = 0
H1 : 6= 0
(the ideas are similar for onetailed tests).
Calculate the test statistic
tobs =
x 0
S
n
Example
Five rounds fired from a gun fall at ranges of 12560, 12490, 12550, 12500 and 12520 metres.
Are these results consistent with a range table prediction of 12500 metres?
We have hypotheses:
H0 : = 12500
H1 : 6= 12500
so that this is a singlesample, twotailed test. Now, from the data above we have
n = 5,
so that
x = 62620,
x =
S =
x2 = 784256600
62620
= 12524
5
v
u
u
t
s
n
X
1
x2 n
x2
n 1 i=1 i
1
(784256600 5 125242)
4
= 30.5
Hence the test statistic is
tobs =
12524 12500
30.5
= 1.76
10.4
Two samples of sizes nA and nB give mean values xA , xB respectively: could they reasonably
both have come from populations with the same mean?
H0 : A = B
H1 : A 6= B
(again, same idea for onetailed tests).
This divides into two distinct situations.
95
Paired Samples
Sometimes nA = nB and there is a natural pairing between the observations in the two
samples (for example, before and after values for the same individual).
In such cases calculate the difference for each pair. Then to test for a difference in means
perform a onesample test of H0 : d = 0 where d is the true mean difference.
Note that the observed differences must satisfy the assumptions required by a onesample
test in section 10.3.
NonPaired Samples
This is the more usual case. The following procedure only works if both of the following are
true:
1. Both observed data sets consist of independent observations from normal distributions.
2. The variances of these normal distributions are unknown but can be assumed to be
equal.
A common rule of thumb is that it is reasonable to assume equality of the population
variances if
1
S2
A2 2
2
SB
If this is so, calculate the pooled estimate of the common variance 2 ,
Sp2 =
Note that this is a weighted average of the two variances, and is just the simple average if
nA = nB .
Then find
xA xB
xA xB
q
=
tobs = r 2
1
1
2
Sp
Sp
S
p nA + nB
+ nB
nA
Soldier
ATGW A
ATGW B
1
7
9
2 3 4
3 5 9
6 4 9
5 6
7 5
9 6
7 8 9
4 3 2
8 7 2
10
3
2
(pairs)
and hence
tobs =
1.4 0
1.897
10
= 2.33
The 5% critical value, for = 9 is tcrit = 2.26 so reject H0 since tobs exceeds this. There
is evidence of a difference, with B doing better. However, since tobs only just exceeds tcrit ,
and the assumption of normality is a little dubious, it can be argued that this conclusion
should be toned down. More data are certainly needed.
Example: Unpaired 2sample t test
The following distances in metres were recorded for the takeoff of two aircraft:
Aircraft A
Aircraft B
Assuming that the distances are normally distributed, is there any evidence of a difference
(at 5% significance level) between the takeoff distances of the two aircraft?
97
5 19.8742 + 8 15.4432
= 298.6795
=
13
253.2 227.0
17.2823
r
1
6
1
9
= 2.87
From the ttable for 5%, twotailed, with degrees of freedom = 6 + 9 2 = 13, we have
tcrit = 2.16. Since tobs exceeds tcrit , we conclude that there is evidence of a difference between
the takeoff distances of the aircraft, with aircraft A having the longer takeoff distance.
98
10.5
Confidence Intervals
All previous comments in sections 10.3 and 10.4 on when tests are appropriate apply equally
to confidence intervals.
One Sample Case
The basic procedure is exactly as in section 9.2, we simply use critical values from the t
distribution instead of the normal.
Hence a twosided confidence interval for is
S
x tcrit
n
where tcrit is the critical value from the t distribution with = n 1 degrees of freedom.
Example
Three tanks are fitted with new tracks and each is driven under trial conditions until either
of its tracks fails. The distances until failure are 6000, 7000 and 8000 miles. Hence
x = 7000
1
(60002 + 70002 + 80002 3 70002)
S2 =
2
= 1000000
S = 1000
If the track failures are due to wear rather than catastrophic failure then the distances
should be approximately normally distributed, so a 95% CI is
1000
7000 4.30
3
7000 2483
= (4517 , 9483)
For very small sample sizes, CIs are often very wide indeed!
Two Sample Case
Just as with tests, this breaks down into paired and unpaired cases.
For paired data use the above onesample procedure on the differences.
For unpaired data calculate the pooled variance Sp2 , if appropriate, and the CI is
xA xB tcrit Sp
1
1
+
nA nB
1 1
+
= 26.2 19.6746
6 9
= (6.525 , 45.875)
Hence we are 95% confident that the difference in means is between about 6.5 and 45.9
metres, with A having the larger takeoff distance. Note that this interval does not include
zero, as expected since the test rejected H0 (see section 9.4).
100
10.6
Further Examples
x 0
S
n
818.3 838
19.653
10
= 3.17
so that the interval is (0,835.8) and we are 99% confident that the true mean velocity is
no more than 835.8 m/s. The hypothesised (target) value of (at least) 838 is outwith this
interval and hence 838 is not a credible value of .
Sometimes even after a onetailed test we prefer a twosided confidence interval. For example
here the 95% twosided confidence interval gives
19.653
= (804.3 , 832.3)
818.3 2.26
10
so that we are 95% confident that the true mean muzzle velocity is within this range.
Example 2: Twosample (unpaired) t test
Two machines are set up to produce bullets for AR15 5.56 45 ammunition, the nominal
diameter of the bullet being 5.56 mm. A sample of 10 rounds is selected at random from
the production of each machine and the diameter of each bullet is measured. The sample
data obtained are as follows:
Machine A
Machine B
5.50 5.54 5.56 5.49 5.60 5.44 5.49 5.58 5.56 5.55
5.59 5.55 5.63 5.61 5.52 5.58 5.56 5.60 5.51 5.52
Does this data set provide sufficient evidence, at the 1% significance level, to conclude
that Machine A is producing bullets with a different mean diameter to those produced by
Machine B?
Solution
The hypotheses are
H0 : A = B v H1 : A 6= B
We have
nA = 10 , xA = 5.5310 , SA = 0.049318
nB = 10 , xB = 5.5670 , SB = 0.041647
This is a twosample test since there is no pairing in the observations, and we need to be
able to assume that both sets of observations are from normally distributed populations.
This seems reasonable, but we would look at similar past data to check.
The ratio of the sample variances is
0.0493182
= 1.4023
0.0416472
so that pooling variances is fine. Then
Sp2 =
9 0.0493182 + 9 0.0416472
= 0.00208333
18
Sp = 0.0456435
102
Hence
tobs =
5.5310 5.5670
q
0.0456435
= 1.76
1
10
1
10
1
1
+
= (0.0942 , 0.0222)
10 10
1 2 3
79 82 91
77 80 93
4 5 6 7 8 9
87 63 95 71 78 86
87 55 90 68 76 86
10
90
89
Assuming that scores are normally distributed, test at the 1% level whether there is any
evidence to suggest a difference in the gunners performance between Turret Trainer and
Desk Top Trainer.
Solution
Since the scores are paired for each soldier, we use the differences X=DTTTT,
Soldier
X
1
2
+2 +2
3 4 5
6
2 0 +8 +5
7
8 9
+3 +2 0
10
+1
The sample mean of the differences is x = 2.1, with a standard deviation S = 2.807 and a
sample size n = 10.
We start by assuming that there is no difference between the population mean score for the
DTT and the population mean score for the TT, and we are interested in detecting any
difference between the DTT and TT scores, so this is a twotailed test of significance at the
1% level,
H0 : d = 0
H1 : d 6= 0
103
x 0
S
n
2.1 0.0
2.807
10
= 2.366
From the tables, the critical value of t9 at the 1% level of significance is tcrit = 3.25. Since
tobs < tcrit , we do not reject H0 . There is no evidence (at the 1% level) to suggest that the
gunners score differently on the Turret Trainer than they do on the Desk Top Trainer.
A 99% twosided CI is given by
2.807
= (0.785 , 4.985)
2.1 3.25
10
which includes zero.
Example 4: Onesample t CI
Five measurements of the diameter of a sphere (in cm) gave the sample
5.33 5.37 5.36 5.32 5.38
Find the 95% confidence interval for the diameter.
Solution
By calculation
n=5
x = 5.352
S = 0.0259
Hence, using tcrit = 2.78 for 5 1 = 4 degrees of freedom, a 95% CI for the true mean is
5.352 2.78
0.0259
= 5.352 0.032
5
= (5.320, 5.384)
This requires the assumption that the observations come from a normal distribution, and in
the absence of past information on similar problems it is difficult to assess whether this is
valid. Hence one can argue that the above is only very approximate, and should be regarded
as a lower bound on the true level of uncertainty. In other words, the CI probably should
be wider than that above, but how much wider we dont know. We should be at least as
uncertain about the true diameter as the above CI indicates.
104
10.7
The following table summarises when the z and t tests and confidence intervals are appropriate.
Note the slight extra complication with twosample t test and intervals, noted in section
10.4.
Distribution
normal
normal
normal
nonnormal
nonnormal
10.8
2
known
unknown
unknown
any
any
n
any
any
large
large
small
Use
z
t
z
z

Accuracy
exact
exact
approximate
approximate

Suppose you are asked to write 3 numbers with no restrictions imposed upon them. You
have complete freedom of choice with regard to all 3 numbers; there are 3 degrees of freedom.
Suppose now you are asked to write 3 numbers with the restriction that their sum must be
some particular value, say 20. You cannot now choose all 3 numbers freely because as soon
as the choice of the first 2 is made the third number is fixed. Your choices are governed by
the relation X1 + X2 + X3 = 20. In this situation there are only 2 degrees of freedom. The
total number of variables is 3 but the number of restrictions upon them is 1 and thus the
number of free variables is 3 1 = 2.
Suppose now you are asked to write 5 numbers such that their sum is 30 and also such that
the sum of the first 2 is 20. Although there are 5 variables you do not have freedom of choice
with regard to all 5. As soon as you choose the first number the second is determined by the
relation X2 = 20 X1 . Also as soon as you select X3 and X4 , the last is determined by the
relation X5 = 30 20 X3 X4 . The degrees of freedom are thus found by subtracting the
number of independent restrictions placed on variables from the total number of variables,
5 2 = 3 in this case.
If we have a sample of size n, its variance is calculated from all n measurements in the
formula
n
X
i=1
(xi x)2
However, since x the mean of the sample has already been determined, only n 1 of these
measurements are independent. Hence the above sum is divided by n 1 rather than n to
obtain S 2 , and consequently the degrees of freedom associated with a single sample ttest is
= n 1.
Similarly in the two sample case the calculation of Sp2 involves two quantities, the means xA
and xB , and hence places two restrictions on the nA + nB measurements. Hence the degrees
of freedom in this case are = nA + nB 2.
105
11
11.1
The variance (or standard deviation) gives the best measure of the spread of a population,
and can therefore be thought of as a measure of consistency.
The material in this section only applies if the data are observations from a normal distribution. The central limit theorem does not apply to variances, so that even for a large
sample size the assumption of normality is needed.
One Sample Case
To check whether the variance of a population is compatible with some designated value,
the sample variance is compared with the given value. Alternatively, confidence limits may
be derived to provide a range of values in which the population variance is likely to be.
Two Sample Case
To check whether two populations have the same variance, samples would be taken from
each population and the two sample variances compared. Such a test would indicate for
example whether some modification has produced a change in consistency.
Example
Two different types of automatic cutting machine are set to cut strips of metal to a nominal length of 30cm. However, random effects and engineering limitations mean that each
machine cuts strips which have a distribution of lengths. Although both distributions may
be centred around the nominal 30cm value we would obviously prefer the machine which
gave rise to the smaller spread of lengths about this value, i.e. the distribution with the
smaller variance if this can be identified. In this context therefore we are trying to determine
whether one machine has a better precision than the other, and can use a twosample test
to compare the variances. Further, if strips need to be between 29cm and 31cm to be usable
then clearly we want = 30cm (or very close) and 2 small, preferably < 0.5cm. This
requires a onesample test, or confidence interval.
11.2
Fishers FDistribution
The basic ideas are just the same as when dealing with means, except that we usually
have to look at ratios of variances (rather than differences in means), and we use the F
distribution rather than the normal or t.
The comparison of two sample variances is made by forming the variance ratio:
S12
S22
If the population variances are equal then this follows Fishers Fdistribution, provided that
the two populations being sampled follow normal distributions.
The F distribution is tabulated in section 15.1. Table 6 is for onetailed tests and onesided
CIs while table 7 is for twotailed tests and twosided CIs.
106
Values of Fcrit are given for significance levels of 5% and 1% and for degrees of freedom
1 = n1 1 and 2 = n2 1, where n1 , n2 are the sizes of the samples with sample variances
S12 , S22 respectively. These critical values are used in tests and CIs for both one and two
sample cases.
11.3
S12
where S12 > S22
2
S2
Note that here the suffix 1 is used for the larger of the two values of S 2 and the suffix 2
for the smaller, so that the tabulated values of Fcrit are always greater than unity (since
the choice of numerator and denominator is clearly arbitrary, we might as well choose the
larger sample variance to be the numerator and hence simplify the tables). Compare Fobs
with the value Fcrit selected from the Fdistribution on table 7 with degrees of freedom
Fobs =
1 = n1 1
2 = n2 1
2. If Fobs > Fcrit then there is evidence that the two population variances are different;
otherwise there is insufficient evidence to conclude that there is any difference in variance
between the two populations.
Note: the rule of thumb for deciding whether you can pool the variances for a twosample
t test is just an ad hoc version of this test.
Example
For the cutting machines in section 11.1, samples are taken from machines A and B giving
nA = 15 , SA = 0.63 cm , nB = 25 , SB = 0.86 cm
Assuming lengths to be normally distributed, is there any evidence of a difference in consistency between the two machines?
We test
H0 : A2 = B2 v H1 : A2 6= B2
and since SB > SA the test statistic is
Fobs =
SB2
0.862
= 1.86
=
SA2
0.632
107
S12
S22
and compare to Fcrit selected from the Fdistribution on table 6 with degrees of freedom
1 = n1 1
2 = n2 1
Note that H1 is such that we only reject H0 if Fobs is much bigger than 1, i.e. if it is in the
righthand tail of the distribution. Hence, similarly to a onetailed test of means in section
8.3, if Fobs < 1 then we need go no further since H0 will definitely not be rejected.
Example (continued)
If the question had been is there any evidence that machine B is less consistent than
machine A then the hypotheses become
H0 : B2 A2 v H1 : B2 > A2
Here H1 is that B has the larger population variance, so the test statistic is B over A:
Fobs =
0.862
SB2
=
= 1.86
SA2
0.632
with 1 = 24, 2 = 14 again, but from table 6 the critical values are
5% : 2.35 , 1% : 3.43
Hence again Fobs < Fcrit so there is no evidence against H0 .
Example (continued)
If the question had been is there any evidence that machine B is more consistent than
machine A then the hypotheses become
H0 : B2 A2 v H1 : B2 < A2
108
This time H1 is that A has the larger population variance, so the test statistic is A over B:
Fobs =
0.632
SA2
=
= 0.54
SB2
0.862
Since Fobs < 1 we can see that it is in the wrong tail of the distribution. Hence there is no
evidence against H0 .
11.4
A sample of size n, assumed to come from a normally distributed population, gives a sample
variance S 2 . Could it reasonably have been taken from a population with designated variance
02 ?
H0 : 2 = 02 v 2 6= 02
This uses exactly the same procedure as in the twosample case above. The hypothesised
variance 02 is treated exactly like the sample variances in the twosample case, but with
degrees of freedom.
Unlike the case for means, where we may want them to be either large, within a certain
range, or small, we nearly always want variances to be small. If a process currently has
variance 02 and we make a change to try to improve the consistency, that is decrease the
variance, we would test
H0 : 2 02 v 2 < 02
However, if we were changing the process for some other reason (to change the mean or
reduce the cost) but wanted to check whether this had reduced the consistency, i.e. increased
the variance, we would test
H0 : 2 02 v 2 > 02
The differences in performing a onetailed test are exactly as in the twosample case.
Example
The weights of shells from a production line are normally distributed with variance 10g2 .
The machinery is upgraded and it is hoped that this will reduce the variance. A random
sample of 25 shells is taken from the production line, giving a sample variance of 5g2 . Test
at the 5% level whether there is any evidence of a decrease in variance.
Here we are looking for a decrease in variance, so the hypotheses are
H0 : 2 10 v 2 < 10
Then
Fobs =
10
02
=
=2
2
S
5
11.5
Given a sample of size n with sample variance S 2 , find, at some specified confidence level,
confidence limits for the population variance 2 . This is only valid if the population can be
assumed to be normally distributed.
Procedure
Confidence intervals for variances, unlike those for means, are asymmetrical, and usually
extend much further above the observed S 2 than below it. A confidence interval for the
standard deviation can be obtained by taking square roots in the obvious way.
The upper confidence limit U2 is given by
U2 = FU S 2
where FU is read from the Fdistribution (table 7) at the appropriate confidence level, using
degrees of freedom 1 = , 2 = n 1.
The lower confidence limit L2 is given by
L2 =
S2
FL
where FL is read from the same table and confidence level but using reversed degrees of
freedom 1 = n 1, 2 = .
With variances onesided CIs are quite commonly of interest, because we are often only
interested in an upper limit. In this case only one calculation is needed, using table 6.
Example
A sample of 13 propellant grains gave an estimated standard deviation for burning time of
0.0137 seconds. Find the upper onesided 95% confidence limit for the variance and hence
for the standard deviation of propellant grains of this type (in other words, how bad could
they be as regards consistency?).
Here n = 13, S 2 = 0.01372 . From Fdistribution table at 5% level (one tailed);
1 = , 2 = 12 : FU = 2.30
Hence, onesided 95% upper limit on variance U2 is given by
U2 = 2.30 0.01372 = 0.000431687
Hence that for the standard deviation is 0.000431687 = 0.0208, so that the onesided CI
for the standard deviation is
(0 , 0.0208) seconds
Hence we are 95% confident that the population standard deviation is no larger than 0.0208
seconds.
110
11.6
Further Examples
Example 1
According to specification, the consistency in range for the 81mm UK Mortar, given by the
standard deviation, is 40.47 metres at twothirds of maximum range. Consistency is defined
to be the distribution of the fall of shot about the mean point of impact (mpi).
It is thought that the range consistency of a particular mortar of this type may have degraded
as a consequence of excessive wear, and to test this, a sample of 20 rounds were fired from
the mortar after beddingin and adjustment.
The range coordinates (in metres) of the sample were:
3766
3620
3785
3595
3670
3685
3700
3665
3598
3710
3617
3800
3720
3648
3750
3650
3620
3765
3633
3773
Does this sample provide evidence at the 5% level to suggest that there has been a degradation in the range consistency of the weapon? From past data we can assume that the fall
of shot follows a normal distribution.
If a degradation in range consistency is shown to have occurred, it has been decided that,
rather than scrap the weapon, it will continue in service with a revised figure for the consistency standard deviation at twothirds of maximum range. Calculate the 99% upper
confidence limit for the range consistency standard deviation, based on the sample data
given above.
Solution
We wish to test
H0 : 2 40.472
H1 : 2 > 40.472
This is onetailed since we are only interested in detecting evidence of an increase in the
variance.
From the data, S = 66.75 and so
Fobs =
66.752
S2
= 2.72
=
02
40.472
From table 6, for a onetailed test at the 5% level with numerator d.f. 1 = 20 1 = 19 and
denominator d.f. 2 = , the critical value is Fcrit = 1.59 (by interpolation).
Since Fobs > 1.59, there is sufficient evidence to reject H0 in favour of H1 , and we conclude
that there has been a degradation in the range consistency of the mortar, probably as a
consequence of age and wear.
The 99% upper confidence limit U2 is given by,
U2 = 2.49 (66.75)2 = 11094.35
111
SyM = 18.8033m
SxM = 9.004m
SyD = 21.0602m
SxD = 7.8863m
nM = 10
nM = 10
nD = 9
nD = 9
21.06022
= 1.2545
18.80332
From the tables of the Fdistribution, for a twotailed test at the 5% level of significance
with 8 and 9 degrees of freedom, Fcrit = 4.10. Since Fobs < 4.10 we conclude that there is
no evidence of a difference between the range consistency of the two base types.
Fobs =
112
9.0042
= 1.3035
7.88632
From the tables of the Fdistribution, for a twotailed test at the 5% level of significance
with 9 and 8 degrees of freedom Fcrit = 4.36. Since Fobs < 4.36 we conclude that there is
no evidence of a difference between the line consistency of the two base types.
Fobs =
Example 3
Two surveyors make repeated measurements of the same angle. Their results only differ in
the figure of seconds of angle, the values of which are:
A 40
B 38
29 37 41 38 35
37 40 33 32 39
40 34
Do these results give evidence of a difference between A and B in terms of consistency? You
can assume that the measurements are normally distributed.
Solution
Hence we have
H0 : A2 = B2 v A2 6= B2
For Surveyor A, SA2 = 18.7; for Surveyor B, SB2 = 10.3.
Hence using suffix 1 for the larger S 2 :
S12 = 18.7, n1 = 6, 1 = 5
and suffix 2 for the smaller S 2 :
S22 = 10.3, n2 = 8, 2 = 7
Therefore
18.7
= 1.8
10.3
From table 7 (twotailed) with 1 = 5, 2 = 7, for 5%, we have Fcrit = 5.29. Since Fobs does
not exceed Fcrit we conclude that there is no evidence of a difference between A and B.
Fobs =
As an aside, if the question had been is there any evidence that B is more consistent than
A then we would have used the hypotheses
H0 : A2 B2 v A2 > B2
In this case we specifically look for evidence that B is more consistent, and hence has a
lower variance. The onetailed value from table 6 is Fcrit = 3.97, but Fobs is still lower than
this so again we have no evidence against H0 .
113
Example 4
A production line makes components which should be 6.30 cm long, so we require that
= 6.30. Additionally, it is required that at least 90% of components are between 6.27 and
6.33 cm long for the process to be in control.
If the lengths can be assumed to be normally distributed, and a random sample of 20 items
gives a standard deviation of S = 0.026, is there any evidence that the process is out of
control ?
Solution
From table 2, 90% of items are within 1.64 standard deviations of the mean, so the requirement to be in control is
1.64 0.03
0.03
= 0.018
1.64
S2
0.0262
= 2.086
=
02
0.0182
114
12
12.1
Suppose r successes are observed in n repeated independent trials (the term success being
used to describe the attribute of interest). If the sample of size n may be regarded as being
drawn from a population having proportion p of successes then p = r/n gives the best
estimate of p. The parameter p may of course equally well be interpreted as the probability
of a success in an individual trial.
We can use the estimate p to produce a CI for the true success probability p, or to test a
hypothesized value p0 .
12.2
For large n we can use the normal approximation to do this. If X Bi(n, p) then, from
section 6.6
X N(np, np(1 p))
Viewed as a random variable, p is just X/n so that, by the results in section 6.4, its variance
is 1/n2 times that of X. Hence
X
p(1 p)
p =
N p,
n
n
p(1 p)
n
Example
If a medical treatment has a 70% success rate in a trial, calculate a 95% confidence interval
for p for sample sizes of 50, 100 and 1000.
Since p = 0.7 in each case, the CI is
0.7 1.96
0.7 0.3
n
Note that n = 50 is the smallest sample for which this approximation is valid.
Tests
Similarly a test of H0 : p = p0 would use the test statistic
p p0
zobs = q
p0 (1p0 )
n
and compare to normal tables in the usual way. (Note that this uses p0 in the variance,
since the test statistic is calculated under the assumption that H0 is true).
12.3
If the normal approximation is not appropriate then we calculate the pvalue directly using
the binomial distribution. (See next section).
12.4
Examples
Example 1
A new type of explosive charge is being tested, but they are so expensive that only six such
charges are available for test detonations. In a test under extreme conditions, five destroy
their target and one does not. We will only consider the charge for future use if the chance
of target destruction is greater than 0.4. Test at the 5% level.
In this case the burden of proof must be with the manufacturers of the charges, who need
to convince us that their product conforms to specification. Hence the hypotheses are
H0 : p 0.4
H1 : p > 0.4
The sample estimate is 5 successes out of 6, so clearly the normal approximation is not
appropriate.
Assuming p = 0.4, the chance of obtaining 5 or more successes is given by the binomial
distribution with n = 6, p = 0.4, q = 0.6
chance of 6 successes :
chance of 5 successes :
p6 = 0.0041
np5 q = 0.0369
If instead we had had n = 100 charges with r = 50 successes, we could work out the
probability directly as above, but it is much easier to use the normal approximation as in
section 12.2. Since p = 0.5 and p0 = 0.4, the test statistic is
0.5 0.4
= 2.04124
zobs = q
0.40.6
100
Comparing to critical values from normal tables, zobs > 1.64 so that this is significant at the
5% level (onetailed) and hence there is fairly strong evidence against H0 .
Example 2
We return to example 2 in section 4.5. A sample of n = 1000 gave an estimate of p = 0.732.
We can construct a 95% confidence interval as
s
0.732 0.268
1000
= (0.70455 , 0.75945)
0.732 1.96
The factory claimed that at least 72% of its production works correctly, but the CI contains
values below 0.72, suggesting that one would not be entirely convinced by this claim.
You might argue that a better approach here would be to test H0 : p 0.72 v H1 : p < 0.72
but this seems rather dubious since the claim that p > 0.72 is simply an assertion, rather
than something which it makes sense to believe in the absence of evidence to the contrary.
Hence it is not really a valid choice for H0 . A more sensible test would use H0 : p 0.72
instead, placing the burden of proof with the company, as in the previous example.
A common approach in process control is that for a new product the null hypothesis is that
the product does not conform to the required specifications, whereas for a wellestablished
product the null hypothesis is that new items from the production line do still conform to
the required specifications.
117
13
13.1
(Oc Ec )2
(O1 E1 )2
+ ...+
E1
Ec
13.2
The 2 Distribution
If H0 is true then the calculated X 2 is an observation from the 2 distribution with degrees
of freedom
= c  number of times sample data were used in calculating the Ei values
= c  1  number of parameters estimated
Values of degrees of freedom for various typical problems will be found in the examples.
This is only an approximate application of the 2 distribution. The approximation improves
as predicted frequencies become larger, and it becomes unsatisfactory for values of Ei less
than about 5. If this occurs, sometimes such small frequency classes can be combined with
neighbouring ones until a sufficiently high value is reached. If this cannot be done, an exact
distribution, requiring special tables, must be used for the test of goodness of fit.
Rule of thumb: the approximation is reasonable provided all Ei are 1 and 80% of them
are 5. (Some people argue for the more stringent criterion of all Ei 5).
These tests are best illustrated by examples.
13.3
119
(b)
(c)
(d)
0
1
2
3
4
5 or more
309
142
40
8}
1}
0}
500
303.265
151.633
37.908
0.108
0.612
0.115
7.200
0.450
500
1.285
13.4
A sample of 10000 persons is examined for lung cancer and it is found that 100 of these
have the disease. Each individual is also classified as a smoker or nonsmoker. The results
are tabulated below:
(Observed)
Smokers
Nonsmokers
Total
Cancer
90
10
100
OK
6910
2990
9900
Total
7000
3000
10000
Do these figures indicate that there is any association between smoking and lung cancer?
Now, if the factors are not associated then we can calculate expected frequencies E using
P (A and B) = P (A) P (B)
where A and B are the row and column category. Hence
P (Smoker AND Cancer) =
120
100
7000
10000 10000
7000 100
= 70
10000
7000 9900
= 6930
10000
(Expected)
Smokers
Nonsmokers
Total
Then
Cancer
70
30
100
OK
6930
2970
9900
Total
7000
3000
10000
202
202
202 202
+
+
+
= 19.24
X =
70
30
6930 2970
2
From the formula for degrees of freedom in section 13.2, if we have r rows and c columns
then
= rc 1 (r 1) (c 1) = (r 1)(c 1)
Hence for tests of this type
degrees of freedom = (no. of rows 1) (no. of columns 1)
Here
degrees of freedom = (2 1) (2 1) = 1
From table 5 with = 1
0.1% critical value
1% critical value
5% critical value
10.830
6.635
3.841
Observed X 2 is significant even at 0.1% level. Hence reject H0  there is strong evidence of
an association. More smokers have cancer than would be expected under a hypothesis of no
association.
Note: For 22 tables like this there is in fact a slight modification to the standard procedure
which improves the approximation.
121
13.5
Further Examples
Example
We return to example 2 in section 4.5. The observed and expected frequencies calculated
there can be compared using the method in section 13.3. We need to combine the 05
categories into one for the 2 approximation to be valid, and hence obtain
(8 10.09)2 (12 16.67)2
(0 4.42)2
+
+ ...+
10.09
16.67
4.42
= 9.3805
X2 =
There are 6 categories and an estimate of p was used to calculate the expected frequencies,
so the 2 has 6 1 1 = 4 degrees of freedom. Hence if
H0 : data come from a binomial distribution
is correct then 9.3805 is an observation from a 2 distribution with 4 degrees of freedom.
From table 5 the 5% critical value is 9.488 so that we almost, but not quite, reject H0 .
Hence we would be fairly equivocal about whether the data really do follow a binomial
distribution.
122
14
14.1
Introduction
In many cases we wish to know whether two variables (e.g. height and weight) are related
to each other, based on a sample of data
(xi , yi) i = 1, 2, 3, . . . , n
In other words there are n individuals giving n pairs of observations.
There are many ways to assess this, depending on the types of variables and the type of
relationship. Here we introduce the two simplest ways.
14.2
Correlation
This section deals only with the Pearson (product moment) correlation coefficient. Other
correlation coefficients (Spearman, Kendall) also exist.
The sample (Pearson) correlation coefficient rxy (or just r), is given by
P
rxy = qP
(xi x)(yi y)
(xi x)
P
2
(yi y)2
Sxy
= q
Sxx Syy
(see AMOR formula book). This is standardised so that it is always between 1 (perfect negative relationship) and +1 (perfect positive relationship). A value close to zero corresponds
to no (apparent) relationship.
A positive correlation indicates that large values of y tend to go with large values of x and
small values of y with small values of x. Similarly a negative correlation indicates that large
values of y tend to go with small values of x and small values of y with large values of x.
Note that the correlation coefficient only assesses a linear relationship, and can be meaningless if the variables are related nonlinearly. For example, if the relationship is strong but
quadratic then the correlation coefficient will be close to zero.
Testing the significance of the correlation coefficient
The population correlation coefficient is usually denoted , and it is natural to use the
sample correlation coefficient r to test whether there is any evidence of a (linear) relationship
between the variables. This translates into the hypotheses
H0 : = 0 v H1 : 6= 0
Table 4 gives critical value for 5% and 1% tests. For the degrees of freedom = n 2, find
the critical value rcrit form the table and reject H0 if
r > rcrit
Obviously if r > rcrit we conclude that there is evidence of a positive correlation and if
r < rcrit we conclude that there is evidence of a negative correlation.
123
Example
To determine the relationship between normal stress and the shear resistance of a substance,
a shear box experiment was performed, giving the following results:
Normal stress x (psi)
Shear resistance y (psi)
11
13
15
17
19
21
15.2 17.7 19.3 21.5 23.9 25.4
xi = 96 ,
yi = 123.0 ,
xi yi = 2039.8 ,
x2i = 1606 ,
yi2 = 2595.44
so that
x = 16.0 , y = 20.5
and hence
2039.8 6 16.0 20.5
= 0.998
rxy = q
(1606 6 16.02 )(2595.44 6 20.52)
The critical value for a 1% test with = 6 2 = 4 is rcrit = 0.92 so that r > rcrit and so
there is strong evidence of a (positive) correlation between x and y, even from such a small
sample.
Important cautionary note
However, it is important to realise that correlation is not the same as causation. Just because
y and x are correlated, it does not mean that x causes y, or vice versa. It is often the
case that an apparent relationship is the result of y and x both being caused by another
variable which we have not measured.
To take a deliberately silly example, we might measure many behavioural variables on a
sample of people and calculate the correlations between them. If two of these variables are
frequency of playing basketball and frequency of hitting your head on door frames then
we would probably find a positive correlation between them. We might then conclude either
that hitting your head makes you want to play basketball or that playing basketball makes
you careless so that you keep hitting your head. Of course, tall people are much more likely
both to play basketball and to hit their heads, so that it is likely to be this third variable
which induces the correlation between the other two.
When the correlation between smoking and lung cancer was first noticed the cigarette companies made this point, that correlation is not the same as causation. However, in that case
the medical evidence demonstrating a causal link was soon found.
124
14.3
Regression
This tries to expand on the ideas above, not just measuring whether there is a relationship,
but trying to model it explicitly.
In general we have a response variable y and an explanatory variable x, and we try to
model the relationship between them. However, we need to allow for the fact that there will
be random variation, so that two observations of y at the same value of x will not produce
the same results.
Fitting a simple linear regression line
The usual way to determine the underlying trend in a set of noisy data (xi , yi) is to draw
the best line through the experimental points. If one suspects a linear dependency of y on
x then one would try to draw the best straight line.
y = + x
through the points and hence calculate the values of the slope and intercept which
characterise this line.
Unfortunately, the best line for one person is not necessarily the best line for another
unless both adopt the same convention for defining best. In general in datafitting the
best fit is to choose the line, the regression line, which minimises the sum of the squares
of the vertical distances from the observed points to the fitted line.
We now draw the distinction between and , which are population parameters defining
(yi ymodel )2
i.
where, for a hypothesised straight line relationship, ymodel =
+ x
The expressions for these estimates in terms of the n experimental points (xi , yi) are given
by
xi yi n
xy Sxy
=
2
x2
xi n
Sxx
= y x
=
where the summations in each case are over all n observations (see formula book).
The fitted line
y =
+ x
represents our estimate of the mean value of y at any given value of x.
Note that these formulae will fit the best straight line even if the relationship is not a
straight line at all, so it is important to check this. If the dependency of y on x is thought
to be a quadratic function then we can instead fit
ymodel = + x + x2
125
In this case minimisation of the sum of the squares of the distances from the observed
points to the line again defines estimates of the three parameters , , in terms of the set
(xi , yi) i = 1, 2, . . . , n. These expressions are somewhat more complicated than in the case
of a straight line model.
Example (continued)
In this test the value of x is specified by the experimenter, while y can be viewed as a
response to x. We assume there is a relationship of the form
y = + x
between these variables. (Normally there is not necessarily any physical interpretation of
the intercept and slope terms but here we can say that is the cohesion of the substance
and = tan where is the angle of friction).
Now,
2039.8 6 16.0 20.5
=
= 1.02571
1606 6 16.02
Therefore, for any given stress value x, our estimate of the mean shear resistance is given
by the above expression.
Confidence limits
Since the values of
and are only estimates based on a sample of all possible measurements
which could have been made, the regression line based on these estimates is itself only an
estimate of the mean value of y at any value of x, and not necessarily the true value. The
question therefore arises: what confidence can we have in our estimate of the true equation
of the line?
We have already seen the variance of an estimator of a mean enables us to form confidence
limits for the mean of a population. In a similar way we can define mathematically the
variances of the estimators ,
and y, and thus place confidence intervals on these quantities.
The actual formulae for these variances and confidence intervals are rather involved and are
given in the formula book. They are based on the assumption that the variation in y values
for any particular value of x can be described by a normal distribution with mean zero and
variance 2 .
Example (continued)
The formulae in the book give the following estimates of the variances of y,
and :
y : 0.073429;
: 0.280778;
: 0.001049
Then 95% confidence intervals for and can be created using the t distribution critical
values with n 2 degrees of freedom ( = 4 and tcrit = 2.78 in this case)
: (2.6155, 5.5617)
: (0.9357, 1.1158)
126
Note that if the confidence interval for includes the value = 0 then we would conclude
that there was no evidence that we should include this second parameter in our model
y = + x. Consequently this could be reduced to y = , or in other words y does not
depend on x.
Confidence limits for the mean value of y at a specified value x0 of x are given by
1.0257x0
v
u
u
+ 4.0886 2.78 t0.073429
1 (x0 16)2
+
6
70
We can similarly define a tolerance interval which is in effect confidence limits for a single
new observation at a specified value x0 of x:
1.0257x0
v
u
u
+ 4.0886 2.78 t0.073429
1 (x0 16)2
1+ +
6
70
Often this is the main thing we want to know, i.e. if we take a new observation of y at
x = x0 , in what range of values do we think it will fall?
127
15
15.1
Reference Material
Tables
15.2
Tutorial exercises
128
TABLE 1
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.0
0.1
0.2
0.3
0.4
0.5
0.0000
0.0398
0.0793
0.1179
0.1554
0.1915
0.0040
0.0438
0.0832
0.1217
0.1591
0.1950
0.0080
0.0478
0.0871
0.1255
0.1628
0.1985
0.0120
0.0517
0.0910
0.1293
0.1664
0.2019
0.0160
0.0557
0.0948
0.1331
0.1700
0.2054
0.0199
0.0596
0.0987
0.1368
0.1736
0.2088
0.0239
0.0636
0.1026
0.1406
0.1772
0.2123
0.0279
0.0675
0.1064
0.1443
0.1808
0.2157
0.0319
0.0714
0.1103
0.1480
0.1844
0.2190
0.0359
0.0753
0.1141
0.1517
0.1879
0.2224
0.6
0.7
0.8
0.9
1.0
0.2257
0.2580
0.2881
0.3159
0.3413
0.2291
0.2611
0.2910
0.3186
0.3438
0.2324
0.2642
0.2939
0.3212
0.3461
0.2357
0.2673
0.2967
0.3238
0.3485
0.2389
0.2704
0.2995
0.3264
0.3508
0.2422
0.2734
0.3023
0.3289
0.3531
0.2454
0.2764
0.3051
0.3315
0.3554
0.2486
0.2794
0.3078
0.3340
0.3577
0.2517
0.2823
0.3106
0.3365
0.3599
0.2549
0.2852
0.3133
0.3389
0.3621
1.1
1.2
1.3
1.4
1.5
0.3643
0.3849
0.4032
0.4192
0.4332
0.3665
0.3869
0.4049
0.4207
0.4345
0.3686
0.3888
0.4066
0.4222
0.4357
0.3708
0.3907
0.4082
0.4236
0.4370
0.3729
0.3925
0.4099
0.4251
0.4382
0.3749
0.3944
0.4115
0.4265
0.4394
0.3770
0.3962
0.4131
0.4279
0.4406
0.3790
0.3980
0.4147
0.4292
0.4418
0.3810
0.3997
0.4162
0.4306
0.4429
0.3830
0.4015
0.4177
0.4319
0.4441
1.6
1.7
1.8
1.9
2.0
0.4452
0.4554
0.4641
0.4713
0.4772
0.4463
0.4564
0.4649
0.4719
0.4778
0.4474
0.4573
0.4656
0.4726
0.4783
0.4484
0.4582
0.4664
0.4732
0.4788
0.4495
0.4591
0.4671
0.4738
0.4793
0.4505
0.4599
0.4678
0.4744
0.4798
0.4515
0.4608
0.4686
0.4750
0.4803
0.4525
0.4616
0.4693
0.4756
0.4808
0.4535
0.4625
0.4699
0.4761
0.4812
0.4545
0.4633
0.4706
0.4767
0.4817
2.1
2.2
2.3
2.4
2.5
0.4821
0.4861
0.4893
0.4918
0.4938
0.4826
0.4864
0.4896
0.4920
0.4940
0.4830
0.4868
0.4898
0.4922
0.4941
0.4834
0.4871
0.4901
0.4925
0.4943
0.4838
0.4875
0.4904
0.4927
0.4945
0.4842
0.4878
0.4906
0.4929
0.4946
0.4846
0.4881
0.4909
0.4931
0.4948
0.4850
0.4884
0.4911
0.4932
0.4949
0.4854
0.4887
0.4913
0.4934
0.4951
0.4857
0.4890
0.4916
0.4936
0.4952
2.6
2.7
2.8
2.9
3.0
0.4953
0.4965
0.4974
0.4981
0.4987
0.4955
0.4966
0.4975
0.4982
0.4987
0.4956
0.4967
0.4976
0.4982
0.4987
0.4957
0.4968
0.4977
0.4983
0.4988
0.4959
0.4969
0.4977
0.4984
0.4988
0.4960
0.4970
0.4978
0.4984
0.4989
0.4961
0.4971
0.4979
0.4985
0.4989
0.4962
0.4972
0.4979
0.4985
0.4989
0.4963
0.4973
0.4980
0.4986
0.4990
0.4964
0.4974
0.4981
0.4986
0.4990
3.1
3.2
3.3
3.4
0.4990
0.4993
0.4995
0.4997
0.4991
0.4993
0.4995
0.4997
0.4991
0.4994
0.4995
0.4997
0.4991
0.4994
0.4996
0.4997
0.4992
0.4994
0.4996
0.4997
0.4992
0.4994
0.4996
0.4997
0.4992
0.4994
0.4996
0.4997
0.4992
0.4995
0.4996
0.4997
0.4993
0.4995
0.4996
0.4997
0.4993
0.4995
0.4997
0.4998
TABLE 2
(%) =
zcrit =
10
0.5
0.2
0.1
(%) =
zcrit =
10
0.5
0.2
0.1
TABLE 3
One tail
Notes:
Two tails
5
2.5
1
0.5
0.1
0.05
10
5
2
1
0.2
0.1
6.31 12.71 31.82 63.66 318.29 636.58
2.92 4.30 6.96 9.92 22.33 31.60
2.35 3.18 4.54 5.84 10.21 12.92
2.13 2.78 3.75 4.60
7.17
8.61
2.02 2.57 3.36 4.03
5.89
6.87
6
7
8
9
10
1.94
1.89
1.86
1.83
1.81
2.45
2.36
2.31
2.26
2.23
3.14
3.00
2.90
2.82
2.76
3.71
3.50
3.36
3.25
3.17
5.21
4.79
4.50
4.30
4.14
5.96
5.41
5.04
4.78
4.59
12
15
20
24
30
1.78
1.75
1.72
1.71
1.70
2.18
2.13
2.09
2.06
2.04
2.68
2.60
2.53
2.49
2.46
3.05
2.95
2.85
2.80
2.75
3.93
3.73
3.55
3.47
3.39
4.32
4.07
3.85
3.75
3.65
40
50
70
100
1.68
1.68
1.67
1.66
1.64
2.02
2.01
1.99
1.98
1.96
2.42
2.40
2.38
2.36
2.33
2.70
2.68
2.65
2.63
2.58
3.31
3.26
3.21
3.17
3.09
3.55
3.50
3.43
3.39
3.29
TABLE 4
1
2
3
4
5
6
7
8
9
10
= 5%
1.00
0.95
0.88
0.81
0.75
0.71
0.67
0.63
0.60
0.58
= 1%
1.00
0.99
0.96
0.92
0.87
0.83
0.80
0.77
0.74
0.71
12
14
16
18
20
0.53
0.50
0.47
0.44
0.42
0.66
0.62
0.59
0.56
0.54
22
24
26
28
30
40
0.40
0.39
0.37
0.36
0.35
0.30
0.52
0.50
0.48
0.46
0.45
0.39
50
60
70
80
90
0.27
0.25
0.23
0.22
0.21
0.35
0.33
0.30
0.28
0.27
100
200
500
1000
0.20
0.14
0.09
0.06
0.25
0.18
0.12
0.08
10
2.5
0.1
=1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
50
60
70
80
90
100
2.706
4.605
6.251
7.779
9.236
10.64
12.02
13.36
14.68
15.99
17.28
18.55
19.81
21.06
22.31
23.54
24.77
25.99
27.20
28.41
29.62
30.81
32.01
33.20
34.38
35.56
36.74
37.92
39.09
40.26
51.81
63.17
74.40
85.53
96.58
107.6
118.5
3.841
5.991
7.815
9.488
11.07
12.59
14.07
15.51
16.92
18.31
19.68
21.03
22.36
23.68
25.00
26.30
27.59
28.87
30.14
31.41
32.67
33.92
35.17
36.42
37.65
38.89
40.11
41.34
42.56
43.77
55.76
67.50
79.08
90.53
101.9
113.1
124.3
5.024
7.378
9.348
11.14
12.83
14.45
16.01
17.53
19.02
20.48
21.92
23.34
24.74
26.12
27.49
28.85
30.19
31.53
32.85
34.17
35.48
36.78
38.08
39.36
40.65
41.92
43.19
44.46
45.72
46.98
59.34
71.42
83.30
95.02
106.6
118.1
129.6
6.635
9.210
11.35
13.28
15.09
16.81
18.48
20.09
21.67
23.21
24.72
26.22
27.69
29.14
30.58
32.00
33.41
34.81
36.19
37.57
38.93
40.29
41.64
42.98
44.31
45.64
46.96
48.28
49.59
50.89
63.69
76.15
88.38
100.4
112.3
124.1
135.8
10.83
13.82
16.27
18.47
20.52
22.46
24.32
26.12
27.88
29.59
31.26
32.91
34.53
36.12
37.70
39.25
40.79
42.31
43.82
45.31
46.80
48.27
49.73
51.18
52.62
54.05
55.48
56.89
58.30
59.70
73.40
86.66
99.61
112.3
124.8
137.2
149.4
10
12
15
20
24
5%
1%
161.4
4052
199.5
5000
215.7
5403
224.6
5625
230.2
5764
234.0
5859
236.8
5928
238.9
5982
240.5
6022
241.9
6056
243.9
6106
246.0
6157
248.0
6209
249.1
6235
254.3
6366
5%
1%
18.51
98.50
19.00
99.00
19.16
99.17
19.25
99.25
19.30
99.30
19.33
99.33
19.35
99.36
19.37
99.37
19.38
99.39
19.40
99.40
19.41
99.42
19.43
99.43
19.45
99.45
19.45
99.46
19.50
99.50
5%
1%
10.13
34.12
9.55
30.82
9.28
29.46
9.12
28.71
9.01
28.24
8.94
27.91
8.89
27.67
8.85
27.49
8.81
27.35
8.79
27.23
8.74
27.05
8.70
26.87
8.66
26.69
8.64
26.60
8.53
26.13
5%
1%
7.71
21.20
6.94
18.00
6.59
16.69
6.39
15.98
6.26
15.52
6.16
15.21
6.09
14.98
6.04
14.80
6.00
14.66
5.96
14.55
5.91
14.37
5.86
14.20
5.80
14.02
5.77
13.93
5.63
13.46
5%
1%
6.61
16.26
5.79
13.27
5.41
12.06
5.19
11.39
5.05
10.97
4.95
10.67
4.88
10.46
4.82
10.29
4.77
10.16
4.74
10.05
4.68
9.89
4.62
9.72
4.56
9.55
4.53
9.47
4.36
9.02
5%
1%
5.99
13.75
5.14
10.92
4.76
9.78
4.53
9.15
4.39
8.75
4.28
8.47
4.21
8.26
4.15
8.10
4.10
7.98
4.06
7.87
4.00
7.72
3.94
7.56
3.87
7.40
3.84
7.31
3.67
6.88
5%
1%
5.59
12.25
4.74
9.55
4.35
8.45
4.12
7.85
3.97
7.46
3.87
7.19
3.79
6.99
3.73
6.84
3.68
6.72
3.64
6.62
3.57
6.47
3.51
6.31
3.44
6.16
3.41
6.07
3.23
5.65
5%
1%
5.32
11.26
4.46
8.65
4.07
7.59
3.84
7.01
3.69
6.63
3.58
6.37
3.50
6.18
3.44
6.03
3.39
5.91
3.35
5.81
3.28
5.67
3.22
5.52
3.15
5.36
3.12
5.28
2.93
4.86
5%
1%
5.12
10.56
4.26
8.02
3.86
6.99
3.63
6.42
3.48
6.06
3.37
5.80
3.29
5.61
3.23
5.47
3.18
5.35
3.14
5.26
3.07
5.11
3.01
4.96
2.94
4.81
2.90
4.73
2.71
4.31
10
5%
1%
4.96
10.04
4.10
7.56
3.71
6.55
3.48
5.99
3.33
5.64
3.22
5.39
3.14
5.20
3.07
5.06
3.02
4.94
2.98
4.85
2.91
4.71
2.85
4.56
2.77
4.41
2.74
4.33
2.54
3.91
11
5%
1%
4.84
9.65
3.98
7.21
3.59
6.22
3.36
5.67
3.20
5.32
3.09
5.07
3.01
4.89
2.95
4.74
2.90
4.63
2.85
4.54
2.79
4.40
2.72
4.25
2.65
4.10
2.61
4.02
2.40
3.60
12
5%
1%
4.75
9.33
3.89
6.93
3.49
5.95
3.26
5.41
3.11
5.06
3.00
4.82
2.91
4.64
2.85
4.50
2.80
4.39
2.75
4.30
2.69
4.16
2.62
4.01
2.54
3.86
2.51
3.78
2.30
3.36
13
5%
1%
4.67
9.07
3.81
6.70
3.41
5.74
3.18
5.21
3.03
4.86
2.92
4.62
2.83
4.44
2.77
4.30
2.71
4.19
2.67
4.10
2.60
3.96
2.53
3.82
2.46
3.66
2.42
3.59
2.21
3.17
14
5%
1%
4.60
8.86
3.74
6.51
3.34
5.56
3.11
5.04
2.96
4.69
2.85
4.46
2.76
4.28
2.70
4.14
2.65
4.03
2.60
3.94
2.53
3.80
2.46
3.66
2.39
3.51
2.35
3.43
2.13
3.00
15
5%
1%
4.54
8.68
3.68
6.36
3.29
5.42
3.06
4.89
2.90
4.56
2.79
4.32
2.71
4.14
2.64
4.00
2.59
3.89
2.54
3.80
2.48
3.67
2.40
3.52
2.33
3.37
2.29
3.29
2.07
2.87
16
5%
1%
4.49
8.53
3.63
6.23
3.24
5.29
3.01
4.77
2.85
4.44
2.74
4.20
2.66
4.03
2.59
3.89
2.54
3.78
2.49
3.69
2.42
3.55
2.35
3.41
2.28
3.26
2.24
3.18
2.01
2.75
17
5%
1%
4.45
8.40
3.59
6.11
3.20
5.18
2.96
4.67
2.81
4.34
2.70
4.10
2.61
3.93
2.55
3.79
2.49
3.68
2.45
3.59
2.38
3.46
2.31
3.31
2.23
3.16
2.19
3.08
1.96
2.65
18
5%
1%
4.41
8.29
3.55
6.01
3.16
5.09
2.93
4.58
2.77
4.25
2.66
4.01
2.58
3.84
2.51
3.71
2.46
3.60
2.41
3.51
2.34
3.37
2.27
3.23
2.19
3.08
2.15
3.00
1.92
2.57
19
5%
1%
4.38
8.18
3.52
5.93
3.13
5.01
2.90
4.50
2.74
4.17
2.63
3.94
2.54
3.77
2.48
3.63
2.42
3.52
2.38
3.43
2.31
3.30
2.23
3.15
2.16
3.00
2.11
2.92
1.88
2.49
20
5%
1%
4.35
8.10
3.49
5.85
3.10
4.94
2.87
4.43
2.71
4.10
2.60
3.87
2.51
3.70
2.45
3.56
2.39
3.46
2.35
3.37
2.28
3.23
2.20
3.09
2.12
2.94
2.08
2.86
1.84
2.42
21
5%
1%
4.32
8.02
3.47
5.78
3.07
4.87
2.84
4.37
2.68
4.04
2.57
3.81
2.49
3.64
2.42
3.51
2.37
3.40
2.32
3.31
2.25
3.17
2.18
3.03
2.10
2.88
2.05
2.80
1.81
2.36
22
5%
1%
4.30
7.95
3.44
5.72
3.05
4.82
2.82
4.31
2.66
3.99
2.55
3.76
2.46
3.59
2.40
3.45
2.34
3.35
2.30
3.26
2.23
3.12
2.15
2.98
2.07
2.83
2.03
2.75
1.78
2.31
23
5%
1%
4.28
7.88
3.42
5.66
3.03
4.76
2.80
4.26
2.64
3.94
2.53
3.71
2.44
3.54
2.37
3.41
2.32
3.30
2.27
3.21
2.20
3.07
2.13
2.93
2.05
2.78
2.01
2.70
1.76
2.26
24
5%
1%
4.26
7.82
3.40
5.61
3.01
4.72
2.78
4.22
2.62
3.90
2.51
3.67
2.42
3.50
2.36
3.36
2.30
3.26
2.25
3.17
2.18
3.03
2.11
2.89
2.03
2.74
1.98
2.66
1.73
2.21
5%
1%
3.84
6.63
3.00
4.61
2.60
3.78
2.37
3.32
2.21
3.02
2.10
2.80
2.01
2.64
1.94
2.51
1.88
2.41
1.83
2.32
1.75
2.18
1.67
2.04
1.57
1.88
1.52
1.79
1.00
1.00
10
12
15
20
24
5%
1%
647.8
16211
799.5
20000
864.2
21615
899.6
22500
921.8
23056
937.1
23437
948.2
23715
956.7
23925
963.3
24091
968.6
24224
976.7
24426
984.9
24630
993.1
24836
997.2
24940
1018
25465
5%
1%
38.51
198.5
39.00
199.0
39.17
199.2
39.25
199.2
39.30
199.3
39.33
199.3
39.36
199.4
39.37
199.4
39.39
199.4
39.40
199.4
39.41
199.4
39.43
199.4
39.45
199.4
39.46
199.5
39.50
199.5
5%
1%
17.44
55.55
16.04
49.80
15.44
47.47
15.10
46.19
14.88
45.39
14.73
44.84
14.62
44.43
14.54
44.13
14.47
43.88
14.42
43.69
14.34
43.39
14.25
43.08
14.17
42.78
14.12
42.62
13.90
41.83
5%
1%
12.22
31.33
10.65
26.28
9.98
24.26
9.60
23.15
9.36
22.46
9.20
21.97
9.07
21.62
8.98
21.35
8.90
21.14
8.84
20.97
8.75
20.70
8.66
20.44
8.56
20.17
8.51
20.03
8.26
19.32
5%
1%
10.01
22.78
8.43
18.31
7.76
16.53
7.39
15.56
7.15
14.94
6.98
14.51
6.85
14.20
6.76
13.96
6.68
13.77
6.62
13.62
6.52
13.38
6.43
13.15
6.33
12.90
6.28
12.78
6.02
12.14
5%
1%
8.81
18.63
7.26
14.54
6.60
12.92
6.23
12.03
5.99
11.46
5.82
11.07
5.70
10.79
5.60
10.57
5.52
10.39
5.46
10.25
5.37
10.03
5.27
9.81
5.17
9.59
5.12
9.47
4.85
8.88
5%
1%
8.07
16.24
6.54
12.40
5.89
10.88
5.52
10.05
5.29
9.52
5.12
9.16
4.99
8.89
4.90
8.68
4.82
8.51
4.76
8.38
4.67
8.18
4.57
7.97
4.47
7.75
4.41
7.64
4.14
7.08
5%
1%
7.57
14.69
6.06
11.04
5.42
9.60
5.05
8.81
4.82
8.30
4.65
7.95
4.53
7.69
4.43
7.50
4.36
7.34
4.30
7.21
4.20
7.01
4.10
6.81
4.00
6.61
3.95
6.50
3.67
5.95
5%
1%
7.21
13.61
5.71
10.11
5.08
8.72
4.72
7.96
4.48
7.47
4.32
7.13
4.20
6.88
4.10
6.69
4.03
6.54
3.96
6.42
3.87
6.23
3.77
6.03
3.67
5.83
3.61
5.73
3.33
5.19
10
5%
1%
6.94
12.83
5.46
9.43
4.83
8.08
4.47
7.34
4.24
6.87
4.07
6.54
3.95
6.30
3.85
6.12
3.78
5.97
3.72
5.85
3.62
5.66
3.52
5.47
3.42
5.27
3.37
5.17
3.08
4.64
11
5%
1%
6.72
12.23
5.26
8.91
4.63
7.60
4.28
6.88
4.04
6.42
3.88
6.10
3.76
5.86
3.66
5.68
3.59
5.54
3.53
5.42
3.43
5.24
3.33
5.05
3.23
4.86
3.17
4.76
2.88
4.23
12
5%
1%
6.55
11.75
5.10
8.51
4.47
7.23
4.12
6.52
3.89
6.07
3.73
5.76
3.61
5.52
3.51
5.35
3.44
5.20
3.37
5.09
3.28
4.91
3.18
4.72
3.07
4.53
3.02
4.43
2.72
3.90
13
5%
1%
6.41
11.37
4.97
8.19
4.35
6.93
4.00
6.23
3.77
5.79
3.60
5.48
3.48
5.25
3.39
5.08
3.31
4.94
3.25
4.82
3.15
4.64
3.05
4.46
2.95
4.27
2.89
4.17
2.60
3.65
14
5%
1%
6.30
11.06
4.86
7.92
4.24
6.68
3.89
6.00
3.66
5.56
3.50
5.26
3.38
5.03
3.29
4.86
3.21
4.72
3.15
4.60
3.05
4.43
2.95
4.25
2.84
4.06
2.79
3.96
2.49
3.44
15
5%
1%
6.20
10.80
4.77
7.70
4.15
6.48
3.80
5.80
3.58
5.37
3.41
5.07
3.29
4.85
3.20
4.67
3.12
4.54
3.06
4.42
2.96
4.25
2.86
4.07
2.76
3.88
2.70
3.79
2.40
3.26
16
5%
1%
6.12
10.58
4.69
7.51
4.08
6.30
3.73
5.64
3.50
5.21
3.34
4.91
3.22
4.69
3.12
4.52
3.05
4.38
2.99
4.27
2.89
4.10
2.79
3.92
2.68
3.73
2.63
3.64
2.32
3.11
17
5%
1%
6.04
10.38
4.62
7.35
4.01
6.16
3.66
5.50
3.44
5.07
3.28
4.78
3.16
4.56
3.06
4.39
2.98
4.25
2.92
4.14
2.82
3.97
2.72
3.79
2.62
3.61
2.56
3.51
2.25
2.98
18
5%
1%
5.98
10.22
4.56
7.21
3.95
6.03
3.61
5.37
3.38
4.96
3.22
4.66
3.10
4.44
3.01
4.28
2.93
4.14
2.87
4.03
2.77
3.86
2.67
3.68
2.56
3.50
2.50
3.40
2.19
2.87
19
5%
1%
5.92
10.07
4.51
7.09
3.90
5.92
3.56
5.27
3.33
4.85
3.17
4.56
3.05
4.34
2.96
4.18
2.88
4.04
2.82
3.93
2.72
3.76
2.62
3.59
2.51
3.40
2.45
3.31
2.13
2.78
20
5%
1%
5.87
9.94
4.46
6.99
3.86
5.82
3.51
5.17
3.29
4.76
3.13
4.47
3.01
4.26
2.91
4.09
2.84
3.96
2.77
3.85
2.68
3.68
2.57
3.50
2.46
3.32
2.41
3.22
2.09
2.69
21
5%
1%
5.83
9.83
4.42
6.89
3.82
5.73
3.48
5.09
3.25
4.68
3.09
4.39
2.97
4.18
2.87
4.01
2.80
3.88
2.73
3.77
2.64
3.60
2.53
3.43
2.42
3.24
2.37
3.15
2.04
2.61
22
5%
1%
5.79
9.73
4.38
6.81
3.78
5.65
3.44
5.02
3.22
4.61
3.05
4.32
2.93
4.11
2.84
3.94
2.76
3.81
2.70
3.70
2.60
3.54
2.50
3.36
2.39
3.18
2.33
3.08
2.00
2.55
23
5%
1%
5.75
9.63
4.35
6.73
3.75
5.58
3.41
4.95
3.18
4.54
3.02
4.26
2.90
4.05
2.81
3.88
2.73
3.75
2.67
3.64
2.57
3.47
2.47
3.30
2.36
3.12
2.30
3.02
1.97
2.48
24
5%
1%
5.72
9.55
4.32
6.66
3.72
5.52
3.38
4.89
3.15
4.49
2.99
4.20
2.87
3.99
2.78
3.83
2.70
3.69
2.64
3.59
2.54
3.42
2.44
3.25
2.33
3.06
2.27
2.97
1.94
2.43
5%
1%
5.02
7.88
3.69
5.30
3.12
4.28
2.79
3.72
2.57
3.35
2.41
3.09
2.29
2.90
2.19
2.74
2.11
2.62
2.05
2.52
1.94
2.36
1.83
2.19
1.71
2.00
1.64
1.90
1.00
1.00
STATISTICS  EXAMPLES 1
1. In an operation the probability that a helicopter fails to return from a sortie is 5%.
What is the chance that it survives 30 sorties?
2. Under a given set of tactical conditions a tank has an 85% chance of seeing an enemy
tank for long enough to engage it. It then has a 90% chance of hitting with its first
shot, and if it hits an 80% chance of killing it. Under these conditions, what is the
chance of the enemy tank surviving?
3. You are validating the results of a tank ammunition trial. The hit probability of a
tank gun with a single round is 13 , independently of all other shots, when firing at a
target at 2000 metres. If 5 rounds are fired without correction, find the chance of:
(a) Missing with the first 2 and hitting with the remainder.
(b) Making 4 hits and 1 miss.
(c) Making at least 2 hits.
4. Three soldiers each fire once at a target. Their probabilities of hitting are 0.2, 0.3 and
0.4, independently of each other. Find the probabilities of obtaining each of exactly
zero, one, two or three hits. Check that these probabilities sum to 1 (why should
they?).
5. An assault upon an enemy airfield is being planned. The assault team will need to be
parachuted to within 3 miles of the target, after which, to reach the airfield, they will
need to successfully cross a minefield followed by the airfield defences.
It is estimated that there is a 65% chance of remaining undetected during the paradrop,
and a 75% chance of successfully crossing the minefield. The chance of getting through
the airfield defences is 80% if the paradrop was undetected but only 20% if it was
detected.
Use a tree diagram to calculate the probability that the assault team succeeds in
reaching the airfield.
6. A screening test for a disease has the following characteristics:
If a person has the disease then there is a 95% chance that the test proves positive,
i.e. that it correctly detects the disease.
If a person does not have the disease then there is a 5% chance that the test result is
(incorrectly) positive.
If 1% of the population have the disease and a person selected at random tests positive,
find the probability that they really have the disease.
GREEN
STATISTICS  EXAMPLES 1  SOLUTIONS
1. Chance of surviving one sortie is 0.95
Chance of surviving n sorties is therefore 0.95n
Chance of surviving 30 sorties is 0.9530 = 0.21464
2. Chance of engaging is 0.85
Chance of engaging and hitting is 0.85 0.9
=
= 0.016
3
3
243
(b) Chance of 4 hits and 1 miss in any order is
4
2
10
1
=
= 0.041
5
3
3
243
(the factor of 5 is because the miss could occur on any of the 5 firings).
(c) Chance of zero hits is
5
2
3
32
243
4
1
2
3
3
80
243
(the factor of 5 is because the hit could occur on any of the firings).
Hence chance of at least 2 hits is 1
80
131
32
=
= 0.539
243 243
243
4.
P (0)
P (1)
P (2)
P (3)
=
=
=
=
5. Well leave you to draw the tree diagram yourself, but in algebraic terms the problem
can be solved as follows.
Break it down into succeeding undetected and succeeding detected:
P (succeed undetected) = 0.65 0.75 0.8 = 0.3900
P (succeed detected) = 0.35 0.75 0.2 = 0.0525
Since these are mutually exclusive, the chance of success is 0.3900 + 0.0525 = 0.4425.
6. Again we leave you to draw the tree diagram yourself and give the answer algebraically.
We have
P (test positive  have disease) = 0.95
P (test positive  dont have disease) = 0.05
P (have disease) = 0.01
Hence
P (have disease and test positive) = 0.01 0.95 = 0.0095
P (dont have disease and test positive) = 0.99 0.05 = 0.0495
These are mutually exclusive, so
P (test positive) = 0.0095 + 0.0495 = 0.0590
Therefore there is a probability of 0.0590 that a person selected at random will test
positively. Of these, a proportion
0.0095
= 0.1610
0.0590
actually have the disease. This is in fact using the formula for conditional probability
P (have disease  test positive) =
=
Hence, if a person tests positively there is still only a 16.1% chance that they actually
have the disease. This is a fairly typical result, in that the vast majority of people
do not have the disease so that, unless the test is incredibly accurate, the majority of
positive results will in fact be false positives.
STATISTICS  EXAMPLES 2
1. From a stock of old ammunition with 10% defectives, five rounds are fired. Find the
probabilities of each of 0, 1, 2, 3, 4 or 5 defective rounds in the sample.
2. If the proportion of defective items in a large production run is 8%, what is the
probability that a sample of 30 items will include fewer than 3 defectives?
3. In a large consignment of SAA the average number of dud rounds in a box is one.
Find the chance that a box selected at random contains more than 3 duds.
4. A man test firing a rifle at a figure target at 1000 metres range can expect to get 1
hit in 20. Calculate the chance of 2 or more hits when 10 rounds are fired.
5. In 1944 the question arose of whether V1 flying bombs falling in London were aimed
at pinpoint targets, or just aimed roughly in the direction of London. London was
divided into 576 areas each of 14 km2 , and the number of areas each with 0, 1, 2 etc
hits recorded. A total of 535 bombs fell in the 576 areas, with the distribution
Number of hits (x)
Number of areas with x hits
0
1 2 3
229 211 93 35
4 5 6
7 1
0
If the bombs were falling randomly then these data should look like 576 observations
535
from a Poisson distribution with mean 576
.
Calculate the expected frequencies using the Poisson distribution and use these to
decide whether the bombs appear to have fallen randomly or not.
GREEN
STATISTICS  EXAMPLES 2  SOLUTIONS
1. Assuming independence of the rounds, this is binomial with n = 5, p = 0.1, i.e.
Bi(5, 0.1), so that
P (0) = 0.95 = 0.5905
P (1) = 5 0.1 0.94 = 0.3281
54
0.12 0.93 = 0.0729
P (2) =
2
54
P (3) =
0.13 0.92 = 0.0081
2
P (4) = 5 0.14 0.9 = 0.0005
P (5) = 0.15 = 0.00001
2. The number of defectives follows a binomial distribution with 30 trials and success
probability 0.08, that is a Bi(30, 0.08) distribution. Hence
P (0)
0.9230
0.0820
P (1)
30 0.08 0.9229
0.2138
P (2)
30 29
0.082 0.9228
12
0.2696
Total
0.5654
P(0) 0.9510
0.5987
0.3151
0.9138
535
= 0.928819
576
0
1
2
3
4
5
229
211
93
35
7
1
227.52 211.34 98.15 30.39 7.06 1.31
6
0
0.23
The numbers are all very close, suggesting that there is no evidence that the V1 bombs
are being aimed, at least at the scale which we are considering. (See section 13.3 for
a formal approach to testing this).
STATISTICS  EXAMPLES 3
1. For a normal distribution with mean 25 and standard deviation 5, find the probability
of a single observation falling within each of the following intervals:
(a) 17.0 to 19.0
(b) above 35.0
(c) 22.0 to 30.0
2. Limit gauges are used to reject all components of length greater than 1.51cm or less
than 1.49cm. A machine produces such components with the length normally distributed with mean 1.503cm and standard deviation 0.004cm. What proportion of
components will be rejected?
3. For a normal distribution with mean 10 and standard deviation 4, find
(a) The value k such that the chance of an observation exceeding k is 5%.
(b) A value k, exceeding 12, such that the chance of an observation falling in the
interval between 12 and k is 20%.
4. Of a large group of men selected for a clothing trial, 5% are under 60 inches in height
and 40% between 60 inches and 65 inches. Assuming heights are normally distributed,
find the mean and the standard deviation of heights.
5. You are calculating theoretical hit probabilities prior to a tank gun trial. The standard deviation of the distribution of hits about the mpi on a target is governed by
the variability of a number of independent errors. If the horizontal and vertical errors
(standard deviations) in mils are distributed in the manner shown below at 1000m, find
the overall horizontal and vertical error standard deviations. Hence find the chance of
a hit on a target 2 metres (horizontal) by 3 metres (vertical) when the mpi is at the
centre of the bottom of the target.
Note 1: 1 mil subtends 1 metre at 1000 metres, so you can take a mil as being equivalent
to a metre in this case (this is not quite correct, but we use it for simplicitys sake).
Note 2: A standard result (section 6.4) tells us that when we have multiple independent
sources of variation then the overall variance is simply the sum of the individual
variances.
Note 3: You can assume that horizontal error and vertical error are independent.
The following table gives the standard deviations associated with the various sources
of error:
Horizontal Sightmechanical
0.5
Barrel bend
0.1
Droop
0.2
Cross wind
0.5
Rangefinding
0.2
Ballistic
0.2
Ballistic
0.3
Laying
0.1
Laying
0.1
6. A ballistics trial to be carried out with a modified tank gun requires muzzle velocities
between 960 m/s and 970 m/s. Two types of round (A and B) seem to be the most
promising. Round A has a mean muzzle velocity of 963 m/s with a standard deviation
of 6 m/s while round B has a mean muzzle velocity of 968 m/s with a standard
deviation of 5 m/s, when fired from this gun.
(a) Which type of round should be selected for the trial in order to minimise the
number of wasted firings?
(b) After firing 5 of the rounds selected in (a), what is the probability that:
i. none of them have achieved a muzzle velocity in the required range?
ii. at least 4 of them have achieved a muzzle velocity in the required range?
GREEN
STATISTICS  EXAMPLES 3  SOLUTIONS
1. Now, = 25, = 5 so X N(25, 52 ).
(a) We want
P (17 < X < 19) = P
19 25
17 25
= P (1.6 < Z < 1.2)
<Z<
5
5
35 25
= P (Z > 2.0) = 0.5 0.4772 = 0.0228
5
(c) Finally
P (22 < X < 30) = P
22 25
30 25
= P (0.6 < Z < 1.0)
<Z<
5
5
Now,
P (0 > Z > 0.6) = 0.2257 and P (0 < Z < 1.0) = 0.3413
so that the chance of being inside the interval is the sum of the two values
0.2257 + 0.3413 = 0.5670
2. The length is described by the random variable X N(1.503, 0.0042). The proportion
of components which are the correct size is
P (1.49 < X < 1.51) = P
1.49 1.503
1.51 1.503
<Z<
0.004
0.004
= 0.05
From tables,
Q(1.645) 0.4500 (by interpolation)
so that
P (Z > 1.645) = 0.5 0.4500 = 0.05
so that we equate
k 10
= 1.645 k = 10 + 4 1.645 = 16.58
4
(b) We need k such that
k 10
12 10
<Z<
4
4
= 0.2
From tables,
12 10
Q
= Q(0.5) = 0.1915
4
so that
k 10
Q
4
From tables,
Q(1.235) 0.3915 (by interpolation)
so that we equate
k 10
= 1.235 k = 10 + 4 1.235 = 14.94
4
4. We know X N(, 2 ) and we have
P (X < 60) = 0.05 and P (60 < X < 65) = 0.40
so that
P (X < 65) = 0.45
First translate these to statements about Z, putting
a1 =
60
a2 =
2
65
Note that these will both be negative because they are to the left of the mean. Hence,
putting
Q1 = P (a1 < Z < 0)
60
and
0.125 =
65
and
0.125 = 65
Hence
1.645 (0.125) = 60 (65 )
Therefore 1.52 = 5, and so
=
so that
5
= 3.29 inches
1.52
10
1 0
<Z<
0.7483
0.7483
00
30
<Z<
0.6557
0.6557
6. (a) We want to pick the round type with the larger value of P (960 MV 970)
Round A
P (960 MV 970) = P
970 963
960 963
Z
6
6
where Z N(0, 1)
= P (0.5 Z 1.17)
= 0.1915 + 0.3790
= 0.5705
Round B
P (960 MV 970) = P
970 968
960 968
Z
5
5
where Z N(0, 1)
= P (1.6 Z 0.4)
= 0.4452 + 0.1554
= 0.6006
Hence, the type B round should be selected for the trial.
(b) The number of type B rounds achieving a muzzle velocity in the range from
960m/s to 970m/s will follow a binomial distribution with n = 5 and p = 0.6006.
i. Hence X Bi(5, 0.6006).
P (X = 0) = (1 p)n
= (1 0.6006)5
= 0.0102
ii. Again X Bi(5, 0.6006).
P (X 4) = P (X = 4) + P (X = 5)
P (X = 4) =
=
=
P (X = 5) =
=
=
Therefore P (X 4) =
=
5
4
p4 (1 p)
5(0.6006)4 (1 0.6006)
0.2598
p5
(0.6006)5
0.0781
0.2598 + 0.0781
0.3379
STATISTICS  EXAMPLES 4
1. Your trials establishment has a machine shop. The standard deviation of the lengths of
components (required to modify some trials vehicles) produced by a machine process
is well established as 0.002cm. A random sample of 5 measured lengths from a days
production are 2.1045, 2.1050, 2.1060, 2.1035, 2.1055. Between what limits may it
be stated with 95% confidence that the true mean length will be located? (You may
assume lengths are normally distributed).
2. Thirty six rounds testfired from a mortar at the same target achieve a sample mean
range of 421m with a sample standard deviation of 42m. Is this evidence against the
range table figure of 400m? State any assumptions you make.
How would the answer change if the question had been is there any evidence that the
true mean range exceeds the range table figure?
3. Sixty measurements of the speed of a hovercraft operating under specified trial conditions have a sample mean value of 40.2 knots and a sample standard deviation of 5.7
knots. What are the 95% confidence limits for the true mean speed of this hovercraft
under these conditions? State any assumptions you make.
4. Six rounds from gun A and 5 rounds from gun B are test fired under the same conditions and fall at the following distances in metres from a datum point. The standard
deviations in range for guns of types A and B are well established and have the values
of 30m and 42m respectively. On the assumption that range is normally distributed,
test at the 5% level whether there is significant evidence of a difference in mean ranges
between the two guns.
Gun A
Gun B
50 72 69 84 75 102
125 84 180 170 43
Mean
Standard deviation
= 6204 miles
= 510 miles
Cheaper tyre:
Mean
= 6101 miles
Standard deviation = 552 miles
Comment on how you might have improved the design of the trial.
GREEN
STATISTICS  EXAMPLES 4  SOLUTIONS
1. Here x = 2.1049 and we know that length is normally distributed with = 0.002.
Hence exact 95% confidence limits for the population mean are
0.002
2.1049 1.96 = 2.1049 0.00175 = (2.10315, 2.10665) cm
5
2. Since n = 36 is quite large we can assume that the sample mean comes from a
distribution which is approximately normally distributed and that the sample variance
S 2 is a reasonably good estimate of the population variance 2 . Hence the z test will
be approximately correct.
We have x = 421, S 2 = 422 , n = 36.
the standard error of the mean, is estimated to be
The standard deviation of X,
s
42
S2
S
= =
=7
n
n
6
We test hypotheses
H0 : = 400 v H1 : 6= 400
Since x = 421 our observed test statistic is
421 400
=3
zobs =
7
If H0 is true then this value should look like an observation from a standard normal
distribution.
Now, from tables, for a twotailed test the critical value for a 1% significance test is
2.58. Since the observed test statistic zobs exceeds this 1% critical value (in magnitude)
we would reject H0 with a 1% test. Hence there is strong evidence that the range table
figure should not be 400m, but somewhat higher. In fact the 0.5% critical value is
2.81, so that the result is significant even at the 0.5% level.
Note: Assuming that the data are normally distributed, which is pretty reasonable in
this case, we obtain the exact answer using the t distribution rather than the normal.
For the t35 distribution the 1% critical value is 2.724, so in this case it does not affect
our conclusions.
If the question was is there any evidence that the true mean range exceeds the range
table figure? then the hypotheses become
H0 : 400 v H1 : > 400
and this is a onetailed test. The only change is that now we use the onetailed critical
values, for example zcrit = 2.33 for a 1% test. This time zobs exceeds even the 0.2%
critical value of 2.87, so we would reject H0 for any significance level greater than
or equal to 0.2%. Hence clearly we still have strong evidence against H0 and the
conclusion remains unchanged.
1
3. Since n = 60 is quite large we can assume that the sample mean comes from a
distribution which is approximately normally distributed and that the sample variance
S 2 is a reasonably good estimate of the population variance 2 .
Hence approximate 95% confidence limits for the population mean are
S
x 1.96
n
where zcrit = 1.96 is the critical value for a (twosided) 95% CI (or a twotailed 5%
test).
Since
x = 40.2, S = 5.7, n = 60
the approximate 95% confidence limits are
5.7
40.2 1.96 = 40.2 1.4 = (38.8, 41.6) knots
60
Note: If we can assume that the data are normally distributed, we obtain the exact
answer using the t distribution rather than the normal. This simply entails replacing
1.96 with 2.00, the value for the t59 distribution. Even if we do not assume that the
data are normally distributed, it can still be argued that this will probably give a
slightly better approximation in most cases.
4. Calculate the sample mean ranges for each gun: xA = 75.3, xB = 120.4.
We are given the true standard deviations: A = 30, B = 42. Also nA = 6, nB = 5.
We test
H0 : A B = 0 v H1 : A B 6= 0
Since the standard deviations are given, and the data are observations from a normal
distribution, the test statistic is
xA xB
75.3 120.4
zobs = r 2
= q 2
= 2.01
2
422
30
B
A
+
+
6
5
nA
nB
For 5%, two tails the critical value is 1.96. The test statistic exceeds this in magnitude,
so the difference is significant at the 5% level. We conclude there is some evidence for
a difference in range between the two guns, with gun B being longer ranged than gun
A.
The confidence interval follows almost immediately, as
75.3 120.4 1.96
302 422
+
= 45.1 43.95 = (89.05 , 1.15)
6
5
STATISTICS  EXAMPLES 5
1. Six rounds test fired from a gun at the same target fall at ranges of 10125, 10240,
10200, 10280, 10320, 10180m. Assuming that range is normally distributed, find if
there is evidence at the 1% level that the true mean range of the gun differs from the
range table figure of 10285m. Also construct a 95% confidence interval for the true
mean range.
2. Sixteen tank gunners use a simulator and then perform a field trial. Their scores on
the simulator have a sample mean value of 74 with a sample standard deviation of 19.2,
while in the field their scores have a sample mean value of 67 with a sample standard
deviation of 17.8. Past experience shows that both field and simulator readings can
be assumed to follow a normal distribution. Is there any evidence from these figures
that gunners in general achieve a lower standard in the field? Suggest a better way to
assess this.
3. Ten soldiers test fire a rifle. They then refire it, after it has been modified, a week
later. Their scores (higher is better), in the same order, are:
1st week
67 24 57 55 63 54 56 68 33 43
2nd week
70 38 58 58 56 67 68 77 42 38
Each of these scores is based on the average miss distance for 50 shots.
Is there any evidence of an improvement? How will the test be affected if the scores
are not shown in the same order each time?
4. Extensive firings of the inservice weapon show that in a 3 month period gun barrels
wear on average by 0.1mm. It has been suggested that gun barrels may be treated by
a new process to resist barrel wear. In order to test this hypothesis thirty new guns
have been treated by this new process and after 3 months of inservice test firings
their barrel wear has been recorded. It is found that the average wear for these thirty
guns is 0.13mm, with an estimated standard deviation of 0.055mm. It is your job to
make sense of these numbers. Can you do so?
5. Looking through scores obtained by 4 ATGW operators using a simulator together
with their corresponding scores during a field trail you see the results given in the
table below:
Operator
Simulator score
Field score
1
60
52
2
40
37
3
52
50
4
25
24
Looking at the differences of these scores you find a mean difference of 3.5 and an
estimated standard deviation (of the differences) of 3.11. This gives an observed t
value for the paired test of 2.25. Finding a set of statistics tables you look up the
critical t value for a onesided test with = 3 at the 5% level of significance as 2.35.
You conclude therefore that there is no evidence that operators in general have a lower
score in the field.
Major Thruster, however, looks at his set of tables of results and makes the following
comments:
1
To hell with statistics, every single one of those chaps got a lower score in the
field than on the simulator. That is certainly a significant result!
Moreover, after looking through his Statistics tables he spots that if you had taken a
10% level of significance the critical t value is 1.638, in which case even your method
of attacking the problem would have given a significant result.
(If you search for the value of t at 10% level of significance for a one tail test in your
tables you will appreciate that Major Thruster has access to a more detailed set of
tables than yours).
Discuss the validity of your approach and that of Major Thruster.
GREEN
STATISTICS  EXAMPLES 5  SOLUTIONS
1. We test
H0 : = 10285 v H1 : 6= 10285
using a sample of size n = 6 where x = 10224.2 and S = 70.6. We are told that the
data are normally distributed, so an exact test uses
x 0
10224.2 10285
= 2.11
tobs =
=
70.6
S
From tables the 1%, two tail, critical value for t5 is 4.03. The test statistic does not
exceed this in magnitude so that the result is not significant at the 1% level. Hence
we can conclude that there is no evidence that this gun has a range different to the
range table value.
The 95% CI is
70.6
10224.2 2.57 = 10224.2 74.0734 = (10150.1 , 10298.3)
6
using the 5%, two tail, critical value for t5 of tcrit = 2.57. Note that the value 10285
is within the interval, so that a 5% test would also have lead to nonrejection of H0 .
2. Simulator scores
xA = 74 SA = 19.2
Field scores
xB = 67 SB = 17.8
with nA = nB = 16 and we test hypotheses
H0 : A B = 0 v H1 : A B > 0
Note that the question specifically tells us that the alternative hypothesis is that the
gunners do worse in the field, rather than differently.
Samples are fairly small and we can assume normally distributed observations so that
we can use a twosample t test.
The two variances are 368.64 and 316.84 so that pooling is clearly reasonable, with
15 19.22 + 15 17.82
= 342.74
16 + 16 2
so that Sp = 18.5132. Hence the test statistic is
Sp2 =
tobs =
74 67
18.5132
1
16
1
16
74 67
= 1.0695
6.5454
the critical value (5%, one tailed, t30 ) is 1.70. The observed test statistic is less than
this so we do not reject H0 at the 5% significance level. Hence we can conclude that
there is no evidence that gunners in general achieve a lower standard in the field.
Clearly we should have recorded each gunners simulator and field scores and then
used a paired test on the differences. By allowing for intergunner variation in this
way we would increase our ability to detect any difference between simulator and field
results (see answer to question 3 for an illustration of this).
1
3. Is there evidence that the mean of the differences is greater than zero? Since each
column in the table is a pair of values for the same soldier, we can use a paired t test.
This requires the assumption that the differences follow a normal distribution. We
cannot verify this with such a small sample, but the distribution of the differences
should at least be symmetrical, and the scores are based on 50 shots, so that the
normality assumption is probably OK.
These differences (week 2 minus week 1) are:
x = 3, 14, 1, 3, 7, 13, 12, 9, 9, 5
On the assumption that the scores on the two weeks are from the same population
the mean difference should be 0, so we test
H0 : d 0 v H1 : d > 0
where d is the true mean difference. The sample mean and variance of the above are
xd = 5.2 , Sd2 = 54.8444 , Sd = 7.4057
Hence the standard error of the mean (estimated standard deviation of means of
samples of size 10) is
7.4057
Sd
=
= 2.3419
n
10
Hence the test statistic is
tobs =
xd 0
Sd
5.2 0
= 2.220
2.3419
The significance level was not specified in the question. From tables the critical value
for t9 (5%, one tailed) is 1.83, but that for 1% is 2.82, so our result is significant at 5%
but not at 1%. Hence there is fairly strong evidence that the modified mean is higher.
Is there evidence that the mean of the differences is greater than zero when order
is not considered? This time there is no pairing in the table, so we have to use an
unpaired 2sample t test.
We test
H0 : 2 1 v H1 : 2 > 1
1st Firing
x1 = 52;
S1 = 14.46; n1 = 10
2nd Firing
x2 = 57.2; S2 = 13.90; n2 = 10
Pooled variance Sp2 =
57.2 52.0
5.2
x1 x2
tobs = r
= r
= r
= 0.82
1
201.2
1
Sp2 n11 + n12
201.2 10
+ 10
5
2
The critical value for t18 (5%, one tailed) is 1.73, so there is no evidence of a difference
between the means.
Note that the numerator is the same as above, but the denominator is much larger here.
This is because it includes the intersoldier variability, which we are not interested in.
The paired test in effect removes this and allows us to concentrate on the difference
between weeks.
By throwing away the information on who fired which shot we have decreased our
ability to spot a difference between the two sets of firings.
In addition, the assumption of normality is less secure in this case, since it uses the
actual scores rather than the differences.
4. The null hypothesis in this case is that the mean wear of untreated guns and treated
guns is the same, while the alternative hypothesis should be that the mean wear of
treated guns is less than the mean wear of untreated guns
H0 : treated 0.1 v H1 : treated < 0.1
This is because we are only interested in whether the treatment reduces wear. Since
the mean wear of our sample is actually higher than that for untreated guns clearly
the test statistic will be in the wrong tail. Hence, without further calculations, we
can say that we do not reject the null hypothesis.
If, however, the alternative hypothesis was that there was a difference in the two mean
values of wear then we use a twotailed test, which at the 5% level provides a significant
result, showing that the treated guns have greater wear. However, the question clearly
indicates that we should be using a onetailed test, so that this would be a posthoc
redefinition of the test.
The main point of this question, however, is to note that we must compare like with
like, so the above analysis is only valid if the value of 0.1 for guns not treated by the
process also refers to new guns, or wear does not depend on age.
Almost certainly, neither of these criteria are correct and therefore the whole test is
invalid. Whoever obtained the data should be told the error of his ways and the whole
exercise repeated under proper experimental conditions.
5. There are several points which must be made about this question.
Why do you use a onesided test? The answer is probably because you looked at the
data and then decided which sort of test to do. This of course is using your data
to fit your answer and one should have decided upon the test prior to accumulating
the data. In fact, since a priori there is no reason to believe that field scores may be
lower than simulator scores one should use a twosided test which, at the 5% level of
significance, has a critical t value of 3.18.
Major Thruster, whether he knows it or not, is actually performing another kind of
test on these data  the socalled sign test. Assuming no difference overall
in the two
4
1
scores, the probability of all four people scoring lower in the field is 2 , or 6.25%.
This is by definition not significant at the 5% level. It is usually true to say that if
you have the actual measurements rather than just are they bigger or smaller, one
should always use the measurement values. The axiom is the more information you
have the better your result.
Major Thrusters attempt to change the significance level is a misuse of statistical
methodology. He is using the data to determine his significance test rather than using
the significance test to test the data. In any case, using a 10% level of significance
means that one is willing to accept a 1 in 10 chance of being wrong. That is to say,
when there is in fact no difference between the true mean values, 1 in 10 times you
will decide that in fact there is a difference. This is likely to be too high a risk for
most situations.
Finally, why are you using a t test? Is it valid to assume that the observed differences
come from a normal distribution? Ideally we would have past data from similar trials
to verify this, or otherwise. If we cannot assume normality of the differences then the
use of the sign test above is in fact correct, since this does not require any assumption
of normality.
STATISTICS  EXAMPLES 6
1. A trial was conducted to compare the consistency in range of two types of artillery
HE round (X and Y ). A single gun fired 15 rounds of type X, followed by 13 rounds
of type Y , both types from the same gun. While both types of round had a similar
mean range, the standard deviation in range was estimated to be 162m for round X
and 195m for round Y .
Test the hypothesis that the consistency in range of rounds X and Y , as measured by
the variances, are the same at the 5% significance level.
Comment on the design of the trial and state any assumptions you make.
2. According to a standard testing procedure on the modulus of elasticity of rubber
specimens, it has been established that the standard deviation of measurements of
this modulus is 18.0 units.
A sample of 20 measurements are taken on a given specimen and the sample standard
deviation found to be 23.2 units.
(a) Test at the 5% level whether the variance of the procedure is being maintained.
(b) Construct a 95% confidence interval for the true variance of the population from
which the sample was drawn.
State any assumptions you make.
3. A manufacturer has specified that, based on past experience, the standard deviation
in the weight of a small arms round is no more than 0.08g. Before conducting a trial
with this type of round, a sample of 16 are chosen from the test batch and weighed.
The standard deviation in weight for this sample is found to be 0.12g.
Perform a 1% test of the hypothesis that the test batch conforms to the manufacturers
specification, stating any assumptions you make.
4. In a toxicity test for a new drug, 10 rats are given the drug and 2 die. It is known
that under the same conditions the standard existing drug would kill 50% of the rats
it was given to. Is there any evidence that the new drug has improved things (i.e.
decreased the kill rate)?
More experiments are performed and, out of 100 rats, 37 die. Find a 99% confidence
interval for the proportion of rats killed by the new drug.
5. A marksman fires five shots at a target and counts the number of bullseyes. After a
series of 100 such sets of 5 shots the results are as follows:
Number of bullseyes
Frequency
0 1 2 3
6 31 36 15
4 5
8 4
Assuming that the probability p of a bullseye remains constant, test whether the above
results are consistent with a binomial distribution. (If not, this suggests that the shots
within each set of 5 are not independent).
6. In a survey, the following information was extracted on the number of officers of each
rank owning various breeds of dog:
1
Golden Labrador
Black Labrador
Spaniel
Total
Total
200
240
50
490
Is there any evidence, at the 1% level, of an association between rank and breed?
7. When designing a helicopter cockpit it is important for it to be of the right size to
cope with pilots of different sizes. An important measurement is the buttockknee
length, and it is of interest to see if we can predict a pilots buttockknee length from
his height, since if we can do this accurately then we can use existing information
on the heights of all pilots to calculate buttockknee length for all of them, without
having to measure it directly.
The height and buttockknee length of a random sample of 144 aircrew are measured.
Taking the buttockknee length as y, the response variable, and heights as x, the
explanatory variable, we have
X
with n = 144.
y 2 = 543, 819 ;
x2 = 4, 425, 549 ;
xy = 1, 551, 020
y = 8, 840.1
x = 25, 235.8
GREEN
STATISTICS  EXAMPLES 6  SOLUTIONS
1. We have sample standard deviations SX = 162, SY = 195, with sample sizes nX = 15
and nY = 13.
Assuming that ranges are normally distributed, the Ftest can be used to compare the
variances.
Null hypothesis
2
H0 : X
= Y2
Alternative hypothesis
2
H1 : X
6= Y2
Test statistic is
Fobs =
SY2
1952
=
= 1.45
2
SX
1622
2
(SY2 is the numerator since SY2 > SX
). The critical value from the Fdistribution with
12 and 14 df at the 5% significance level (2tailed) is:
Fcrit = 3.05
Since the test statistic Fobs is less extreme than the critical value, we conclude that
there is insufficient evidence (at the 5% significance level) to conclude that the consistency of round X is different from that of round Y .
The trial design would have been better if the order of firings had been randomised.
By firing all of the X rounds and then all of the Y rounds some systematic error
may have been introduced. For example, wind speed or direction might have changed
substantially. The use of only one gun also leaves us open to the possibility of it being
atypical of guns of its type. We are not told whether the gun was warmed up by some
preliminary rounds being fired prior to any measurements being made, but this would
be desirable.
2. The following assumes that elasticity measurements are normally distributed.
(a) We test
H0 : 2 = 18.02 v H1 : 2 6= 18.02
Here S 2 = 23.22 is the larger, so
Fobs =
23.22
= 1.6612
18.02
The critical value for F with 19 and degrees of freedom (5%, 2tailed) is
Fcrit = 1.71
Since Fobs < Fcrit , we conclude that there is no evidence of a difference.
H0 : 2 0.082
Alternative hypothesis
H1 : 2 > 0.082
Now, S 2 is the numerator as it is larger than the hypothesised 02 , so the test statistic
is
Fobs =
0.122
= 2.25
0.082
The critical value from the Fdistribution with 15 and df at the 1% significance
level (1tailed) is:
Fcrit = 2.04
Since the test statistic is more extreme that the critical value, the result is significant
at the 1% level. The evidence suggests that it is very unlikely that the test batch
conforms to the manufacturers specification.
4. We have
H0 : p >= 0.5 v H1 : p < 0.5
Hence the pvalue is
P (X <= 2)
In other words, if H0 is really true then there was only a 5.47% chance of observing
what we did (or something even less compatible with H0 ). Hence by definition this
2
is not quite significant at the 5% level. You would probably say that there is slight
evidence against H0 , and recommend collecting more data.
With the extra data, we use the estimated value
37
= 0.37
p =
100
so that the 99% CI (based on the normal approximation) is
0.37 2.58
0.37 0.63
100
0.37 0.124564
= (0.2454, 0.4946)
5. Clearly 5 100 = 500 rounds were fired, with
1 31 + 2 36 + 3 15 + 4 8 + 5 4 = 200
hits. Hence we estimate p by
p =
200
= 0.4
500
Hence, under
H0 : data come from binomial distribution
the data are 100 observations from the random variable X, where
X Bi(5, 0.4)
Using the usual binomial calculations to obtain the Probability line, then multiplying
by the number of observations (100) to obtain the Expected numbers, we find
Number of bullseyes
Observed
Probability
Expected
(OE)2
E
0
6
0.0778
7.776
1
31
0.2592
25.920
2
36
0.3456
34.560
3
15
0.2304
23.040
4
5
8
4
0.0768 0.0102
7.680 1.024
Note that the two righthand columns are amalgamated so that E > 5.
This gives X 2 = 5.515 and the 2 distribution has 5 1 1 degrees of freedom (five
classes, one parameter estimated) so that even the 10% critical value is 2crit = 6.252.
Hence even a 10% test would not reject H0 , and so there is no evidence against the
hypothesis that the observations come from a binomial distribution.
6. Under the null hypothesis of no association, we calculate the expected numbers as,
for each cell of the table
row total column total
E=
overall total
so that for example
130 240
= 63.67
E for Major, Black Labrador =
490
and so on. This gives the table
3
Golden Labrador
Black Labrador
Spaniel
(46 57.14)2
(4 7.14)2
+ ...+
= 11.63
57.14
7.14
The degrees of freedom are (4 1)(3 1) = 6, so that the 1% critical value is 2crit =
16.81. Hence at the 1% level there is no evidence against the null hypothesis. There
is no evidence for an association between rank and breed.
7. The sample means are
y = 61.3896 ; x = 175.249
and from AMOR formula book section 17.10, we have
Syy = 543819 144 61.38962 = 1128.88
Sxx = 4425549 144 175.2492 = 3001.00
Sxy = 1551020 144 61.3896 175.249 = 1803.63
(a) Hence the correlation coefficient is
rxy =
1803.63
= 0.980
1128.88 3001.00
Hence, not surprisingly, there is a very strong positive correlation between the
two.
(b) The regression line is given by
1803.63
=
= 0.601
3001.00