Вы находитесь на странице: 1из 201

CHAPTER 1

INTRODUCTION TO STATISTICAL ANALYSIS

Reading
Newbold 1.1, 1.3, parts of 1.2.
Anderson, Sweeney, and Williams Chapter 1
Wonnacott and Wonnacott Chapter 1
James T Mc Clave, P. George Benson Chapter 1

Introductory Comments
This Chapter sets the framework for the book. Read it carefully, because the ideas
introduced are a basis to this subject and research Methodology.

1.

Random Sampling, Deductive and Inductive Statistics.


Random Sampling
Only in exceptional circumstances is it possible to consider every member of the
population. In most cases only a sample of the population can be considered and
the results contained from this sample must be generalized to apply to the
population.
In order that these generalizations should be accurate the sample must be random,
that is, every possible sample has an equal chance of selection and the choice of a
member of the sample must not be influenced by previous selection; this is simple
random sampling.

Example 1
Suppose that a population consists of six measurements, 1, 2, 3, 4, 5, and 7. List
all possible different samples of two measurements that could be selected from
the population. Give the probability associated with each sample in a random
sample of n 2 measurement selected from the populations.

Solution
All possible samples are listed below

Sample

Measurements

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

1,2
1,3
1,4
1,5
1,7
2,3
2,4
2,5
2,7
3,4
3,5
3,7
4,5
4,7
5,7

Now let us suppose that I draw a single sample of n = 2 measurement from the 15
possible sample of two measurements. The sample selected is called a random sample if
every sample had an equal probability (1/15) being selected.
It is rather unlikely that we would ever achieve a truly random sample, because the
probabilities of selection will not always be exactly equal. But we do the best we can.
One of the simplest and most reliable ways to select a random sample of n measurements
from a population is to use a table of random numbers (See Appendix B). Random
number tables are constructed in such a way that, no matter where you start in the tables
no matter what direction you move, the digits occur randomly and with equal probability.
Thus if we wished to choose a random sample of n = measurements from a population
containing 100 measurements, we could label the measurements in the population from
0 to 99 (or 1 to 100). Then referring to Appendix Vii and choosing a random starting
point, the next 10 two-digit numbers going across the page would indicate the labels of
the particular measurements to be included in the random sample. Similarly, by moving
up or down the page, we would also obtain a random sample.

Example 2
A small community consists of 850 families. We wish to obtain a random sample of 20
families to ascertain public acceptance of a wage and price freeze. Refer to Appendix B
to determine which families should be sampled.
Solution
Assuming that a list of all families in the community is available such as a telephone
directory), we could label the families from 0 to 849 (or equivalently, from 1 to 850).
Then referring to the Appendix, we choose a starting point. Suppose we have decided to
start at line 1, column 4. Going down the page we will choose the first 20 three-digit
numbers between 000 and 849 from Table B, we have

511
584
754
258

791
045
750
266

099
783
059
105

671
301
498
469

152
568
701
160

These 20 members identify the 20 families that are to be included in our example/

Deductive and Inductive Statistics.


The reasoning that is used in statistics hinges on understanding two types of logic,
namely deductive and inductive logic. The type of logic that reasons from the particular
(sample) to the general (Population) is known as inductive logic, while the type that
reasons from the general to the particular is known as deductive logic.

Learning Objectives
After working through this chapter, you should be able to:

Explain what random sampling is

Explain the difference between a population and a sample

CHAPTER 2

METHODS OF ORGANISING AND PRESENTING DATA

Reading

Newbold Chapter 2
James T Mc Clave and P George Benson Chapter 2
Tailoka Frank P Chapter 3

Introductory Comments
This Chapter contains themes to do with the understanding of data. We find graphical
representations from the data, which allow one to easily see its most important
characteristics. Most of the graphical representations are very tedious to construct
without the use of a computer. However, one understands much more if one tries a few
with pencil and a paper.

Graphical Representations Of Data


Types of business data; methods of
frequency distribution.

representation of qualitative data, cumulative

Types of business data. Although the number of business phenomena that can be
measured is almost limitless, business data can generally be classified as one of two
types: quantitative or qualitative.
Quantitative data are observations that are measured on a numerical scale. Examples of
quantitative business data are:
i.
ii.
iii.

The monthly unemployment percentage


Last years sales for selected firms.
The number of women executives in an industry.

Qualitative data is one that is not measurable, in the sense that height is measured, or
countable, as people entering a store. Many characteristics can be classified only in one
of asset of category. Examples of qualitative business data are:

i)

The political party affiliations of fifty randomly selected business executives.


Each executive would have one and only one political party affiliation.

ii)

The brand of petrol last purchased by seventy four randomly selected car owners.
Again, each measurement would fall into one and only one category.
Notice that each of the examples has nonnumerical or qualitative measurements.

Graphical methods for describing qualitative data.


(a)

The Bar Graph


For example, suppose a womans clothing store located in the downtown area of a
large city wants to open a branch in the suburbs. To obtain some information
about the geographical distribution of its present customers, the Store manager
conducts a survey in which each customer is asked to identify her place of
residence with regard to the citys four quadrants. Northwest (NW), North east
(NE), Southwest (SW), or Southeast (SE). Out of town customers are excluded
from the survey. The response of n = 30 randomly selected resident customers
might appear as in Table 1.1 (note that the symbol n is used here and throughout
this course to represent the sample size i.e. the number of measurements in a
sample). You can see that each of the thirty measurements fall in one and only
one of the four possible categories representing the four quadrants of the city.
Table 1.1.

Customer
1
2
3
4
5
6
7
8
9
10

Customer resident Survey: n = 30

Resident
NW
SE
SE
NW
SW
NW
NE
SW
NW
SE

Customer
11
12
13
14
15
16
17
18
19
20

Residence
NW
SE
SW
NW
SW
NE
NE
NW
NW
SW

Customer
21
22
23
24
25
26
27
28
29
30

Residence
NE
NW
SW
SE
SW
NW
NW
SE
NE
SW

A natural and useful technique for summarizing qualitative data is to tabulate the
frequency or relative frequency of each category.
Definition:

The frequency for a category is the total number of measurements that fall in the
category. The frequency for a particular category, say category i will be denoted by the
symbol fi .
The relative frequency for a category is the frequency of that category divided by the
total number of measurements; that is, the relative frequency for category I is
Relative frequency =

fi
n

Where n = total number of measurements in the sample

fi = frequency for the i category.


The frequency for a category is the total number of measurements in that category,
whereas the relative frequency for a category is the proportion of measurements in the
category. Table 1.2 shows the frequency and relative frequency for the customer
residences listed in Table 1.1. Note that the sum of the frequencies should always equal
the total number of measurements in the sample and the sum of the relative frequencies
should always equal 1 (except for rounding errors) as in Table 1.2.
Category
NE
NW
SE
SW

Frequency
5
11
6
8

Total

30

Relative Frequency
5/30 = .167
11/30 = .367
6/30 = .200
8/30 = .267
1

A common means of graphically presenting the frequencies or relative frequencies for


qualitative data is the bar chart. For this type of chart, the frequencies (or relative
frequencie) are represented by bars-one bar for each category.
The height of the bar for a given category is proportional to the category frequency (or
relative frequency). Usually the bars are placed in a vertical position with the base of the
bar on the horizontal axis of the graph. The order of the bars on the horizontal axis is
unimportant. Both a frequency bar chart and a relative frequency bar chart for the
customers residence are shown in Figure 1.1.

10

Relative
Frequency

5
Frequency

0
NE

NW

SE

SW

Residential quadrant
a)

A frequency bar chart.

.50

.25

0
NE

NW

SE

SW

Residential Quadrant
b)

b)

A Relative Frequency bar chart.

Figure 1.1
The Pie Chart

The second method of describing qualitative data sets is the pie chart. This is
often used in newspaper and magazine articles to depict budgets and other
economic information. A complete circle (the pie) represents the total number of
measurements. This is partitioned into a number of slices with one slice for each
category. For example, since a complete circle spans 360o, if the relative
frequency for a category is .30, the slice assigned to that category is 30% of 360
or (.30) (36) = 108o.

108o

Figure 1.2 The portion of a pie char corresponding to a relative frequency of .3.

Graphical Methods for Describing Quantitative Data.


The Frequency Histogram and Polygon.
The histogram (often called a frequency distribution) is the most popular graphical
technique for depicting quantitative data. To introduce the histogram we will use thirty
companies selected randomly from the 1980 Financial Magazine (the top 500 companies
in sales for calendar year 1979). The variable X we will be interested in is the earnings
per share (E/S) for these thirty companies. The earnings per share is computed by
dividing the years net profit by the total number of share of common stock outstanding.
This figure is of interest to the economic community because it reflects the economic
health of the company.
The earnings per share figures for the thirty companies are shown (to the nearest ngwee)
in Table 1.3.

Company

E/S

Company

E/S`

Company

E/S

1
2
3
4
5
6
7
8
9
10

1.85
3.42
9.11
1.96
6.48
5.72
1.72
.8.56
0.72
6.28

11
12
13
14
15
16
17
18
19
20

2.80
3.46
8.32
4.62
3.27
1.35
3.28
3.75
5.23
2.92

21
22
23
24
25
26
27
28
29
30

2.75
6.58
3.54
4.65
0.75
2.01
5.36
4.40
6.49
1.12

How to construct a Histogram


1.

Arrange the data in increasing order, from smallest to largest measurement.

2.

Divide the interval from the smallest to the largest measurement into between five
and twenty equal sub-intervals, making sure that:
a)

Each measurement falls into one and only one measurement class.

b)

No measurement falls on a measurement class boundary.


Use a small number of measurement classes if you have a small amount of
data; use a larger number of classes for large amount of data.

3.

Compute the frequency


measurement class.

(or relative frequency) of measurements in each

4.

Using a vertical axis of about three-fourths the length of the horizontal axis, plot
each frequency (or relative frequency) as a rectangle over the corresponding
measurement class.
Using a number of measurements, n = 30, is not large, we will use six classes to
span the distance between the smallest measurements, 0.72, and the largest
measurement, 9.11. This distance divided by 6 is equal to
Largest measurement smallest measurement
Number of intervals

9.11 0.72
6
1.4

By locating the lower boundary of the first class interval at 0.715 (slightly below the
smallest measurement) and adding 1.4, we find the upper boundary to be 2.115. Adding

1.4 again, we find the upper boundary of the second class to be 3.515. Continuing this
process, we obtain the six class intervals shown in the table below. Note that each
boundary falls on a 0.005 value (one significant digit more than the measurement), which
guarantees that no measurement will fall on a class boundary.
The next step is to find the class frequency and calculate the class relative frequencies

Class
1
2
3
4
5
6

Measurement
Class
0.715 2.115
2.115 3.515
3.515 4.915
4.915 6.315
6.315 7.715
7.715 9.115
Total

Class
Frequency
8
7
5
4
3
3

Class relative
Frequency
8/30 = .267
7/30 = .233
5/30 = .167
4/30 = .133
3/30 = .100
3/30 = .100

30

1.00

Table 1.4

Definition
The class frequency for a given class, say class i, is equal to the total number of
measurements that fall in that class. The class frequency for class I is denoted by the
symbol f i .
Definition
The class relative frequency for a given class, say class i, is equal to the class frequency
divided by the total number n of measurements, i.e.
Relative frequency for class i =

fi
n

10

a)

0.517 2.115
Earnings per share
Frequency Histogram.

3.515 4.915

6.315 7.715

.3

.2

.1

0.715

(b)

2.115 3.515 4.915 6.315 7.715 9.115

Earnings per share


Relative Frequency histogram

Cumulative Frequency Distribution

11

9.115

It is often useful to know the number or the proportion of the total number of
measurements that are less than or equal to those contained in a particular class. These
quantities are called the class cumulative frequency and the class cumulative relative
frequency respectively.
For example, if the classes are numbered from the smallest to the largest values of x, 1, 2,
3, 4, . . . , then the cumulative frequency for the third class would equal the sum of the
class frequencies corresponding to classes 1, 2, and 3.
Cumulative frequency for class 3 f1 f 2 f 3
Similarly, cumulative relative frequency for class 3

f1 f 2 f 3
where n is the total
n

number of measurements in the sample.

Cumulative frequencies and cumulative relative frequencies for earning per share data.

Class No.

Measurement
class

Class
Frequency

Cumulative
frequency

Class Relative Class


Frequency
Cumulative
Relative
Frequency

0.715 - 2.115

8/30 = .267

8/30 =.267

2.115 3.515

(8 + 7) = 15

7/30 = .233

15/30 = .500

3.155 4.915

(15 + 5) = 20

5/30= .167

20/30 = .667

4.915 6.315

(20 + 4) = 24

4/30 = .133

24/30 = .800

6.315 7.715

(24 + 3) = 27

3/30 = .100

27/30 = .900

7.715 9.115

(27 + 3) = 30

3/100 = .100

30/30 = 1.00

30
Cumulative relative frequency Distribution for earnings per share data.

12

1.0
Cumulative
Relative
.8
Frequency
.6

.4

.2

0.715

2.115 3.115 4.915 6.315


Earnings per share

7.715

9.115

Learning Objective

After working through this Chapter you should be able to:

Draw a pie chart, bar chart and also construct frequency tables, relative
frequencies, and histogram.

Interpret the diagrams. You will understand the importance of captions, axis
labels and graduation of axes.

CHAPTER 3

13

DESCRIPTIVE MEASURES

Reading
Newbold Chapter 2
Wonnacott and Wonnacolt Chapter 2
Tailoka Frank P. Chapter 4
James T McClave , Lawrence Lapin L and P George Benson Chapter 3

Introductory Comments

This Chapter contains themes which allow one to easily se the most important
characteristics of data. The idea is to find simple numbers like the mean, variance which
will summarize those characteristics.

3.

Numerical Description of Data.


The Mode; A measure of Central tendency.
Definition.
The mode is the measure that occurs with the greatest frequency in the data set.
Because if emphasizes data concentration, the mode has application in marketing
as well as in description of large data sets collected by state and federal agencies.
Unless the data set is rather large, the mode may not be very meaningful. For
example, consider the earning per share measurements for the thirty financial
companies we used in the previous chapter. If you were to re-examine these data,
you would find that none of the thirty measurements is duplicated in this sample.
This, strictly speaking, all thirty measurements are mode for this sample.
Obviously, this information is of no practical use for data description. We can
calculate a more meaningful mode by constructing a relative frequency histogram
for the data. The interval containing the most measurements is called the modal
class and the mode is taken to be the midpoint of this class interval.

The modal class, the one corresponding to the interval 0.715 2.115 lies to the left side
of the distribution. The mode is the midpoint of this interval; that is

14

Mode =

0.715 2.115
1.415
2

In the sense that the mode measures data concentration, it provides a measure of central
tendency of the data.

The Arithmetic mean


A measurement of Central Tendency
The most popular and best understood measure of central Tendency for a quantitative
data set is the arithmetic (or simply the mean):

Definition
The mean of a set of quantitative data is equal to the sum of the measurements divided by
the number of measurement contained in the data set. The mean of a sample is denoted
by x (read x bar) and represent the formula for this calculation as follows:-

Example 1
Calculate the mean of the following five simple measures,. 5, 3, 8, 5,6.
Solution
Using the definition of the sample mean and demand shorthand notation we find
5

11

xi

5 3 8 5 6 27

5.4.
5
5

The mean of this sample is 5.4


The sample mean will play an important role in accomplishing our objective of making
inferences about populations based on sample information. For this reason it is important
to use a different symbol when we want to discuss the mean of a population of
measurement s i.e. the mean of the entire set of measurements in which we are interested.
We use the Greek letter (mu) for the population mean
The Median: Another measure of Central Tendency

15

The median of a data set is the number such that half the measurements fall below the
median and half fall above. The median is of most value in describing large data sets. If
the data set is characterized by a relative frequency histogram, the median is the point on
the x-axis such that half the area under the histogram lies above the median and half lies
below. For a small, or even a large but finite, number of measurements, there may be
many numbers that t satisfy the property indicated in the figure on the next page. For this
reason, we will arbitrarily calculate the media of a data.

Calculating a median
1.

If the number of n of measurements in a data set is odd, the median is the middle
number when the measurements are arranged in ascending (or descending) order.

2..

If the number of n of measurements is even, the median is the mean of the two
middle measurements when the measurements are arranged in ascending (or
descending) order.

Example 2
Consider the following sample of n = 7 measurements.
5, 7, 4, 5, 20, 6, 2
a)

Calculate the median of this sample

b)

Eliminate the last measurement (the 2) and calculate the median of the remaining
n = 6 measurements.

Solution
a)

The seven measurements in the sample are first arranged in ascending order
2, 4, 5, 5, 6, 7, 20
Since the number of measurements is odd, the median is the middle measure.
Thus, the median of this sample is 5.

b)

After removing the 2 from the set of measurements, we arrange the sample
measurements in ascending order as follows:

4, 5, 5, 6, 7, 20
Now the number of measurements is even, and so we average the middle two
measurements. The median is (5+6)/2 = 5.5.

16

Comparing the mean and the median


1.

If the median is less than the mean, the data set is skewed to the right.

Relative
Frequency

Median

Rightward Skewness
Skewness

2.

Mean

measurement units

Mean Mode
s tan dard deviation

3(mean median)
s tan dard deviation

The median will equal the mean when the data set is symmetric.

Median

Mean
Measurement unit

Symmetry

17

3.

If the median is greater than the mean, the data set is skewed to the left.

Mean

Median

The range: A measure of variability

Measures of Variation
Definition:
The range of a data. Set is equal to the largest measurement minus the smallest measure.
When dealing with grouped data, there are two procedures which are not adopted for
determining the range.
1.
2.

Range = class mark of highest class class mark of lowest class.


Range = upper class boundary of highest class lower class boundary of lowest
class.

Variance and Standard Deviation


The Sample Variance for a sample of n measurements is equal to the squared distances
from the mean divided by (n-1). In symbols using S 2 to represent the simple variances,

S2

(x x)
i 1

n 1

The second step in finding a meaningful measure of data variability is to calculate the
standard deviation of the data set.
18

The sample standard deviation , s, is defined as the positive square root of the sample
variance, S 2 thus,
n

S S2

(x x)
i 1

n 1

The corresponding quantity, the population standard deviation, measure the variability of
the measurements in the population and is denoted by (sigma). The population
variances will therefore be denoted by 2 .
Example 3

Calculate the standard deviation of the following sample. 2, 3, 3, 3, 4.

Solution
For this set of data, x 3. Then

(2 3) 2 (3 2) 2 (3 3) 2 (4 3) 2
5 1
2
0.5 0.71
4

Shortcut formular for simple variance


( sum of sample measurement )
( sum of square of sample measurement )
n
S2
n 1
n
x1
n
2
xi i 1

n
i 1
n 1

19

Example 4

Use the shortcut formula to compute the variances of these two samples of five measures
each.
Sample 1:

1, 2, 3, 4, 5

Sample 2:2, 3, 3, 3, 4

Solution
We first work with sample 1. The quantities needed are:
n

x
i 1

= 1 + 2 + 3 + 4 + 5 = 15,

x
i 1

2
1

and

12 22 32 42 52

1 4 9 16 25 55

5
xi
n
(15) 2
2
x1 i 1
55

5
5
S 2 i 1

5 1
4
55 45 10

2.5
4
4

Similarly, for sample 2 we get


5

x
i 1

= 2 + 3 + 3 + 3 + 4 = 15
5

Add

x
i 1

2
1

22 32 32 32 42 4 9 9 9 16 47

20

Then the variance for sample 2 is


2

5
xi
n
(15) 2
2
x1 i 1
47

5
5
S 2 i 1

5 1
4

47 45 2
0.5
4
4

Example 5
The earnings per share measurements for thirty companies selected randomly from 1980
Financial/Daily mail are listed here. Calculate the sample variance S 2 and the standard
deviation, S, from these measurements.
1.85
3.42
9.11
1.96
6.48

5.72
1.72
8.56
0.72
6.28

2.80
3.46
8.32
4.62
3.27

1.35
3.28
3.75
5.23
2.92

2.75
6.58
3.54
4.65
0.75

2.01
5.36
4.40
6.49
1.12

Solution

The calculation of the sample variance , S 2 , would be very tedious for this example if we
tried to use the formula,
30

S
2

(x
i 1

x) 2

30 1

because it would be necessary to compute all thirty squared distances from the mean.
However, for the shortcut formula we need only compute:

21

30

x
i 1

30

x
i 1

2
i

1.85 3.42 . . . 1.12 122.47 and

(1.85) 2 (3.42) 2 . .

(1.12) 2 6.57.5239

30
x1
30
(122.47) 2
i 1
2
x

657.5239
i 30
30
S 2 i 1

30 1
29
5.4331

Notice that we retained four decimal places in the calculation of S 2 to reduce rounding
errors, even though the original data were accurate to only two decimal places.

The standard deviation is

S S 2 5.4331 2.33

Interpreting the Standard Deviation


If we are comparing the variability of two samples selected from a population, the sample
with the larger standard deviation is the more variable of the two. Thus, we know how to
interpret the standard deviation on a relative or comparative basis, but we have not
explained how it provides a measure of variability for a single sample.
One way to interpret the standard deviation as a measure of variability of a data set would
be to answer questions each as the following. How many measurements are within 1
standard deviation of the mean? How many measurements are within 2 standard
deviation of the mean? For a specific data set, we can answer the questions by counting
the number of measurements in each of the intervals. However, if we are interested on
obtaining a general answer to these questions, the problem is more difficult. There are
two guidelines to help answer the questions of how many measurements fall within 1, 2,
and 3 standard deviations of the mean. The first set, which applied to any sample, is
derived from a theorem proved by the Russian Mathematician Chebyshev. The second
set, the Empirical Rule is based on empirical evidence that has accumulated over time
and applies to samples that posses mould shaped frequency distributions those that are
approximately symmetric, with a clustering of measurement about the mid point of the
22

distribution (the mean, median and mode should all be about the same) and that laid off
as we move away from the center of the histogram.

Aids to the Interpretation of a Standard deviation.


1.

2.

A rule (from Chebyshevs theorem) that applied to any sample of measure


regardless of the shape of the frequency distribution.
a.

It is possible that none of the measurements will fall within 1 standard


deviation of the means ( x S to x S ).

b.

At least of the measurement will fall within 2 standard deviations of the


mean ( x 2 S to x 2 S ).

c.

At least 8/9 of the measurements will fall within 3 standard deviations of


the mean ( x 3S to x 3S ).

A rule of thumb, called the empirical rule, that applies to samples with frequency
distributions that are mould-shaped:

a)

Approximately 68% of the measurements will fall within 1 standard


deviation of the mean ( x S to x S ).

b)

Approximately 95% of the measurements will fall within 2 standard


deviations of the mean ( x 2 S to x 2 S ).

c)

Essentially all the measurements will fall within 3 standard deviations of


the mean ( x 3S to x 3S ).

Example 6
Refer to the data for earnings per share for thirty companies selected randomly from the
1980 Financial/Daily Mail. x 4.08 , S 2.33 . Calculate the fraction of the thirty
measurements that lie within the intervals x S , x 2 S , and x 3S , and compare the
results with those of the Chebyshev and Empirical rule.

23

Solution
x S , x S ) (4.08 2.33, 4.08 2.33) (1.75, 6.41)

A check of the measurements show that 19 of the 30 measurements i.e., approximately


63% are within 1 standard deviation of the mean.
( x 2 S , x 2 S ) (4.08 4.66 , 4.08 4.66 ) (0.58, 8.74 )
Contains 29 measurements, or approximately 97% of the n = 30 measurements. Finally
the 3 standard deviation interval around x
( x 3S , x 3S ) (4.08 6.99 , 4.08 6.99 ) (2.91, 11 .07 ).

contains all the measurements. These 1, 2 and 3 standard deviations percentages (63, 97,
and 100) agree fairly well with the approximations of 68%, 95% and 100%, given by the
Empirical Rule for mould-shape distributions.

Example 7
The aid for interpreting the value of a standard deviation can be put to an immediate
practical use as a check on the calculation of the standard deviation. Suppose you have a
data set for which the smallest measurement is 20 and the largest is 80. You have
calculated the standard deviation of the data set to be S = 190.
How can you use the Chebyshev or empirical rule to provide a rough check on your
calculated value of S?

Solution
The larger the number of measurements in a data set, the greater will be the tendency for
very large or very small measurements (extreme values) to appear in the data set. But
from the Rules, you know that most of the measurements (approximately 95% if the
distribution is mould-shaped) will be within 2 standard deviations of the mean, and
regardless of how many measurements are in the data set, almost all of them will fall 3
standard deviations of the mean. Consequently we would expect the range to be between
4 and 6 standard deviations i.e. between 4s and 6s.

24

Range largest measurement smallest measurement = 80 20 = 20.

x 2S

x 2S

Range 4S

The relation between the range and the Standard deviation.

Then if we let the range equal 6S, we obtain


Range
60
S

=
=
=

6S
6S
10

Or, if we let the range equal 4S, we obtain a larger (and more conservative) value for S,
namely

Range =
60
=
S
=

4S
6S
15

Now you can see that it does not make much difference whether you let the range equal
4S (which is more realistic for most data set) or 6S (which is reasonable for large data
sets). It is clear than your calculated value, S = 190, is too large, and you should check
your calculations.

25

Calculating a mean and standard Deviation from Grouped data


If your data have been grouped in classes of equal width and arranged in a frequency
table, you can use the following formulas to calculate x , S2, and S
xi Midpoint of the ith class
f i = Frequency of the ith class
K = Number of classes
K

x f

i i

i 1

xi f i
K

x12 f i i 1

n
S 2 i 1
n 1

S S2

Example 8
Compute the mean and standard deviation for the earnings per share data using the
grouping shown in the frequency Table 1.4.
Solution
The six class interval, midpoints, and frequencies are shown in the accompanying table.
Table 1.4 Earnings per share
Class

Class Midpoint

0.715 2.115

1.415

Class frequency
fi
8

2.115 3.515

2.815

3.515 4.915

4.215

4.915 6.315

5.615

6.315 7.015

7.015

7.715 9.115

8.415

n fi 30

26

x f

i i

(1.415)(8) (2.815)(7) (4.215)(5) . . . (8.415)(3) / 30


n
120.85

4.03
30
i 1

xi f i
K

x12 f i i 1

n
S 2 i 1
n 1

We found

x f
i 1

i i

= 120.85 when we calculated x, therefore

((1.415) 2 (8) (2.815) 2 (7) . . . (8.415) 2 (3)) (120.85)3 / 30


30 1
646.49875 486.82408

29
5.5060
S2

S 5.5060 2.35.

You will notice that values of x, S 2 , and S from the formulas for grouped data usually do
not agree with these obtained for the raw data ( x 4.03 and S = 2.311). This is because
we have substituted the value of the class mid point for each value of x in a class
interval. Only when every value of a x in each class is equal to its respective class
midpoint will the formulas for grouped and for ungrouped data give exactly the same
answers for x, S 2 , and S. Otherwise, the formulas for grouped data will give only the
approximations to these numerical descriptive measures.

Measures of Relative Standing


Descriptive measures of the relationship of a measurement to the rest of the data are
called measure of relative standing.
One measure of relative standing of a particular measurement is its percentile

27

ranking.

Definition
Let x1 , x2 , . . . , xn be a set of n measurements arranged in increasing (or decreasing)
order. The pth percentile is a number x such that p% of the measurements fall below the
pth percentile and (100 p)% fall above it.
For example: if oil company A report that its yearly sales are in the 90th percentile of all
companies in the industry, the implication is that 90% of all oil companies have yearly
sales less that As, and only 10% have yearly sales exceeding company As.

Relative
Frequency

.90
.10
Company As sales. Yearly sales.
Another measure of relative standing in popular use is the Z-score. The Z-score makes
use of the mean and standard deviation of the data set in order to specify the location of a
measurement.
Definition
The sample Z-score for a measurement x is
Z

xx
S

The population Z-Score for a measurement x is

The Z-score represents the distance between a given measurement x and the mean
expressed in standard units.

28

Example 9
Suppose 200 steel workers are selected, and the annual income of each is determined.
The mean and standard deviation are x K14 ,000 , S K 2,000
Suppose Chipos annual income is K12, 000 what is his sample Z-score?

K8,000
x 3S

K12,000
x

K14,000
x

K20,000
x 3S

Annual income of steel workers.

Solution
Chipos annual income lies below the mean income of the 200 steel workers.
We compute Z

x x 12000 14000

1.0
S
2000

Which tells us that Chipos annual income is 1.0 standard deviation below the sample
mean, in short, his sample Z-score is 1.0.

Example 10
Suppose a female bank executive believes that her salary is low as a result of sex
discrimination. To try to substantiate her belief, she collects information on the salaries
of her counterparts in the banking business. She finds that their salaries have a mean of
K17, 000 and a standard deviation of K1, 000. Her salary is K13, 500. Does this
information support her claim of sex discrimination?

Solution
The analysis might proceed as follows: First, we calculate the Z-score for the womans
salary with respect to those of her male counterparts. Thus
Z

13500 17000
3.5
1000

29

The implication is that the womans salary is 3.5 standard deviations below the mean of
the male distribution. Furthermore, if a check of the male salary data shows that the
frequency distribution is mould-shaped, we can infer that very few salaries in this
distribution should have a Z-score less than 3, as shown in the figure.

Relative
Frequency

Z-Score = -3.5

13.500

17,000
Salary (K)

Male Salary Distribution


Therefore, a Z-score of 3.5 represents either a measurement from a distribution different
from the male salary distribution or a very unusual (highly improbable) measurement for
the male salary distribution.
Well, which of the two situations do you think prevails? Do you think the womans
salary is simply an usually low one in the distribution of salaries, or do you think her
claim of salary discrimination is justified? Most people would probably conclude that
her salary does not come from the male salary distribution.

However, the careful investigator should require more information before inferring sex
discrimination as the case. We would want to know more about the data collection
technique the woman used, and more about her competence at her job. Also perhaps
other factors like the length of employment should be considered in the analysis.

30

Learning Objectives
After working through this Chapter you should be able to

Calculate the arithmetic mean, standard deviation, variance, median, and


quartiles for grouped or ungrouped data.

Explain the use of all the above quartiles.

31

Sample Examination Questions


1.

(a)

(b)

Briefly state, with reasons, the type of chart which would best convey the
information for each of the following:
(i)

Students at the University classified by programme of study.

(ii)

Members of a professional association classified by age.

(iii)

Numbers of cars taxed for 2002, 2003 and 2004 in areas A, B and
C of a city.

The weekly cost (K) of rented accommodation was recorded for 100
students living in an area.

Amount in Thousand of
Kwachas
04
59
10 14
15 19
20 24
25 - 29

Frequency
3
17
24
31
19
6

(i)

Draw a histogram.

(ii)

Give the median and the interquartile range.

(iii)

Calculate the mean, mode, and standard deviation.

(iv)

What conclusions can you draw from the data?

32

2.

3.

The data below are per capita per week numbers of cigarettes sold for 38 states in
a country.
19.20

26.82

19.24

27.18

25.96

30.14

29.27

21.10

28.91

29.92

29.64

21.94

22.58

29.92

26.91

43.40

30.18

23.86

28.56

24.75

24.32

24.78

22.17

20.96

27.38

24.44

26.89

41.46

21.08

23.57

15.80

32.10

24.44

29.04

31.34

29.60

23.12

17.08

(a)

Plot the data using an approximate graphical method.

(b)

Give the mean, the median and the mode.

(c)

Assuming this is a normal distribution, and given a standard deviation of


these figures of 4.387, what proportion of the states would expect to have
more than 20 cigarettes smoked per capita per week?

(d)

How does this compare with the actual situation as shown in the table
above?

(a)

Briefly state, with reasons, the type of chart which would best convey in
each of the following:

(b)

(i)

A countrys total import of cigarettes by source.

(ii)

Students in higher education classified by age.

(iii)

Number of students registered for secondary school in year 2001,


2002 and 2003 for areas X, Y, and Z of a country.

The weekly cost (K000) of rented accommodation was recorded for 40


students living in an area.
35

56

33

30

31

55

29

27

21

32

43

33

29

27

30

29

26

26

27

26

35

32

28

27

31

27

33

24

27

28

33

49

22

19

46

36

26

38

36

55

33

4. (a)

(i)

Summarize the data in a frequency distribution table.

(ii)

Calculate the mean and the standard deviation from your frequency
table.

(iii)

Plot a histogram for these data. What is the value of the median?

(iv)

What conclusions can you draw from these data?

Given below is a sample of 25 observations, calculate:


(i)

The range

(ii) The arithmetic mean

(iii)

The median

(iv)

The lower quartile

(v)

The upper quartile

(vi)

The quartile deviation

(vii)

The mean deviation

(viii)

The standard deviation

18

29

42

50

61

20

33

43

54

63

10

21

35

46

56

67

11

25

39

48

58

69

14

(b)

Explain the term measure of dispersion and state briefly the advantage and
disadvantage of using the following measures of dispersion:
(i)

Range

(ii)

Mean deviation

(iii)

Standard deviation

34

5.

A machine produces the following number of rejects in each successive period of


five minutes.
20
84
16
26
27

55
58
25
42
42

58
7
55
57
13

40
40
43
73
28

15
41
22
27
24

28
67
66
66
37

21
28
32
7
34

29
19
29
23
27

30
26
11
17
24

17
26
21
35
12

(a)

Construct a frequency distribution from these data, using seven class


intervals of equal width.

(b)

Using the frequency distribution, calculate:


(i)
(ii)

(c)

the mean
the standard deviation

Briefly explain the meaning of your calculated measures.

35

CHAPTER 4

PROBABILITY
Reading
Newbold Chapter 3
Tailoka Frank P Chapter 8
Wonnacott and Wonnacolt Chapter 3

Introductory Comments
Probability is more abstract than other parts of this subject, and solving the problems may
be difficult. The concepts are very important for statistics because it is the rules of
probability that allow one to reason about uncertainty. Independence and conditional
probability are important to understand clearly for the purpose of statistical investigation.

4.

Elementary Probability
Counting Techniques. Introduction of the probability concept. The event and the
event relationships. Probability trees, conditional probability and statistical
independence.
Counting techniques: In calculating probabilities, it is essential to be able to work
out n(s) and n(E) as straight-forwardly as possible.
Permutations and
combinations are very helpful here. We begin with the following basic principle.
Fundamental principle of counting. If two operations A, B are carried out, and
there are M different ways of carrying out A and k different ways of carrying out
B, then the combined A and B may be carried out in M x K different ways.
Example 1
Suppose a license plate contains two distinct letters followed by three digits with
the first digit not zero. How many different license places can be printed?

36

The first letter can be printed in 26 different ways, the second letter in 25 different ways
(since the letter printed first cannot be chosen for a second letter, the first digit in 9 ways
and each of the other two digits in 10 ways. Hence
26.25.9.10.10 = 585,000
Different plates can be printed.

Example 2.
A toy manufacturer makes a wooden toy in two parts, the top part may be coloured red,
white or blue and the bottom part brown, orange, yellow or green. How many differently
coloured toys can be produced?

A red top part may be combined with a bottom part of any of the four possible colours.
Similarly, either a white or a blue top part may be combined with each of the four
different coloured parts. Hence the number of different coloured toys is

3 4 12

Permutations; An arrangement of a set of n objects in a given order is called a


permutation of the objects (taken all at a time). An arrangement of any r n of these
objects in a given order is called an r-permutation or a permutation of the objects taken r
at a time.

Example 3
Consider the set of letters a, b, c and d. Then
i)

bdca, dcba and acdb are permutations of the 4 letters (taken all at a time).

ii)

bad, adb and bca are permutations of 4 letters taken 3 at a time.

iii)

ad, ca, da and bd are permutations of the 4 letters taken 2 at a time.

37

Example 4
The telephone switchboard in the company requires two operators whose chairs
(positions) are side by side. When the telephone operators go to lunch, two of the four
Secretaries take their places. If we make a distinction between the two operatorss
positions, in how may ways can the four secretaries fill them?
We can answer this question by determining the number of possible permutations of 4
things taken 2 at a time. There are 4 secretaries, A, B, C and D, to fill the first position.
Once this position has been filled, there are only 3 secretaries to fill the second positions.
The figure below
Ways to fill
First position

Ways to fill second


position

Counting the number of


permutations

10

11

38

12

The tree diagram on the page illustrates that there are 4.3 = 12 possible permutations of
four things taken two at a time. Suppose that n is the number of distinct objects from
which an ordered arrangement is to be derived, and r is the number of objects in the
arrangement. The number of possible ordered arrangements is the number of
permutations of things taken r at a time. This is written symbolically as P(n, r ) in
general, or n Pr .
P(n, r ) n(n 1)(n 2). . . (n r 1)

(1)

We multiply the right-hand side of equation (1) by


(n r )!/(n r )!
This is equivalent to multiplying by 1, we obtain
P(n, r ) n(n 1)(n 2). . . (n r 1)

(n 1)!
(n r )!

n(n 1)(n 2) . . . (n r 1)(n r )!


(n r )!
n!

(n 1)!

Example 5
i)

In a stock room, 5 adjacent bins are available for storing 5 different items. The
stock of each item can be stored satisfactorily in any bin. In how many ways can
we assign the 5 items to the 5 bins?

We get the answer by evaluating P(5, 5) which is


P(5,5)

ii)

5!
5.4.3.2.1 120
(5 5)!

Suppose that there are 6 different parts to be stocked, but only 4 bins are
available.

To find the number of possible arrangements, we need to determine the number of


permutations of 6 things taken 4 at a time, which is
39

P(6,4)

6!
6.5.4.3.2.1

360
(6 4)!
2!

Example 6
How many permutation are there of 3 objects, say, a, b and c?
There are P (3,3)

3!
3! 1.2.3 6 such permutations.
(3 3)!

These are abc, acb, bac, bca, cab, cba.

Permutation with repetitions:


The number of permutations of n objects of which n1 are alike, n2 are alike of another
kind . . . . nr are alike of a further kind, is given by
n!
n1!n2!. . . n !
where n n1 n2 . . . nr

Example 7
Find the number of permutation of the word ACCOUNTANTS
Total number of letters in ACCOUNTANTS is 11 out of which there are two Cs, two
Ns, and two ts. So the required number of permutation s

11!
2494800.
2!2!2!2!

Combinations
A combination is an arrangement of objects without regard to order.

40

Example 8
The combinations of the letters a, b, c, d taken 3 at a time are
{a, b, c}, {a, b, d}, (a, c, d}, (b, c, d} or simply
abc, abd, acd, bcd, . Observe that the following combinations are equal.
abc, acb, bac, bca, cab, cba.
That is, each denotes the same set a, b, c

The number of combinations of n objectives taken r at a time will be denoted by


C (n, r ) or nCr .
Example 9
We determine the number of combinations of the four letters, a, b, c, d taken 3 at a time.
Note that each combination consisting of three letters determine 3! = 6 permutations of
the letters in the combination.

Combinations

Permutations

abc

abc, acb, bac, bca, cab cba

abd

abd, adb, bad, bda, dab, dba

acd

acd, adc, cad, cda, dac, dca

bcd

bcd, bdc, cbd, cbd, dbc, dcb

Thus the number of combinations multiplied by 3! Equals the number of permutations

41

C (4,3).3! P(4,3)orC (4,3)

P(4,3)
3!

Now P(4,3) 4.3.2 24 and 3! 6; henceC(4,3) 4 as noted above.


Thus C (n, r )

n!
r!(n r )!

Example 10

A perfume manufacturer who makes 10 fragrances wants to prepare a gift package


containing 6 fragrances. How many combinations of fragrances are available?
The answer is
C (10 ,6)

10!
10 .9.8.7.6!

210
6!(10 6) 6!.4.3.2.1

Tree Diagrams
A tree diagram is a device used to enumerate all the possible outcomes of a sequence of
experiments where each experiment can occur in a finite number of ways. The
construction of tree diagrams is illustrated in the following examples.
Example 11
Find the product A x B x C where
A = {1, 2}, B{a, b, c} and C = {3, 4}. The tree diagram follows:
3

(1, a, 3)

(1, a, 4)

(1, b, 3)

(1, b, 4)

(1, c, 3)

42

(1, c, 4)

(2, a, 3)

(2, a, 4)

(2, b, 3)

(2, b, 4)

(2, c, 3)

(2, c, 4)

Observe that the tree is constructed from left to right, and that the number of branches at
each prints corresponds to the number of possible outcomes of the next experiment.

Example 12
Mumba and Ened are to play a tennis tournament. The first person to win two games in a
row or who wins a total of three games wins the tournament. The following diagram
shows the possible outcomes of the tournament.

M
M
M

E
E

M
E

M
M
E
E

43

Observe that there are 10 end points which correspond to the 10 possible outcomes of the
tournament.
MM, MEMM, MEMEM, MEMEE, MEE, EMM, EMEMM, EMEME, EMEE, EE

The path from the beginning of the tree to the end point indicates who won which game
in the individual tournament.

Basic Of Probability

Given a sample spaces S, we need to assign to each event that can be obtained from S a
number, called the probability of the event. This number will indicate the relative
likelihood of the various events.
For events that are equally likely, the probability of the event can be found from the
following basic probability principle. Then the probability that event E occurs, written P
(E), is
P(E) = m

(1)

n
This same result can also be given in terms of the cardinal number of a set. Where n (E)
represents the number of elements in a finite set E. With the same assumptions given
above,
P(E) = n(E) .

(2)

n(S)

44

Example 1
Suppose a fair coin is tossed twice. The sample space is S = (HH), (HT), (TH), (TT).
Set S contains 4 outcomes, all of which are equally likely. (This makes n = 4 in the
formula (1) above.) Find the probability of the following outcomes.
a)

E = (HT), (TH)
Event E contains two elements, so
P (E) = 2 = 1
4

By this result, a head or tail will show up 1/2 of the time when a fair coin is tossed
twice.
b)

Two heads
Let event F = (HH) be the event two heads are observed when a fair coin is
tossed twice. Event F contains one element, so

P (F) =
c)

Three heads
A fair coin tossed twice can never show three heads. If G is the event, then G =
, and P (G) =

0
= 0.
4

The event is impossible.

45

Example 2
If a single paying card is drawn at random from an ordinary 52-card bridge deck,
find the probability of each of the following events.
a)

An ace is drawn
There are four aces on the deck, out of 52 cards, so
P(ace) =

b)

4
1

52 13

A face card is drawn


Since there are 12 face cards
P (face card) =

c)

12 3

52 13

A spade is drawn
The deck contains 13 spaces, so
P (spade) =

d)

13 1

54 4

A spade or heart is drawn


Besides the 13 spades, the deck contains 13 hearts, so
P (spade or heart) =

26 1

52 2

46

Example 3
The Manager of a department store has decided to make a study on the size of purchases
made by people coming into the store. To begin he chooses a day that seems fairly
typical and gathers the following data. (Purchases have been rounded to the nearest
Kwacha) with sales tax ignored.
Amount of purchase

Number of customer

Probability (relative
frequency)

K0 and under

160

0.280

K2250 and under


K11250
K11250 and under

84

0.147

50

0.088

and under

136

0.239

and under

77

0.135

63

0.111

570

1.000

K13500
K13500
K20250
K20250
K22500
K22500 and over

Probability Distributions.

In Example 3 the outcomes were various purchase amounts, and a probability was
assigned to each outcome. By this process, a probability distribution can be set up; that is
to each possible outcome of an experiment, a number, called the probability of that
outcome, is assigned.

47

Example 4
Set up a probability distribution for the number of heads observed when a fair coin is
tossed twice.
_______________________________________
Number of heads
Probability
_______________________________________
0
1
4
1
2
4
2
1
4
_________
Total
1
_______________________________________

The probability distribution that was set up suggests the following properties of
probability.
Let S = S1, S2, S3, , Sn be the sample space obtained from the union of n distinct
simple events S1 , (S2 , S3 ,, Sn with associated probabilities P1, P2, P3, ,
Pn. Then

1.

0 P1 1, 0 P2 1, , 0 Pn 1
(All probabilities are between 0 and 1 inclusive);

2.

P1 + P2 + P3 + + Pn = 1;
(The sum of all probabilities for a sample space is 1.);

3.

P (S) = 1

4.

P() = 0

48

Addition Principle
Suppose E S1 , S 2 S n , where S1 , S 2 , S n are distinct simple events then
P (E) = P( S1 ) + P( S2 ) + ... + P ( Sn )

Example 5
Refer to the previous Example and find the probability that a customer spends at least
K11, 250 but less than K20250.
This event is union of two simple events spending K11, 250 to K20, 250. The probability
of spending at least K11, 250 but less than K20, 250 can thus be found by the addition
principle. Let this event A, then
P (A ) = P(Spending K11250 K13500) + P(spending K13500 -K20250)

Addition for Mutually Exclusive Events .

For mutually exclusive events E and F

P (EUF) = P(E) + P(F)


Example 6
Use the probability distribution of Example 5 to find the probability that we get at least
one head on tossing a fair twice.
Event E At least one head is the union of three mutually exclusive events, two
heads, one head one tail and one tail one head.
P(E) = P(2 heads) + 2P(one head one tail)
=

1 2 3

4 4 4

Complement: P(E ') = 1 - P(E' ) and P(E) = 1 - P(E)

49

In a particular experiment, P(E)

P(E') = 1 - P(E) = 1

3
.
8

Find P(E')

3 5
.
8 8

Example 7
In example 3 above, find the probability that a customer spends less than K22500. Let E
to be the event a customer spends less than K22500.
P(E) = 0.281 + 0.147 + 0.088 + 0.2394 + 0.135 = 0.889
Alternatively E' is the event that a customer spends K22500 and over from the table.
P(E') = 0.111, and 1-P( E ) = P(E) = 1 - 0.111 = 0.889

Odds
The Odds in favor of an event E is defined as the ratio of P(E) to P(E') , or P(E)
P(E')

Example 8
Suppose the weather forecaster says that the probability of rain tomorrow is

2
. Find
5

the odds in favor of rain tomorrow.


Let E be the event rain tomorrow. Then E is the event no rain tomorrow. Since
P(E)

2
5

3
We have P( E ) = . By the definition of odds, odds in favor of rain
5
3 or 3:2

= 2/5 written 2 to
3/5 .

50

In general, if the odds favoring event E are m to n, then

P(E) =

m
m
and P( E ) =
mn
mn

Example 9
The odds that a particular bid will be the low bid are 8 to 13. Find the probability that the
bid will be the low bid.

Solution
Odds of 8 to 13 show 8 favorable chances out of 8 + 13 = 21 chances altogether.

P (bid will be low bid) =

There is a

8
8

8 13 21

13
chance that the bid will not be the low bid
21

Extended Addition Principle


For any two events, E and F form a sample space S,
P(EUF) = P(E) + P(F) - (E F)

51

Example 10.

If a single card is drawn from an ordinary deck, find the probability that it will be red or a
face card.
Let R and F represent the events red and face card respectively. Then

P(R) =

26
12
6
, P(F) =
, and P (R F) =
52
52
52

(There are six red face cards in a deck) By the extended addition principle,

P(R F) = P(R) + P(F) - P(R F)

= 26 + 12 - 6 = 32 = 8
52

52

52

52

13

Example 11
Suppose two fair dice care rolled. Find each of the following probabilities.

a)

The first die show a 2 or the sum is 6

(1,1)

(2,1)

(3,1)

(4,1)

(5,1)

(6,1)

(1,2)

(2,2)

(3,2)

(4,2)

(5,2)

(6,2)

(1,3)

(2,3)

(3,3)

(4,3)

(5,3)

(6,3)

(1,4)

(2,4)

(3,4)

4,4)

(5,4)

(6,4)

(1,5)

(2,5)

(3,5)

(4,5)

(5,5)

(6,5)

(1,6)

(2,6)

(3,6)

(4,6)

(5,6)

(6,6)

52

P(A) =

6
5
1
, P(B) =
, P(An B) =
36
36
36

By the extended addition principle


P(AB) = P(A) + P(B) P(A B)

=
b)

6
5
1 10 5

36 36 36 36 18

The sum is 5 or the second die is 4.


P(sum is 5) =

4
6
, P(second die is 4) =
36
36

P(sum is 5 and second die is 4) =

1
36

= 9 = 1
36

Often we are interested in how certain events are related to the occurrence of
other events. In particular, we may be interested in the probability of the
occurrence of an event given that another related event has occurred. Such
probabilities are referred to as Conditional Probabilities.
The conditional Probability of event E given event F, written P(EF), is
P(EF) = P(E F), P(F) 0
P(F)

53

Example 11

The Training Manager for a large stockbrokerage firm has noticed that
some of the of firms brokers use the firms research advice, while other
brokers tend to go with their own feelings of which stocks will go up. To
see if the research department is better than just the feelings of the brokers,
the manager conducted a survey of 100 brokers, with results as shown in
the following table.
Didnt pick stocks

Picked stocks

Total

That went up
That went up
15

Used research

30

45

Didnt use research

30

25

55

Totals

60

40

100

Letting A represent the event picked stocks that went up, and letting B represent the
event used research, we can find the following probabilities.
P(A) =

60
= 0.6
100

P(A') =

40
= 0.4
100

P(B) =

45
= 0.45
100

P(B') =

55
= 0.55
100

Suppose we want to find the probability that a broker using research will pick stocks that
go up. From the table above, of the 45 brokers who use research, 30 picked stocks that
went up, with
P(broker who uses research picks stocks that go up)
= 30 = 0.667.
45
This is a different number than the probability that a broker picks stocks that go up, 0.6,
since we have additional information (the broker uses research) which reduced the

54

sample space. In other words, we found the probability that a broker picks stocks that go
up, A, given the additional information that the broker uses research, B. This is called the
conditional probability of event A, given that event B has occurred, written P(A/B). In
the example above,
P(AB) = P(A B)
P(B)
= 30 = 0.667.
45
Product Rule: For any events E and F
P(EF) = P(F). P(E/F)

Example 12.

A class is

2
3
women and men . Of the women, 25% are business majors. Find the
5
5

probability that a student chosen at random is a woman business major.

Solution
Let B and W represent the events business major and woman, respectively. We want
to find P(B W) . By the product rule,
P(B W) = P(W). P(BW)
Using the given information, P(W) =

2
5

= 0.4 and P(BW) = 0.25.

Thus P(B W) = 0.4(0.25) = 0.10

Example 13

Suppose an investment firm is interested in the following events:


A = Common stock in XYZ Corporation gains 10% next year

55

B = Gross National Product gains 10% next year

The firm has assigned the following probabilities on the basis of available information.

P(AB) = 0.8, P(B) = 0.3

That is, the Investment Company believes the probability is 0.8 that the XYZ common
stock will gain 10% in the next year assuming that the GNP gains 10% in the same time
period. In addition, the company believes the probability is only 0.3 that the GNP will
gain 10% in the next year. Use the formula for calculating the probability of an
intersection to calculate the probability that XYZ common stock and the GNP gain 10%
in the next year.
Solution.

We want to calculate P(AB). The formula is


P(AB) = P(B) P(AB) = (0.3) (0.8) = 0.24

Thus, the probability, according to this investment firm, is 0.24 that both XYZ common
stock and the GNP will gain 10% in the next year.
In the previous section we showed that the probability of an event A may be substantially
altered by the assumption that the event B has occurred. However, this will not always
be the case. In some instances the assumption that event B has occurred will not alter the
probability of event A at all. When this is true, we call events A and B independent.

Events A and B are independent if the assumptions that B has occurred does
not alter the probability that A has occurred, i.e
P(AB) = P(A)

When events A and B are independent it will also be true that

P(BA) = P(B)

56

Events that are not independent are said to be dependent.

Example 14

The probability that interest rates will rise has been assessed as 0.8. If they do rise, the
probability that the stock market index will drop is estimated to be 0.9. If the interest
rates do not rise, the probability that the stock market index will still drop is estimated as
0.4. What is the probability that the stock market index will drop?
Solution
P(A) = P(Interest rates rise) = 0.8.
P(B) = P(Stock market index drops) = ?
Then, the probability of A , the complement of A, interest rates do not rise is P( A ) =
1 0.8 = 0.2.

P(BA) = P(stock market index dropsinterest rates rise) = 0.9


P(B A ) = P(stock market index dropsinterest rates do not rise) = 0.4.

By the multiplication rule

P(B and A) = P(A) P(BA) = 0.8 x 0.9 = 0.72 and


P(B and A ) = P( A ) P(B A ) = 0.2 x 0.4 = 0.08 = 0.80

Example 15
Suppose we toss a fair die, let B be the event observe a number less or equal to 4 and A to
be the event an even number is observed. Are event A and B independent?

P(B) =

4 2
, since B = { 1, 2, 3, 4}
6 3

P(A) =

3 1

since A = 2, 4,
6 2

57

P(A B) =

2 1
where A B = 2, 4
6 3

Now given A has occurred

P(BA) = P(AU B) = 1/3 = 2 = P(B)


P(A)
Similarly P(AB)

P( A B)

P( A B) 1 / 3 2

P( B)
P( A)
1/ 2 3

P( A B) 1 / 3 1

P( A)
P( B)
1/ 2 2

Therefore the events A and B are independent.


If events A and B are independent, the probability of intersection of A and B equals the
product of the probabilities of A and B, i.e,

P(A B) = P(A) P(B).

In the toss experiment

P(AB) = P(A). P(B) =

1 2 1
.
2 3 3

58

Bayes Theorem

A posteriori Probabilities
Suppose three machines, A, B, and C, produce similar engine components. Machine A
produces 45 percent of the total components, machine B produces 30 percent, and
Machine C, 25 percent. For the usual production schedule, 6 percent of the components
produced by machine A do not meet established specifications; for machine B of machine
C, the corresponding figures are 4 percent and 3 percent. One component is selected at
random from the total output and is found to be defective. What is the probability that
the component selected was produced by machine A?

The answer to this question is found by calculating the probability after the outcomes of
the experiment have been observed.

Such probabilities are called a posteriori

probabilities as opposed to a prior probabilities probabilities that give the likelihood


that an event will occur.

C
A

B
D

A D

BD

CD

D is the event that a defective component is produced by machine A, machine B or


machine C.

59

The three mutually exclusive events A, B and C form a partition of the sample spaces.
Apart from being mutually exclusive, their union is precisely S.
The event D may be expressed as:
1.

D ( A D) ( B D) (C D)

2.

The event that a component is defective and is produced by machine A is given

by

A D.

Thus, a posterior probability that a defective component selected was produced by


machine a is given by P ( A / D )

n( A D )
n( D )

P( A D)
P( D)
P(( A D)

P( A D) P( B D)P(C D)
P( A / D)

(1)

Next, using the product rule, we may express

P( A D) P( A) P( D / A)
P( B D) P( B)P( D / B), and
P(C D) P(C ) P( D / C )

so that (1) may be expressed in the form

P( A / D)

P( A) P( D / A)
P( A) P( D / A) P( B) P( D / B)P(C ) P( D / C )

(2)

which is a special case of a result known as Bayes Theorem.


Observe that the expression on the right of (2) involves the probabilities P(A), P(B), P(C)
and the conditional probabilities P(D/A),P(D/B), and P(D/C), all of which may be
60

calculated in the usual fashion. Infact, by displaying these quantities on a tree diagram,
we obtain Figure 1.0. We may compute the required probability by substituting the
relevant quantities into (2), or we may make use of the following device.
P(A/D) = Product of probabilities along the limb through A
Sum of products of the probabilities along each limb terminating at D

Step 1
outcome
Machine

Step 2

Probability

Condition

P( A) 0.45
A
P( D A) P( A).P( D / A)

P( D / A) 0.06

D
P ( D / A) 0.94

P( B) 0.30

= 0.027

P ( D A). P ( D / A) =

0.423

P( D / B) 0.04

P( D B) P( B).P( D / B)
= 0.012

P(C ) 0.25
P ( D / B) 0.96

P( D B) P ( B ). P( D / B )
=0.288

P( D / C ) 0.03

of

P( D C ) P(C ).P( D / C )
= 0.0075

P( D / C ) 0.97

P( D C ) P (C.). P( D / C ).
=0.2425

In either case, we obtain

61

P( A / D)

(0.45)(0.06)
(0.45)(0.06) (0.3)(0.04) (0.25)(0.03)

0.027
0.027 0.012 0.0075

0.027
0.581
0.0465

Before looking at any further examples, let us state the general form of Bayes Theorem.
Let A1 , A2 , . . . , An be a partition of a sample space S and let E be an event of the
experiment such that P( E) 0. Then the posterior probability P( Ai / E )(1 i n) is
given by
P( Ai / E )

P( A1 ) P( E / A1 )
P( A1 ). P( E / A1 )P( E / A2 )P( A2 ) . . . P( An ). P( E / An )

(3)

Problems

1)

In a certain city, 40 percent of the people consider themselves movement for


multiparty democracy (MMD), 35 percent consider themselves to be United Party
for Nation Development (UPND) and 25 percent consider themselves to be
independents (1). During a particular election, 45 percent of the MMDs voted, 40
percent of the UPND voted and 60 percent of the independents voted. Suppose a
person is randomly selected:
a)
b)

Find the probability that the person voted.


If the person voted, find the probability that the voter is
i)
ii)
iii)

2)

MMD
UPND
Independent.

Three girls Chanda, Mumba and Chileshe, pack okra in a factory. From the batch
allotted to them Chanda packs 55%, Mumba, 30% and Chileshe 15%. The
probability that Chanda breaks some okra in a packet is 0.7, and the respective
probabilities for Mumba and Chileshe are 0.2 and 0.1. What is the probability
that a packet with broken okra found by the Checker was packed by
a)

Chanda?
62

b)
c)

3)

Mumba?
Chileshe?

A publisher sends advertising material for an accounting text to 80% of all


Professors teaching the appropriate Accounting Courses. Thirty percent of the
Professors who received this material adopted the books, as did 10% of the
professors who did not receive the material. What is the probability that a
Professor who adopts the book has received the advertising material?

Solutions

MMD
UPND
Independent
P( MMD) 0.40
P(UPND) 0.35
P( I ) 0.25
P(V / MMD) 0.45 P(V / UPND) 0.40
P(V / I ) 0.60
a)

P(V ) P(MMD).P(V / MMD) P(UPND).P(V / UPND) P( I ) P(V / I )


= .40(.45) + .35(.40) + .25(.60)
= 0.18 + 0.14 + 0.15 = 0.47

b)

i)

ii)

iii)

P(M / V )
P (V )
P ( M ). P (V / M )

P ( M ). P (V / M ) P (U ). P(V / U ) P ( I ). P (V / I )
0.18

0.383
0.47
P(M / V )

P (U V )
P (V )
P (U ). P (V / U )

P (V )
0.14

0.298
0.47
P (U / V )

P( I / V )

0.15
0.319
0.47

63

2.

Chanda,
(D)

Mumba
(M)

Chileshe
(H)

P ( D) .55,
P ( B / D ) 0. 7,

P ( M ) .30
P ( B / M ) 0.2,

P ( H ) .15
P ( B / H ) 0. 1

P ( B ) P ( D ).P ( B / D ) P ( M ).P ( B / M ) P ( H ).P ( B / H )


.55(0.7) .30(0.2) .15(0.1)
0.385 0.06 0.015 0.46
P( D). P( B / D) 0.385
P( D / B)

0.837
a)
P( B)
0.46

3.

b)

P( M / B)

P( M ). P( B / M ) 0.06

0.1304
P( B)
0.46

c)

P( H / B)

P( H ).P( B / H ) 0.015

0.0326
P( B)
0.46

Let R be the event the Professor received material. A be the even the Professor a
adopted the book
P(R).P(A/R)

P(A/R) = 0.30

P( A /R) = 0.10

P(R) = 0.8

P(A/ R ) = 0.10
P( R ) = 0.2

P( A / R ) = 0.90

64

P( R / A)

P( R A)
P( R ).P ( A / R )

P( A)
P( R).P( A / R) P( R).P ( A / R)

0.8(0.30)
0.8(0.30) 0.2(0.10)

0.24
0.24

0.24 0.02 0.26

0.923.

Learning Objectives
After working through this Chapter, you should be able to

List the rules of probability.

Explain conditional probability, independent events, mutually exclusive events.

Apply the Bayes Theorem to find conditional probabilities

Define combinations, permutation and be able to apply such results to problems.

65

CHAPTER 5
PROBABILITY DISTRIBUTION

Reading
Newbold Chapters 4 (not 4.4) and only 5.5 in Chapter 5
Wonnacott and Wonnacott Chapter 4
Tailoka Frank P Chapter 9

Introductory Comments
This Chapter introduces the three useful standard distributions for two counts (Discrete
Probability distribution) and one for (Continuous probability Distribution). These are so
often used that everyone should be familiar with them. We need to know the mean, the
variance and how to find simple probabilities.

5.0

Discrete Random Variables

A random variable maybe defined roughly as a variable that takes on different


numerical values because of chance. Random variables are classified as either
discrete or continuous. A discrete random variable is one that can take on only a
finite or countable number of distinct values. For example, the number of people
entering a shop is finite the values are 0, 1, 2, etc., the outcomes on 1 roll of a fair
die are limited to 1, 2, 3, 4, 5 and 6.

A random variable is said to be continuous in a given range if the variable can


assume any value in any given interval. A continuous variable can be be
measured with any degree of accuracy by using smaller and smaller units of
measurements. Examples of continuous variables include weight, length,
velocity, distance, time, and temperature. While discrete variables can be
counted, continuous variable can be measured with some degree of accuracy.

A probability distribution of a discrete random variable x whose value at x is


f (x) possess the following properties.

66

1.

f ( x) 0 for all real values of x

2.

f ( x) 1
x

Property 1: simply states that probabilities are greater than or equal to zero. The
second property states that the sum of the probabilities in a probability
distribution is equal to 1. The notation

f ( x)

means sum of the values f for all the values that x takes on. We will

ordinarily use the term probability distribution to refer to both discrete and
continuous variables; other terms are sometimes used to refer to probability
distributions (also called probability functions).
Probability distributions of discrete random variables are often referred to as
probability mass functions or simply mass functions because the probabilities are
massed at distinct points, for example along the x axis.
Probability distributions of continuous random variables are referred to as
probability density functions or density functions.

5.1

Cumulative Distribution Functions


Given a random variable x , the values of the cumulative distribution function at
x , denoted F (x) , is the probability that x takes on values less than or equal to
x . Hence
f ( x) p ( x) ( x )

(1)

In the case of a discrete random variable, it is clear that

f (c ) f ( x )

(2)

xc

The symbol

f ( x)
x c

Means sum of the values of x for all values of x less than or equal to c.

67

Example 1
Shoprite is interested in diversifying its product line into the soft goods market.
Mr Phiri, Vice president in charge of mergers and acquisitions, is negotiating the
acquisition of quick-save, a discount shop. The determine the price Shoprite
would have to pay per share for quick save, she sets up the probability distribution
for the stock price shown in the table below.
Probability distribution and cumulative distribution for the price of Quick
save common stock.

Price of Quicksave
Common stock x
K74 250
76 500
78 750
81 000
83 250

Probability
f x
0.08
0.15
0.53
0.20
0.04

Cumulative Probability
F x
0.08
0.23
0.76
0.96
1.00

The probability that the price would be K78 750 or less is


P( x K 78 750 ) F ( K 78750 ) 0.08 0.15 0.53 0.76
P( x K 76 500 ) F ( K 76 500 ) 0.23

68

A graph of the cumulative distribution function is a step function that is the values
change in discrete steps at the indicated integral values of the random variable x.

F (x)

1.00

0.80

0.60

0.40

0.20

0.00
K74 250

76 500

78 750

81 000

83 250

Price of stock
Graph of cumulative distribution of the price of Quicksave common
stocks.

5.2

Probability Distribution of Discrete Random Variables


We will discuss the binomial and Poisson probability distribution of discrete
random variables.

E ( x) xP( x)
All x
The variance of discrete random variable x is

69

2 E ( x ) 2 ( x ) 2 p ( x)
All x
In general, if g(x) is any function of the discrete random variable x, then

E[ g ( x)] g ( x) P( X x)
All x
For example

E (20 x) 20 xP( X x)
E ( x 2 ) x 2 P( X x)

E ( X 5) ( x 5) P( X x)

Example 2
The random variable X has the following distribution for x 1,2,3,4.
X
P( X x )

1
0.02

2
0.35

Calculate:

a)
b)

E ( x)
E (5 x 3)

c)

E( X 2 )

d)

6 E ( x) 8

e)

E (5 x 2 2)

Solution
a)

E( x) xP( X x)
1(0.02 ) 2(0.35 ) 3(0.53) 4(0.10 )
0.02 0.70 1.59 0.40
2.71

70

3
0.53

4
0.10

b)

E (5x 3) 5E ( x) 3

5 xP( X x) 3
5 [1(0.02) 2(0.35) 3(0.53) 4(0.10)] 3
5(2.71) 3
13.55 3
10.55

c)

E( X 2 ) X 2 P( X x)
12 (0.02) 22 (0.35) 32 (0.53) 42 (0.10)
0.02 1.4 4.77 1.6
7.79

d)

e)

6E( x) 8 6 xP( X x) 8
= 6(2.71) + 8 = 16.26 + 8
= 24.26
E (5 x 2 2) 5E ( x 2 ) 2

5E ( x 2 ) 2
5 x 2 P( X x) 2
5(7.79) 2
40.95

In general, the following results hold when X is a discrete random variable.


1)

E (a) a where a is any constant.

2)

E (ax) aE( X ), where a is any constant

3)

E (aX b) aE( x) b, where a and b are any constants.

4)

E[ f1 ( x) f 2 ( x)] E[ f ( x)] E[ f 2 ( x) where f1 and f 2 are functions of X.

71

Variance, Var (x)

As for the variance, the following results are useful.


1)

Var(a) 0 where a is any constant

2)

Var (ax) a 2 var( x) where a is any constant

3)

Var (ax b) a 2 var( x) where a and b are any constants.

Example 3
For the data in Example 2, calculate the following:

a)

Var(5 x 3) 25 var(x)

b)

Var(4 x)

c)

Var(3x 2)

Solution
a)

Var(5x 3) 25 var(x)
We will need to find Var ( x) E ( x 2 ) E 2 ( x)

E( X )

xP( X

x)

2.71.
E ( X 2 ) X 2 P( X x)
7.79
Var( x) E ( X 2 ) E 2 ( x)
7.79 (2.71) 2
0.4459
Var(5 x 3) 25 var(x)
25(0.4459)
Therefore var(5 x 3) 11.1475

72

b)

Var(4 x) 16 var(x)
16(0.4459) 7.1344

c)

Var(3 x 2) 9 var(x)
9(0.4459) 4.0131

Example 4
A risky investment involves paying K300 000 that will return K2, 700,000 (for a net
profit of K2, 400,000) with probability 0.3 or K0 .00 (for a net loss of K300 000) with
probability 0.7. What is your expected net profit from this investment?
Solution
x
2,400,000
-300,000

P(x)
0.3
0.7

(Note
that
a
loss
is
treated
as
a
negative
profit.)
Then E( x) xP( x) 2,400,000(0.3) (300,000)(0.7) 720,000 210,000 510,000
Your expected net profit on an investment of this kind is K510, 000. If you were to make
a very large number of investments, some would result in a net profit of K7200, 000, and
others would result in a net loss of K300, 000. However, in the long run, your Average
net profit per investment would be K510, 000.
5.3

The Binomial Distribution


The Binomial distribution, in which there are two possible outcomes on
each experimental trial, is undoubtedly the most widely applied
probability distribution of a discrete random variable. It has been used to
describe a large variety of processes in business and the social sciences as
well as other areas. The Bernoulli process after James Bernoulli (1654
1705) gives rise to the Binomial distribution.
The Bernoulli process has the following characteristics.
a)

On each trial, there are two mutually exclusive possible outcomes, which
are referred to as success and failure. In somewhat different language
sample space of possible outcomes on each experimental trial is S =
(failure, success).

b)

The probability of a success will be denoted by P , P remains constant


from trial to trial. The probability of a failure will be denoted by q , q is
always equal to 1 P .

73

c)

The trials are independent. That is, the outcomes on any given trial or
sequence of trials does not affect the outcomes on subsequent trials.

Suppose we toss a coin 3 times, then we may treat each toss as one Bernoulli trial.
The possible outcomes on any particular trial are a head and a tail. Assume that
the appearance of a head is a success. For example, we may choose to refer to the
appearance for a defective item in a production process as a success, if a series of
births is treated as a Bernoulli process, the appearance of female 9male0 may be
classified as a success.
Consider the experiment of tossing a fair coin three times, then the sequence of
outcome is
HTH, HHH, HHT, THH, TTT, THT, TTH, HTT
Since the probability of a success and failure on a given trial are respectively, P
and , the probability of the outcome for instance {HTH } pqp p 2 q where p is
the probability of observing a head and q is the probability of observing a tail.

Outcome

Probability

HTH

pqp p 2 q

HHH

PPP p 3

HHT

ppq p 2 q

THH

qpp qp 2

THT

qpq q 2 p

TTT

qqq q 3

TTH

qqp q 2 p

HTT

pqq pq 2

We can obtain the number of such sequences from the formula for the number of
combination of n objects taken x at a time. Thus the number of possible sequences in
3
which two heads can occur is .
2

74

Thus C (n, x)

C (3,2)

n!
x!(n x)!

3!
3
2!1!

These are the events {HTH}, {HHT}, {THH}


Therefore the probability of exactly 2 heads p( x 2) c(3,2)qp 2

In the case of the fair coin, we assign a probability of

1
1
to p and to q. Hence
2
2

P( x 2) C (3,2)(1 / 2)(1 / 2) 2 3 / 8.

This result may be generalized to obtain the probability of (exactly) p successes in n


trials of a Bernoulli process. Let us assume n x failures occurred followed by x
successes, in that order. We may then represent this sequence as:

qqq

. q

n x Failures

ppp
x successes

The probability of this particular sequence is q n x p x . The number of possible sequences


n
of n trials resulting in exactly x success is .
x
Therefore, the probability of obtaining x successes in n trials of a Bernoulli process is
given by
F ( x) (n, x)q n x p x for x 0,1,2, . . ., n

If we denote by x the random variable number of successes in these n trials, then


F ( x ) P( X x )

The fact that this is a probability distribution is verified by noting the following
conditions.
1)

f ( x) 0 for all real numbers of x

75

2)

f ( x) 1
x

Therefore, the term binomial probability distribution, or simply binomial distribution, is


usually used to refer to the probability distribution resulting from a Bernoulli process.
In problems where the assumption of a Bernoulli process are met, we can obtain the
probabilities of zero, one, or more successes in n trials from the respective terms of the
binomial expansion (q p) n , where q and p denotes the probabilities of failure and
success on a single trial and n is the number of trials.

Example 5
The tossing of a fair coin 3 times was used earlier as an example of a Bernoulli process.
Compute the probabilities of all possible numbers of heads and this establishes a
particular binomial distribution.

Solution
1
, n 3. Letting x
2
represent the random variable number of heads, the probability distribution is as
follows:

This problem is an application of the binomial distribution for P

(Number of heads)

76

P( x)

3 1 1
1

8
0 2 2

3 1

1 2

3 1 1
3

8
2 2 2

3
1

8
2

3 1 1
1

8
3 2 2
3

Example 6
A machine that produces stampings for car engines is not working properly and
producing 15% defectives. The defective and no defective stampings proceed from the
machine on a random manner. If 4 stampings are randomly collected, find the probability
that 2 of them are defective.

Solution
Let P = 0.15 be the probability that a single stamping will be defective and let X equal the
number of defective in n = 4 trials. Then,
q 1 p 1 0.15 0.85 and
n
p( x)
x

x n x
p q 4(0.15) x (0.85) 4 x

4!
(0.15) x (0.85) 4 x ( x 0,1,2,3,4)
x!(4 x)!

Therefore, the probability of x = 2 defectives in a sample n = 4, substitute x = 2 into


the formula for P(X) to obtain

77

4!
(0.15) 2 (0.85) 2 0.01625625(6)
2!(4 2)!
0.0975375
0.0975
P(2)

The mean, variance and standard deviation for a Binomial random variable is given by:
Mean

np

Variance

2 npq

S tan dard deviation npq

To calculate the values of and in example 5, substitute n = 4 and P = 0.15 unto the
following formula

np 4(0.15) 0.60
npq (4)(0.15)(0.85) 0.51 0.714

Example 7
Payani Serenje owns 5 stocks. The probability that each stock will rise in price is 0.6.
What is the probability that three out of the five stocks will rise in price?

Solution
n 5 0.6,

q 1 P 0.4

Let x be the number of stocks, then

P( X 3) (5,3)(0.6)3 (0.4) 2
5!

.(0.216)(0.16)
3!2!
(5)(4)

(0.216)(0.16)
2
0.3456
0.346
From the tables n = 5, P = 0 .6
78

P(3) P( X 3) P( X 2) .663 .317 0.34

5.4

The Poisson Distribution


The Poisson distribution is named after the eighteenth century in the early 1800s
French Physicist and mathematicain. The Poisson distribution is a discrete
probability distribution which has the following formula.

P( X )

xe
x!

, forx 0,1,2 . . .

Where P(x) is the probability that a variable with a Poisson distribution equals x,
is the mean or expected value of the Poisson distribution, and e is
approximately 2.718 and is the base of the natural logarithms.
One reason why the Poisson distribution is important in statistics is that it can be
used as an approximation to the binomial distribution. If n (the number of trials)
is large and P (the probability of success) is small, the probability can be
approximated by the Poisson distribution where np . Experience indicates
that the approximation is adequate for most practical purposes if n is at least 20
and P is no greater than 0.05.

The Poisson distribution has been used to describe the probability function of such
situations.
1)
2)
3)
4)
5)

Product demand
Demand for service
Number of telephone calls that come through a switchboard.,
Number of death claims per day received by an insurance company.
Number of breakdowns of an electronic computer per much.

All the preceding has two elements in common,


1)

The given occurrence can be described in terms of a discrete random variable,


which takes on values, 0, 1, 2, and so forth.

2)

There is some rate that characterizes the process producing the outcome. The rate
is the number of occurrences per interval of time or space.

79

For instance, product demand can be characterized by the number of units purchased in a
specified period. Product demand may be viewed as a process that produces random
occurrences in continuous time.
The characteristics of a Poisson distribution are as follows:1)

The experiment consists of counting the number of times a particular even occurs
during a given unit of time, or in a given area of volume (or any unit of
measurement,

2)

The probability that an event occurs in a given unit of time, area, or volume is
independent of the number that occur in their units.

Note that the most important difference between the Binomial and the Poisson
distributions is that in the Binomial distribution we find the probability of a number of
successes in n trials , whiles as for the Poisson distribution we find the probability of the
number of successes per unit of time.
Example 7
Suppose the random variable X the number of the companys absent employees on
Tuesdays has (approximately) a Poisson probability distribution. Assuming that the
average number of Tuesday absentees is 3.4;
a)

Find the mean and standard deviation of x , the number of absent employees on
Tuesday.

b)

Find the probability that exactly 3 employees are absent on a given Tuesday.

c)

Find the probability that at least two employees are absent on a Tuesday.

Solution
a)

The mean and variance of a Poisson distribution are equal to . Thus for this
example

= 3.4,

2 3.4

Therefore the standard deviation is

3.4 1.84

b)

We want the probability that exactly three employees are absent on Monday. The
probability distribution for x is

80

P( X )

X e
X!

Then = 3.4, X = 3, and e 3.4 = 0.033373


Thus, P(3)

c)

(from Table 2)

(3.4)3 e 3.4 (3.4)3 (0.033373 )

0.2186 .
3!
6

To find the probability that at least two employees are absent on Tuesday, we
need to find

P( X 2) P(2) P(3) . . . P( X )
x2

Alternatively, we could find the complementary event

P( X 2) 1 P( X 1) 1 [ P(0) P(1)]
(3.4)0 e3.4 (3.4)1 e3.4
1

0!
1!

1 [0.033373 (3.4)(0.03337]
1 0.1468412 0.8531588
0.8532

Example 8

On Saturdays at Southdown, a small airport in Kalulushi, airplanes arrive at an average of


3 for the one hour period 13 00 hours to 14 00 hours. If these arrivals are distributed
according to the Poisson probability distribution, what are the probabilities that:

a)

Exactly zero airplanes will arrive between 13 00 hours to 14 00 hours next


Saturday?

b)

Either one or two airplanes will arrive between 13.00 hours and 14 00 hours next
Saturday?

c)

A total of exactly two airplanes will arrive between 13 00 hrs and 14 00 hrs
during the next three Saturdays?

81

Solution
a)

= 3 and we let X be the number of arrivals during the specified time period.
30 e. 3
P (0)
0.049787068
0!
0.0498

(From the table, we have 0.049787).

b)

P( X 1 or X 2) P( X 1)P( X 2)

31 e 3 32 e 3

1!
2!
9
e 3 (3 )
2
15
( )(0.04978068)
2
0.37340301

0.3734.

c)

A total of exactly two arrivals in three Saturdays during the period 13 00 hours to
14 00 hours can be obtained. For example by having two arrivals on the first day,
none on the second day, and none on the third day during the specified one-hour
period.
The total number of ways in which the event in question can occur is shown in the
table below.

Saturday Day 1
2
0
0
1
1
0

Number of Arrivals
Saturday Day 2
0
2
0
1
0
1

82

Saturday Day 3
0
0
2
0
1
1

Number of ways of obtaining a total of exactly 2 arrivals in 3 Saturdays.


3[ P( X 20][P( X 0)]2 3[ P( X 1)]2 [ P( X 0)]

(32 e 3 ) (30 e 3 ) 2
(31 e 3 ) 2 (30 e 3 )
3
2!
0!
1!
0!

9
81
9
81e
3e 9 9
(0.0001)
2
2
2

0.0049815
0.005

5.5

Continuous Random Variables


The probability distribution of continuous random variables is also important in
statistical theory. They are a theoretical representation of a continuous random
variable such as the time taken in minutes to do some work, or the mass in
grammes of a bag of salt.
The continuous random variable is specified by its probability density function,
which is written f (x) where f ( x) 0 throughout the range of values for which
x is defined. The probability density function ( p.d . f ) can be represented by a
curve, and the probabilities are given by the area under the curve.
For a continuous random variable x that assumes a value in the interval a x b,
b

the P(a x b) f ( x)dx , assuming the integral exists.

Similar to the

requirements for a discrete probability distribution, require

f ( x) 0 and

f ( x)dx 1.

If x is a continuous random variable and with p.d.f. f (x), then

a
b

var(x) x 2 f ( x)dx 2 where E ( x) xf ( x)dx, the standard deviation of x


is often written as var(x)

83

5.6

The Normal Distribution


The normal distribution plays a central role in statistical theory and
practice, particularly in the area of statistical inference.
Any important characteristic of the normal distribution is that we need to
know only the mean and standard deviation to compute the entire distribution.
The normal probability distribution is defined by the question.
1

F ( x)

(x 2 )

2
2 2

The normal distribution is perfectly symmetric about its mean .


computing the area over intervals under the normal probability distribution is a
difficult task. As a result, we will use the computed areas listed in Table 3.

Example 1
Suppose you have a normal random variable x with 50 and 15.
Find the probability that x will fall within the interval 30 x 70 .

Solution
We compute the Z-Score (or standard score) for the measurement x, the
standard score is defined by:
Z

Value Mean
x

S tan dard deviation

Thus Z

30 50
1.33
15

Because x = 30 lies to the left of the mean, the corresponding Z-score


should be negative and of the same numerical value as the Z-score corresponding
to x = 50.

70 50 20

1.33
15
15

f (x)

84

(4)

30

50

70

Normal frequency function: 50, 15.


To find the area corresponding to a Z-score of 1.33, we first locate the
value 1.3 in the left-hand column. Since this column lists Z values to one decimal
place only, we refer to the top row of the table to get the second decimal place,
0.03. Finally, we locate the number where the row labeled Z = 1.3 and the
column labeled 0.03 meet. This number represents the area between the mean,
and the measurement that has a Z-score of 1.33.
A = 0.4062
Or, the probability that x will fall between 50 and 70 is 0.4082. Thus the
required probability is 2(0.4082) = 0.8164.
Example 2
Use Table 1 to determine the area to the right of the Z-score 1.64for the standard normal
distribution, i.e., find P( Z 1.64) .
Solution

Standard Normal Distribution: 0, 1

85

The probability that a normal random variable will be more than 1.64 standard deviation
to the right of its mean is indicated in the figure above. Because the normal distribution
is symmetric, half of the total probability (.5) lies to the right of the mean and half to the
left. Therefore, the desired probability is P(Z 1.64) 0.5 A. .
Where A is the area between 0 and Z =1.64 as shown in the figure.
Referring to Table 1, the area A corresponding to Z = 1.64 is 0.4495, so,
P(Z 1.64) 0.5 A 0.5 0.4495 0.0505.
Example 3
Find the probability that the value of the standard normal variable will be between 1.23
and +1.14.
Solution
Table 1.0 show that the area under the standard normal curve between 0 and 1.23 is
0.3907, so the area between 0 and 1.23 must also be 0.3907. Table 1.0 show that the
area between 0 and 1.14 is 0.3729. Thus, the area between 1.23 and +1.14 equals
0.3907 + 0.3729 = 0.7636, which means that the probability we want equals 0.7636.

-1.23

+1.14

Example 4
Find the probability that the value of the standard normal variable will be between 0.43
and 1.55.

86

Solution

0.43 1.55

From Table 1, the area between 0 and 1.55 is 0.4394 and that between 0 and 0.43 is
0.1664. Therefore the area between 1.55 is 0.4394 0.1664 = 0.2730.
The Normal Distribution As An Approximation To The Binomial Distribution
Normal Approximation to the Binomial Distribution. If n (the number of trials) is large
and P ( the probability of success) is not too close to 0 or 1, the probability distribution of
the number of successes occurring in n Bernoulli trials can be approximated by a normal
distribution. Experience indicates that the approximation is fairly accurate as long as
1
1
1
and n(1 p) when p .
np 5 when p
2
2
2

Example 5

1
. A firm has 100
2
such machines and whether one is down, is statistically independent of whether another is
not down. What is the probability that at least 60 machines will be down?
The probability that a machine will be down for repairs next week is

Solution
The number of machines down for repair has a binomial distribution with mean equal to
1 1
100 or 50. Because of the continuity correction, the probability that the
2 2
number down for repairs is 60 or more can be approximated by the probability that the
value of a normal variable with mean equal to 50 and standard deviation equal to 5
exceeds 59.50. The value of the standard normal variable corresponding to 59.50 is (5950) 5, or 1.9. Table 3 shows that the area under the standard normal curve between

87

zero is 1.9 is 0.4713, so the area to the right of 1.9 must equal 0.5000 0.4713 = 0.0287.
This is the probability that at least 60 machines will be down for repair.

Learning Objectives
After working through this Chapter, you should be able to:

Give the formal definition of a random variable, and distinguish between a


random variable and the values it takes.

Explain the difference between continuous and discrete random variables.

Discuss such distributions as Binomial, Poisson, and Normal and calculate


probabilities of events for such random variables.

Find the mean and the variance of the binomial, Poisson and Normal distributions.

88

Sample Examination Questions

1.

2.

a)

It is estimated that 75% of a grapefruit crop is good, the other 25% have
rotten centers that cannot be detected unless the grapefruit is cut open.
The grapefruit are sold in sacks of 6. Let r be the number of good
grapefruit in the sack.
i)

Make a histogram of the probability distribution of r.

ii)

What is the probability of getting no more than one bad grapefruit


in a sack?

iii)

What is the probability of getting at least one grapefruit in a sack?

iv)

What is the expected number of good grapefruit in a sack?

v)

What is the standard deviation of the r probability distribution?

b)

Let x have a normal distribution with 10 and 2. Find the


probability that an x value selected at random from the distribution is
between 11 and 14.

a)

In a lottery, you pay K12 500 to choose a number (integer) between 0 and
9999, inclusive. If the number is drawn, you win K12 500,000. What is
your expected gain (or loss) per play?

b)

A large hotel knows that on average 2% of its customers require a special


diet for medical reasons. It is hosting a conference for 500 people.
i)

Which probability distribution would you suggest for calculating


the exact probability that no customer at the conference will
require a special diet? Calculate this probability.

ii)

Which probability distribution do you suggest is an approximation


to this and why? Calculate an approximate probability that no
customers require a special diet.

iii)

Compare your answers to (i) and (ii).

iv)

From past records the hotel knows that 0.2% of its customers will
require medical attention while staying in the hotel. Calculate the
exact and approximate probability that no customer out the 500
will require medical attention while attending the conference. Is
this approximation better or worse that the approximation used in
(ii)? Why?

89

3.

a)

The Table below shows the probabilities for the number of complaints
received each day by a newspaper agency from customers not receiving a
paper.

No. of complaints
Probability

b)

4.

a)

b)

8
.35

9
.42

10
.18

11
.03

12
.02

i)

Find the mean and standard deviation of the number of complaints.

ii)

The agency state the cost (in kwachas) of daily complaints to be C


= 600 + 300x, where x is the number of complaints. Find the mean
and standard deviation of the cost of daily complaints.

A write has prepared to submit sit articles for publication. The probability
of any article being accepted is 0.20. Assuming independence, find the
probability that the writer will have
i)

exactly one article accepted.

ii)

At least two articles accepted

iii)

No more than three articles accepted

iv)

At most two articles accepted.

A Toyota dealer wishes to know how many citations to order for the
coming month. Estimated demand is normally distributed, with a standard
deviation of 20 and a mean of 120.
i)

What is the probability that he will need more than 160?

ii)

What is the probability that he will eed less than 90?

A client wishes to know what price he might be able to get for a business
property. The realtor estimates that a sale price for that property of K600
million would be exceeded no more than 5% of time. A price at least
K420 million should be obtained at least 90% of the time.. Assuming the
distribution of sales prices to be normal, answer the following questions?
i)

What are and for this distribution?

ii)

What is the probability of a scale price greater than K540, less than
K640 million, and between K540 million and K600 million.

90

5.

a)

b)

6.

Which of the following are continuous variables, and which are discrete
variables.
i)

Number of traffic fatalities per year in the town of Livingstone.

ii)

Distance a ball travels after bring killed by a soccer player.

iii)

Time required to drive from home to campus on any given day.

iv)

Number of cars in Kitwe on any given day.

v)

Your weight before breakfast each morning.

The ABCD Mother-in-law sociologists say that 80% of married women


claim that their husbands mothers are the biggest bones of contention in
their marriages (sex and money are lower-rated areas of contention).
Suppose that five married women are having lunch together one
afternoon, what is the probability that:
i)

All of them dislike their mother-in-law

ii)

None of them dislike her mother-in-law?

iii)

At least four of them dislike their mother-in-law?

iv)

No more than three of them dislike their mother-in law.

c)

The Mulenga Caf has found that about 6% of the parties who make
reservations dont show up. If 90 party observations have been made, how
many can be expected to show up. Find the standard deviation of this
distribution.

a)

The mean and standard deviation on an examination are 85 and 15


respectively. Find the scores on standard units of students receiving
grades.

b)

i)

65

ii)

89

Determine the probabilities


i)
ii)

P( Z 2.12 )
P (16 Z 1.13)

where Z is assumed to be normal with mean 0 and variance 1.

91

7.

c)

What is the probability of obtaining at least 1280 heads if a coin is tossed


2500 times and heads and tails are equally likely?

d)

The side effects of a certain drug cause discomfort to only a few patients.
The probability that any individual will suffer from the side effects is
0.005. If the drug is given to 35 000 patients, what is the probability that
three (3) will suffer side effects.

a)

The customer service center in a large Luksa department store has


determined that the amount of time spent with a customer with a
complaint is normally distributed with a mean of 9.3 minutes and a
standard deviation of 2.5 minutes. What is the probability that for a
randomly chosen customer with a complaint the amount of time sent
resolving the complaint will be:

b)

c)

i)

less that 10 minutes?

ii)

more than 5 minutes

iii)

between 8 and 15 minutes.

A car rental company is determined that the probability a car will need
service work in any given month is 0.25. The company has 850 cars.
i)

What is the probability that more than 150 cars will require service
work in a particular month?

ii)

What is the probability that fewer than 180 cars will need service
work in a given month? (Give reason for the method used to
calculate the probabilities in (i) and (ii).

A contractor estimates the probabilities for the number of days required to


complete a certain type of construction project as follows.

Time (days)
Probability

1
.04

2
.21

3
.34

4
.31

5
.10

i)

What is the probability that a randomly chosen project will take


less than 3 days to complete.

ii)

Find the expected time to complete a project.


92

iii)

Find the standard deviation of time required to complete a project.

iv)

The Contractors project cost is made up of two parts a fixed


cost of K100,000,000 plus K10,000,000 for each day taken to
complete the project. Find the standard deviation of total project
costs.

93

CHAPTER 6
SAMPLING AND SAMPLING DISTRIBUTION

Reading

Newbold Chapter 6
Wonnacolt and Wonnacolt Chapter 6
Tailoka Frank P Chapter 10
James T Mc Clave and P George Benson Chapter 7

Introductory Comments
We now start on the work that defines the subject Statistics as a different and unique
subject. The idea of sampling and sampling distribution for a statistic like the mean must
be clearly understood by all users of statistics. This is not an easy Chapter to understand.

6.

Sampling Theory
Sampling and Sampling Distribution

6.1

Sampling
If we draw an object from a box, we have the choice of replacing or not replacing
the abject into the box before we draw again. In the first case a particular object
can come up gain and again, whereas in the second it can come up only once.
Sampling where each member of a pollution may be chosen more than once is
called sampling with replacement while sampling where each member cannot be
chosen more than once is called sampling without replacement.

94

Random Samples. Random Numbers


Clearly the reliability of conclusions drawn concerning a population depends on
whether the sample is properly chosen so as to represent the population
sufficiently well, and one of the important problems of statistical inference is just
how to choose a sample.
The way to do this for finite population is to make sure that each members
of the population has the same chance of being in the Sample, which os often
called a random sample. Random sampling can be accomplished for relatively
small populations by drawing lots or equivalently, by using a table of random
numbers specially constructed for such purposes.
Because inference from sample to population cannot be certain we must use the
language of probability in any statement of conclusions.

6.2

Sampling Distributions

As we have seen, a sample statistic that is computed from X 1 , . . . , X n is a


function of these random variables and is therefore itself a random variable. The
probability distribution of a sample statistic is often called the sampling
distribution of the statistic.
Alternatively, we can consider all possible sample of size n that can be drawn
from the population, and for each sample we compute the statistic. In this manner
we obtain the distribution of the statistic, which is its sampling distribution.
For a sampling distribution, we can of course compute a mean, variance, standard
deviation, etc. The standard deviation is sometimes also called the standard error.

The Sample Mean


X 1 , X 2 , . . . X n denote the independent, identically distributed random
Let
variables for a random sample of size n as described above. Then the mean of the
sample or sample mean is a random variable defined by

X1 X 2 . . . X n
n

95

(1)

If x1 , x2 , . . ., xn denote the values obtained in a particular sample of size b, then the mean
x x . . . xn
for that sample is denoted by x 1 2
(2)
2

Sampling Distributions of Means


Let f (x) be the probability distribution of some given population from which we draw a
sample of size n. Then it is natural to look for the probability distribution of the sample
statistics x , which is called the sampling distribution for the sample mean, or the
sampling distribution of mean. The following theorems are important in this connection.

Theorem 6.1
The mean of the sampling means denoted by x

(3)

Where is the mean of the population. Theorem 6 1 states that the expected value of
the sample mean is the population mean.

Theorem 6.2
If a population is infinite and the sampling ir random or if the population is finite and
sampling is with replacement, then the variance of the sampling distribution of means,
denoted by x2 , is given by

E (x )
2

2
x

2
n

Theorem 6.3
If the population is of size N, if sampling is without replacement, and if the sample size is
2 N n
2
(5)
n N , then the previous equation is replaced by x
n N 1
While x is from Theorem 6.1.
Note that Theorem 6.3 is basically the same as 6.2 as N

96

Theorem 6.4
If the population from which samples are taken is normally distributed with mean and
variance 2 , then the sample mean is normally distributed with mean and variance

2
.
n
Theorem 6.5
Suppose that the population from which samples are taken has a probability with mean
and variance 2 that is not necessarily a normal distribution. Then the standardized
variable associated with x , given by

(6)

n
is asymptotically normal, i.e.
lim
n

P( Z z )

1
2

2
2

du

(7 )

Theorem 6.5 is a consequence of the Central limit theorem. It is assumed here that the
population is infinite or that sampling is with replacement. Otherwise, the above is

correct if we replace
in Theorem 6.5 by x2 as given in theorem 6.3.
n

Example 1.0
Five hundred ball bearings have a mean weight of 5.02kg and a standard deviation of
0.30kg. Find the probability that a random sample of 100 ball bearings chosen from this
group will have a combined weight of more than 5.10kg.
For the sampling distributions of means, x 5.02 kg, and

0.30 500 100


0.027
100 500 1

97

x2

2
n

N n
N 1

The combined weight will exceed 5.10kg if the mean weight of the 100 bearings exceeds
5.10kg.
5.10 in standards units Z

5.10 5.02
2.96
0.027

The required probability is the area to the right z = 2.96 as shown in Figure 6.1.

2.96
Figure. 6.1.
The probability is 0.5 0.4985 = 0.0015. Therefore, there are only 3 chances in 2000 of
picking a sample of 100 ball bearings with a combined weight exceeding 5.10 kg.

Sampling Distribution of Proportions


Suppose that a population is infinite and binomially distributed, with p and
q 1 p being the respective probabilities that any given number exhibits or does not
exhibit of a certain property. For example, the population may be all possible tosses of a
fair coin, in which the probability may be all possible tosses of a fair coin, in which the
1
probability of the event heads is p .
2
Consider all possible samples of size n drawn from this population, and for each sample
determine the statistic that is the proportion p of successes. In the case of the coin, p
would the proportion of heads turning up in n tosses. Then we obtain a sampling
distribution whose mean p and standard deviation p are given by

p P

pq

p(1 p)
n

(8)

For large values of n(n 30) the sampling distribution is very nearly a normal
distribution, as seen from Theorem 6.5. For finite populations in which samplings

98

without replacement, the equation p given above is replaced by x as given Theorem


6.3 with p

pq
n

Example 2.0
A simple random sample of size 64 is selected from a population with p 0.30 .
(a)
What is the expected value of p ?
(b)
What is the standard deviation of p ?
(c)
Show the sampling distribution of p ?
(d)
What does the sampling distribution of p show?
Solution
(a)
(b)

(c)
(d)

The expected value of p i.e., E ( p ) p 0.30 .


The
standard
deviation
pq
0.31 0.3

0.00328125 0.0573 .
of p p
n
64
Normal with E ( p ) 30 and p 0.0573 .
The probability distribution of p .

Sampling Distribution of Differences and Sums


Suppose that we are given two populations. For each sample size n1 drawn from the first
population, let us compute a statistic X 1 . This yields a sampling distribution for X 1
whose mean and standard deviation we denote by X and X , respectively. Similarly
1

for each sample of size n2 drawn from the second population, let us compute a statistic
X 2 whose mean and standard deviation are X and X respectively.
2

Taking all possible combinations of these samples from the two populations, we can
obtain a distribution of the differences X 1 X 2 , which is called the sampling distribution
of differences of the statistics. The mean and standard deviation of this sampling d,
denoted respectively.
By X 1 X 2 X 1 X 2

1X2

X2 1 X2 2

(9)

Provided that the samples chosen do not in any way depend on each other, i.e., the
samples are independent (in other words, the random variables X 1 and X 2 are
independent.)

99

Similarly for the sample means from two populations, denoted by x1 , x2 , respectively,
then the sampling distribution of the differences of means is given for infinite population
with mean and standard deviation X , X and X , X , respectively by
1

1 x2

x1 x 2 1 2 ,

1 x2

x2 x2
1

12
n1

and

22
n2

(10)

(11)

Using Theorems 6.1 and 6.2 this result also holds for finite populations if sampling is
done with replacement. The standardized variable
Z

( X 1 X 2 ) ( 1 2 )

12
n1

22
n2

in that case is very nearly normally distributed if n1 and n2 are large (n1 , n2 30 ).
Similar results can be obtained for infinite populations in which sampling is without
replacement by using Theorems 6.1 and 6.3.
Example 3.0
In the age of rising housing costs, comparisons are often made between costs in different
areas of the country. In order to compare the average cost 1 of a 3 bedroom, 2 bath
home in Kitwe to the average cost 2 of a similar home in Lusaka, independent random
samples were taken of 190 housing costs in Kitwe and 120 housing costs in Lusaka.
Describe the sampling distribution of ( x1 x2 ) , the difference in sample housing costs in
the two cities.
Solution
The mean of the sampling distribution of x1 x2 is E x1 x2 E ( x1 ) E ( x1 ) 1 2
The variance of x1 x2 is the sum of the variances of x1 and x2 ; Thus

x2 x

12

22

12

22

, where 12 and 22 represent the population variances of

n1 n2 190 120
the costs of 3 bedroom, 2 bath homes in Kitwe and Lusaka, respectively. The standard
1

deviation of the sampling distribution of x1 x2 is the

100

12

22

190 120

Corresponding results can be obtained for sampling distributions of differences of


proportions from two binomially distributed populations with parameters
P1 , q1 , and P2 , q2 , whose mean and standard deviation of their difference is given by

P P P1 P2
1

P p
1

(13)

P1q1 P2 q2

n1
n2

(14)

Example 4.0
It has been found that 2% of the tools produced by a certain machine are defective. What
is the probability that in a shipment of 400 such tool, 3% or more will prove defective?

p P 0.02,

p q

0.02(0.98) 0.14

0.007
400
20

0.03 0.02

P( P 0.03) P Z

0.007

P( Z 1.43)
0.5000 0.4236
0.0764

1.43

101

Learning Objectives
After working through this Chapter, you should be able to:

Give the formal definition of a random variable, and distinguish between a


random variable and the values it take,

Explain the difference between continuous and discrete random variables.

Discuss such distribution as Binomial, Poisson, and Normal and calculate


probabilities of event for such random variables.

Find the mean and the variance of the Binomial, Poisson and Normal distribution.

Define the sampling distribution of the sample mean, the sample proportion and
their differences.

102

CHAPTER 7
ESTIMATION

Reading
Newbold Chapter 7
Wonnacott and Wonnacott Chapter 7
Tailoka Frank P Chapter 10

Introductory Comments
We need to know how the mean of the population is related to the sample mean.
What characteristics must the sample mean have. We need to know whether the sample
is likely to give us an estimate close to the population value. To tell us this, we use
confidence intervals.

7.

Estimation Theory

7.1

Unbiased Estimates and Efficient Estimates


A statistic is called unbiased estimator of a population parameter if the mean or
expectation of the statistic is equal to the parameter. The corresponding value of
the statistic is then called unbiased estimate of the parameter.
If the sampling distribution of two statistics have the same mean, the statistic with
the smaller variance is called a more efficient estimator of the mean. The
corresponding value of the efficient statistic is then called an efficient estimate .
Clearly one would in practice prefer to have estimators that are both efficient and
unbiased, but this is not always possible.

7.2

Point estimates and Interval Estimates


An estimate of a population parameter given by a single number is called a point
estimate of the parameter. An estimate of a population parameter given by two
numbers between which the parameter may be considered to lie is called an
interval estimate of the paratmeter.

103

Example 1.0
If we say that a distance is 34.5km, we are giving a point estimate. If, on the
other hand, we say that the distance is 34.5 0.04km, i.e., the distance lies
between 34.46 and 34.54km, we are giving an interval estimate.
A statement of the error or precision of an estimate is often called reliability.

7.3

Confidential Interval Estimates of Population Parameters.


Let s and s be the mean and standard deviation (standard error) of the
sampling distribution of a statistic S. Then if the sampling distribution of S is
approximately normal (which we have seen is true for many statistics if the
sample size n 30), we can expect to find S lying in the interval s s to
s s , s 2 s to s 2 s or s 3 s , to 3 s , about 68%, 95% and
99.7% of the time respectively.
Equivalently, we can expect to find, or we can be confident of finding in the
intervales S s , to S s , S 2 , to S 2 , S 3 s to S 3 s about 68%,
95% and 99.7% of the time respectively. Because of this, we call these respective
intervals 68%, 95% and 99.7% confidence intervals for estimating s (i.e., for
estimating the population parameter, in this case of an unbiased S). The end
number of these intervals ( S s S 2 s , S 3 s ) are then called the 68%, 95%
and 99.7% confidence limites.
Similarly, S 1.96 s and S 2.58 s are 95% and 99% confidence limits for
s . The percentage confidence is often called the confidence level. The
numbers 1.96, 2.58, etc., in the confidence limits are called critical values and are
denoted by Z c . From confidence levels, we can find critical values.

7.4

Confidence Intervals for Means


We shall see how to create confidence intervals for the mean of a population
using two different cases. The first case shall be when we have a large sample
size ( n 30), and the second case shall be when we have a smaller sample
n 30) and the underlying population is normal.
Large samples ( n 30)

104

If the statistic S is the sample mean x , then the 95% and 99% confidence limits
for
estimation
of
the
population
mean
are
given
by

x 1.96 x , and x 2.58 x , respectively.


More generally, the confidence limits are given by x Zc x where Zc which
depends on the particular level of condience desired. The confidence limits for
the population mean are given by
x Zc

(1)

In case of sampling from an infinite population or if sampling is done with


replacement from a finite population, and by
x Zc

N n
N 1

(2)

If sampling is done without replacement from a population of finite size N.


In general, the population standard deviation is unknown, so that to obtain the
above cnfidence limits we use the estimator S or S .
Example 2.0
Find a 95% confidence interval estimating the mean height of the 1546 male
students at XYZ University by taking a sample size 100. (Assume the mean of
the sample, x , is 67.45 and that the standard deviation of the sample S , is
2.93cm).
The 95% confidence limits are x 1.96

Using x = 67.45cm and S = 2.93 as an estimate of , the confidence limits are

2.93
67.45 1.96

100

or

67.45 0.57

Then the 95% confidence interval for the population mean is 66.88 to 68.02
cm, which can be denoted by 66.88 68.02.

105

We can therefore say that the probabilit that the population mean height lies
between 66.88 and 68.02 cm is above 95%.
In symbols, we write
P(66.88 68.02) 0.95% . This is equivalent to saying that we are 95%
confident that the population mean (true mean) lies between 66.88 and 68.02cm.

7.5

Sample Sample (n 30) and Population Normal


In this case use the distribution to obtain confidence levels. For example, if
t0.025 and t0.025 are values of T for which 2.5% of the area lies in each tail of the t
distribution, then a 95% confidence interval for T is given by
t0.025

x
S

t0.025

(3)

From which we can see that can be estimated to lie in the interval

x t0.025

S
S
x t0.025
n
n

(4)

with a 95% confidence. In general the confidence limits for population means are
given by

x tc

S
n

(5)

where the tc values can be read from Table 2.


Example 3.0
The following data have been collected from a sample of nine items from
a normal population: 12, 9, 16, 20, 16, 23, 7, 8, and 10.
(a)
What is the point estimate of the population mean?
(b)
What is the point estimate of the population standard
deviation?
(c)
What is the 90% confidence interval for the population
mean?
Solution
x 121 13 .444
(a)
The point estimate is x
n
9

106

(b)

The

point

estimate

x
x n

s
(c)

7.6

n 1

of

the

population

121
1879

standard

deviation

is

5.615

s
5.615
, 13.444 1.860
, 13.444 3.4813 .
n
9
Thus, the 90% confidence interval estimate of the population mean is
9.9627 to 16.9253.

We have x t0.05,8

Confidence Intervals for Proportions


Suppose that the statistic S is the proportional of successes in a sample of size
n 30 drawn from a binomial population in which P is the proportion of
successes (i.e. the probability of success). Then the confidence limits for P are
given p z p where p denotes the proportion of success in the sample of size
2

n . Using the value of p obtained in chapter 6; we see that confidence limits for

the population are given by:

P Zc

pq
P(1 P)
P Zc
n
n

(6)

This is the case where sampling is from an infinite population or if sampling is


done with replacement from a finite population. Similarly, the confidence limits
are:
P Zc

pq
n

N n
N 1

(7 )

when sampling is done without replacement from a finite population of size N .


Note that these results are obtained from (1) and (2) on replacing x by P
and by Pq . To compute the above confidence limits, we use the sample
estimate P for p.

107

Example 4.0
A sample roll of 100 votes chosen at random from all voters in a given district
indicated that 55% of them were in favour of a particular candidate. Find the 99%
confidence limits for the proportion of all voters in favour of this candidate.
The 99% confidence limits for the population P are
P 1.58 p P 2.58

0.55 2.58

P(1 p)
n

055(0.45)
100

0.55 0.13

7.7

Confidence Intervals for Differences and Sums

If X 1 and X 2 are two sample means with approximately normal sampling


distributions, confidence limits for the differences of the population parameters
corresponding to X 1 and X 2 are given by

X 2 Z c s1 s 2 X 1 X 2 Z c s21 s22

(8)

While confidence limits for the sum of the population parameters are given by

X 2 Z c s1 s2 S1 S2 Z c s21 s22

(9)

provided the samples are independent.

For example, confidence limits for the difference of two population means, in the
case where the populations are infinite and have known standard deviations
1 , 2 , are given by

x x Z
1

x1 x 2

x1 x 2 Z c

s2 s2
1

n1

108

n2

(10 )

where x1 , n1 and x 2 , n2 are the respective means and sizes of the two samples
drawn from the populations.
Similarly, confidence limits for the difference of two population proportions,
where the populations are infinite, are given by

P1

P 2 Z c

P(1 p1 )
P (1 p2 )
2
n1
n2

(11)

When P1 and P2 , are sample proportions and n1 and n2 are sizes of the two
samples drawn from the populations.

Example 5.0
In a random sample of 400 adults and 600 teenagers who watched a certain
television program, 100 adults and 300 teenagers indicated that they like it.
Construct the 99.7% confidence limits for the difference in proportions of all
adults and all teenagers who watched the program and liked it.

Confidence limits for the difference in proportions of the two groups are given by
911), where subscripts 1 and 2 refer to teenagers and adults, respectively, and
Q1 1 p1 , Q2 1 p2. Here P1 300 / 600 0.5 and P2 100 / 400 0.25 are
respectively, the proportions of teenagers and adults who liked the program.

The 99.7% confidence limits are given by


0.50 0.25 3

(0.50 )( 0.50 ) (0.25 )( 0.75 )

600
400

0.25 0.09

(12 )

Therefore, we can be 99.7% confident that the true difference in proportions lies between
0.16 and 0.34.

109

7.7 Determing The Sample Size


Form the previous work, there is a 1 probability that the value of the sample will

provide a sampling error of Z x or less. Because x


, we can rewrite this
n
2
statement to read. There is 1 probability that the value of the sample mean will

provide a sampling error of Z
or less. Given values Z and , we can determine
n

2
2
the sample size n needed to provide any sampling error. Let d the maximum sampling
2

2
Z

n 2 2
error, we have
. This is the sample size which will provide a
d
probability statement of 1 with sampling error d or less.
In most cases, , will be unknown. In practice one of the following procedures can be
used.
(a)
Use a pilot study to select a preliminary sample. The sample standard
deviation from the preliminary sample can be used as the planning value for .
(b)
Use the sample standard deviation from a previous sample of the same
or similar units
(c)
Use judgment or best guess for the value of . This is where you apply
the Empirical rule or the Chebyshevs rule.

Example 6.0
How large a sample should one select to be 90% confident that the sampling error is 3 or
less? Assuming the population variance is 36.
Solution
We have d 0.05 , Z 0.05 1.65 and 6 . Hence

2
2

1.65 6
n

32

6.6

In cases where the computed n is a fraction, we round up to the next integer value: hence
the recommended sample size here is 7.

pq
. In practice, the planning value for the population
d2
2
proportion can be chosen in the same way as the population mean. However if none of
them applies, use p 0.05

As for a proportion, n Z 2

110

Example 7.0
In a survey, the planning value for the population proportion p is given as 0.45. How
large a sample should be taken to be 95% confident that the sample proportion is within
0.04 of the population proportion?
Solution
We have d 0.04 , Z 0.025 1.96 , p 0.45 and q 0.55 . Hence

1.962 0.450.55 594.2475


0.042

Hence, a sample size of 595 is recommended.

Example 8.0
How large a sample should be taken to be 90% confident that the sampling error of
estimation of the population proportion is 0.02 or less? Assume past data are not
available for developing a planning value for p ?
Solution
We have Z 0.05 1.65 , and assume that p 0.5 , q 0.5 and d 0.02 .
Therefore n

1.652 0.50.5 1701.5625 . The recommended sample size is 1702.


0.022

Learning Objectives

After working through this Chapter you should be able to:

Explain a point estimate and confidence interval.

Confidence intervals for proportions and differences of proportion

Find confidence intervals for means of normal populations, and for differences of
means of two normal populations, both when variance (s) are known and when
they are unknown..

111

CHAPTER 8
HYPOTHESIS TESTING

Reading
Newbold Chapter 9
Wonnacott and Wonnacott Chapter 9
Tailoka Frank P Chapter 10

Introductory Comments
We often need to answer questions about a population such as Is the mean of the
population less 5? or Is there any difference between two means? In statistics we try
to answer these questions based on the information in samples. There is useful
information in this Section of this subject for everyday life.
The theory of tests of hypothesis is necessarily linked to that for confidence intervals.
8.0

Test of Hypothesis and Significance

8.1

Statistical Decisions
Very often in practice we are called upon to make decisions about
populations on the basis of sample information. Such decisions are called
statistical decisions. For example, we may wish to decide on the basis of sample
data whether a new serum is really effective in curing a disease, whether one
educational procedure is better than another, or whether a given coin is loaded.

8.2

Statistical Hypothesis
In attempting to research decisions, it is useful to make assumptions or
guesses about the populations involved. Such assumptions, which may or may
not be true, are called Statistical hypotheses and in general are statements about
the probability distribution of the populations. For example, if we want to decide
whether a given coin is loaded, we formulate the hypothesis that the coin is fair,
i.e., p = 0.5, where p is the probability of heads. Similarly, if we want to decide
whether one procedure is better than another, we formulate the hypothesis that
there is no difference between the two procedures (i.e., any observed differences

112

are merely due to fluctuations in sampling from the same population). Such
hypotheses are often called null hypotheses, denoted by H o .
Any other hypothesis that differs from a given null hypothesis is called an
alternative hypothesis. For example, if the null hypothesis is p = 0.5, possible
alternative hypotheses are p 0.7, P 0.5 or P 0.5. A hypothesis alternative
to the null hypothesis is denoted by H1 .
8.3

Type I and Type II Errors


If we reject a hypothesis when it happens to be true, we say that a Type I error
has been made. If, on the other hand, we accept a hypothesis when it should be
rejected, we say that a Type II error has been made. In either case, a wrong
decision or error in judgement has occurred.
In order for any tests of hypotheses or decision rules to be good, they must
be designed so as to minimize errors of decision. This is not a simple matter
since, for a given sample size, an attempt to decrease one type of error is
accompanied in general by an increase in the other type of error. In practice one
type of error may be more serious than the other, and so a compromise should be
reached in favour of a limitation of the more serious error. The only way to
reduce both types of errors is to increase the sample size, which may or may not
be possible.

8.4

Level of Significance
In testing a given hypothesis, the maximum probability with which we
should be willing to risk a type I error is called the level of significance of the
test. This probability is often specified before any samples are drawn so that
results obtained will not influence our decision.
In practice a level of significance of 0.05 or 0.01 is customary, although other
values are used. If for example a 0.05 or 5% level of significance is chosen in
designing a test of a hypothesis, then there are about 5 chances in 100 that we
would reject the hypothesis when it should be accepted; i.e., whenever the null
hypothesis is true, we are about 95% confident that we would make the right
decision. In such cases we say that the hypothesis has been rejected at a 0.05
level of significance, which means that we could be wrong with probability 0.05.

8.5

Test Involving the normal Distribution


To illustrate the ideas presented above, suppose that under a given
hypothesis, the sampling distribution of a statistic S in a normal distribution with
mean s and standard deviation s . The distribution of that standard variable
Z ( S s ) / s is the standard normal distribution (mean 0, variance 1) shown

113

in Figure 8.1, and the extreme values of Z would lead to the rejection of the
hypothesis.

Critical
region

Critical
region
0.95

0.25

0.25
Z = -1.96

Z = 1.96

As indicated in the figure, we can be 955 confident that, if the hypothesis


is true, the Z-score of an actual sample statistic S will be between 1.96 and 1.96
(since the area under the normal curve between these values is 0.95).
However, if on choosing a single sample at random we find that the Z Score of its
statistic lies outside the range 1.96 to 1.96, we would conclude that such an
event could happen with the probability of only 0.05 (total shaded area in the
figure) if the given hypothesis was true. We would then say that this Z-Score
differed significantly from what would be expected under the hypothesis, and we
would be inclined to reject the hypothesis.
The total shaded area 0.05 is the level of significance of the test. It represents the
probability of our being wrong in rejecting the hypothesis, i.e., the probability of
making a Type I error. Therefore, we say that the hypothesis is rejected at a 0.05
level of significance or that the Z Score of the given sample statistic is significant
at 0.05 level of significance.
The set of Z Scores outside the range 1.96 to 1.96 constitutes what is called the
critical region or region of rejection of the hypothesis of the region of
significance. The set of Z Scores inside the range 1.96 to 1.96 could then be
called the region of acceptance of the hypothesis or the region of non significance.
On the basis of the above remarks, we can formulate the following decision rule:

114

a)

Reject the hypothesis at a 0.05 level of significance of the Z Score of the


statistic S lies outside the range 1.96 to 1.96 (i.e., if either Z 1.96 or
Z 1.96). This is equivalent to saying that the observed sample statistic
or significant at the 0.05 level.
b)

8.6

Accept the hypothesis 9or, if desired, no decision at all) otherwise.

One-Tailed and Two-Tailed Tests


In the above test we displayed interest in extreme values of the statistic S
or its corresponding Z score on both side of the mean, i.e., in both tails of the
distribution. For this reason such tests are called two-tailed tests or two-sided
tests.
Often, however, we may be interested only in extreme values to one side
of the mean, i.e., on one tail of the distribution, as for example, when we are
testing the hypothesis that one process is better than another (which is different
from test whether one process is better or worse than the other). Such tests are
called one-tailed tests or one-sided tests. In such cases, the critical region is a
region to one side of the distribution, with area equal to the level of significance.

8.7

P-Value: The P-value is the smallest value of which will lead to the rejection
of the null hypothesis.

8.8

Special Tests
For large samples, many statistics share nearly normal distributions with
mean s and standard deviation s . In such cases we can use the above results to
formulate decision rule or tests of hypotheses and significance. The following
special cases are just a few of the statistics of practical interest. In each case the
results hold e for infinite populations or for sampling with replacement. For
sampling without replacement from finite populations, the result must be
modified.
1.

Population Means: Here S X , the sample mean; s x , the


population mean; s x n , where is the population standard
deviation and n is the sample size. The standardized variable is given by

x
/ n

(1)

for n 30 .

115

for n 30 ,
tc

2.

x
S n

Population Proportions: Here S P, the proportion of successes in a sample;


s p P, where p is the population proportion of successes and n is the
sample size; s p
given by

pq / n , where q = 1 p. The standardized variable is

P p
pq / n

In case P

(2)

x
, where x is the actual number of successes in a sample, (2)
n

becomes
Z

3.

X np
npq

(3)

Differences of Population Means: let X 1 and X 2 be the sample means obtained


in large samples of sizes n1 and n2 drawn from respective populations having
means 1 and 2 and standard deviations 1 and 2 . Consider the null
hypothesis that there is no difference between the population means, i.e., 1 =
2 .

x x 0

1 x

12
n1

22
n2

(4)

The standardized variable is given by


Z

4.

X1 X 2 0

1X2

X1 X 2

(5)

X X

Difference of population proportions: let P1 and P2 be the sample proportions


obtained in large samples of sizes n1 and n2 drawn from respective proportions
P1 and P2 . Consider the null hypothesis that there is no difference between the
116

population proportions, i.e., P1 = P2 , and thus the samples are really drawn from
the same population.
1 1

n
1 n2

p P 0, P P 2 P(1 P)
1

where P

n1P1 n2 P2
is used as an estimate of the population proportion P.
n1 n2

By using the standardized variable Z

P1 P 2 0

P P
1

P1 P2

P P
1

we can observe

differences at an appropriate level of significance and thereby test the null


hypothesis.
Tests involving other statistics can similarly be designed.
Example 1.0
The mean lifetime of a sample of 100 fluorescent light bulbs produced by a
company is computed to be 1570 hours with a standard deviation of 120 hours.
If is the mean lifetime of all the bulbs produced by the company, test the
hypothesis 1600 hours. Use a significance level of 0.05 and find the P value
of the test.

1.

Ho :

= 1600
H a : 1600

2.

This is a two tailed test.

.025

.025
.95

-1.96

1.96

we reject H o if Z c is either 1.96 or -1.96

117

3.

n = 100, Z c

X
S
n

X = 1570, = 1600, S = 120


Zc

1570 1600 30

120
12
100

= -2.5

4.

Since Z = -2.5, -1.96, we reject H o .


P-value = 2P(Z 2.5) 2(0.0062) .0124.
Example 2.0
Consider the following hypothesis test. H 0 : 12 ; H a : 12 . A sample of 60
provides a sample mean of 8.58 and sample standard deviation of 3.
(a) At 0.05 , what is the critical value for Z ? What is the rejection rule?
(b) Compute the value of the test statistic Z . what is your conclusion?
Solution
(a) This is a one tailed test to the left. Z 0.05 1.65 . Reject H 0 when the calculated
test statistic is less than -1.65.
x 8.58 12

8.83
(b) Z C
s
3
n
60
Since Z C 8.83 1.65 , we reject H 0 .and conclude that we have sufficient
evidence based on this sample at 5% level of confidence to say that the population
mean is less than 12.

118

(a)
(b)
(c)
(d)
(e)

Example 3.0
Consider the following hypothesis test. H 0 : 15 ; H a : 15 . Data from the
sample of seven items are: 8, 10, 9, 11, 15, 9, 7.
Compute the sample mean.
Compute the sample standard deviation.
With 0.05 , what is the rejection rule?
Compute the value of the test statistic t .
What is your conclusion?
Solution

(a) The sample mean x

x 69 9.857
n

x
x n

721

692

n 1
6
(c) This is a two tailed test we reject H 0 if tc t0.025, 6 2.447 or
(b) The sample standard deviation s

2.6095

tc t0.025, 6 2.447

x 9.857 15

5.2144
s
2.6095
n
7
(e) Since tc 5.2144 5.2144 , we reject H 0 .

(d) tc

(a)
(b)
(c)
(d)

(a)
(b)

(c)
(d)

Example 4.0
Consider the following hypothesis test. H 0 : P 0.35 ; H a : P 0.35 . A sample
of 500 provides a sample proportion of P .255 .
At 0.01, what is the rejection rule?
Compute the value of the test statistic Z .
What is the P value?
What is your conclusion?
Solution
This is a two tailed test. Reject H 0 if Z C 2.58 or Z C 2.58
P P
0.255 0.35
Z

4.45
0.350.65
pq
n
500
P Value P(Z 4.45) Or PZ 4.45 . Because of the symmetrical nature of
the normal distribution , P value 2 PZ 4.45 0
Reject H 0 .

119

Learning Objectives
After working through this chapter you should be able to:

Define and use the terminology of statistical testing.

Carry out statistical tests of all the types covered in this Chapter.

Calculate the P-value of the simpler tests.

Explain the way in which the rejection regions of tests follow from the
distributional results, taking into account the level and considerations of power.

120

Sample Examination Questions


1.

2.

3.

A finite population consisting of the numbers 6, 7, 8 10 and 11 can be converted


into an infinite population if we take a random of size 2 by first drawing one
element and then replacing it before drawing the second element.
(a)

Determine how many different samples of size 2 can be drawn from this
infinite population and list them.

(b)

Determine the means of the samples of part (a). What is the probability
assigned to each mean? Construct the sampling distribution to the mean for
random samples of size 2 drawn from this infinite population.

(c)

Calculate the mean and the standard deviation of the probability distribution
of part (b) and compare the value of the standard deviation with the
corresponding result obtained from the standard error of the mean formula.

(a)

Explain briefly with examples:


(i)

population parameter

(ii)

sample statistics

(iii)

population

(b)

Chisha is a cocktail hostess in a very exclusive private club. The Zambia


Revenue Authority is auditing her tax return this year. Chisha claims that
her average tip last year was K23, 750. To support her claim, she sent the
ZRA a random sample of 52, credit card receipts showing her bar tips.
When ZRA got the receipts, they computed the sample average and found it
to be x K 26 ,250 with sample standard deviation S K 5,750 . Do these
receipts indicate that the average tip Chisha received last year was more
than K23,750. Use a 1% level of significance. Also find the P-value.

(a)

Briefly define each of the following terms:


(i)

Finite population correction factor

(ii)

Simple random sampling

(iii)

Standard error

121

(b)

4.

(a)

(b)

A government agency recently found that an artificial sweetener used in diet


soft drinks may have harmful side effects. Therefore, it sets limits on the
amount that each can, may contain at 0.1 ounce. The manager of a local
soft drink company, thinking that the mixing machine may not be staying
within the tolerate limit, runs a test on 100 cans. The test shows the cans to
have an average of 0.13 ounce of artificial sweetener. The population
standard deviation is 0.06.
(i)

Should the manager adjust the machine if 0.05?

(ii)

If 0.02, should the manager adjust the machine?

(iii)

Which value of would you pick for this problem?

(iv)

What if x 0.12 0.02 ?

(v)

At what value of x should he keep the machine 0.02 as it is?

Define each of the following:


(i)

The power of a test.

(ii)

A students test.

The table below shows the annual salaries in millions of kwacha of


randomly selected faculty in public educational institutions and private
educational institutions.
Public
Private

80
86

90
95

100 110 85 75
105 115 92 74

65
64

85
92

72
73

74

(i)

Find a 90% confidence interval for the difference between


population mean annual salaries in the public and private
institutions.

(ii)

Test the null hypothesis that the mean salary for the private
institutions is K5, 000,000 more than in the public institutions
against the alternative that the mean for the private institutions is
more than K5, 000,000 greater.

(iii)

State carefully the assumptions you have made in arriving at the test
and confidence interval.

122

5.

(a)

(b)

6.

(a)

(b)

7.

(a)

Explain the following terms used in statistical hypothesis testing:


(i)

Rejection region

(ii)

Significance level of the test.

A random sample of 25 engineers in company A produces a mean salary of


K90,000,000 with standard deviation of K15,000,000; and a random sample
of 86 engineers in company B produces a mean salary K110, 000,000 with a
standard deviation of K20, 000, 000.
(i)

Can we conclude that company B pays its engineers more than


company A? Use an 0.05 level of significance.

(ii)

What is the P-value for this test?

Define each of the following:


(i)

The power of a test

(ii)

Rejecting a null hypothesis

(iii)

The Central Limit Theorem

An Air Force base mess hall has received a shipment of 10 000 gallon size
cans of cherries. The supplier claims that the average amount of liquid is
0.25 gallon per annum. A government inspector took a random sample of
100 cans and found the average liquid content to be 0.28 gallon per can
with a standard deviation of 0.10.
(i)

Does this indicate that the suppliers claim is too low? (Use 95%
level of significance).

(ii)

Compute the P-value.

A consumer group is testing camp stores. To test the heating capacity of a


store, the group measures the time required to bring 2 litres of water from
10c to boiling (at sea level).
Two competing models are under consideration. Thirty-six
stores of each model are tested and the following results are obtained:
x1 11 .4 min ; Standard deviation S1 25 min

Model 1:

Mean time

Model 2:

Mean time x2 9.9 min ;

123

Standard deviation S 2 30 min

Is there any difference between the performance the performances of these


two models? (Use a 5% level of significance). Also find the P-value for the
sample test statistic.
(b)

Define briefly the following terms:


(i)

Type I error

(ii)

Decision

(iii)

Type II error

124

CHAPTER 9

ANALYSIS OF VARIANCE
Reading

Newbold Chapter 15
Wonnacott and Wonnacott Chapter 10
Tailoka Frank P Chapter 13

Introductory Comments
Analysis of Variance (ANOVA) is a popular tool that needs some time and effort to
appreciate. The idea of analysis of variance is to investigate how variation in structured
data can be split into pieces associated with components of the structure. Here we cover
one-way and two-way cases. Both tests and confidence intervals are widely used in
applications.

Analysis Of Variance
Use of F-distribution: The F-distribution is used to test the hypothesis that the variance of
one normal population equals the variance of another normal population.
The second use of the F-distribution involves the analysis of variance techniques,
abbreviated ANOVA. Basically, analysis of variance uses sample information to
determine whether or not three or more treatments produce different results. A treatment
is a cause, or specific source, of variation in a set of data. Following are several cases to
expand on the meaning of a treatment.
Do different treatments of fertilizer affect yield? Do different grades of gasoline affect
performance? Do four different assembly methods result in different population means?

Assumptions Under laying the use of The Analysis of Variance Test


Before we actually conduct a test using the ANOVA techniques, the assumption
underlying the test will be examined. If the following assumptions cannot be met,
another analysis of variance technique may be applied.

125

1.

The three or more populations of interest are normally distributed.

2.

These populations have equal standard deviations

3.

The samples we select from each of the populations are random and independent
that is they are not related.

Analysis Of Variance Procedure:


The ANOVA procedure can best be illustrated using an example. Suppose the manager
of ABC resigned and three sales people at the branch are being considered for the
position. All three have about the same length of service, education and so on. In order
to make a decision, it was suggested that each of their monthly sales are shown in Table
1. The treatments in this problem are sales people.

Table 1.0
Monthly Sales of appliances for three sales People.

Sample

Ms Banda

Monthly Sales (K000)


Mr Mwenya

Mr Chisenga

25

25

19

15

15

17

14

17

13

10

16

11

21

17

12

18

14.4

17

Mean

126

The ANOVA procedure calls for the same hypothesis procedure outlined in the lecture
notes of Estimation and hypothesis testing.

STEP 1

The null hypothesis H o states that there is no significant difference among


the mean sales of the three salespeople; that is 1 2 3 . H a states that
at least one mean is different. As before, if H o is rejected, H a will be
accepted.

STEP 2

The level of significance is selected. In our case we choose 0.05 level.

STEP 3

The test statistic. The appropriate test statistic is the F-distribution.


Underlying this procedure are several assumptions.
1) The data must be at least interval level.
2) The actual selection of the sales must be chosen using a probabilitytype procedure.
3) The distribution of the monthly sales for each of the populations is
normal.
4) The variance of the three populations are equal, i.e. 2 1 2 2 23 .
F is the ratio of two variances.
F =
Estimatedpopulationvar iancebasedonthe var iationbetweenthesamplemeans
Estimatedpopulation var iancebasedon var iationwithinsamples
MST

MSE

The numerator has k-1 degrees of freedom. The denominator has N-K
degrees of freedom, where k is the number of treatments and n is the number
of observations.
STEP 4

The Decision Rule.


As noted previously the F-distribution and
accompanying curve are positively skewed and dependent on:
1)
2)

The number of treatments, K, and


The total number of observations, N.

For this problems we have K-1=3-1=2 degrees of freedom in the numerator.


There are 15 observations (three samples of five each). Therefore there are
N-K=15-3=12 degrees of freedom in the denominator

127

In using the predetermined 0.05 level, the decision rule is to accept the null hypothesis
H o if the computed F value is less than or equal to 3.89; we reject H o if the computed F
value is greater than 3.89. The decision rule is shown diagrammatically.

Region of rejection
Region of acceptance

3.89
Distribution of F for a k of 3 and an N of 15.

0.05

F scale critical value

0.05

STEP 5
Compute F, and arrive at a decision. The first step is to set
up an ANOVA table. It is merely a convenient form to record the sum of
squares and other computations. The general format for a one-way
analysis of variance problem is shown in table 2.0
Table 2.0
A general format for Analysis of Variance Table

Source of
variation

(1)
Sum of Squares

(2)
Degrees of
freedom

(3)
Mean squares (1)/(2)

K-1

SST
MSTR
K 1

N-K

SSE
MSE
N K

SST
Between
Treatments
SSE
Error(within
treatments)
SS Total
Total

128

Formula For

SST
K 1
SSE
NK

F=

MSRT
MSE

Where

MSTR is the mean square between treatments.


MSE is the mean square due to error. It is also referred to as the mean
square within treatments.

SST

is the abbreviation for the sum of square treatment and is found by:
2
(T 2 ) ( X )

SST =
n
N

SSE is the abbreviation for the sum of square error.


Where:

Is the number of observations for each respective treatment

x
x

=
2

Treatment total

is the sum of all the observations (sales)


is the square of each observation (sales) and then the sum
of the squares.

is the number of treatments (sales people)

is the total number of observations

Compute SST
SST =

2
(T 2 ) ( X )

n
N

(75) 2 (90) 2 (72) 2 (247) 2

5
5
5
15

= 4101.8 4067.27
= 34.53

129

Compute SSE
[T ]
X N
2

SSE =

(85) 2 (90) 2 (72) 2


= (25) 2 (15) 2 ....(12) 2

5
5
5

= 4.355 4.101.8 = 253.2


Total variation (SS total) is the sum of the between-columns and the between-rows
variation, that is SS total = SST + SSE = 34.53 + 253.2 = 287.73.
As a check
SS Total =

= 4 355 -

( 247 )
15

( X )

= 4.355 4067.27

= 287.73
Three sums of squares and the calculation needed for F are transferred to the ANOVA
Table 3.

Table 3.0
ANOVA Table for the Store Managers problem
Source of
variation

(1)
Sums of square

(2)
degrees of freedom

Between
treatment

SST = 34.53

K-1=3-1=2

Error (within
253.2
SSE
=
SS Total)
287.73

N-K = 15-3 = 12

130

1
Mean squares
2
SST 34.53

17.265
k 1
2

SSE
253.2

21.1
NK
12

SST
MSRT 17.265

0.818
K 1
MSE
21.1
SSE
NK

Computing F: F =

The decision rule states that if the computed value of F is less than or equal to the critical
value of 3.89, the null hypothesis is accepted. If the F value is greater than 3.89, H o is
rejected and H a is accepted. Since 0.818 < 3.89, the null hypothesis is accepted at the
0.05 level. To put it another way, the differences in the mean monthly sales (K17,000,
K18,000 and K14,000) are due to chance (sampling). From a practical standpoint, the
levels of sales of the three salespeople being considered for Store manager are the same.
No decision with respect to the position can be made on the basis of monthly sales.
Inferences about Treatment Means
Suppose in carrying out the ANOVA procedure, we make the decision to reject the null
hypothesis. This allows us to conclude that all treatment means are not the same.
Sometimes we may be satisfied with this conclusion, but in other instances we may want
to know which treatment means differ. Let us consider the following example:
Four groups of students were subjected to different teaching techniques and tested at the
end of a specified period of time. As a result of dropouts from the experimental groups
(due to sickness, transfer, and so on), the number of students varied from group to group.
Do the data shown below present sufficient evidence to indicate a difference in the mean
achievement for the four teaching techniques? Use 0.05 level of significance.
1
65
67
73
79
81
69
454

SS (total) =
= 139511 -

2
75
69
83
81
72
79
90
549

X
(1779 )
23

ij

3
59
78
67
62
83
76

4
94
89
80
88

425

351

( X ij )

= 139511 137601.78
= 1909.22

131

T2
CM

i 1 ni
K

SST =

(454 ) 2 (549 ) 2 (425 ) 2 (351 ) 2

137601 .78
6
7
6
4

= 34352.667 + 43057.29 + 30104.17 + 30800.25 137601.7826


= 13814.377 137601.783 = 712.594
SSE = SS total SST = 1909.22 712.59 = 1196.63

Table 4.0
ANOVA Table For Students
Source of
Variation
SST
SSE
SS Total

Sums of square
712.59
1196.63
1909.22

Degrees of
Freedom
3
19
22

Mean square
237.53
62.98

237.53
3.77
62.98

Decision Rule: Reject H o if the computed F value is greater than F.05, 3, 19 = 3.13.
Since FC 3.77 3.13 , we reject H o .
Recall that in the Stores managers data there was no difference in the treatment means.
In this case further analysis of the treatment means is not warranted. However, in the
foregoing example, regarding mean achievement for the four teaching techniques, we
found a difference in the treatment means. That is, the null hypothesis is rejected and the
alternative hypothesis is accepted. If the achievements do differ, the question is between
which groups do the treatment means differ?
Several procedures are available to answer this question. Perhaps the simplest is through
the use of confidence intervals. A confidence interval for the difference between two
population means is found by:

X 2 t
2

N K

1 1
MSE
n1 n2

Where:
132

X1

is the mean of the first treatment

X2
t
MSE
n1
n2

is the mean of the second treatment


is obtained from the table. The degrees of freedom are equal to N-K
is the mean square error term obtained from the ANOVA table (SSE/N-K)
is the number of observations in the first treatment
is the number of observations in the second treatment.

If the confidence interval includes 0, we conclude there is no difference in the pair of


treatment means. However, if both end points of the confidence interval are of the same
sign, it indicates that the treatment means differ.
The 0.95 level of confidence for the difference between 1 and 2 is found by

X 2 t
2

,N K

1 1
MSE
n1 n2

= (75.67 78.43) 2.093

1 1
62.98
6 7

= -2.76 9.24
= --12.00 and 6.48
where
X 1 = 75.67, X 2 = 78.43

t = 2.093 from Appendix A table A.6 (N-K = 19 degree of freedom).


MSE = 62.98 from the ANOVA Table
n1 = 6, n2 = 7

Similarly, consider X 1 = 75.67 and X 4 87 .75


We found that the 95 percent confidence interval ranges from 22.8 up to 1.36. Both
end points are negative: we can conclude these treatment means differ significantly. That
is students subjected to teaching techniques 4 have higher score than those subjected to
teaching technique 1.

133

Caution
The investigation of differences in treatment means is a sequential process. The initial
step is to conduct the ANOVA test. Only if the null hypothesis that the treatment means
are equal is rejected should any analysis of the treatment means be attempted.
Two-Way Analysis of variance:
In the appliance sales, example, we were unable to show that a difference exists among
the mean sales of the three salespeople. In the computation of F- statistic, variation was
considered as originating from two sources. First, variation within each of the treatment
was considered. The variation either originated from the treatment or was considered
random. There are other possible sources of variation, such as the training the sales
people had, the days of the week on which the sample data were obtained, etc. Two-way
analysis of variance allows us to consider at least one other of these possibilities.
Example:
EUROAFRICA is expanding bus services from the Capital City into the heart of the
Copperbelt. There are four routes being considered from Kitwe to the other four towns.
The travel times in minutes along each of the four routes are given below.
Travel Time from Kitwe to Other Four Towns
DAY
Monday
Tuesday
Wednesday
Thursday
Friday

LUANSHYA
40
38
38
37
41

NDOLA
45
42
40
43
41

CHINGOLA
46
44
44
42
40

MUFULIRA
34
30
33
40
32

At the 0.05 significance level, can it be concluded there is a difference among the four
routes? Does it make a difference which day of the week it is?
The null hypothesis is that the mean time is the same along the four routes, then this
requires the one-way ANOVA approach. The variation that occurs because of
differences in the days of the week is considered random and is included in the MSE
term. Thus the F ratio is reduced. If the variation due to the day of the week can be
removed, the denominator or the F ratio will be reduced. In this case, the day of the week
is called a blocking variable. Hence, we have variation due to treatment and due to
blocks. The sum of squares due to block (SSB) is computed as follows:
SSB

B
K

( X ) 2
N

134

Where B refers to the block total, that is, the total for each row, and K refers to the
number of items in each block.
The same format is used for the two-way ANOVA Table as was used in the one-way
ANOVA case. SST and SS total are computed as before. SSE is obtained by subtraction
(SSE = SS Total SST-SSB). Table 4.0 shows the necessary calculations.

Calculations Needed For Two-Way ANOVA


Day
Monday
Tuesday
Wednesday
Thursday
Friday
Column Total
Sum of Square
Sample size

Luanshya
40
38
38
37
41
194
7538
5

Ndola

Chingola

Mufulira

46
44
44
42
40
216
9352
5

34
30
33
40
32
169
5769
5

45
42
40
43
41
211
8919
5

Row
Sum
165
154
155
162
154
790
31578

Analogous to the ANOVA Table for a one-way analysis, the two way general format is:

Source of

(1)
Sum of Squares

(2)
Degrees of
freedom

(3)
Mean squares (1)/(2)

K 1

SST
MSTR
K 1

SST
Treatments
SSB
Blocks

Error

SSE
SSTotal

n 1

( K 1)(n 1)

Total

135

SSB
MSB meansquare
n 1

SSE
MSE
( K 1)( n 1)

As before, to compute SST


(T 2 ) ( X )

SST =
n
N

(194) 2 (211) 2 (216) 2 (169) 2 (790) 2


=

5
5
5
20
5

= 31474.8 31205
= 269.8

SSB is found by:


SSB

B X

1652 1542 1552 1622 1542

=
31205
4
4
4
4
4

= 31231.5 31205 = 26.5

The remaining sum of squares is


SS Total =
= 31578 -

X
2

(790 )
20

( X )

= 31578 31205
= 373
SSE = SS total SST SSB
= 373 269.8 26.5
= 76.7

136

The values for the various components of the ANOVA Table are computed as follows:
(1)
Sum of Squares

Source of
variation

(2)
Degrees of
freedom

(3)
Mean squares (1)/(2)

89.933

6.625

12

6.392

269.8
Treatments
26.5
Blocks
76.7

Error

Total

373

19

There are two sets of hypothesis being tested:


1.

2.

Ho

The treatment means are the same. 1 2 3 4

Ha

The treatment means are not the same.

Ho

The block means are the same. 1 2 3 4 5

Ha

The block means are not the same.

First we all test the hypothesis concerning the treatments means. There are K-1 = 4-1 = 3
degrees of freedom in the numerator and (n-1) (K-1) = (4-1)(5-1) = 12 degrees of
freedom in the denominator. Using the 0.05 significance level, the critical value of F is
3.49. The null hypothesis that the mean times for the four routes are the same is rejected
if the F ratio exceeds 3.49.

F=

MSTR 89.933

14.07
MSE
6.392

The null hypothesis is rejected and the alternate accepted. It is concluded that mean
travel time is not the same for all routes. EUROAFRICA will want to conduct some tests
to determine which treatment means differ.
Next, we test to find out if the travel time is the same for different days of the week. The
degrees of freedom in the numerator for blocks is n-1 = 5-1 = 4. The degrees of freedom

137

in the denominator is the same as before: (n-1) (K-1) = (5-1) (4-1) = 12. The null
hypothesis that the block means are the same is rejected if the f ratio exceeds 3.26.
MSB 6.625
F=

1.04
MSE 6.392

The null hypothesis is accepted. The mean travel time is the same for the various days of
the week.
Problems
1)

Suppose that we want to compare the cholesterol contents of four competing diet
foods on the basis of the following data (in milligrams per package) which were
obtained for three 6-ounce packages of each of the diet foods.
Diet Food
A
3.6
4.1
4.0
nA 3

B
3.1
3.2
3.9
nB 3

C
3.2
3.5
3.5
nC 3

D
3.5
3.8
3.8
nD 3

The means of these four samples are YA 3.9,YB 3.4 , YC = 3.4 and Y4 3.7 .
We want to know whether the differences among them are significant or whether
they can be attributed to chance, using 5% level of significance.

2)

Of the three banks in Kitwe, customers are randomly selected from each bank and
their waiting times before service are recorded.

Bank
ZNCB
4.8
Standard Chartered 6.9
bank
7.1
Barclays bank

Waiting time (minutes)


5.5
6.3
8.5
5.3
4.3
3.5

Do these data indicate a significant difference among the mean waiting times of
these banks? Use the 0.05 significance level.

138

3.

4)

5)

A Wholesaler is interested in comparing the weight in grammes of tomatoes from


Lusaka, Ndola and Kitwe.
Lusaka

Ndola

Kitwe

5.6
8.8
9.0

7.8
8.2
7.4
8.2

11.0
10.1
8.9
9.3
10.0

a)

State the null and alternative hypothesis.

b)

Fill in an ANOVA Table

c)

What is the critical value of F, assuming a 0.01 level of significance?

d)

What decision should the wholesaler make?

Refer to problem 3. Let A and B respectively, denote the mean weights in


grammes of tomatoes from Lusaka and Ndola.
a)

Find a 95 percent confidence interval for A

b)

Find a 95 percent confidence interval for B

c)

Find a 95 percent confidence interval for A B

d)

What conclusion can you draw from the interval in c.

An experiment was conducted to complete the effect of four different chemicals,


A, B, C and D. In producing water resistance in textiles, a strip of materials,
randomly assigned to receive one of the four chemicals, A, B, C, or D. This
process was replicated three times, thus producing a randomized block design.
The design, with moisture-resistance measurement, is as shown in the
accompanying diagram (low readings indicate low moisture penetration).
a)

Do these data indicate a significant difference among the mean waiting


times of these banks? Use the 0.05 significance level.

139

b)

Do the data provide evidence to indicate that blocking increased the


amount of information in the experiment?

c)

Find a 95% confidence interval for the difference in mean moisture


penetration for fabric treated by chemicals A and D.

d)

Interpret the interval.


1
C
9.9
A
10.1
B
11.4
D
12.1

2
D
13.4
B
12.9
A
12.2
C
12.3

3
B
12.7
D
12.9
C
11.4
A
11.9

ANSWERS
Diet Food:
A
3.6
4.1
4.0
Total X 11.7

B
3.1
3.2
3.9
10.2
35.06

45.77

C
3.2
3.5
3.5
10.2
34.74

X
X

SS Total =

43 .2

= 156.7 -

12

= 156.7 155.52
= 1.18

140

D
3.5
3.8
3.8
11.1
41.13

T X
SST =

(11 .7) 2 (10 .2) 2 (10 .2) 2 (11 .1) 2


3

- 155.52

136.89 104.04 104.04 123.21


155.52
3
= 156.06 155.02

= 0.54
SSE = SS Total SST = 1.18 054 = 0.64
___________________________________________________
Source of
Degree of
Mean square
F
Variation
Freedom
____________________________________________________
SST = 0.54
3
0.18
SSE = 0.64
8
0.08
2.25
___________________________________________________
SS Total = 18
11
____________________________________________________
F0.05,3,8 4.07 , Therefore we accept H o

2)

_______________________________________________
Bank

Waiting

Sample

Time
Size
________________________________________________
ZNCB
4.8, 5.5, 6.3
3
16.6
92.98
Standard
Chartered
Bank

6.9, 8.5, 5.3, 4.3

25

166.44

Barclays
7.1, 3.5
2
10.6
62.66
________________________________________________

141

(52 .2) 2
322 .08 302 .76 19 .32
SS Total = 322 .08
9
(16 .6) 2 25 10 .6

302 .76
SST =
3
5
2
2

= 91.853 + 156.25 + 56.18 302.76


= 304.283 302.76 = 1.523
SSE = SS Total SST = 19.32 1.523 = 17.797

Source of
variation
SST
SSE

Sum of Square
1.523
17.797

Degree of
freedom
2
6

Mean square

0.7615
2.966
0.257

SS Total

19.32

F0.05, 2, 6 5.14 . Therefore, we accept H o .

3.

H o : 1 2 3

H a : One of them is not all equal

Reject H a is F is greater than 8.02.


SS Total = 428.59

(104 .3) 2
12

= 928.59 906.54 = 22.05


(23 .4) 2 (31 .6) 2 (49 .3) 2

906 .54
SST=
3
4
5

= 11.718

142

SSE = 22.05 11.718 = 10.332

Source of
variation
SST
SSE

Sum of Square

Degree of
freedom
2
9

11.718
10.332

Mean square

5.859
1.148
5.10

SS Total

22.05

11

We cannot reject H o since F0.01, 2,9 8.02 . The evidence does not suggest any
differences in the weights of tomatoes.
4)

a)

for a simple treatment T1 t S / n1


2

where S =

S =
1.07
7.8 t0.025,9
3

MSE

7.8 2.262 (0.618)


(6.402, 9.198)

b)

7.9 (2.262)

(1.071 )
4

7.9 1.2
(6.7, 9.1)

T T t
1

c)

1 1

ni n j

(7.8 7.9) (2.262) (1.071)


-0.1 1.85
(-1.95, 1.75)

143

1 1

3 4

This interval traps 0 which implies there is no significant difference


between the two means.
2

5)

SSB =

(43 .5)
(50 .8)
(48 .9) 2 (143 .2)

4
4
4
12

= 473.0625 + 645.16 + 596.8025 1708.85 = 7.175


SS Total

(143 .2)
1721.76 12

= 1721.76 1708.85 = 12.91

SST =

34 .22 37 2 33 .62 38 .42


3

33

1708 .85

= 39.88 + 456.33 + 376.32 + 491.52 1708.85


= 5.2
SSE = 12.91 7.175 5.2 = 0.535

Source of
variation
SST
SSB
SSE
SS Total

5)

a)

Sum of Square
5.2
7.175
0.535
12.91

Degree of
freedom
3
2
6
11

F0.05,3, 6 4.76 we reject H o

H a : A B C D
H a : One of the means is not equals.

b)

F0.05, 2, 6 5.14 We reject H o

144

Mean
square
1.7333
3.5875
0.0892

F
19.43
40.22

H o : 1 2 3
H a : One of the means is not equal.

c)

11.4 12.8 t0.25,6

1 1
0.0892
3 3

1.4 2.447 0.249

-1.4

0.597

(-1.997, -0.803)

145

Learning Objectives

After working through this Chapter you should be able to:

Explain the purpose of analysis of variance

Carry out small examples of one way and two-way analysis of variance with a
hand calculator, presenting in an ANOVA table.

Carry out tests of hypothesis, and to write down confidence intervals as in this
Chapter.

146

Sample Examination Questions


1.

a)

A restaurant owner operates three restaurant within a city. One in a major


shopping centre (A), one near the college campus (B), and one at the park
area (C). The management has collected the following data on daily sales
(in thousands of kwachas).
A

Monday

10.5

8.4

5.9

Tuesday

8.4

9.3

7.1

Friday

12.6

11.4

6.7

Saturday

18.3

7.9

14.2

Sunday

10.8

6.3

13.7

Day

(b)

(i)

What type of experimental design is represented here?

(ii)

Construct an ANOVA summary table for this experiment.

(iii)

Is there evidence of a difference in mean sales among the


restaurants? (Use 0.05 ).

(iv)

Is there evidence (at 0.05 ) of a difference in the mean sales for


the five days.

(v)

Estimate the difference in mean sales between the restaurant created


at the shopping center and near the college campus. Use a 90%
confidence interval.

(vi)

State the assumptions required for the validity of the procedures used
in parts (ii) to (v).

A major appliance dealer wishes to compare his mean television sales


during three different periods of the week. Beginning (Monday, Tuesday),
Middle (Wednesday, Thursday), and End (Friday, Saturday). His plan is to
select random samples of sales records from each period, and record the
number of television sets sold. What type of experimental design is this?

147

2.

(a)

What is a two-way ANOVA test?

(b)

A power plant, which uses water from the surrounding bay for cooling its
condensers, is required by the Environmental Protection Agency (EPA) to
determine whether discharging its heated water into the bay has a
detrimental effect on the flora (plant life) in the water. The EPA requests
that the power plant make its investigation at three strategically chosen
locations, called stations. Stations 1 and 2 are located near the plants
discharge tubes, while station is further out in the bay. During one
randomly selected day in each of 4 months, a diver is sent down to each of
the stations, randomly samples a square meter area of the bottom, and
counts the number of blades of the different types of grasses present. The
results are as follows for one important grass type.

Month

(c)

Station
1

May

28

31

53

June

25

22

61

July

37

30

56

August

20

26

48

(i)

Is there sufficient evidence to indicate a difference among the mean


numbers of blades found per square meter per month for the three
stations? Use 0.05 .

(ii)

Is there sufficient evidence to indicate a difference among the mean


numbers of blades found per square meter for the 4 months? Use
0.05 .

Place a 90% confidence interval on the difference in means between stations


1 and 3.

148

3.

(a)

An advertising firm is studying the effects of four different kinds of displays


of a product in a grocery store in three different sales areas in the city.
Within each sales area, four stores are selected, and each receives one of the
four displays. Over the duration of the experiment, the number of units of
the product sold is recorded. The data are shown in the table.
Display

4.

Sales Area
1

120

76

95

114

60

102

140

85

122

102

80

85

(i)

Which model is appropriate for analyzing these data? Explain.

(ii)

Do the four displays result in different averages? Use 0.05 to


reject.

(b)

State the three assumptions of the error term in the analysis of variance
models. Which of the three assumptions is most critical in validating an
analysis of variance model fitted to a data set?

(a)

What is an ANOVA test?

(b)

A supermarket chain conducted a study to determine where to place its


generic brand products in order to increase sales. Sales (in thousands of
kwacha) for one were as follows:
Store 1

Store 2

Store 3

High shelf

60

56

52

Eye-level shelf

53

58

56

Low shelf

55

55

59

Perform a two-way analysis of variance. Use 5% level of significance.

149

5.

(a)

Three of the currently most popular television shows produced the following
ratings (percentage of the television audience tuned into the show) over a
period of four weeks:

Week
1
2
3
4
Totals

(b)

SHOW
B
28.4
32.2
32.4
28.2
121.2

A
34.7
38.1
35.1
30.4
138.3

C
23.8
20.7
25.8
29.9
99.2

Totals
86.9
91.0
93.3
87.5
358.7

0.01) that the mean ratings differ

(i)

Is there evidence (at


for the three shows.

(ii)

Is there evidence (at =0.01) that the use of weeks as blocks is


justified in this experiment.

(iii)

Construct a 95% confidence interval for the difference in mean


ratings between shares B and C.

(iv)

State the assumptions necessary for the validity of the procedure


used in (i) to (iii).

Independent random samples of six assistant professors, four associate


professors and five full professors were asked to estimate the amount of
time outside the classroom spent on teaching responsibilities in the last
week. Results, in hours are shown in the accompanying table.
Assistant
8
13
12
16
10
12

Associate
16
13
16
9

Full
12
8
7
10
8

(i)

What type of experiment design is represented here.

(ii)

Set out the analysis of variance table.

(iii)

Test the null hypothesis that the three population times are equal.
Use 0.05 .

150

CHAPTER 10
TIME SERIES

Reading
Newbold Chapter 17
Tailoka Frank P Chapter 6
Plane and Oppermann 395

Introductory Comments

This Chapter follows from the Index and allows the understanding of some alternative
ways of presenting the results. Index numbers plays an important role in forecasting and
here models of forecasting are presented.

10.1

Introduction
Any variable that is measured overtime in sequential order is called a time series.
The primary characteristic of a time series is the assumption that the observations
have some form of dependence on time. Since this time dependence may take on
any number of possible patterns, the problem becomes one of identifying the most
important factors.
Business people, economists, and analysts of various kinds all look back at the
sequence of events that occurred over the past year or years in order to understand
what happened and thereby (they hope) to be in a better position to anticipate
what may happen in the future.
A leveling-off long-term population growth, for example, may indicate to a
particular firm that future market expansion may not be unlimited and that more
careful attention should be paid to increasing the firms market share. Even with
a general slowdown in population growth, the gradual aging of the population
may imply to another firm one concentrating in consumer goods for older
people that its total market potential is growing substantially year after year,
other types of time dependent patterns may exist, as well. In looking at a time
series of monthly or quarterly beer sales, for example, we may discover a regular
seasonal pattern in which beer consumption peaks. Other regular periodic or
seasonal variation can be observed in sales of college textbooks, and in the

151

observance of such social customs as giving Christmas gifts and Valentines Day
flowers.
The task of time series analysis can therefore be thought of quite generally as a
matter of identifying and isolating the various major time dependent patterns on a
given time series data array. Once accomplished, this analysis should enhance the
users ability to forecast variables of interest over the future.
The classical time-series model focuses on the decomposition of the timedependent variable into four component parts: trend (T), cycle (C), seasonal
variation (S), and residual or irregular variation (I).
The model may be additive in its component parts:
Yt Tt St I t Ct

or multiplicative in its component parts,


Yt Tt C1 S1 I1

The movements of a time series may be classified as follows:


1.

A trend (also known as a secular trend) is a long-term relatively smooth pattern


or direction that the series exhibits. By definition, it has a duration of more than
one year. For example, data for beer sales show them to have an upward trend to
the right, whereas birth rates over the last few years seem to have a downward
trend to the right.

2.

A cycle is a wavelike or oscillatory pattern about a long-term trend that is


generally apparent over a number of years. By definition, it has a duration of
more than one year. Examples of cycles are well known business cycles that
record periods of economic recession and inflation, long-term product demand
cycles and cycles in the monetary and financial sectors.

3.

Residual or Irregular Variation is the random movement that a series exhibits


after the trend, cycle, and seasonal variation are removed. For example, daily
centimeters of rainfall in a particular urban setup during a given month is often
random in this sense. Notice that all time series exhibit random variation while
they may not have a trend, a cycle, or seasonal variation. Moreover, whether or
not a particular trend, cycle, or seasonal variation is present in a given time series
critically depends on the time period chosen for observation.

152

4.

Seasonal these are the oscillations, which depend on the season of the year.
Thus, employment is usually higher at harvest time at Nakambala Sugar Estate in
Mazabuka. Rainfall will be higher at some times of the year than at others.
The motivation behind decomposing a time series is twofold. On the one hand,
we wish to see whether a particular component is present in a given time series
and to understand the extent to which it explains some of the movements in the
variable of interest. On the other hand, if we wish to forecast a particular
variable, we can usually improve our forecasting accuracy by first breaking it into
component parts, then forecasting each of these parts separately, and finally
combining the individual effects to produce the composite overall forecast.
Business Forecasting is concerned with estimating the future value of some
variable of interest. This may be done for the short-term or for the long-term, and
different forecasting models are more appropriate for one case than for the other.
Forecasting may be done in any of three possible ways. Using regression models,
using time series models, and using forecasting models especially created for a
specific purpose. Indeed, quantitative forecast models have even been designed
for cases in which historical databases are not available such as when a firm
wishes to forecast sales of a new product or the expected profitability or market
share for such a product.
Today, forecasters have developed a specialized terminology or jargon and many
forecasting models require a level of mathematical sophistication and the
availability of computers and specialized computer software that go far beyond
the scope of this book. As such, our objective in this course is to provide the
student with a basic understanding of the underlying issues about the use of
various types of forecasting models, rather than to provide a sophisticated level of
hands on experience.

10.2

Trend Analysis
The first component of a time series that we will consider is the long-term trend.
A trend can be linear or nonlinear and, indeed, can take on a whole host of other
functional forms such as polynomials and logarithmic trends, among others. We
shall begin by working through an example using a linear model.

Example
Annual sales for a pharmaceutical company have been recorded over the past 10
years; they are shown in Table 1.1. Calculate a linear trend of the data.

153

Table 1.1 Annual Data for Pharmaceutical example.


How we measure time along the horizontal axis (it turns out) is irrelevant in timeseries analysis. We can suit ourselves, picking whatever numbers serve to reduce
the computational burden. A common practice is to measure the time periods
consecutively (1, 2, 3, .), and we shall do so here.
Table 1.2 Calculations for Example 1.1
SALES
Y
18.0
19.4
18.0
19.9
19.3
21.1
23.5
23.2
20.4
24.4
207.2

YEAR
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
TOTAL

TIME
X
1
2
3
4
5
6
7
8
9
10
55

X2
1
4
9
16
25
36
49
64
81
100
385

SALES
(in K millions)
18.0

YEAR
1975
1976

19.4

1977

18.0

1978

19.9

1979

19.3

1980

21.1

1981

23.5

1982

23.2

1983

20.4

1984

24.4

154

XY
18.0
38.8
54.0
79.6
96.5
126.6
164.5
185.6
183.6
244.0
1,191.2

Least Squares Method: The simplest method of fitting a linear trend is to use the
least squares approach we discussed in the handout on Regression Analysis. In
this method, the formulas for the slope and intercept are:

xy
x

a Y bx

n
x 2
n

1191.2

55207.2

10
2

55
385
10

51.6
.6255
82.5

55 17.28
207.2
.6255
10
10

and the following trend equation can be written as:

Y 17.28 .6255x

17.28 .625511
24.1605
24.16

Similarly, forecasting 2 years ahead would involve setting x equal to 12; and so on.
Both confidence and prediction intervals can be constructed to give us a bound of
confidence about our forecast. The Caveat about forecasting outside the data range must
be emphasized here-especially if forecasting for more than one time period is being
contemplated.

155

Example 1.2
Among the more common functional forms used in trend analysis are the following three:
1.

A linear model,
y P0 P1 x

which is appropriate if the first differences are roughly equal (first differences are
between success values in time series).
2.

A polynomial form,

y P0 P1 x P2 x 2

( parabola)

or
y P0 P1 x 2

( parabola)

which is appropriate of differences between successive first differences).


3.

A logarithmic trend or exponential trend,

Y P0 ( P1 ) x
or
log y log P0 (log P1 ) x
which is appropriate if neither A linear or polynomial form fits but there
nonetheless appears to be a constant rate of increase over time.

10.3

Moving Averages
An alternative approach to trend-cycle analysis is to use moving averages. In a
sense, the moving average, MA, takes away the short-term seasonal and irregular
variation, leaving a combined trend widely used to remove seasonal variation,
irregular variation (or noise, as it is also called), or both.
Example 1.2
Monthly sales figure for gasoline were recorded at all the gas stations in a
particular town, as shown in table 1.3. Calculate the three-month and five month
moving averages.

156

Example 1.3
Monthly Regional Gasoline Sales

GASOLINE SALES
(1000s of kilograms)
37
70
45
26
60
45
31
79
24
61
25
44

MONTH
1
2
3
4
5
6
7
8
9
10
11
12

Solution
A moving average is a simple arithmetic average computed over any number of time
periods. For a three period moving average, we would take the first three months (1, 2,
and 3) and average them. Then we would move to the next month grouping (2, 3 and 4)
and averaging them; and so on. In a similar fashion, we can compute 5 month moving
averages, as shown in table 1.4, or any other number of months averages.
Table 1.4 Calculations for Moving Averages for Gasoline Sales Example

Month

Gasoline
Sales

1
2
3
4
5
6
7
8

37
70
45
26
60
45
31
79

3 month MA
Moving
3
Total
Moving
=
Average
152
50.7
141
47.0
131
43.7
131
43.7
136
45.3
155
51.7
134
44.7
157

5 month MA
Moving
3 Moving
Total
= Average

238
246
207
241
239
240

47.6
49.2
41.4
48.2
47.8
48.0

9
10
11
12

24
61
25
44

164
110
130
-

54.7
36.7
43.3
-

220
233
-

44.0
46.6
-

Notice that, the longer the time period, over which we average, the smoother the series
becomes. Eventually it becomes a straight line moving average. Reducing the number
of observation points for the 3 month moving average, we lose the first and last month;
for the 5 month moving average, we lose both the first 2 and the last 2 months.
In general, if we set the period of the moving average exactly equal to the number of
seasonal variations that occur in a given time series, we exactly remove that seasonal
variation. For example, if we have quarterly observations and wish to remove the four
seasons, we choose a 4 period moving average. Here (and in general) when the number
of periods chosen is even numbered we must compute a centered moving average.

Example 1.3
Historical occupancy rates for a Kasaba resort hotel have been compiled by the
government tourism office; these are shown in Table 1.5 calculate 4 quarter moving
average.
Solution
To remove the seasonal variation, we need to compute a 4 period moving average.
This, however, would place the moving average exactly between the two quarters.
Consequently, we next take a 2 period moving average of all 4 period moving averages,
thereby centering the final moving average on a particular quarter. Our calculations
appear in Table 1.6.

Notice that we first calculated the 4-quarter moving and then centered it by determining
the averages of each pair of adjacent moving averages. For example, the moving average
of the first four quarters is 105. The moving average of quarters (1980 and 1981) II, III,
IV and I are 90. The centered moving average is (105 + 90)/8 = 24.4. The remaining
centered moving averages are computed in a similar manner.

158

Table 1.5 Hotel Occupancy Rates


Year
1980

1981

1982

1983

1984

Quarter

Hotel Occupancy

I
II
III
IV
I
II
III
IV
I
II
III
IV
I
II
III
IV
I
II
III
IV

40
20
30
15
25
15
35
20
35
22
32
18
36
16
30
20
37
17
32
18

Moving averages are specifically designed to remove seasonal and/or irregular variations.
As such, they can be thought of as serving three purposes. First, they are one of several
types of smoothing techniques that remove short-term variation and leave only a
combined trend-cycle. In other words, if we think of the classical multiplicative time
series model, we have

Y T .C.S.I
by dividing both sides by (S.I.), we get
Y
T .C.S .I .

T .C MA
S .I
S .I

That is, we are left with the moving average series, which is composed solely of the trend
and cycle.

159

Second, we can set the period of the moving average exactly equal to the number of
seasonal effects we wish to remove. In that sense, we have deseasonalized our time
series.
Table 1.6 Centered Moving Average Calculation for Hotel Occupancy
Year

1980

1981

1982

1983

1984

Quarter

Occupancy

4 Quarter
Moving Total

I
II
III
IV

40
20
30
15

I
II
III
IV
I
II
III
IV

25
15
35
20
35
22
32
18

105
90
85
90
95
105
112

I
II
III
IV
I
II
III
IV

36
16
30
20
37
17
32
18

Centered
Moving
Average
24.4
21.9
21.9
23.1
25.0
27.1

109
107
108
102

27.6
27.0
26.9
26.3

100
102
103
104
106
104
-

25.3
25.3
25.6
25.9
26.3
26.3
-

This is one of the simplest methods of forecasting but it is only appropriate for series with
no trend or seasonal effect. It is often used to predict the demand for a product in the
next time period so that sufficient stock can be kept to supply it. (This is called demand
forecasting.)
10.4

Irregular Variation
Irregular or random variation remains after the trend, cyclic and seasonal variation
have been removed. One way of removing it is through smoothing techniques,
such as the moving average we discussed in section 1.3. Another popular
technique is exponential smoothing, which we shall look at shortly.
By definition, irregular variation is unpredictable and random, can only
sometimes be identified through examination of major external events that might
have influenced the time series, and often tend to cancel each other out over time.
160

Although certain mathematical techniques (such as spectral analysis) address


themselves to irregular variation and movements in residual error terms, they are
beyond the scope of this course.

Exponential Smoothing Exponential smoothing offers an alternative to moving


averages as a way of smoothing a exponential smoothing.

st Yt 1 Yt 1 1 Yt 2 ...
2

This formula states that the current periods smoothed value of the time series, St
depends on all past values of the dependent variable, although these are weighed
progressively less the farther back they go. We set the smoothing constant such that
2
0 1, which means that the successive values of , 1 , 1 ..., get smaller
and smaller. There is a mathematical procedure for selecting the best or optimal value of
the smoothing constant, but it is beyond the level of this course. In fact, selecting small
values for straightens out the time series more completely than selecting large values
of does. By simple mathematical derivation, it can be shown that the extended
exponential smoothing equation just described reduces to a computationally simpler
form, called the basic exponential smoothing equation:
St Yt 1 St 1

or

St Yt St 1 St 1

for

0 1

(1)

Note that S t is the forecasted value and Yt is the actual value.


We begin the smoothing procedure by initially setting S1 Y1 in the first period.
Successive values are individually computed as:

S2 Y2 1 S1
S3 Y3 1 S2

and so on.

Setting the smoothing constant to either of its extremes yields one of two cases. When
0, then

St 0. yt 1 0St 1
St 1
Since we set S1 Y1 , it follows that St Y1 for all t . Thus smoothed values are simply
equal to the initial value of the time series. Setting 1, then

161

St 1. yt 1 1St 1
Yt
Thus, the smoothed value of the series is just the most recent observation, and all earlier
observations are ignored. Such a series is called a random walk or a nave forecasting
model. Here, the forecast value in any particular year is simply the previous years
value.
The layout for working out problems using equation (1) is as follows:

(t)

Actual
Values
(Yt)

Y1

Y2

3
.
.
t

Time Period

Forecasted Values
St

Y S0

S1

Y3

Y2 S1
Y2 S 2

S3 S2 Y2 S1

.
.

.
.

.
.

Yt St 1

Yt

S 2 S1 Y1 S0

St St 1 Yt 1 St 2

The forecasts of values X t 1 are obtained by the series St 1 St Yt St 1 . This single


value is then used as the forecast value in all future years, i.e., for t 2,3,...
Example 1.3
Consider the example used by Roger C. Pfaffenberger and James H. Patterson, book,
Statistical Methods (1987) page 899. information on monthly sales of computer software
from Daltons Software, Inc., in Fortworth, Texas, for 1986 is given in Table 1.0 using values of 0.1 and 0.9 and forecast of sales for January 1986 of $2,100,
forecast sales for February 1986 through January 1987.

Month 1986
January
February
March
April
May
June
July
August

Yt

1
2
3
4
5
6
7
8

$1,800
2,000
1,800
3,000
2,700
1,900
3,000
2,600

162

September
October
November
December

9
10
11
12

1,700
1,200
2,400
1,500

0.1

Time Period
t
1
2
3
4
5
6
7
8
9
10
11
12

Actual Sales
Yt

Yt St 1

Forecast
Sales

$1,800
2,000
1,800
3,000
2,700
1,900
3,000
2,600
1,700
1,200
2,400
1,500

-30
-7
-26
96
57
-29
84
36
-58
-102
28
-65

2,100
2,070
2,063
2,037
2,133
2,190
2,161
2,245
2,281
2,223
2,121
2,149
2,084

2
, where n is the
n 1
number of periods in the equivalent moving average. For example, for a 4quarterly moving average over 1 year n 4 , 0.4. The larger the value of
n , of course, and the smaller the value of , the greater will be the smoothing
effect.

A useful rule for finding is given by the formula

163

Worked Examples

1.

Exponentially Smooth the following observed series of values:


45, 43, 46.

40, 35, 39, 44,

The old forecast for the first observed value should be taken as 40 with 0.2 .
St Yt St 1 St 1

2.

0.2

Yt St 1

S t 1

Yt

1
2
3
4
5
6
7

40
35
39
44
45
43
46

0
-1
0
1
1
0.4
0.92

40
40
39
39
40
41
41.4
42.32

Exponentially Smooth the following data what is the new forecast for the
production of aircraft in 1971? (Take 0.25 ).

Year

1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970

Production
of New
Aircraft

518

Year

t
1
2
3
4
5

395

487

Yt
518
395
487
450
319

450

319

415

431

0.25

S t 1

0
-31
0
-9
-40

518
518
487
487
478

Yt St 1

164

312

278

500

450

6
7
8
9
10
11

415
431
312
278
500
450

-6
-0.25
-30
-31
32
12

438
432
432
402
371
403
415

The new forecast for the production of aircraft in 1971 is 415.

Problems:
1.
The accompanying table shows earnings per share of a corporation over a
period of 18 years.
Year
1
2
3
4
5
6

2.

Earnings
3.63
3.62
3.66
5.31
6.14
6.42

Year
7
8
9.
10.
11
12.

Earnings
7.01
6.37
5.82
4.98
3.43
3.40

Year
13.
14.
15.
16.
17.
18.

(a)

Using Smoothing Constants 0.3, 0.5,


forecast based on simple exponential Smoothing.

(b)

Which of the forecasts would you choose to use?

Earnings
3.54
1.65
2.15
6.09
5.95
6.26

0.7

and

0.9, find

Manufacturer Sales of Womens Footwear (m. pairs)


1st Quarter
2nd Quarter
3rd Quarter
4th Quarter

1966
20.9
17.3
15.6
13.9

1967
17.5
14.7
13.5
13.1

1968
17.0
13.5
13.5
13.7

Is there any evidence that manufacturers sales of womens footwear are subject
to seasonal variation? Predict manufacturers sales during the first quarter of
1969.
New Forecasted
Value = old forecasted value + (actual observation old forecasted value).

0.2
165

Period
Reference
1
2
3
4
5
6
7

Actual
Demand
16
20
15
19
17
21
25

Old
Forecast
16
16
16.80
16.44
16.95
16.96
17.77
19.22

New
Forecast
16.00
16.80
16.44
16.95
16.96
17.77
19.22

y2 16 0.220 16 16 0.24 16.80


y3 16.80 0.215 16.80 16.80 0.2 1.8 16.44
y4 16.44 0.219 16.44 16.44 0.22.56 16.952
y5 16.95 0.217 16.95 16.95 0.20.05 16.96
y6 16.96 0.221 16.96 16.96 0.24.04 17.77
y7 17.77 0.225 17.77 17.77 0.27.23 19.22

Learning Objectives
After working through this Chapter you should be able to:

Define the term time series

Discuss appropriate model to use when forecasting, least squares method, moving
average method, exponential smoothing method

166

CHAPTER 11
INDEX NUMBERS

Reading
Newbold Chapter
Plane and Oppermann Chapter 16
Tailoka Frank P Chapter 5

Introductory Comments
This Chapter looks at an index number which is useful in describing the way in which
the economic changes from period to period using prices, quantities etc.
A device constructed by statisticians which attempts to explain the magnitude of
economic changes overtime is called an index number.
An index number shows the rate of change of a variable from one specification to
another.
You will realize that the index of retail prices attempts to measure the change in the price
of a whole range of goods and services that we regularly buy. So you can see that it is
attempting to measure the cost of living something that concerns us all. In times of
inflation, the retail price index is probably more important than at any other time in its
existence. In the some developed countries increase in pay and pension are index linked. The consumer price index (CPI) is an indicator of what is happening to prices
consumers are paying for the items purchased. The CPI measures changes in price over a
period of time. It is often used as a measure of inflation.
However, to do this we need to know what an index is, how it is calculated and what its
limitations are. The primary function of price index is to compare prices in one year with
those in some other years. Technically prices in a given year are to be compared with
prices in the base year which are taken as standard. Conventionally P1 refers to the price
in the given year and P0 refers to the price in the base year.
A Price Index: measures the change in the money value of a group of items overtime. If
only one item such as bread is being considered the comparison between years may be
made by the calculation of price relatives, i.e., the prices in the given year relative to the
base year.

167

P
Price relative = 1 .100
P0
e.g., if the price of a loaf was K1100 in 1999 and K1700 in 2000, the 2000 price relative
1700
to 1999 was
100 154 .5 . The interpretation of price index is straight forward.
1100
The price index for 2000 is 154.5. This means that the 2000 price of a loaf of bread is
154.5 percent of the 1999 (base year) price of a loaf of bread.

Four Main Considerations to be borne in Mind When Constructing an Index


Number
i)

The purpose of the Index Number.

ii)

Unless the purpose is clearly defined the eventual usefulness of the final index
will be suspect. In other words it must be designed to show something in
particular.
Selection of the items for inclusion

iii)

The main principles to be followed are that the items selected must be
unambiguous, relevant to the purpose, and their values ascertainable.
Since index numbers are concerned largely with making comparisons over
given time periods, an item selected one year must be clearly identified (i.e. in
terms of size, weight, capacity, quality, etc.) so that the same item can be
selected the following year for comparison.
Selection of appropriate weights

iv)

Decisions must be made on the level of importance to attach to each change


from one year to the next, or the relative importance of each item to the whole
list.
Selection of a base year
Care must be exercised so that an abnormal year is not chosen in relation to
the characteristic being measured. If an abnormally high year is chosen, all
subsequent changes will be understated, whereas if an abnormally low year is
chosen, all subsequent changes will be overstated in percentage terms.

If more than one item of commodity is to be considered to give an overall impression of


rising or falling prices it becomes necessary to combine the prices of these items into
some form of a weighted average or index number. The most commonly used form is
that calculated by the Laspeyres formula.

I1

Pq
Pq

1 0
0 0

168

.100

where I1

= index number for the given year

q 0 = weight applied to each price calculated from the base year.

= sum to be taken over all the items.

Po = price calculated from the base year.

P0

Note that the above formula can be written as I

W P

1 100 where we let


W

W p0 q0 and stands for the base year weights.


Consider question one (Tailoka Frank P, Chapter 5, Question and Answer, Business
Mathematics and Statistics). I1 is then the index number for 1991. , I 0 1990

A
B
C

q0
500
35
65

p1
75
90
100

I1

P0
45
50
55

pq
p q

1 0

P1q0
37500
3150
6500
47150

P0 q0
22500
1750
3575
27825

100

0 0

I1

47150
.100 169.5
27825

It may now be stated that prices have risen by 69.5% overall from 1990 to 1991
based on the evidence of these three commodities.
This index is a reasonable measure of the change in prices over a short period of,
say, two years, but if the given year is a longer period in time from the base year,
the weights used tend to become out of date as spending habits change and no
longer give a realistic comparison between the two years. This disadvantage may
be overcome by using a given year weighted index as calculated by the Paasche
formula.

169

I1

pq
p q

1 1

100

0 1

This index gives the change in the total value of the given year consumption from
the value it would have had in the base year. The disadvantage of the Paasche
price index is that the quantity must be predetermined each year, thus adding to
the time and cost of data collection. Moreover, each year the index numbers for
previous years must be recomputed to reflect the effect of the new quantity
weights.

p1
75
90
100

P0
45
50
55

q1
800
150
80

I1

P1q1
60000
13500
8000
81500

P0 q1
36000
7500
4400
47900

81500
.100 170
47900

From this calculation prices may be said to have risen 70% overall. However, this
formula is equally unrealistic in that it compares hypothetical past quantities with
current real quantities rather than vice versa. One suggested way out of the
dilemma is to calculate an average index number which is the geometric mean of
the Laspeyres and the Paasche index numbers which is called the Fishers price
index.
I F I L .I p 100

Pq . Pq
Pq Pq
1 0

1 1

0 0

0 1

.100

Fishers price index has its own disadvantages each years index number is
calculated with new weights the only comparisons that can be made are between
the given year and the base year and the successive years are not directly
comparable as with the Laspeyres formula. It is also costly and time consuming
operation to find raw weights each year.
2.

Changing the Base


The base of an index number series is changed by taking proportions as illustrated
below. Index A has 1971 as a base year and Index B has 1976 as a base year. To
convert Index A to Index B, each index A value was divided by 150. It can be
seen that the numbers for each year are in the same proportions for both Index A
and Index B.

170

Year
1971
1972
1973
1974
1975
1976

3.

BASE CHANGE
Index A
100
110
120
130
140
150

Index B
66.7
73.3
80.0
86.7
93.3
100

Chain Index Numbers


In a chain base index the base period progresses by one time period each time;
therefore each index number is interpreted relative to the previous period.
Pr ice / Quantity at time n
100
Chain index =
Pr ice / Quantity at time n 1
Example: The table below shows the week ending share price on the stock
exchange over a period of four weeks for a local companys shares:
Week
Price (K)

1
250

2
300

3
350

4
225

Calculate and interpret a chain base index using week 1 as the base.
Index
Index
Index
Index

( wk1) 100
Pr ice
wk 2
300
( wk 2)
100
100 120 (to 2 d . p.)
Pr ice
wk1
250
Pr ice
wk 3
350
( wk 3)
100
100 116.67 (to 2 d . p.)
Pr ice
wk 2
300
225
( wk 4)
100 64.29 (to 2 d . p.)
350

At the end of the second week the share price had increased by 20% from the end
of the first week. By the end of the third week the share price had increased again
but at a slower rate (16.67%) when compared with week 2. In week 4 the price
had dipped with a 35.71% decrease from week 3.

171

4.

Splicing Overlapping Series of Index Numbers


Suppose index A has a base of 1972 and that in 1974 it becomes necessary to alter
the weights used; thus producing a new index, B, based on 1974. However, it is
not very meaningful to have an index series covering only three years such as A,
but continuity would be maintained if the new series B could be expressed in
terms of the series A.
The process is really one of taking proportions using a chain index and it is
illustrated using the data in Table 2.0. This procedure for linking two series
together is an operation called splicing.

Index A
Pq66

Year

240
200
180

1972
1973
1974
1975
1976

We first calculate index A as

Index B
Pq68

pq

66

200
180
160

in the forward direction,

180 200

solving for using proportions, we get


I1975 180
180180 162 , 180 200 and therefore, I 180160 144 . Similarly
I1975
1976
I1976 160
200
200
180 200
200200 222.22 and lastly,

calculating Index B as pq68 ,


, I1973
200 I1973
180
240200 266.67
I1972
180

Series A

172

Year
1972
1973
1974
1975
1976

This is summarized in the table below:


pq66
pq68
Index A
Index B
222.22
240
266.67
200
180
200
162
180
144
160

Base year 1972

Base 1974

100.0
83.3
75.0
67.5
60.0

111.11
133.34
100.00
90.00
80.00

The index series B came into being because the weights were changed in 1974. It
would of course be possible to change the weights every year and using the chain
index technique relate that year back to the original base Series A. This is the
method used in calculating the index of retail prices.
5.

Deflating Prices and Incomes


Indicators of inflation are raising prices and incomes. The question sometimes
asked is: by how much has real income increased in, for example the past two
years? It may be answered by deflating the income figures by dividing by the
retail price index. Prices of individual commodities may be deflated in the same
manner, thus showing the increased in real price. Whenever we remove the price
increase effect from a time series, we say we are deflating the series.
Table 3.0
Deflating Income
Year
1974
1976

Income
K2,610,000
K3,150,000

Price Index
100
157

Real Income
K2,610,000.00
K2,006,369.43

Example:
Suppose that the income column in table 3.0 shows the incomes from a sales
representative in 1974 and 1976, the base year of the index of retail prices has
been taken as 1974 and the value for 1976 is 157. Real income may be calculated
by dividing actual income by the price index.
1974 real income =

K 2,610,000
K 2,610,000
1.00

173

1976 real income =

K 3,150,000
K 2,006,369.43
1.57

It may be said that the salesmans real income has decreased by K603, 630.57
over the two years.

The Purchasing power of a Kwacha is defined to be the reciprocal of


the price index, with the base year of the index being the year in which the Kwacha is
said to have a purchasing power of K1.00.
Example:
Assume in 2004, you were getting K300, 000 and the CPI =114 and in 2008, you were
getting K300, 000 and the CPI =150. Using 2004 as a base year, the
150
1
CPI
100 131.6 . The purchasing power of a Kwacha is
100 0.76 or 76
114
131.6
300,000
ngwee. The real income is
K 227,963.53 . You have lost K72, 036.47.
1.316
However,
considering
the
CPI,
you
should
have
been
getting 300 ,00 1.316 K 394 ,800 .00 .

Sample Examination Questions


1.

The following figures give the distribution of income percentages for an average
family:

Food

%
45

Fuel and light

15

Clothing

05

Rent

20

Other items

15

Average prices (K000) for three successive years as follows:

Food

Fuel and Light

Clothing

Rent

Other Items

2003

180

40

95

50

65

2004

200

45

80

55

80

2005

215

42

95

60

80

174

2.

(i)

Calculate a cost of living index for the years 2004 and 2005, taking 2003 as
a base year.

(ii)

Comment briefly on the problem of the choice of items and weights when
constructing an index number.

(a)

What are the main considerations to be borne in mind when constructing an


index number?

(b)

The following table shows the total weekly expenditure on four


commodities in July 2001 and July 2002 based on a representative sample of
1000 households:
Commodities Quantities Purchased (Kg) Total Expenditure (K)
July 2001:
Butter
5 500
2 500 000
Potatoes
10 500
600 000
Apples
4 000
800 000
meat
8 000
9 500 000
28 000
13 400 000
July 2002:
Butter
5 500
3 400 000
Potatoes
9 500
900 000
Apples
3 500
850 000
Meat
8 500
1 250 000
27 000
6 400 000
You are required to compute a paasche index showing the extent of the use
in prices of all four commodities.

(c)

3.

Explain briefly the major weakness of the paasche index in this case and
suggest an alternative.

The following figures give the distribution of income percentages for an average
family:
%
Food
25
Fuel and light
20
Clothing
25
Rent
10
Other items
20

175

Average prices for the successive years were as follows:


Food

Fuel & light

Clothing

Rent

Other Items

1999

180

35

100

45

65

2000

195

34

90

45

75

2001

210

30

95

50

75

(a)

Calculate a cost living index for the years 2000 and 2001, taking 1999 as a
base year.
Comment briefly on the problem of the choice of items and weights when
constructing an index number.
Define what is meant by a fixed base index number and a chain based
index number and explain the different ways in which these alternatives
have to be interpreted.
From the following data, calculate:
i)
A laspeyre price index for 2003.

(b)
4.

(a)

(b)

ii)

A paasche quantity index for 2003.

In each case using 2001 as the base year.

5.

Commodity
A

2001
Average price (K)
18 250

Quantity
155

2003
Average price (K)
1 8 750

Quantity
195

39 100

275

46 000

310

7 000

120

9 000

195

14 750

435

22 700

380

74 200

95

101 800

130

(a)

What are the main considerations to be borne in mind when constructing an


index number?

(b)

The following table shows the total weekly expenditure on four


commodities in July 1993 and July 2004, based on a representative sample
of 1000 households.
Commodities
Quantities Purchased Total Expenditure
July 1993
(Kg)
K000
Butter
1 680
4 500
Potatoes

9 500

510

Apples

3 000

600

176

Meat

7 000
24 000

7 200
9 990

July 2004
Butter

4 500

4 200

Potatoes

8 500

4 200

Apples

3 500

1 500

Meat

(c)

7 500
19 500
24 000
29 400
You are required to compute a Laspeyres index showing the extent of the rise in
prices of all four commodities.
Explain briefly the major weakness of the Laspeyres index in this case and
suggest an alternative.

Learning Objectives
After working through this Chapter you should be able to

Explain what an index number is.

Compute simple index number and interpret them

Calculate the Paasche, Laspeyres and Fishers index number

Change index from one base to another.

177

CHAPTER 12
REGRESSION ANALYSIS

Reading
New bold Chapter 12, 13
Pfaffenberger Chapter 13, 14, 15
James T. McClare, Chapter 10, 11
P. George Benson
Wonnacott and Wonnacott Chapter 12, 15
Introductory Comments
We carry through the ideas of least Squares fitting; using further assumptions that allow
confidence intervals and tests, connection between regression and analysis of variance
becomes apparent. Correlations are very importance for all work with many variables.

Regression Analysis helps one determine the probable form of the relationship between
variances. The objective of this method of analysis is usually to predict or estimate the
value of one variable corresponding to a given value of another variable. The English
Scientist Sir Francis Galton (1822 1911) first proposed the ideas of regr4ession in
reports of his research in the area of heredity first in sweet peas and later in human
stature. (Business Statistics, Third Edition Daniel/Terrel page 301).
1.1

THE SIMPLE LINEAR REGRESSION MODEL


The typical regression problem is like most problems applied statistical inference.
We have available for analysis a sample of observations from some real or
hypothetical population. On the basis of our analysis of these data, we want to
reach decisions about the population from which we resume the sample was
drawn. In order to handle the analysis intelligently and interpret the results
properly, we must understand the nature of the population from which the sample
was drawn. We should know enough about the population to be able either to

178

construct mathematical model to represent it or to determine whether it fits some


established model reasonably well.
Most statistical models that are of practical value do not conform perfectly to the
real world.

A model that fits the situation at hand perfectly is usually too

complicated for practical use. On the other hand, an analysis that has forced the
sample data into a model that is not applicable is worthless. Fortunately we can
get useful results from a model that falls somewhere between these two extremes.
The type of relationship between the two variables X and Y that is of concern
here is a linear relationship. This implies that the relationship of interest has
something to do with a straight line. The measurements that are available for
analysis come in pairs, x1 , y1 , x2 , y2 ,... , xn , yn where the measurements xi , yi
are taken on the same entity, called the unit of association.
Two variables X and Y are linearly related if their relationship can be expressed
by the following simple linear model:
yi X i ei

(1)

Where y i is the value of the Y variable for a typical unit of association from the
population, xi is the value of the X variable for that same unit of association,

and are parameters called the regression constant and the regression
coefficient, respectively, and ei is a random variable with a mean of 0 and a
variance of 2 . To understand the model of equation (1), we must consider the
assumption underlying simple linear regression.
1.2

THE ASSUMPTIONS UNDERLYING SIMPLE LINEAR REGRESSION


As we have said, Simple Linear Regression analysis is concerned with the
relationship between two variables, X and Y. For reasons that will become
apparent, the variable X is called the independent variable, and Y is called the
dependent variable. In discussing the linear relationship between X and Y, given

179

in equation (1), we speak of the regression of Y on X. The following assumption


underlie the simple linear regression model of equation (1).
1. Values of the independent variable X may be either fixed or random. That
is, we may select the values of X in advance (fixed), so that as we collect
the data, we control the values of X. Or we may obtain the values of X
without imposing any restrictions, in which case X is a random variable.
When the Xs are non-random, we refer to the regression model as the classic
regression model.
2. The variable X is measured without error. From a practical point of view, this
means that the magnitude of the measurement error in X is negligible.
3. For each value of X there is a subpopulation of Y values. For most of the
inferential procedures of estimation and hypothesis testing to be valid, these
subpopulations must be normally distributed.

To demonstrate inferential

procedures, we shall assume in the examples and exercises that follow that the
Y values are normally distributed.
4. The variances of subpopulations of Y are all equal.

The means of the

subpopulations of Y all lie on the same straight line. This assumption is


known as the assumption of linearity. It may be expressed symbolically as:

Y / x xi

(2)

Where Y / x is the mean of the subpopulation of Y values assumed to exist for


xi , a particular value of X. When viewed geometrically, and represent

the Y intercept and slope, respectively, of the line on which all the
subpopulation means are assumed to lie.
5. The Y values are statistically independent. This means that in drawing the
sample, the values of Y chosen at one value of X in no way depend on the
values of Y chosen at another value of X.

180

We are now in a position to shed some more light on the term ei in the simple
linear model. Solving equation (1) for ei , we have
ei yi xi

(3)

Thus ei shows the amount by which y i deviates from the mean of the
subpopulation of Y values from which it is drawn, since by equation (1)

Y / E xi . The subpopulations of Y values are assumed to be normally


distributed, with a variance equal to 2 , the common variance of the
subpopulations of Y values. The ei ' s are independent, and their distribution
has a mean of 0.
1.3

OBTAINING THE SAME REGRESSION EQUATION:


The regression model of equation (1) is not an equation for a straight line. It is a
symbolic representation of a typical value of the dependent variable Y. Equation
(1), however, is an equation for a straight line. It is the line that describes the true
relationship between X and Y X .

The time position of this line is unknown

because and are unknown.

The objective of regression analysis is to

estimate and in order to make inferences about the true line of regression of
Y on X.
We can explain the procedures involved in regression analysis more easily by
means of a numerical illustration.
Example (1)
An operations analyst conducts a study to analyze the relationship between
production and manufacturing expenses in the electronics industry. A sample of

n 10 firms, randomly selected from within the industry yields the data in Table
(1). Manufacturing expenses is considered to be the dependent variable. It
changes as the volume of production varies. On the other hand, a change in

181

manufacturing expenses would not necessary cause a change in volume of


production.
Table (1) production (X) and manufacturing expenses (Y) for 10 selected firms.
X (thousands of units)

40

42

48

55

65

79

88

100

120

140

Y (thousands of kwachas)

150

140

160

170

150

162

185

165

190

185

The Least Squares Method


The objective method that we use here to describe the relationship between the
variables is called the method of least squares. The line obtained by this method
is called the least-squares line.
We may write the equation for a straight line as:
y a bx

(4)

Here a is the point at which the line crosses the Y axis and b is the amount by
which the line changes per unit change in x. We refer to a as the Y intercept and
b as the slope of the line. To draw a straight line for the sample data, then we
need only numerical values for a and b. Once we have these values, we can
substitute two different values of X into the equation and get corresponding
values of Y. If we plot the resulting coordinates x1 , y1 and x2 , y2 on the graph
and connect them, we have a straight line.
Figure (2) is a graph of a straight line. Here we see the geometric relationships
between the slope, the Y intercept, and a unit change in x.
We can find numerical values for a and b for any set of data such as that in the
present example by simultaneously solving the following two equations:

Y na b x
i

(5)

Yi a X i b X i2

182

(6)

These equations, obtained by differential calculus, are called the normal


equations.

Their solution yields the equation for the least squares line

describing the relationship between X and Y. The equation is of the form


y a bx

(7)

Where y denotes the calculated value of Y for a given X, and a and b are
estimates of and , respectively.
Table (2) gives the values of

Y , X , X Y , Y , X
2

i i

2
i

, and n which are

needed to solve the equations. Substituting values from Table (2) into equations
(5) and (6) gives.
Figure (2) A linear regression equation illustrating the geometrical interpretations
of a and b.

b = slope

a = y.intercept

183

Table (2) Intermediate computations for normal equations


Example (1)
xi

xi2
1 600
1 764
2 304
3 025
4 225
6 241
7 744
10 000
14 400
19 600
70 903

yi

40
42
48
55
65
79
88
100
120
140
Total 777

150
140
160
170
150
162
185
165
190
185
1 657

xy

6 000
5 880
7 680
9 350
9 750
12 798
16 280
16 500
22 800
25 900
132 938

22 500
19 600
25 600
28 900
22 500
26 244
34 225
27 225
36 100
34 225
277 119

1657 = 10a + 7776b, 132 938 = 777a + 709 0b


We may solve these equations by any familiar method to get
The following formulas for a and b are usually computationally more convenient:

XY n
b
X
X
X

n XY X Y
n X 2 X

Y b X y b x
n

For example, the present example, we have

184

(9)

(8)

132,938

7771657
10

7772
70,903

0.3978,

10

a 165.7 0.397877.7 134.72


The two results for a do not agree exactly, due to rounding errors. The equation
for the least squares line that describes the relationship between production and
manufacturing expenses is
y 134.79 0.3978x

If we let x 0, y 134.79 . And if x 100, y 174.57. These two points core


sufficient for plotting the line, as we have done in figure (3). This line is the
sought after best line for describing the relationship between the sample
values of X and Y. Before we say by what criterion we judge it to be best, let us
look at figure (3). None of the points actually fall on the line that was drawn.
That is, the points deviate from the line. Its obvious that we cant draw a straight
line that will pass through all the points. Some deviation of points from any
straight line is inevitable. The line drawn through the points, therefore, is best in
this sense:
The sum of the squared deviations of the observed data points ( y i ) fro the leastsquares line is smaller than the sum of the squared deviations of the data points
from any other line that can be drawn through the data points.

185

Figure (3) Scatter diagram and least squares line for Example (1)

y 134.79 0.3978x

200

134.79

100

25

50

75

100

125

Suppose that we square the vertical distance from each observed point yi to the
least-squares use, and add these squared distances over all points. The total we
get will be smaller than the similarly computed total for any other line that we
could draw through the original points. This is why we call the line the least
squares line.

1.4

Evaluating The Sample Regression Equation


After we have determined the regression equation, we must evaluate it to find out
whether it adequately describes the relationship between the two variables, and to
see whether we can use it effectively for prediction and estimation.

186

One method of evaluating the regression equation is to compare, the scatter of the
points about the regression line with the scatter about y , the mean of the sample
values of Y. Figure (4) shows the regression line and the relative magnitudes of
the scatter of the points from y for example (1). It shows the line representing y
as a horizontal line. This is because, regardless of the value of X, y remains
constant. For these data, the dispersion of the points about the regression line is
much less than the dispersion about the y line. So it seems that the regression
line provides a good fit for the data.
We get the amount by which any observed value of Y, y i and as showing figure
(4).
Figure (4) scatter diagram for Example (1) showing deviations about y and the
regression line.

yi

y 134.79 0.3978x

xi

This difference yi y is called the total deviation. Consider, for example, the
ninth value of Y. You will find it in Table (1) to be y 190. Since y 165 .7,
the total deviation of this Y value is 190 165.7 = 24.3.

187

The vertical distance from the regression line to the y is given by y y . This is
called the explained deviation. It shows the amount by which we reduce the total
deviation when we fit the regression line to the points.

For example, for

y9 190 , y 182 .5. The explained deviation is y y 182 .5 165 .7 16 .8.

Finally, the vertical distance of the observed Y from the regression line ( yi y ) is
called the unexplained deviation. It represents that portion of the total deviation
not explained or accounted for by the fitting of the regression line. In the case
of y9 190 , there is an unexplained deviation of y9 y 190 182 .5 7.5.
Thus the total deviation for a particular y i is equal to the sum of the explained
and unexplained deviations. That is,

y y y y
i

yi y

Total
Explained Un exp lained
deviation deviation deviation

In the case of y9 190 , we have 24.3=16.8+7.5.

We can perform similar

calculation for each yi .


If we square each of the deviations in Equation 8 and sum for all observations, we
get three sums of squared deviations. Their relationship may be expressed as
follows:

Total sum of
squares

y y

Explained sum
of Squares

y (9)
2

Un exp lained sum


of Squares

Each of the terms in equation (9) is a measure of dispersion. The total sum of
squares measures the dispersion of the observed values of Y about their mean y .

188

That is, this term is a measure of the total variation in the observed values of Y. it
is the numerator of the familiar formular for the sample variance.

The explained sum of squares measures the dispersion of the observed Y values
about the regression line. It is sometimes referred to as the sum of squares of
deviations from linearity. The unexplained sum of squares is the quantity that we
minimize when we find the least-squares line. It is usually called the error sum of
squares. We may write equation 9 in a more compact form, as follows:

SST = SSR + SSE


Where

(10)

SST = Total sum of squares


SSR = Sum of squares due to regression
(explained sum of squares)
SSE = error sum of squares (unexplained sum of squares)

We can compute the total sum of squares by the following formula:

y
SST y y y
n

2
2

(11)

We can compute the explained sum of squares by

SSR y y b
2

Xi

xi x b X i
n

(12)

We can get the unexplained sum of squares by subtraction. That is, SSE = SST
SSR

From the data on production and manufacturing expenses, we may compute


2

1657
SST 277 ,119

10

2554 .10

189

Alternatively, we may compute SST by squaring and summing the individual total

deviations yi y . When we do this, we have

15 .7 2 25 .7 2 ... 19 .32 246 .49 660 .49 ... 372 .49 2554 .10
By equation (12), the explained sum of squares, or sum of squares due to
regression, is
2

777
SSR 0.3978 70,903
1666.33
10

or we can get the explained sum of squares by squaring and summing the
explained deviations y y to give

SSR 15 14.2 ... 24.8


2

225.0 201.64 ... 615.04 1666.44

The unexplained, or error, sum of squares, obtained by subtraction, is

SSE 2554.10 1666.33 887.77

As an alternative, we can compute SSE by squaring and summing the individual


unexplained deviations yi y . Thus:
SSE 0.7 11 .5 ... 5.5 0.47 132 .25 ... 30 .25 886 .54
2

Note a slight discrepancy due to rounding in the results for SSR and SSE
computed by the two methods.

190

When the assumptions we gave in section 1.2 hold, we may use analysis of
variance to test for the presence of regression. In this process, the total sum of

squares

is a measure of the total variability present in the data. The

explained sum of squares

y y

is a measure of the total variability due to

linear regression. And the unexplained sum of squares

y is a measure
2

of the variability left unexplained after regression has been considered. This last
sum of squares is also called the deviations from regression or error sum of
squares. We can also subdivide the total degrees of freedom n 1 into two
components, 1 for regression and n 1 1 n 2 associated with the error
sum of squares. Dividing the sums of squares by their associated degrees of
freedom yields corresponding mean squares. If there is no linear regression (that
is, if 0 , and if the stated assumptions about the model apply, the ratio of the
regression mean square to the error mean square is distributed as F with 1 and

n 2,

degrees of freedom).

We can, therefore, test the null hypothesis that 0 using analysis of variance.
Table 3 shows the analysis-of-variance table that we can construct.

Table 3
ANOVA table for Simple Linear Regress
Source of Variation
Linear regression
Deviation from
uncarity (error)
Total

SS
SSR

df
1

n2
n 1

SSE
SST

ms

SSR
n 1MSR
1
SSE
MSE
n2

MSR
MSE

Table 4
Analysis of Varaince for Example (1)
Source
Regression

SS
1,666.33

191

df
1

MS
1,666.33

15.02
Error
Total

887.77
2,554.10

8
9

110.97

For the data on production and manufacturing expenses, let us test.

H o : There is no linear regression between X and Y 0 against

H1 : There is a linear regression of Y on X 0

at the 0.01 level of significance. Table 4 shows the appropriate analysis of


variance. The computed value of F 15.02 is significant at the 0.01 level. Thus
we may conclude that the data of this sample provide sufficient evidence of the
presence of regression. Since 15.02 11.26, we have, for this test, P 0.005.
When we cant reject H o : 0 , we cant be certain that X and Y are unrelated.
Aside form the fact that we may have committed a Type II error, we must be
aware that, although they are perhaps not linearly related X and Y may have a
nonlinear relationship. Even when we can reject H o : 0, we cant be certain
that the strongest form of relationship between X and Y is a linear one. The two
variables may be more strongly related in a nonlinear way, although a linear
model gives a satisfactory approximation to the true relationship. Of course, a
rejected null hypothesis that 0 may very well indicate that there is a true
linear relationship between X and Y.
An alternative way to evaluate the sample regression equation is to use b, the
slope of the sample line, as a basis for testing the null hypothesis of no regression.

When the assumptions in section 1.2 are met, a and b are unbiased point
estimators, respectively, of and .

When, under these assumptions, the

subpopulations of Y values are normally distributed, the sampling distributions of


a and b are each normal, with means and variance as follows:

192


2
a

(13)

y2

2
i

n xi x

(14)

b2

(15)

y2

x x

(16)

In equation (14) and (16) is the variance about the population regression line. We
also call y2 x the unexplained variance of the population. It is the common
variance 2 of the subpopulations of Y as specified in the initial assumptions.
The definitional equation for this quantity, for a finite population of size N is:

y2

yi y
n

i 1

When assumptions are met, then, we can construct confidence intervals for, and
test hypotheses about, and in the usual way. In most cases, inferences
about are not of great interest. The parameter , however, is of great interest.
If 0, the regression line is horizontal, and an increase or decrease in X is not
associated with a change in Y.
In this situation, we conclude that X and Y are not linearly related. A positive
indicates that, generally, Y tends to increase as X increases. In this situation,
there is a direct linear relationship between X and Y. A negative indicates that
values of Y tend to decrease as values of X increase, and there is an inverse linear
relationship between X and Y. Figure 5 illustrates these three situations.

193

Figure 5 Scatter Diagrams Showing Different Types of Linear Relationships

(a) Direct linear relationship (b) Inverse Linear relationship (c) No linear relationship

We want to determine whether the sample data provide sufficient evidence to


indicate that is different from 0. Suppose that we can reject the null hypothesis
that =0. Then we can conclude that is not equal to 0, and therefore there is a
linear relationship between X and Y. Whether this suggested linear relationship is
presumed to be direct or inverse depends on the sign of b, the estimate of .

The test statistic, when y2


Z

b o

is known, is
(18)

In the usual case, y2 x is unknown and the test statistic is


t

b o
Sb

(19)

194

Where S b is the estimator of b . The associated degrees of freedom are n-2, the
error degrees of freedom from the ANOVA table.
To find S b , we must first estimate y2 x . An unbiased estimator of this is given
S

2
y x

2
y

(20)

n2

An alternative formula S y2 x is

S y2

X i Yi

yi X iYi
1

2
n

yi
2
n 2
n
X i
2

(21)
2

yi n X iYi X i Yi
1

2
b

yi

n 2
n
n

The estimator, S
2
b

S y2

(22)

The following formula takes less work:


S
2
b

S y2

(23)

xi2 xi n
2

Let us now use the example of production and manufacturing expenses (Example
(1)) to show how to test the null hypothesis that 0 . First we state the
hypotheses and significance level:
H 0 : 0, H1 : 0

Let 0.05. We next obtain S y2 , which from table 4 is S y2 MSE 110.97


x

195

We may now compute


Sb2

110 .97
0.0105 and Sb 0.0105 0.102
2
70903 777 10

The figures in the denominator of S b2 come from table (4). The test statistic that
we may compute
t

0.3978 0
3.9
0.102

We reject H 0 , since 3.9>2.306, the upper critical value of t for a two-sided test
with 8 degrees of and 0.05. Thus we conclude that is not 0 and that there
is a linear relationship between X and Y. Since b is positive, we conclude that the
relation is direct, not inverse. Since 3.9>3.3554, P<2(0.005)=0.01.
Note that the decision resulting from testing H 0 : 0 by means of the t test is
the same as that reached using analysis of variance. In fact, the value of t
computed from equation (19) is equal to the square root of the F computed in the
analysis of variance.
We can use equation (19) to test the null hypothesis that is equal to some value
other than 0. The hypothesized value for , 0 , replaces 0 in the equation. All
other quantities computations, degrees of freedom, and methods of determining
significance are the same as in the example.
Alternatively, we can test the null hypothesis that 0 by means of a
confidence interval for . We use the general formula for a confidence interval,
Estimate (reliability factor) (standard error)
When we construct a confidence interval for , the estimator is b. The reliability
factor is some value of Z or t (depending on whether or not y2 is known). And
x

the standard error of the estimator is

196

y2 x

x x

When y2 x is unknown, we estimate b by

Sb

S y2 x

x x

Thus in most practical cases, the 100 1 % confidence interval for is given
by

b t 2 Sb

(24)

If the confidence interval that we construct includes 0, we conclude that 0 is a


candidate for . Therefore we cannot rule out the possibility that is 0. This
conclusion corresponds to the statistical decision of failing to reject H 0 : 0.
If, on the other hand, the interval does not contain 0, we reject the null hypothesis
that 0. We conclude that X and Y are linearly related. The strength of this
conclusion is related to the confidence coefficient selected in constructing the
interval.
Let us construct a 95% confidence interval for , using the data from Example
(1). We can construct the following 95
% confidence interval using Expression (24).
0.3978 2.306 0.102
0.1626 , 0.6330
0.3978 0.2352 ,

We interpret this interval in the usual way.

1.5

Using the Sample Regression Equation:


Once we have decided that the data at hand provide sufficient evidence to indicate
a linear relationship between X and Y, we can use the samples regression
equation. We can use it in two ways. First, we can use it to predict that value Y
is likely to assume for a given value of X. When the assumptions of section 1.2
197

are met, we can construct a prediction interval for Y. Second, we can use it to
estimate the mean of the subpopulation of Y values for a particular value of X.
Again, if the assumptions of section 1.2 are met, we can construct a confidence
interval for the mean.
Predicting Y for a Given Y
We get a point prediction of the value Y is likely to assume for a given X by
substituting a particular value of X, X p , into the sample regression equation and
solving for y . If the assumptions of section 1.3 are met, and if y2 x is unknown,
the 100 1 % prediction interval for Y is given by

y t 2 S y x

Xp x
1
1
n x x2

We can evaluate the denominator,

(25)

x x ,
2

by means of he formula

x
x n

2
i

The degrees of freedom used in selecting t are n-2.

In example (1), we wish to predict the manufacturing expenses for a firm that
produces 50,000 units. Substituting 50 for x in the sample regression equation
gives
y 134 .79 0.3978 50 155

Using expression (25) and the data from Tables 4 and 2, we construct the
following 95% prediction interval:

1
K155 2.306 110.97 1
10
155 26
129,181

198

50 77.7 2
7772

70903
10

Interpreting a prediction interval is like interpreting a confidence interval.

Estimating the Mean of Y for a Given X


To estimate the mean y x of a subpopulation of Y values for a certain value of
X, x p , we substitute x p into the sample regression equation and solve for y .
The 100 1 % confidence interval for y x , when y2 x is unknown and the
assumptions of section 1.2 are met, is given by

xp x
1

n x x2

y t 2 S y x

(26)

Suppose that, for the example of the production and manufacturing expenses, we
wish to estimate the mean of the subpopulation of Y values for firms that produce
50,000 units. We obtain the estimates as follows:
y 134 .79 0.3978 50 155

Using expression (26), we obtain the 95% confidence interval for y x :

1
155 2.306 110.97

10

50 77 7 2
777
70903

10

155 10

145, 165

If we repeatedly drew samples of size 10 from the population, performed a


regression analysis, and constructed confidence intervals for y x for X=50, 95%
of such intervals would include the true mean. Thus we are 95% confident that
the single interval constructed contains the true mean.

Learning Objectives
199

After working through this chapter you should be able to:

Using the given formulas compute a and b to fit the least square line

Explain how to set confidence intervals and carry out tests about and from a
small collection of data.

demonstrate how to set confidence intervals for y , and how to calculate a


prediction interval and explain the difference between the two

define the sample correlation coefficient, and link it to the appearance of scatter
diagrams

construct and use an Analysis of Variance Table for a regression, including the Ftest for 0

compute, coefficient of correlation and coefficient of determination

Interpret coefficient of correlation and determination.

200

READING LIST
1.
2.
3.
4.
5.
6.
7.
8.

Statistics for Business and Economics, Debra Olson Oltman and James R Lackritz
Thomson information/Publishing Group.
Statistics and Econometrics, Charles R. Frank Jr.
Introduction to statistics Analysis, Wilfrid J. Dixon and Frank J. Massey Jr.
Questions and Answers, Tailoka Frank P.
Statistics for Business and Economics An Action learning Approach Marion
Gross Sobol, Martin k Starr (McGraw Hill)
Statistical methods Dfattenberger Roger, C, James H, Patterson (Irwin)
Elementary Business Statistic The Modern Approach, sixth edition, John E.
Freund, Frank J. Williams, Benjamin M Perles (Prentice-Hall International, Inc).
Business Statistics A decision making Approach, David F. Groebner/Patrick W.
Shannon.

201

Вам также может понравиться