Вы находитесь на странице: 1из 91

Statistics Module

Module Code: LV-STS


Module Writer:
Symon B. Chibaya
Mathematics and Statistics Lecturer, Natural Resources College

TABLE OF CONTENTS
1. Module Objectives ..3
2. Basic Concepts of Statistics
1.1 Meaning of statistics....4
1.2 Using statistical data........4
1.3 Types of statistics....5
1.4 Sources of statistical data....5
1.5 Types of data variables5
3. Summarizing Data
2.1 Frequency distribution 8
2.2 Graphical
presentation.14
4. Describing Data
1.1 Measures of central tendency.....19

Page 2 of 91
1.2 Measures of dispersion...24
5. Probability Theory
4.1 Probability concepts30
4.2 Discrete probability distribution- Binomial distribution and Poisson
distribution35 4.3
Normal probability distribution...42
5 Sampling Methods and Sampling Distributions......................................49
6 Hypothesis Testing
6.1 Defining hypothesis.54
6.2 Setting up hypothesis...54
6.3 Test statistics56
6.4 Analysis of Variance [ANOVA]..59
7
8

Test of Proportions (Chi-Square Testing)64


Simple Linear Regression and Correlation Analysis...68 9 Appendix
Table 1 The binomial Distribution80
Table 2 F Distribution...82
Table 3 The Chi-Square Distribution85
Table 4 Critical values for the PPMC...87
10 References88 MODULE
OBJECTIVES
By the end of the course students should be able to:
a) describe concepts used in statistics
b) present statistical data in different formats
c) apply statistical tools in problem solving.

Page of 91

Page 3 of 91

TOPIC 1: BASIC CONCEPTS OF STATISTICS OBJECTIVES


By the end of this topic, you should be able to:
(a). Define the term statistics.
(b). State types of statistics.
(c).List down sources of statistical data.
(d).Describe types of data variables.
DEFINITION OF STATISTICS
Statistics is the science of conducting studies to collect, organizes, summarize, analyze
and draw conclusions from data.
Statistics is used to analyze the results of surveys and as a tool in scientific research to
make decisions based on controlled experiments. Other uses of statistics include
operations research, quality control, estimation and prediction.
TYPES OF STATISTICS
Statistics is divided into two types namely:
(a) descriptive statistics and
(b) inferential statistics
(a) DESCRIPTIVE STATISTICS
This consists of the collection, organization, summarization and presentation of data. For
example population census.
(b) INFERENTIAL STATISTICS
This uses probability. That is the chance of an event occurring. Here statisticians try to make
inferences from samples to population.

Page of 91

Page 4 of 91
POPULATION
A population is a collection of all individuals, objects or measurements of interest. Most of
the time, due to the expense, size of population, medical concerns, e.t.c it is not possible to
use the entire population for a statistical study; therefore, researchers use samples.
SAMPLE
It is a group of subjects (human or otherwise) from the population. Example
The following examples constitute a population.
The heights of students at Natural Resources College.
And its sample can be.
The heights of students in nutrition 12 class.
DATA
These are values that the variable can assume.
VARIABLES
A variable is a characteristics or attribute that can assume different values.
SOURCES OF STATISTICAL DATA
Sources of data can be internal or external.
INTERNAL DATA SOURCE
All types of organization will collect and keep data, which is therefore internal to the
organization.
ADVANTAGES OF INTERNAL DATA SOURCES
1. It will be cheaper
2. Readily available information can be used much more quickly.
3. It can be understood much more easily.
EXTERNAL SOURCES
The sources of statistical information

TYPES OF DATA VARIABLES


Variables can be classified as qualitative or quantitative.
Qualitative variables have non-numerical observations such as colour of hair, gender and
religious prefences, although, of course, each possible non-numerical value may be associated
with a numerical frequency.
Whereas quantitative variables are numerical and can be ranked. Some examples of
quantitative variables are age, height and weight. Quantitative variables can be further
classified into two groups: discrete and continuous.
Discrete variables can take whole numbers only. For example, number of children in a family
while continuous variables take any value. It can be a whole number or decimal number. For
example: height of students.

Page of 91

Page 5 of 91
Classification of variables can be summarized as follows
Data

Qualitative

Quantitative

Discrete

Continuous

TYPES OF DATA
(a) PRIMARY DATA
If data is collected for a specific purpose then it known as primary data. For example:
population census. They are collected by a researcher himself or herself.
(b) SECONDARY DATA
Secondary data is data, which has been collected for some purpose other than for
which it is being used. For example if a company has to keep records of when
employees are sick and you use this information to tabulate the number of days
employees had malaria in a given month, then this information would be classified as
secondary data. Secondary data is collected and possibly processed by people other
than the researcher in question. In other words these are collected by others to be
reused by the researcher.

Page of 91

Page 6 of 91

TUTORIAL 1: BASIC CONCEPTS OF STATISTICS


1. Define the term statistics
2. Distinguish between:
(a) Inferential statistics and descriptive statistics
(b) Sample and population
(c) Qualitative data variables and quantitative data variables 3. Give two advantages of
internal source of data.

Page of 91

Page 7 of 91

TOPIC 2: SUMMARISING DATA


OBJECTIVES
By the end of this topic, you should be able to:
(a) Organize raw data into frequency distributions.
(b) Define the following terms; limits, boundaries and class width.
(c) Represent data in frequency distribution graphically using histograms, frequency
polygons and a cumulative frequency polygon
To describe situations, draw conclusions or make inferences about events, the
researcher/statistician must organize the data in some meaningful way. The most convenient
method of organizing data is to construct a frequency distribution.
FREQUENCY DISTRIBUTION
A frequency is the name given to the number of times a value occurs.
Example
Twenty-five army inductees were given a blood test to determine their blood type. The
data set is A B B AB O
O O B AB B
B B O A O
AB A O B A
You can summarise the data in the example above with a frequency table or frequency
distribution. A frequency distribution is the organization of raw data in the table from using
classes and frequency.
Steps to be followed when constructing ungrouped frequency distribution for the above
raw data are as follows
1. Make a table as shown
Class Tally Frequency
A
B
AB
O

Page of 91

Page 8 of 91
2. Tally the data and place the results in the second column.
3. Count the tallies and place in the third column
Class Tally
Frequency
A 5
B 7
AB
9
O
4

Example
A survey taken in a restaurant shows the number of cups of coffee consumed with
each meal. Construct an ungrouped frequency distribution.
0 2 2 1 1 1
3 5 3 2 2 2
1 0 1 2 4 2
0 1 0 1 4 4
2 2 0 1 1 5
Solution
Class
Tally

Frequency
1 5
2 8
3 10
4 2
5 3
6 2

GROUPED FREQUENCY DISTRIBUTION


The following are the steps of constructing a frequency distribution for a grouped
data;
1. There should be between 5 and 20 classes
2. The classes must be equal in width. The following rule may be used to
find the required class interval width
WL
K

Where W = class width


L = the largest data
1 The mid-point is the numerical location of the centre of the class and it is
necessary for graphing.

Page of 91

Page 9 of 91
S = the smallest data
K =number of classes
3. The class width should be an odd number. This ensures that the midpoint of each class has the same value as the data. The class mid-point
Xm is obtained by adding the lower and upper boundaries and dividing
by 2 or adding the lower and upper limits and diving by 2.
Xm lowerlim it upper lim it

Or

Page of 91

2
lower
boundary
Xm upper
boundary

Page 10 of 91
4. The classes must be mutually exclusive. Mutually excusive classes have nonoverlapping
class limits so that data cannot be placed into two classes.

For example
A
B
10-20
not 10-20
21-31
20-30
32-42
30-40
43-53
40-50
If a person is 40 years old, into which class of table B she or he should be placed?
5. The classes must be continuous. Even if there are no values in a class, the class must be
included in the frequency distribution. In other words, there should be no gaps in the
frequency distribution. The only exception occurs when the class with a zero frequency
is the first or last class.
6. The classes must be exhaustive. There should be enough classes to accommodate all the
data.
Example
The following data represents times (second) for 50 runners in a race.
246 238 246 251 240
243 245 243 241 248
244 246 249 246 245
244 248 240 243 249
242 245 239 244 245
245 248 248 249 248
250 242 243 245 242
242 246 246 245 247
244 240 245 247 248
247 250 247 248 250
Construct a grouped frequency distribution for the data.
Solution
Procedure for constructing a group distribution
Step 1: determine the classes
Suppose we want to have 5 classes then
S

WL
K

=2.4 3
A number is rounded up if there is any decimal remainder when dividing. For example,
534= 13.25 is rounded up to 15. 856 14.167 is rounded up to 15.
Page of 91

10

Page 11 of 91
Select a starting point for the lowest class limit. This can be the smallest data value
or any convenient number less than the smallest data value. In this class 137 is
used.
Add the width to the lowest class limit taken as the starting point to get the next lower
limit of the next class. Keep adding until there are 5 classes as shown.
137, 140, 143, 146, 149
Subtract one unit from the lower limit of the second class to get the upper limit of
the first class. Then add the width to each upper limit to get all the upper limits as
shown below.
139, 142, 145, 148, 151 So
the five classes are:
137-139, 140-142, 143-145. 146-148, 149-151
Find the class boundaries by subtracting 0.5 from each lower class limit and
adding 0.5 to each upper class limit. i.e 236.5-239.5, 239.5-242.5-245.5, e.t.c
Find the mid-point of each class Step 2. Tally the data.
Step3: Find the numerical frequency from tallies.

Class interval
137-139
140-142
143-145
146-148
148-151

Class boundaries
136.5-139.5
139.5 -142.5
142.5-145.5
145.5-148.5
148.5-151.5

Mid-point Tally
238
241
244
247
250

Frequency
2
8
14
19
7

Example
The average quantitative GRE scores for the top 30 graduate schools or engineering are
listed below. Construct a frequency distribution with six classes
767 770
763 760
780 750
756 766

761
747
746
758

760
766
764
770

771
754
769
762

768 776
771 771
759 753
746

Solution
WL S
K

=
=5.667 6

Page of 91

11

Page 12 of 91
Smallest lower limit (starting point) is 145. Other lower limits are 751, 757,763, 769, 775 The upper limit of
the first class is 751-1=750. Other upper limits are 756, 762, 768, 774, 780

Class limit
745-750
751-756
757-762
763-768
769-774
775-780

Class boundaries Tally


744.5-750.5
750.5-756.5
756.5-762.5
762.5-768.5
767.5-774.5
774.5-780.5

Frequency
4
7
7
11
2
1

CUMULATIVE FREQUENCY DISTRIBUTION


So far, we have discovered how to tabulate a frequency distribution. There is a further
way of presenting frequencies and that is by forming cumulative frequencies. This
technique conveys a considerable degree of information and involves adding up the
number of times (frequency) values less than or equal to a particular value occur. Example
A survey was conducted in a certain town on peoples weights in kilograms. The following
are the results:
Weight
Frequency
32.0-32.7 1
35.0-35.7 3
36.0-36.7
9
37.0-37.7 20
38.0-38.7 28
39.0-39.7 26
40.0-40.7 15
41.0-41.7 4
Calculate the cumulative frequency of the data.
Solution
Weight
Frequency Cumulative frequency
32.0-32.7 1
0+1=1
35.0-35.7
3
1+3=4
36.0-36.7
9
4+6=13 37.0-37.7
20
13+20=33
38.0-38.7 28
33+28=61
39.0-39.7 26
61+26=87
40.0-40.7 15
87+15=102
41.0-41.7 4
102+4=106
RELATIVE FREQUENCY DISTRIBUTIONS
RELATIVE FREQUENCY

Page of 91

12

Page 13 of 91
Relative frequencies are the frequencies divided by the total number of observations. actual
frequency
Relative frequency=
Total number

of

observations

Example
The following data represents heights of students in cm
Height (cm)
Frequency
Under 165
7
Under 170
11
Under 175
17
Under 180
20
Under 185
16 Under
190
9
Construct a relative frequency distribution.
Solution
Height (cm)

Frequency

Relative frequency

Under 165

0.0875

Under 170

11

0.1375

Under 175

17

0.2125

Under 180

20

0.25

Under 185

16

0.2

Under 190

0.1125

CUMULATIVE RELATIVE FREQUENCY


We have seen how to calculate cumulative frequencies. Using the same logic, you can
obtain cumulative relative frequency in a particular class to that already arrived at for
previous class.
Example
Construct cumulative relative frequency distribution for the above example. Solution
Height (cm)
Frequency
Relative frequency Cumulative relative frequency Under 165
0.0875

7
Under 170
Page of 91

0+0.0875=0.0875
0.1375

11
13

0.0875+0.1375=0.225

Page 14 of 91

1.
2.
3.
4.
5.

Under 175

17

0.2125

0.225+0.2125=0.4357

Under 180

20

0.25

0.4357+0.25=0.6875

Under 185

16

0.2

0.6875+0.2=0.8875

Under 190
9
0.1125
0.8875+0.1125=1.000
The reasons for constructing a frequency distribution are:
To organize the data in a meaningful way.
To enable the reader to determine the nature or shape of the distribution.
To facilitate computational procedures for measures of average and spread.
To enable the researcher to draw charts and graphs for the presentation of the data.
To enable the reader to make comparison among different data sets.

GRAPHICAL PRESENTATION
After the data have been organized into a frequency distribution, they can be
presented in graphical form. The purpose of graphs in statistics is to convey the
data to the viewers in pictorial form. It is easer for most people to comprehend
the meaning of data presented graphically than presented numerically in tables or
frequency distributions.
The three most commonly used graphs are:
(a) histogram
(b) frequency polygon
(c) cumulative frequency graph or ogive (pronounced 0-jive)
THE HISTOGRAM
The word histogram is derived from Greek: histos- anything set upright and gramma
drawing, record, writing.
The histogram is a graph that displays the data by using continuous vertical bars
(unless the frequency of class is 0) of various heights to represent the frequencies
of the classes.
Example
The annual exports of a group of small firms in Lilongwe are
Exports (K
Number of
millions
firms
2-4
4
5-7
12
8-10
15
11-13
8
14-16
4
Construct a histogram to represent the data shown above.

Page of 91

14

Page 15 of 91

Solution
Step 1: Construct a frequency distribution that has class boundaries.
Class
2-4
5-7
8-10
11-13
14-16

Class boundaries
1.5-4.5
4.5-7.5
7.5-10.5
10.5-13.5
13.5-16.5

Frequency
4
12
15
8
4

Step 2: Draw and label the x-axis and y-axis. The x-axis is always the horizontal
axis and the y-axis is always the vertical axis.
Step 3: Represent the frequency on the y axis and the class boundaries on the xaxis.
Step4: Using frequencies as the heights, draw vertical bars for each class.
Frequency

NOTE: The frequency in each class is represented by the rectangles. However, it is


important to realize that it is not the height of the rectangle that represents the frequency
but the area with of the rectangle.
THE FREQUENCY POLYGON
The frequency polygon is a graph that displays data using lines that connect points plotted
for the frequencies at the mid-point of the classes. This is a very quick method of drawing
the shape of a frequency distribution.
Steps to be followed when drawing a frequency polygon are:
1. Draw a histogram.
2. Mark the mid-point at the top of each rectangle.
3. Join the mid-points with a ruler.

Page of 91

15

Page 16 of 91
4. Extend the lines at each end of the histogram to the mid-points of the next highest and lowest classes,
which will have equal a frequency of zero. NOTE: The lines are extended to the x-axis so that the area of
the polygon will equal that of the histogram it represents.
Example
Draw a frequency polygon it represents from the example above.
Solution
Class
2-4
5-7
8-10
11-13
14-16

Class boundaries
1.5-4.5
4.5-7.5
7.5-10.5
10.5-13.5
13.5-16.5

Frequency
4
12
15
8
4

Mid-point
3
6
9
12
15

OR
Step 1: Find the mid-points of each class.
Step 2: Draw the x-axis to represent scores.
Step 3: Draw the y-axis to represent frequencies.
Step 4: Plot the frequency against class mid-point.
Step 5: Join the crosses in order, that is the cross representing the first class should be joined
to the one representing the second class and so on.
Step 6: Include the two extreme points.
For instance,
Class
2-4
5-7
8-10
11-13
14-16

Class boundaries
1.5-4.5
4.5-7.5
7.5-10.5
10.5-13.5
13.5-16.5

Frequency
4
12
15
8
4

Mid-point
3
6
9
12
15

The mid-point before 3 is 3-3=0 and the mid-point after 15 is 15+3=18

Page of 91

16

Page 17 of 91

CUMULATIVE FREQUENCY POLYGON


It can be used to represent the cumulative frequencies for the classes.
It is also called ogive.
The cumulative frequency polygon is plotted against the upper boundaries of the
corresponding class. In a cumulative frequency polygon, the cumulative frequencies are
joined together by straight lines whereas in a cumulative frequency curve; a smooth
curve joins the points.
Example
The lengths of 50 fish caught from the pond were measured and the following are the results.
Length
2023262922
25
28
31
Frequenc 3
10
9
28
y
Construct a cumulative frequency polygon. Solution
Class Class
Frequenc Cumulative
boundaries
y
frequency
2019-5-22.5
3
3
22
22.5-25.5
10
13
2325.5-28.5
9
22
25
28.5-31.5
28
50
2628
2931

Page of 91

17

Page 18 of 91

TUTORIAL 2: SUMMARISAING DATA


1. 50 people were asked to record how many radio station they listen to in a week. The
results are shown below:
No. of radio
No. of
station
listeners
0-9
2
10-19
32
20-29
10
30-39
6
(a).What is this table called?
(b).Draw a histogram
2. In a study of reaction times of dogs to a specific stimulus, an animal trainer obtained
the following data, given in seconds. Construct a histogram, frequency polygon and
ogive for the data.
Class
Frequency
limit
2.3-2.9
10
3.0-3.6
12
3.7-4.3
6
4.4-5.0
8
5.1-5.7
4
5.8-6.4
2
3. The number of calories per serving for selected ready-to-eat cereals is listed here.
Construct a frequency distribution using seven classes. Draw a histogram, frequency
polygon and ogive for the data. 130 190 140 80 100 120 220 110 100
210 130 100 90 210 200 120 180 120
190 210 120 200 130 180 260 100 160
190 240 80 120 90 190 200 210 190 180
115 210 110 225 190 130

Page of 91

18

Page 19 of 91

TOPIC 3: DESCRIBING DATA


OBJECTIVES
By the end of this topic, you should be able to;
(a) Calculate mean, median and mode.
(b) Explain advantages and disadvantages of mean, mode and median.
(c) Explain the difference between sample deviation and population deviation
(d) Describe data using measures of variations such as the range, variance and standard deviation.
Any set of measurement has two important properties namely: the central or typical value
and the spread about the value.
MEASURE OF CENTRAL TENDENCY
The main measures of the central tendency are:
Mean
Median
Mode
MEAN
The arithmetic mean of a set of values is the sum of values of the data dividing by the
total number of values. The symbol x (read as
x-bar) represents the sample mean
n

x1 x2 x3 ...xn

i 1

xi

where n represents the total number of values in the

sample.
For a population, the Greek letter (mu) is used for the mean.
N

x1

2 3

... xN

i 1

xi

where N represents the total number of values in the

population.
If some values appear more than once, we may use the following formula.
x fx

Example
Find the mean of the following data:
20, 26, 40, 36, 23, 42, 35, 24, 30.
Page of 91

19

Page 20 of 91

Solution
x
x n

= 30.7
Example
The table below shows the frequency distribution of the number of days on which 100
employs of a firm were late for work in a given month. Using the data, find the mean
number of dates on which an employee is late in a month.
Number of days late Number of employees
1
32
2
25
3
18
4
14
5
11
Solution
x f

Fx

1 32
25
2 18
14
3 11

32
50
54
56
55

4
5

f
100

fx

247

But x fx

Page of 91

20

Page 21 of 91
=2.47
Therefore the mean number of days is 2.4 days.
MEAN FOR GROUPED DATA
Steps to be followed
1. Make a frequency distribution as shown below Class Frequency (f) mid-point(Xm)
f. Xm
2. Find the mid-points of each class and place them in column 3
3. Multiply the frequency by the mid-point for each class and place the product in column
4.
4. Find the sum of columns 2 and 4. In other ways, find f

5. Use the following formula in order to find the mean.

and fxm

x fx

Example
The marks scored by 500 candidates in an examination in which the maximum mark was
50 were:
Mark
Frequency
range
1-5
10
6-10
41
11-15
72
16-20
83
21-25
94
26-30
81
31-35
71
36-40
27
41-45
13
46-50
8
Calculate a mean mark for these candidates.
Solution
Mark
Frequency
range

Page of 91

Mid-point(xm) f.xm

21

Page 22 of 91
1-5
6-10
11-15
16-20
21-25
26-30
31-35
36-40
41-45
46-50

10
41
72
83
94
81
71
27
13
8

3
8
13
18
23
28
33
38
43
48

30
328
936
1494
2162
2268
2343
1026
559
384

fxm
11530

500
But x fx

=23.1
ADVANTAGES OF ARITHMETIC MEAN
1. It is easy to calculate.
2. It uses all the values.
3. It is used in computing other statistics such as variance.
DISADVANTAGRS OF ARITHMETIC MEAN
1. It is affected by extremely high or low values.
2. It can not be read from a graph.
MEDIAN
A statistic which is not affected by a few very unusual extreme scores is the median. Median
is the middle value when the values are arranged in order (either ascending or descending
order). When the data is ordered, it is called data array.
The median is used when one must determine whether the data values fall into the upper
or lower half in the distribution.
Steps in computing the median are:
1. Arrange the data in order.
2. Select the middle point.
NOTE: If the number of value (n) is odd, then the median is the value of the middle
value; if n is even, then the median is the value of the arithmetic mean of the two middle
values.
In other words, (a) if n is odd and M is the value of the median then:
M= the value of the n 1

th

observation

2
Page of 91

22

Page 23 of 91
(b) if n is even, the middle observations are nth

and

the

n 1 th

2 2
observations and then
M= the values of the mean of these two observations
Example
Find the median of 4, 3, 5, 2, 11.
Solution
Arranging in ascending order, we have
2, 3, 4, 5, 11. n= 5(odd)
Median = the value of
=3rd observation
=4.
Example
Six people take shoe sizes: 7, 9, 9, 8, 5, 6. What is the median?
Solution
Arranging in ascending order, we have 5, 6, 7, 8, 9, 9.and n= 6(even)
So M= the mean of 6th
and
the
61 th

observations

2 2
=the mean of 3rd and 4th observations
=
= 7.5
ADVANTAGES OF MEDIAN
1. Its value is not distorted by extreme values.
2. All the observations are used to order the data even though only the middle one or two middle
observations are used in the calculation.
3. It can be illustrated graphically in a very simple way.
DISADVANTAGES OF MIDIAN
1. In a grouped frequency distribution, the value of the median within the median class
can only be an estimate.
2. It is of little use in calculating other statistical measures.
MODE
Mode is the most frequent data value or the value that occurs most often in a data set.
The mode is used when the most typical case is desired.

Page of 91

23

Page 24 of 91
A set observations with one mode is called unimodal, a set of observations with two
modes is called bimodal while a set of observations with more than two modes is called
multimodal.
Example
The monthly salaries of a sample of doctors are: K35000, K58000, K50000, K49000,
K50000, K50000, K60000, K70400, K50000, K40000, K50000, K40000, K65000, K55000.
What is the modal (mode) monthly salary?
Solution
The value which occurs most often is K50000.
Therefore the modal salary is K50000.
Example
Find the modal class for the frequency distribution of miles that 20 runners ran in one week.
Class

Frequency

5.5-10.5
10.5-15.5
15.5-20.5
20.5-25.5
25.5-30.5
30.5-35.5
35.5-40.5

1
2
3
5
4
3
2

Solution
The modal class is 20.5-25.5 since it has the largest frequency.
ADVANTAGES OF THE MODE
1. It is not distorted by extreme values of the observations.
2. It is easy to calculate.
DISADVANTAGES OF THE MODE
1. It can not be used to calculate any further statistics.
2. It may have more than one value.
THE WEIGHTED MEAN
Sometimes, one must find the mean of a data set in which not all values are equally represented.
This type of mean is called weighted mean.
To find the weighted mean, multiple each value by its corresponding weight and divide
the sum of the products by the sum of the weights.
In other words x w1x 1 w2x2 ... wnxn wx

Page of 91

24

Page 25 of 91
w1 w2 ... wn w
Where w1, w2, w3, , wn are the weights and x1, x2,, xn are the value.
Example
A student received an A in English composition 1 (3 credits), a C in Introduction to
Psychology (3 credits), a B in Biology 1 (4 credits) and a D in Physical Education (2
credits). Assuming A=4 grade points, B= 3 grade points, C= 2 grade points, D= 1 grade
point and F= 0 grade point. Find the students grade-point average.
Solution
Course
English composition 1
Introduction to Psychology
Biology 1
Physical Education

Credits (w)
3
3
4
2

w12

Grade (X)
A(4 points)
C(2 points)
B(3 points)
D(1 point)

wx
12
6
12
2

wx

32
But x wx

=
=2.7
MEASURES OF DISPERSION
In statistics, to describe the data set accurately, statisticians must know more than
measures of central tendency. We also need to know the spread of data. There are several
different measures of dispersion. The most important of these (which we will describe in
this section) are:
Range
Variance
Standard deviation.
RANGE
A range is the deference between the highest value and the lowest value.
Example
The weights of the contents of several small bottles are (in grams): 4, 3, 6, 5, 7, 2 and 4.
Find the range.
Solution
The lowest value is 2 and the highest value is 7.
Therefore range = 7-2
=5

Page of 91

25

Page 26 of 91
ADVANTANGES OF THE RANGE
1. It is easy to understand.
3. It is simple to calculate.
4. It is a good measure for comparison as it spans the whole distribution.
DISADVANTAGES OF THE RANGE
1. It uses only two of the observations and so can be distorted by extreme values
2. It can not be used in calculating other functions of the observations.
STANDARD DEVIATION AND VARIANCE
These two measures of dispersion can be discussed in the same section because the standard
deviation is the square root of the variance.
The variance is the average of the squares of the distance each value is from the mean.
The symbol for the population variance is 2 ( is the Greek lowercase sigma).
The formula for the population variance is
2

N
Where x= individual value
Population mean
N= population size
The standard deviation is he square root of the variance. The symbol for the population
standard deviation is and its formula is

x
N

The standard deviation is one of the measures used to describe the variability of a
distribution. It has an additional use which makes it more important than other measures
of dispersions. It is used as a unit to measure the distance between any two observations.
STEPS TO BE FOLLOWED IN CALCULATING STANDARD DEVIATIONS

1. Find the mean ( ) of the data.


2. Subtract the mean from each data value x.
3. Square each result x2 . The reason of squaring is to eliminate the negative sign.
4. Find the sum of the squares, x2 .
5. Divide the sum by N to get the variance, x 2 .
N

Page of 91

26

Page 27 of 91

6. Take the square root of the variance to get standard deviation,

. The
N reason of
squaring is that since the distance were squared; the units of the resultant
numbers are squares of the units of the original raw data. Finding the square root
of the variance puts the standard deviation in the same unit as the raw data.
2

Example
Find the variance and standard deviation for the following set:
10, 66, 50, 30, 40, 20 Solution
The mean, =
= 35
x

x-

(x-) 2

10
60
50
30
40
20

10-35=25
60-35=25
50-35=15
30-35=-5
40-35=5
20-35=15

625
625
225
25
25
225

But 2

1750

N
=
=291.7
Therefore, the variance is 291.7
And 2
Therefore,

N
291.7

=
=17.1
Therefore, the standard deviation is 17.1
SAMPLE VARIANCE AND STANARD DEVITION
The formula for finding sample variance and standard deviation are as follows;
xx2

Page of 91

x x
2

27

Page 28 of 91

Variance, s2

n 1

n1

x x
n 1

Sample standard deviation, s

or

n 1

Where x= individual score

x
=
sample
mean
n = sample sizes
NOTE: Dividing by n-1 gives a slightly larger value and an unbiased estimate of the population
variance.
Example
Find the sample variance and standard deviation for three standard deviation for three students
earnings. The data are in Kwacha.
3, 4, 5
Solution
2 2

but s2 xx

n1

x nx
2

n1

x 3 45 12

x 12 144
2

x 3 4 5 50
2

x x

n
s
n1
2

1144

50
3
31

=
=1
(ii) The standard deviation is 1 1
For the grouped data, we use the following steps;
1. Make a table as shown below and find the mid-point of each class.

Page of 91

28

Page 29 of 91

Class Frequency Mid-pointxm

f. xm

f. xm2

2. Multiply the frequency by the mid-point of each class and place the product in column 4.
3. Multiply the frequency by the square of the mid-point and place the product in column 5

4. Find the sum of columns 2 (n), 4 f.xm and 5 f.xm2 .


5. Substitute in the formula and solve to get the variance.

f .x

.xm
s2 n 1
6. Take the square root to get the standard deviation.
Example
Find the sample variance and standard deviation for the following frequency distribution.
Class
Frequency
limits
13-19
2
20-26
7
27-33
12
34-40
5
41-47
6
48-54
1
55-61
0
62-68
2
Solution
Class limits
Frequency
Mid-pointxm
f. xm
f. xm2
13-19
2
16
32
512
20-26
7
23
161
3703
27-33
12
30
360
10800
34-40
5
37
185
6845
41-47
6
44
264
11616
48-54
1
51
51
2601
55-61
0
58
0
0
62-68
2
65
130
8450

f.xm
=1183

Page of 91

29

f.x
2
m

=38327

Page 30 of 91

s2

38327
351

1126.270588
s 1126.270588
=33.6
Therefore, the sample variance is 1126.3 and standard deviation is 33.6
TUTORIAL 3: DESCRIBING DATA
1. Find the mean, mode, median, variance and standard deviation of the following set
of numbers;
(a) 2, 3, 5, 3, 3
(b) 3, 4, 5, 4, 5, 4, 5, 3, 4, 4
2. Find the weighted mean of 20, 25, 30, 35 if they are assigned weightings of (a)
1, 2, 3, 4
(b) 1, 3, 7, 9 respectively
3. An instructor grades exams, 20%; term paper, 30%; final exam,50%. A student had
grade of 83, 72 and 90 respectively. Find the students final average. Use the
weighted mean.
4. In a class of 29 students, this distribution of quiz scores was recorded.
Class
Frequency
limit
0-2
1
3-5
3
6-8
5
9-11
14
12-14
6
Find the mean, variance and standard deviation of the quiz.
5. A survey was made of the monthly earnings of four Agricultural assistance and the
results are recorded below
K18, 000, K19, 000, K20, 000 and K21, 000
Calculate the following
(a) sample variance
(b) Sample standard deviation.

Page of 91

30

Page 31 of 91

TOPIC 4: PROBABILITY THEORY


OBJECTIVES
By the end of this topic, you should be able to;
(a) Determine sample space.
(b) Find the probability of an event, using classical probability.
(c) Calculate probabilities, applying rules of addition and multiplication.
(d) Distinguish between a discrete random variable and a continuous random
variable.
(e) Construct a probability distribution for a random variable.
(f) Find the mean, variance and standard deviation for the variable of a
binomial distribution.
(g) Find probabilities for outcomes of variables using the Poisson distributions.
(h) Find the area under the standard normal distribution.
(i) Calculate probabilities for normally distributed variable by transforming it
into a standard normal variable.
PROBABILITY
CONCEPTS
(a) EXPERIMET/TRIAL
It is a repeated procedure whose outcome is attributed to chance.
(b) SAMPLE SPACE/PROBABILITY SPACE(S)
It is a set of all possible outcomes of a trial. It is denoted by S
Examples of trials and sample space
(a) Tossing a coin once
S = H,T (b)
Tossing a die once
S 1,2,3,4,5,6
(c) SAMPLE POINT
It is any distinct element of the sample space.
(d) EVENT
It is any subset of a sample space. For exampleS 1,2,3,4,5,6. One event is the
event of picking an even number. If A is an event of picking an even number then
A2,4,6
Example
A die is tossed once. Find
(i) Sample space
(ii) Event of odd numbers.
(iii) Event of even numbers.
(iv) Event of prime numbers.
(v) Event of multiples of 3.

Page of 91

31

Page 32 of 91

(i)

Solutions
S 1,2,3,4,5,6

(ii) A1,3,5
(iii) B2,4,6

(iv) C 1,2,3,5
(v) D3,6

(e) MUTUALLY EXHAUSTIVE EVENTS


These are events which constitute the sample space. i.e. if ABS then A and B
are mutually exhaustive events.
(f) MUTUALLY EXCLUSIVE EVENTS
These are events which dont have any elements in common. That is if AB then A
and B are mutually excusive events.
(g) COMPLEMENT OF AN EVENT
The complement of an event A with respect to the sample space S is the set of
points in S not belonging to the
subset A. A or A/ is often used as a symbol for not A. A and not A are referred to
as complementary events.
(h) PROBABILITY OF AN EVENT
The probability of any event E is
nimber
of
outcomes
Total
number
of
This probability is denoted by
nE

outcomes

in

in

the

sample space

PE

nS
Example
Suppose you toss a coin. Find probability that the outcome is head Solution
S H,T and E
H nH
But PH
nS

Page of 91

32

Page 33 of 91

Example
For a card is drawn from a standard pack of cards, find the probability of getting a
queen. Solution
n(S)= 52 and n(E)= 4
nE
but PE

nS

Example
If a family has three children, find the probability that all the children are girls. Solution
S BBB,BBG,BGB,GBB,BGG,GBG,GGB,GGG
So n(S)= 8 and n(E)= 1
nE
But PE

nS

The following are four basic probability rules


(a) The probability of any event is a number (either a fraction or decimal) between and
including 0 and 1. This is denoted by 0 PE1 NOTE: Rule 1 states that
probabilities can not be negative or greater than 1
(b) If an event E can not occur (i.e the event contains no members in the sample
space) then its probability is 0.
(c) If an event E is certain, then the probability of E is 1.
(d) The sum of the probabilities of the outcomes in the sample space is 1.
RULE FOR COMPLEMENTARY EVENTS

P E

1 PE or PE1 P E or PE P E 1

Example

Page of 91

33

Page 34 of 91
The weather bureau estimates the probability of rain tomorrow to be 0.42. What is the
probability that it does not rain?
Solution
P(no rain) = 1- P(rain)
=1 0.41
=0.58

TWO LAWS OF PROBABILITY


1.ADDITION LAW

Then PABPAPB. This is so if A and B are mutually exclusive


Events.
Or

A BB
PAB PAPBPABand B are not mutually exclusive events.
Example
A card is picked at random from a standard pack of cards. What is the probability it
is
(a) a red heart or a club?
(b) red heart or a king?
Solution
There is no red heart that is a club. So P(r or c) = P(r)+P(c)
=
=
Page of 91

34

Page 35 of 91

(i) Some red hearts are kings


So P(r or k) = P(r) +P(k)-Prk

=
=
=
Example
In a hospital unit there are 8 nurses and 5 physicians; 7 nurses and 3 physicians are
females. If a staff person is selected, find the probability that the subject is a nurse
or a male.
Solution
The sample space is shown below
staff
Female
males
Total
Nurses
7
1
8
3
2
5
physicians
total
10
3
13
There is a nurse who is a male. So
P(nurse or male) = P(nurse) + P(male)- P(nurse male)
=

=
MULTIPLICATION LAWS
1. When two events are independent, the probability of both occurring is
PAB PA.PB
Example
A coin is tossed and a die is rolled. Find the probability of getting a 4 on the die and
a head on the coin.
Solution
P(4 and head) = P(4).P(head)
=
=

Page of 91

35

Page 36 of 91

Example
A card is drawn from a desk and replaced. Then a second card is drawn, find the probability
of getting a queen and the ace.
Solution
P(queen and ace)= P(queen).P(ace)
=

2. P

AB

=
P
= P

But PA/ BPA

A
B

P
P

B/ A
A/ B

when two events are dependent. PB

Example
A card is drawn from a deck and without replacement. Then a second card is
drawn. Find the probability of getting a queen and then ace. Solution
P(queen ace)=

Example
The world wide Insurance Company found that 53% of the residents of a city had
homeowners insurance with the company. Of these clients, 27% also had
automobile insurance with the company. If a resident is selected, find the probability
that the resident has both homeowners and automobile insurances with the World
Wide Insurance Company.
Solution
Let homeowners insurance be H and automobile insurance be A.
So P(H) =.53 P(A/H) =0.27
P(H and A) = P(H).P(A/H)
=0.54 x 0.27
=0.1431
DISCRETE PROBABILTY DISTRIBUTION
RANDOM VARIABLES
A random variable is the numerical outcome of a random experiment, denoted by X.
If the experiment is repeated, different values of X will be obtained and these values
are denoted by small x.

Page of 91

36

Page 37 of 91
If a variable can assume only a specific number of values, such as the outcome for
the roll of a die or the outcome for the toss of a coin, then the variable is called a
discrete variable. Discrete variables have values that can be counted.
DISCRETE PROBABILTY DISTRIBUTION
It consists of the values a random variable can assume and the corresponding
probabilities of the value. The probabilities are determined theoretically or by
observation.
Example
Construct a probability distribution for rolling a single die. Solution
S 1,2,3,4,5,6
P1 ,P2 ,P3 ,P4 ,P5
and
P6
So its distribution is
Outcome x
1 2 3 4 5 6
Probability P(x) 1 1 1 1 1 1
6

Probability distribution can be shown graphically by representing the values of x


on the x-axis and the probabilities P(x) on the y-axis. The graphical representation
of the above example is

NOTE: In this probability distribution, the probabilities are between 0 and 1


0 P X x 1
REQUIREMENT FOR A PROBABILITY DISTRIBUTION
1. The sum of the probabilities of all event in the sample must equal 1; that is

P(X x) 1

Page of 91

37

Page 38 of 91
2. The probabilities of each event in the sample space must be between or equal to
0 and 1. That is 0P(X x)1
Example
Represent graphically the probability distribution for the sample space for tossing
three coins.
Number of heads
0 1 2 3
Probability P(X=x) 1 3 3 1
8

Solution

Example
A random variable has the distribution shown in the following table
X
0
P(X=x) 1

1 2
1

4
4
(a) Find P(X=1)
(b) Represent graphically the probability distribution. Solution
(a) P(X=1) = 1 - 4 4 since P(X x) 1

=1
Page of 91

38

Page 39 of 91

(b) Probability distribution is

1.
2.
3.
4.
5.

BINOMIAL DISTRIBUTION
Many types of probability problems have only two outcomes or can be reduced to
two outcomes. Some examples of experiments where you have two outcomes are:
A win or loss in a football game.
A pass or fail in an examination.
A head or tail on a coin toss.
Effective or ineffective lecturer.
A correct or incorrect item.
Situations like these are called binomial experiments.
A binomial experiment is a probability that satisfies the following four requirements;
(i) Each trial can have reduced to two outcomes that can be considered as either
success or failure.
(ii) There must be a fixed number of trials.
(iii)
The outcome of each trial must be independent of each other.
(iv)The probability of success or failure must remain the same for each trial. The
outcome of a binomial; experiment and the corresponding probabilities of these
outcomes are called a binomial distribution. In a binomial experiment, the
probability of exactly x successes in n trials is
P(X x)

Page of 91

n!

x nx

39

Page 40 of 91
(n x)!x!
Where n= number of trials
x= the number of successes in n trial
NOTE 0 xn
p= the numerical probability of success i.e P(S)
q=The numerical probability of a failure i.e P(F)=1-P(S)
NOTE: n!=n n1) n 2 n3 (n 4
0!=1

...321

Example
Find (a) 3! and (b) 5!
Solutions
(a) 3!= 3x2x1
=6
(b) 5!=5x4x3x2x1
=120
Example
A survey found that one out of five Malawian say he or she has visited a doctor in
any given month. If 10 people are selected at random, find the probability that
exactly 3 will have visited a doctor last month. Solution
n=10, x=3, p= and q= (1 )
x nx
n!
q
but P(X x) p
(n x)!x!
= 10!

103

(103)!3! 5 5
=10987!134

7!321 5 5
=0.201
Example
It has been found that an average 5% of the eggs supplied at NRC market are
cracked. If you buy a box of 6 eggs what is the probability that it contains 2 or
more cracked eggs?
Solution
P=P(cracked)=0.05, q=P(not cracked)= 1- 0.05(0.95) and n=6
P(2 or more cracked)=1-P(less than 2 cracked)
=1- P(0) P(1)

Page of 91

40

Page 41 of 91
P(0)

6! 0.050 0.95

60

(60)!0!
10.956
!
=0.7351
=

0.051 0.956 1

P(1)
!

0.050.955
!
=60.050.955
0.2321
Therefore P (2 or more cracked)=1-(0.7351+0.2321)
=1-0.9672
=0.033 to 3 d.p
For a binomial distribution
(a) mean=np
(b) variance = 2 n.p.qor n.p.(1-p)
=

(c) standard deviation= n.p.q or n.p.1 p


where n= number of trials
p= P(S)
q=P(F)
Example
A die is rolled 480 times. Find the mean, variance and standard deviation of the
number of 2 s that will be rolled. Solution
Here p=P(2)=
n= 480 (a)
mean= n.p

,so q=1=

=480x
=80
(b) variance=n.p.q
=480 x x
=66.7
(c) standard deviation = n.p.q

Page of 91

41

Page 42 of 91
= 66.7
=8.2
Example
Let X be equal to the number of responses out of n=20 questions and let p equal to
the probability of a correct choice on a single question. A candidate in an
examination randomly select one of the 5 possible answers for each question and
hence that p . Find the mean, variance and standard deviation for the student
Solution
Given: n= 20 and p .
So q= 1=

(a) mean = n.p


=20 x
=4
(b) variance = n.p.q
=20 x
=3.2
(c) standard deviation= n.p.q

= 3.2
=1.8 to 1 decimal place
POSSION DISTRIBUTION
The binomial distribution is useful in cases where we take a fixed sample size and
count the number of successes. Sometimes, we dont have a definite sample size
and then the binomial distribution is of no use. In such cases we use another
theoretical distribution called the Poisson distribution
In a Poisson distribution, the probability of exactly r successes is
P(r) em

m
r!
Where m = mean
r= number of events (successes)
e= 2.7183
WHEN TO USE THE POISSON DISTRIBUTION
(1) When n is large i. very large number of trials n 30
Page of 91

42

Page 43 of 91
(2) p is small
(3) The independent variable occurs over a period of time, or a density of items is distributed over a given
area or volume.
Example
If there are 200 typographical randomly distributed in a 500 page manuscript, find
the probability that a given page contains exactly three errors.
Solution
Given: r =3
But m=
=0.4
and P(r) em m r

P(3)

r!
2.7183 0.4
(0.4)

3!
=0.0072

Example
A number of accidents per working week in a particular factory in Lilongwe are
known to follow a Poisson distribution with a mean 0.5. Find the probability that
in a particular week there will be (i) 2 accidents (ii) less than 3 accidents
Solution
(i) P(r) em m r r!

2.7183 0.5
(0.5)

P(2)

2!
=0.08
(ii) P(less than 3 accidents)= P(0) +P(1)+P(2)

P(less

than

2.7183 0.5
0
2.7183 0.5
accidents) 0.5
0.5

0!
Page of 91

43

1!

2!

2.7183 0.5
0.5

Page 44 of 91

=0.6065
NORMAL PROBABILITY DISTRIBUTION
A normal distribution is a continuous, unimodal, symmetric bell shaped distribution
of a variable. For example

PROPERTIES OT THE THEORETICAL NORMAL DISTRIBUTION


(i)The curve is bell-shaped
(ii) The mean, median and mode are equal and located at the center of the
distribution.
(iii)
The curve is unimodal (has only one mode)
(iv)The curve is symmetrical about the mean. What this is that if you cut the normal
curve vertically at the centre, the two halves so formed are images of the other.
(v) The curve is continuous, that is , there are no gaps or holes. For each value of x,
there is a corresponding value of y.
(vi)All the values of y are greater than zero and approaches zero as x approaches

(vii)
The area between the curve and the x-axis is 1 unit or 100%
The standard normal distribution is a normal distribution with a mean of 0 and a
standard deviation of 1. All normally distributed variables can be transformed into the
standard normally distributed variables by using the formula for the standard score.
X

Where Z = Z-score
X = value
=mean
= standard deviation
The Z score is actually the number of standard deviations that a particular X value is
away from the mean.
Example
A student scored 65 on a Mathematics test that had a mean of 50 and a standard
deviation of 10. Calculate his Z score. Solution
X
Page of 91

44

Page 45 of 91
Z

=
=1.5
AREA UNDER THE STANDARD NORMAL CURVE

NOTE: (i) (z1) Area under the curve between the ordinate at Z and the mean.
(ii) P(x1 < x < x2)= Area under the curve between the ordinates (x1) and
(x2 )
Example
Find the area under the normal curve between Z= 0 and Z= 2.34

Solution

Using the normal tables


1. Look for 2.3 in the first column
2. Look for 0.04 in the second column
3. Look for the value where the row of 2.3 and column of 0.04 meet. In other words, they meet at 0.4904.
Therefore the area is 0.4904 Example
Find the area between Z=0 and Z= -1.75
Page of 91

45

Page 46 of 91

Since area is always positive then look for 1.75.


From the normal tables the area is 0.4599
Example
Find the area to the left of Z= -1.93 Solution

Look for 1.93. We get 0.4732


Area to the left of Z =-1.93 = 0.5000-Area between Z=0 and Z=-1.93
=0.5000-0.4732
=0.0268
NOTE: Area between Z=0 and is 0.5000 and area between Z=0 and - is 0.5000
Example
Find the area between Z= 2.00 and Z=2.47

Solution

First step: Find the area between Z=0 and Z=2.47 i.e. 0.4932
Second step: Find the area between Z= 0 and Z=2.000 i.e. 0.4772
Third step: Find the difference of the two i.e. 0.4932- 0.4772
Therefore, the area is 0.060
NOTE: If the area is on the same side of Z=0, subtract the areas.
Example
Find the area between Z=1.68 and Z= -37 Solution

Page of 91

46

Page 47 of 91

First step: Find the area between Z=0 and Z=1.68 i.e. 0.4535
Second step: Find the area between Z= 0 and Z=-1.37 i.e. 0.4147
Third step: Find the sum of the two i.e. 0.4535 +0.4147
Therefore, the area is 0.8682
NOTE: If the areas are on opposite sides of Z=0, add the two areas.
NORMAL PROBABILITIES
P(x1 x x2 ) P(z1 z z2 ) Area under the standard normal curve between the
ordinates at z1 and z2.
Example
Find the probability for each of the following
(a) P(0<z<2.32)
(b) P(z<1.65)
(c) P(z>1.91)
Solutions
(a)

P(0<z<2.32)= area between z=0 and z=2.32


=0.4898 (from normal tables) (b)

P(z<1.65) = area to left of z = 1.65


=0.5000 +0.4505
=0.9505
(c)

Page of 91

47

Page 48 of 91

P(z>1.91) = area to the right of z = 1.91


=0.5000 0.4719
=0.0281
Example
The washing machine breaks down on average after 300 days, the standard deviation being
50 days. Assume that the times taken for washing machines to breakdown are normally
distributed. What is the probability?
(a) that a given washing machine will break down in under 320 days? (b)
That the washing machine breaks down after more than 363 days?
(c) That a given washing machine breaks down somewhat between the 200th and the
350th day? Solutions
X

(a) Z

=
=0.4
P(z<0.4)

P(z<0.4)= area to the left of z=0.4


=0.5000 + 0.1554
=0.6554
(b) Z
= 1.26

Page of 91

48

Page 49 of 91
P(z> 1.26)
P(z>1.26)= are to the right of z=1.26
=0.5000-0.3962
=0.1038
(c) Z1
=-2
P(-2<z<1)

and

Z2
=1

P(-2<z<1)=P(2)+P(1)
=0.4772+0.3413
=0.8185

TUTORIAL 4: PROBABILITY THEORY


1. A bag contains 3 pens, 7 biros and 2 pencils. What is the probability that one object
selected at random is a pen?
2. Find the probability that a number chosen at random from the integers between 10
and 20 inclusive is either a prime or a multiple of 5.
3. Three cards are drawn from an ordinary deck and not replaced. Find the
probability of these.
(a) Getting 3 jacks
(b) Getting an ace, a king and a queen. In order.
4. Determine whether each distribution is a probability distribution.
(a)
X
1 3 5 7 9 11
P(x) 1 1 1 1 1 1
6

(b)
X
Page of 91

6
49

12 15

Page 50 of 91
P(x)

(c)
X
P(x
)

3
0.3

6
8
0.6 0.7

X
P(x
)

5
1.
2

10
0.
3

(d)

(a)
(b)
(c)
(d)
(e)
(f)

15
0.
5

5. A die is tossed 100 times. Find the mean, variance and standard deviation of the
number of 2s that will be rolled.
6. The average number of phone inquiries per day at the poison control is four. Find
the probability it will receive five calls on a given day. Use Poisson approximation.
7. Find the probabilities for each, using the standard normal distribution.
P(0<z<1.69)
P(-1.57<z<0)
P(1.32<z<1.51)
P(Z>2.59)
P(z<-1.77)
P(-0.05<z<1.10)

TOPIC 5: SAMPLING METHODS AND SAMPLING DISTRIBUTIONS

(a)
(b)
(c)
(d)

OBJECTIVES
By the end of this topic, you should be able to;
State the four basic sampling techniques.
Define sampling distribution.
Calculate the standard error.
Use the central limit theorem to solve problems involving sample means for larger
samples
SAMPLING METHODS
To obtain samples that are unbiased i.e. give each subject in the population an equally likely
chance of being selected- statisticians use four basic methods of sampling: random.
Systematic, stratified and cluster sampling.

Page of 91

50

Page 51 of 91

(a) RANDOM SAMPLING


The word random in statistics means a process by which every available item has an
equal chance of being chosen. Therefore, random sampling is a way in which a sample
may be obtained according to which each member of a population has an equal chance
of being included in the sample.
(i)
(ii)

PROPERTIES OF RANDOM SAMPLING


Unbiased: each unit has the same chance of being chosen
Independence: Selection of one unit has no influence of other units.
ADVANTAGES OF RANDOM SAMPLING
It always produces an unbiased sample.
DISADANTAGE OF RANDOM SAMPLING
Sampling units may be difficult or expensive to contact.
(b) SYSTEMATIC SAMPLING
It is a sampling procedure that involves starting with a randomly chosen unit and then
selecting every kth unit thereafter. For example, suppose there were 2000 subjects in
the population and a sample of 50 subjects were needed. Since 200050 is 40, then
k=40 and every 40th subject would be selected. However, the first subject (numbered
between 1 and 40) would be selected at random. Suppose subject 12 were first selected
then the sample would consists of the subjects whose numbers were 12, 52, 92, e.t.c.
until 50 subjects were obtained.
ADVANTAGES OF SYSTEMATIC METHOD
This method is not time consuming.

DISADVANTEGE OF SYSTEMATIC METHOD


A major disadvantage occurs if the sampling frame (a list of every unit in the population)
is arranged so that sampling units with a particular characteristic occur at regular
intervals, causing over or under representation of this characteristic in the sample.
(c) STRATIFIED SAMPLING
It is a sampling method/procedure that involves dividing the population into
subgroups called strata according to various homogeneous characteristics and
then drawing a simple sample from each stratum.
There are two types of population;
(i) Homogenous population: sampling units are all of the same kind and
can reasonably be dealt with in one group.
(ii) Heterogeneous population: sampling units are different from one
another and should be placed in several separate groups.
ADVANTAGE OF STRATIFIED SAMPLING
A sample is not distorted or biased by undue emphasis on extreme observations.
Page of 91

51

Page 52 of 91

DISADVANTEGES OF STRATIFIED SAMPLING


1. It is difficult to define the strata.
2. It can be time consuming.
3. It can be expensive method.
(d) CLUSTER SAMPLING
It is a procedure that involves grouping the population into small groups called clusters
and then observing everything in the sampled clusters.
ADVANTAGE OF CLUSTER SAMPLING
1. It reduces cost.
2. It increases speed in carrying out the survey.
DISADVANTAGES OF CLUSTER SAMPLING
The units within the sample may be homogenous.
SAMPLING DISTRIBUTIONS
Suppose a researcher selects 100 samples of a specific size from a large population and
computes the mean of the each of the 100 samples. These sample means

x1,x2,x3 ..., x n constitutes a sampling distribution of sample means.


A sampling distribution of a sample means is a distribution obtained by using the means
computed from random samples of a specific size taken from a population.
Properties of sampling distribution of sample means
1. The mean of the sampling distribution will be the same as the population mean.i.e

2. The standard deviation of the sample means will be smaller than the standard
deviation of the population. This is so because the extreme values of x must be
smaller than the extreme values of x. The standard deviation, of x depends
upon

the size of the sample is defined as:

This standard deviation is called the standard error of the mean (SE) i.e. SE

Example
What is the standard error for a sample of 100 with a standard deviation of 5? If you
increase the sample size to 200, what change on the standard error do you notice? Solution
SE

Page of 91

52

Page 53 of 91
=
n
5
100

=
=0.5
SE
=
n
5
200

=0.35
Observation: The larger the sample the smaller the standard error.
THE CENTRAL LIMIT THEOREM

It states that the sample mean x of n observations is normally distributed with the mean

and standard deviation .

The Z-score can be found by using the following formula;

NOTE: x is the sample mean and the denominator is the standard error of the mean.
Example
The final examination scores for MAT at NRC are normally distributed with mean 60
and standard deviation 10. A Lecturer teaches one of the sets of MAT and his class has
22 students. What is the probability that the average final examination score for his
class is below 55?
Solution

Z (Central limit theorem)

Page of 91

53

Page 54 of 91
=

n
55
60
10
22

=2.35

P x 55 Pz 2.35

= 0.5000 0.4906 (from normal tables)
=0.0094

TUTORIAL 5: SAMPLING METHODS AND SAMPLING DISTRIBUTIONS


(1) List down four sampling methods, and give an explanation to each one.
(2) What is the mean of the sample means?
(3) What does the central limit theorem say about the distribution of the sample
means?

Page of 91

54

Page 55 of 91
(4) In a sample of 400 people, 175 were males. Find the standard error of the sample
proportion.
(5) The average age of lawyers is 43.6 years, with a standard deviation of 5.1 years. If
a law firm employs 50 lawyers, find the probability that the average age of the
group is greater than 44.2 years odd.

TOPIC 6: HYPOTHESIS TESTING


OBJECTIVES
By the end of this topic, you should be able to;
(a) State the null and alternatives hypothesis.
(b) State the five steps used in hypothesis testing.
(c) Explain the relationship between type 1 and type 11 errors.
Page of 91

55

Page 56 of 91
(d) Tests of hypothesis using the normal distribution.
(e) Use the one-way ANOVA technique to determine if there is a significant difference among three or more
means.
A hypothesis is a statement about the value of a population parameter. An example of a hypothesis
is fifty percent of eligible voters are below the age of twenty-five.
A statistical hypothesis is a conjecture about a population parameter. This conjecture may or
may not be true.
There are two types of statistical hypothesis for each situation: the null hypothesis and the
alternative hypothesis.
NULL HYPOTHESIS
It designated H0 and read H subzero.
It is a statistical hypothesis that states that there is no difference between a parameter and a
specific value.
H stands for hypothesis and the subscript zero stands for no difference.
H0: the value of the population mean given in the problem and it shows that the
sample belongs to the population.
ALTERNATIVE HYPOTHESIS
It is symbolized by H1.
It is a statistical hypothesis that states the existence of a difference between a parameter and
a specific value.
The alternative hypothesis depends on the wording of the problem. The wording can
suggest one of the three possible meanings;
(a) The sample comes from a population and the mean of which is not equal to . In
other words, it may be smaller or larger. Then you take H1: u .
For this alternative, you divide the critical region into two equal parts and put one in
each tail of the distribution as shown in the following figure.

This is called a two tailed.


(b) The sample comes from the population of the mean of which is larger than u0. Then
you take H1: u>u0 and put the whole of the critical region in the right-hand tail of
the distribution as shown below.

This is called an right-tail test.

Page of 91

56

Page 57 of 91
(c) The sample comes from a population with u smaller than u0 so that H1:u<u0 and put
the whole of the critical region in the left-hand of the distribution as shown below.

Example
State the null and alternate hypothesis for each conjecture.
(a) A researcher thinks that if the expectant mothers use vitamin pills, the birth weight
of the babies will increase. The average birth weight of the population is 8.6 kg
(b) An engineer hypothesizes that the mean number of defects can be decreased in a
manufacturing process of compact disks by using robots instead of humans for
certain tasks. The mean number of defective disks per 1000 is 18.
(c) A Psychogist feels that playing soft music during a test will change the result of
the test. The Psychologist is not sure whether the grades will be higher or lower. In
the past, the mean of the score was 73.
Solutions
(a) H0 :u 8.6
H1 :u 8.6
(b) H0 :u 18
H1 :u18
(c) H1 :u 73
H0 :u 73
POSSIBLE OUTCOMES OF A HYPOTHESIS TEST
There are two possibilities for a correct decision and two possibilities for an incorrect decision.
There are two types of errors namely: type 1 and type 11.
A type 1 error occurs if one rejects the null hypothesis when it is true. A type 11 error occurs
if one does not reject the null hypothesis when it is false.
H0 true
H0 false
Error type 1
Correct
decision
Reject H0
Do not reject H0
Correct
Error type 11
decision
TESTS OF HYPOTHESIS USING THE NORMAL DISTRIBUTION
In hypothesis testing, the following steps are recommended;
1. State the hypothesis. Be sure to state both the null and alternative hypothesis.
2. Stating the level of significance.
The level of significance is the maximum probability of committing a type 1 error.
The critical region is the range of values of the test value that indicates that there
is a significant difference and that the null hypothesis should be rejected.
Page of 91

57

Page 58 of 91

The shaded part is called critical region and the unshaded area is called noncritical or
acceptance region.
X

3. Standardizes x. i.e. Z

4. Compare the standardized x to a fixed significance level.


5. Make a decision. Here is where you reject the null hypothesis if the standardize x lies in the critical
region or accept the null hypothesis if the standardized x lies in the non-critical region.
Example
A tomato plants all of the same variety have their height measured after a period of
growth in a green house. The heights are found to be normally distributed with the
mean 83 cm and variance 36 cm2. A tomato plant found growing outside the green
house measures 73.1 cm, after the same period of growth. Does this plant belong to the
same variety of tomato plants in the green house?
Solution
1. Formulate hypothesis
H0: u =83
H1: u83
2. The level of significance is 5% 0r 0.05.
3. Standardize x
X

Z=
= -1.65
4. Comparing the standardized x to a significance level
-1.65 is between -1.96 and 1.69
5. Make decision
Accept H0 i.e. Yes it belongs to same variety.

Page of 91

58

Page 59 of 91
Example
Over a long period, the weights of pots of jam made by a standard process have been
normally distributed with the mean 345g and standard deviation 2.8g. A pot produced just
before the process closed for the day weights 338.5g. Is the process working correctly?

2.

Solution
Formulate hypothesis
H0: u =345 (yes it is working correctly)
H1: u 345
Decision rule 5% level of significance

3.

Standardize x

1.

Z
= -2.32
Comparing the standardized x to a significance level
-2.32< -1.96 (outside acceptance region)
Make decision
Reject H0. In other words, it is not working correctly.

4.
5.

Example
A researcher reports that the average salary of assistant professors is more than K42,
000. A sample of 30 assistants professors has a mean salary of K43, 260. At
0.05 test the claim that assistants professors earn more than K42000 a month. The
standard deviation of the population is K5230.
Solution
1. Formulate hypothesis
H0: u K42000
H1: u K42000
2. Decision rule 5% level of significance
3.

Standardize x

n
43,000
42,000
5230

(Central limit theorem since n 30)

Z=

30

4.
5.

= 1.32
Comparing the standardized x to a significance level
1.32< 1.65(It is not in the acceptance region)
Make decision
Reject H0. In other words, the assistants professors earn not more than K42, 000

Page of 91

59

Page 60 of 91
For comparison of two variances or standard deviations, an F test is used.
s

F s12 2 2 , the sampling distribution of the variance is called the F distribution.


CHARACTERISTICS OF THE F DISTRIBUTION
1. The value of F cannot be negative, because variances are always positive
2. The distribution is positively skewed.
3. The mean value of F is approximately equal to 1.
Example
Find the critical value for an F tests when 0.05, the degrees of freedom for the
numerator are 15 and the degrees of freedom of the denominator are 21.
0.05
Solution
Since this test is a right-tailed with 0.05, use the 0.05 tables. The d.f.N is listed
on the top, and the d.f. D is listed in the left column. The critical value is found where
the row and column intersect in the table. In this case, it is 2.18. See the figure below.

21
22

0.05
2.18
.
.

ANALYSIS OF VARIANCE (ANOVA)


A statistical technique, which helps in making inference whether three or more
samples might come from populations having the same mean; specifically, whether
the differences among the samples might be caused by chance variation.
ANOVA technique uses two estimates of the population variance; the between-group variance
is the variance of the sample means; the within-group variance of the overall variance of all
the values.

d.f.D

d.f.N
...

1
2
.
.
20

Page of 91

60

14

15

Page 61 of 91
When there is no difference in the means, the between-group variance estimate will be
equal to the within-group variance and the F test value will be approximately equal to
1. The null hypothesis will not be rejected.
For a test of the difference among three or more means, the following hypothesis should
be used.
H0: u1=u2=u3==un
H1: At least one mean is different from the others.
The degree of freedom for this F test is
d.f.N = k-1 where k is the number of groups
d.f.D = N-k where N is the sum of the sample sizes of the groups. The
F-test to compare means is always right-tailed.
STEPS TO BE FOLLOWED
1. Find the mean and variance of each sample.x1,s12 ,x2,s22 ..., xk ,sk2

2. Find the grand mean.


xGM

x
N
3. Find the between-group variance.

SB2

ni x i x GM

k 1
4. Find the within-group variance.
SW2

n 1s
i

i2

n 1
i

5. Find the F-test value


F SSW B2 2

Example
A researcher wishes to try three different techniques to lower the blood pressure of
individual diagnosed with high blood pressure. The subjects are randomly assigned to
three groups; the first group takes medication, the second group exercises and the third
group follows a special diet. After four weeks, the reduction in each persons blood
pressure is recorded. At 0.05, test the claim that there is no difference among the
means. The data are shown below.

Page of 91

61

Page 62 of 91

Medication Exercise
10
6
12
8
9
3
15
0
13
2

x1 11.8
s12 5.7

x2 3.8
s22 10.2

Diet
5
9
12
8
4

x3 7.5
s32 10.3

Solution
1. Formulate hypothesis
H0: u1=u2=u3==un
H1: At least one mean is different from the others.
2. Find the critical value
K = 3 and N = 15
d.f.N= 3-1= 2
d.f.D = N-k= 15-3=12
The critical value is 3.89 (from F distribution table)
3. (a) mean and variance of each sample are shown in the table
(b) Find the grand mean
xGM

7.73
(c) Find the between-group variance,

n x
SB2

x GM

k 1
511.8 7.732 53.8 7.732 57.6 7.732

31

80.07
(d) Find the within-group variance.
SW2

Page of 91

n 1s
i

62

Page 63 of 91

n 1
i

515.7

5110.22 5110.32

515151

8.17
(e). Find the F test value
F SSW B2 2

9.17
4. 3.89< 9.19, reject the null hypothesis
5. There is enough evidence to reject the claim and conclude that
at least one mean is different from the others.
Example
A state employee wishes to see if there is a significant difference in the number of
employees at the interchanges of three state toll roads. The data is shown. At
0.05 , can it be concluded that there is a significance difference in the average number
of employees at each interchange?
Pennsylvance
Turnpike
7
14
32
19
10
11

x1 15.5
s12 81.9

Green Bypass
Mon-Fayette Expressway
10
1
1
0
11
1

x2 4.0
s22 25.6

Solution
1. State the hypothesis
H0: u1=u2=u3==un
H1: At least one mean is different from the others.

Page of 91

63

Beaker valley
Expressway
1
12
1
9
1
11

x3 5.8
s32 29.0

Page 64 of 91
2. Find the critical value
K = 3 and N = 18
d.f.N= 3-1= 2
d.f.D = N-k= 18-3=15
The critical value is 3.68 (from F distribution table)
2. (a) mean and variance of each sample are shown in the table
(b) Find the grand mean
xGM

8.4
(c) Find the between-group variance
2

ni x i x GM

SB2

k 1

615.5 8.42 64.0 8.42 65.8 8.42

31

229.59
(d) Find within-group variance

n 1s

SW2

i2

n 1
i

6 181.9

6 125.62 6 129.02

6 16 16 1

45.5
(e) Find the F test value
SB2
F SW 2

Page of 91

64

Page 65 of 91

5.05
4. Make the decision: reject the null hypothesis.
5. Summarize the results: There is evidence that there is a difference among the mean.

TUTORIAL 6: HYPOTHESIS TESTING


1. Explain the difference between a one-tailed and a two-tailed test.
2. State the null and alternative hypothesis for each conjecture, (a) The average age of
tax drivers in Lilongwe city 36.6 years.
(b) The average pulse rate of a female forest assistant is greater than 72 beats
per minute.
(c) The average bowling score of people who enrolled in a basic bowling is
less than 100.
3. List down the five steps a researcher has to follow when testing hypothesis.
4. Experience with a steel-belted radial tire produced rubber tire manufacture
indicates that, on average, a tire travels 50,000 km before it needs to be replaced. In
an effort to increase the amount of kilometers still further, the tire was redesigned,
and other changes were made. Two hundred tires were tested using accelerated-life
testing machines. It was found that average number of kilometers traveled was
43,000 km. Using the 0.05 level of significance, ascertain whether there has a
significant increase in the mean number of km.
(a) State the null hypothesis and alternative hypothesis
(b) Is the test being used a one-tailed or two-tailed test? Explain
(c) Arrive at your decision. Explain the rationale underling your decision.
5. A survey claims that the average cost of a hotel room in Salima is K69.21. To test
the claim, a researcher selects a sample of 30 hotel rooms and finds that the average
cost is K68.43. The standard deviation of the population is K3.72. At
0.05, is there enough evidence to reject the claim?
6. The lengths (in feet) of a random sample of suspension bridges in the United States,
Europe and Asia are shown. At 0.05, is there sufficient evidence to conclude
that there is a difference in mean lengths?
United States Europe Asia

Page of 91

65

Page 66 of 91
4260
3500
2300
2000
1850

5238
4626
4347
3300

6529
4543
3668
3379
2874

7. The numbers (in thousands) of farms per state found in three regions of Malawi are
listed below. Test the claim at 0.05 that the mean number of forms is the
same across these regions.
Northern Malawi Central Malawi Southern Malawi
48
95
29
57
52
40
24
64
40
10
64
68
38
TOPIC 7: TEST OF PROPORTIONS (CHI SQUARE TESTING)
OBJECTIVES
By the end of this topic, you should be able to;
(a) Use chi-squared tables
(b) Calculate the chi-squared statistic of a sample.
(c) State five steps in testing hypothesis using chi-squared test.
(d) Test proportions for homogeneity using chi-square.
This section explains how to conduct a chi-square test of homogeneity. The test is applied to
a single categorical variable from two different populations. It is used to determine whether
frequency counts are distributed identically across different populations.
The hypotheses in this case would be
H0 : p1=p2=p3==pn
H1 : At least one proportion is different from the others
Steps to be followed when using Chi Squared Test are as follows:
1. Formulate the hypotheses 2.
Construct a contingency table.
It is made up of R rows and C columns. It should be noted that row and column
headings do not count in determing the number of rows and columns.
3. Determine the appropriate number of degrees of freedom (d.f).
The degrees of freedom of any contingency table are (rows 1) times (columns -1);
that is d.f = (R-1) (C -1)
4. Compute the test value. To compute the test value, first find the expected values. For each cell of the
contingency table, use the formula
row
sumcolumn sum
Expected
value,
E
grand
total

Page of 91

66

Page 67 of 91
To find the test value, use the formula
E

X 2 O E

Where X2 =Chi-Square
O= Observed frequency/ value
E= Expected frequency/ value
5. Make the decision
6. Summarize the results
ASSUMPTIONS FOR THE CHI-SQUARE HOMOGENEITY TESTS
a. The data are obtained from a random sample
b. The expected value in each cell must be 5 or more.
Example
A researcher selected a sample of 150 seniors from each of the three area high schools and
asked each senior, do you drive to school; in a car owned by either you or your parents?
The data are shown in the table. At = 0.05, test the claim that the proportion of students
who drive their own or their parents cars is the same at all three schools.
Yes
No
Total
Solution
Step 1

School 1
18
32
50

School 2
22
28
50

School 3
16
34
50

Total
56
94
150

state the hypotheses


H0: p1= p2= p3
H1: At least one proportion is different from the others.
Step 2 Construct a contingency table. It is in the question.
Step 3 Degrees of freedom = (2 -1)(3-1)=2
The critical value is 5.991
Step 4 Compute the test value
row sumcolumn sum
Expected
value, E
grand total

E1,3

Page of 91

E1,1

18.67, E1,2

E2,1

31.33, E2,2

18.67 ,

18.67

67

31.33,

Page 68 of 91

E2,3

31.33
E

But X 2 O E

=
1 8 18.67 2 2 18.67 16 18.672 32 31.332 28 31.332 34 31.332

18.67 18.67 18.67 31.33 31.33 31.33


=1.596
Step 5 Make decision
The decision is not to reject the null hypotheses (1.596 < 5.991)
Step 6 Summarize the results
The proportions are equal
Example
A childrens playground equipment manufacturer read in a survey that 55% of all Malawi
playground injuries occur on the monkey bars. The manufacturer wishes to investigate
playground injuries in four different parts of the country to determine if the proportions of
accidents on the monkey bars are equal. The results are shown here. At = 0.05, test the
claim that the proportions are equal.
Accidents
North
South
East west
On monkey bars
15
18
13 16
Not on monkey bars 15
12
17 14
Total
30
30
30 30
Solution
Step 1 State the hypotheses
H0: p1 =p2 =p3 =p4
Hi: At least one proportion is different from the others
Step 2 Construct contingency table
Accidents
North
South
East west Total
On monkey bars
15
18
13 16
62
Not on monkey bars 15
12
17 14
58
Total
30
30
30 30
120
Step 3 Degrees of freedom
d.f = (R-1)(C-1)= (2-1)(4-1)
=3
Therefore, the critical value is 7.81
Step 4 Compute the test value

Page of 91

68

Page 69 of 91

Expected

value,

row
E

sumcolumn
grand total

sum

E1,1

15.5,E1,2

15.5,E1,3

15.5,E1,4

E2,1

14.5,E2,2

14.5,E2,3

14.5,E2,4

15.5
14.5

But X 2 O EE

15 15.5

18 15.5

13 15.5

16 15.5

15 14.5

12 14.5

17 14.5

15.5

15.5

15.5

15.5

14.5

14.5

14.5

= 1.735
Step 5 Make decision
The decision is not to reject the null hypothesis
Step 6 Summarize the results.
The proportions are equal

TUTORIAL 7: TEST OF PROPORTIONS (CHI_SQUARE TESTING)


1. How are the null and alternative hypotheses stated for the test of homogeneity of proportions?
2. How are the expected values computed for each cell in the contingency table?
3. Calculate the value and give the number of degrees for X2 for these contingency tables;
(a) Columns Rows
1 2
3
1
35 16 84
2
120 92 206
(b) Columns Rows
1 2
3
1
37 34 93
Page of 91

69

14

14

14.5

Page 70 of 91
2
1
2

31
16

131
216

66 57 113
Category
Population
1 2
3 4
40 17 3
35 22 8
4. Test the hypothesis that the proportions is the same for all three age groups
Age groups
25 and under over 25 and under 50 50 and 0ver Total
Claim
40
35
60
135
No claim
60
65
40
165
Total
100
100
100
300
5. Test the hypothesis that the proportions of individual in categories 1, 2, 3 and 4 are the same
in populations 1 and 2.
Category
Population 1
2
3
4
40 17
3
35 22
8

TOPIC 8: LINEAR REGRESSION AND CORRELATION ANALYSIS


OBJECTIVES
By the end of this topic, you should be able to;
(a) Draw a scatter plot or diagram for a set of ordered pairs.
(b) Define the word correlation.
(c) State types of correlation.
(d) Compute correlation coefficient.
(e) Compute the equation of the regression line.
(f) Describe how correlation analysis is done.
Thus far, we have discussed statistical problems in which only one variable has been
measured. This topic deals with problems in which two variables are measured. Attempts can
then be made to discover whether or not there appears to be some form of connection
between the two variables.
Page of 91

70

Page 71 of 91

SCATTER DIAGRAMS
Consider the following set of pairs of values
x
1
2
3
4
5
6
y
2
3.5
3.75
4.0
4.5
5.5
These (x, y) pairs of values form an example of a bivariate distribution. When they are
plotted on graph paper as shown below, the result is called a scatter diagram, scatter gram,
or scatter plot.

A scatter diagram is a visual way to describe the nature of the relationship between the independent
and dependent.

Example
The marks of ten candidates in each of two examinations are given below.
Examination
8
10
18
23
29
32
35
38
42
1
Examination
10 12
20
20
25
30
29
31
36
2
Plot this information on a scatter diagram
Solution

Page of 91

71

48
35

Page 72 of 91

CORRELATION
Correlation is a statistics method used to determine whether a relation between variables
exists. There are two types of correlations, namely: (a) linear correlation and (b) nonlinear
correlation.
LINEAR CORRELATION
The correlation is said to be linear when the relationship between the two variables is linear.
In other words, straight lines can represent all the points. There are two types of linear
correlations namely; positive and negative correlations.
POSITIVE LINEAR CORRELATION
The points have the appearance of clustering about a line that slopes up to the right.

NEGATIVE LINEAR CORRELATION

Page of 91

72

Page 73 of 91
The points have the appearance of clustering about a line that slopes down to the right.

NON-LINEAR CORRELATION
Here a straight line cannot represent the points.

REGRESSION

Page of 91

73

Page 74 of 91
Once a scatter diagram has been produced, the next problem is to determine the line to
which the points approximate. In order to determine the relationship between x and y, we
need to know what straight line to draw through the collection of points on the scatter
diagram. It will not go through all the points but will lie somewhat in the midst of the
collection of points and it will slope in the direction suggested by the points such a line is
called a regression line or line of best fit.
There are two methods for drawing a regression line a graphing method and a
mathematical method.
(a) GRAPHICAL METHOD

It can be proved that the regression line must pass through x


,
y i.e. at the point

whose coordinates are the means of x and y values. This point is called the mean
value
STEPS TO BE FOLLOWED WHEN USING GRAPHICAL METHOD

1. Calculate the means x and y of the two variables.


2. Plot the point corresponding to this pair of values on the scatter diagram.
3. Using a ruler, draw a straight line through the mean centre and lying as evenly as you can judge, among
the other points on the diagram.
Example
The following table gives the marks of eight students in each of two examinations in Mathematics
and Statistics. Construct a scatter diagram and draw a line of best fit.
Mathematics 39 61 49 64 42 72 52 57
statistics
44 62 54 70 46 76 60 64
Solution

y
8

MATHEMATICAL METHOD
The equation of a straight line is usually given as y = mx + c where m is the gradient and
c is the y-intercept. In statistics, this equation can be written also as y = a + bx
Where a is the y-intercept and b the slope/gradient and it is called the equation of regression
line.
If the equation of a regression line is y= mx + c then

Page of 91

74

Page 75 of 91
m nxy2xx
y , nx

c ymx or c

y
mx n

n= number of pairs of readings. m and c are called regression coefficients. The


main use of a regression line is to calculate values of the dependent variables not
observed in the data set.
Example
The following table shows the examination marks in Mathematics and Physics for 11 students.
Student
A
B
C
D
E
F
G
H
I
J
Mathematics 21
28
39
49
55
64
68
79
86
94
Physics
33
37
43
39
51
51
53
46
59
58
1. Plot these results on a graph
2. Find a regression line
3. Draw the regression line.
4. Estimate the Physics mark corresponding to a Mathematics mark of 50.
Solutions
x(maths)

x 578
57.8 n

10

y 470 47 yphysics

n
10
Therefore, the mean center is (57.8, 47)
m n xy2xx 2 y n
x

0.326
THE CORRELATION COEFFICIENT

Page of 91

75

Page 76 of 91
To measure the strength, or intensity of the correlation in a particular case, we
calculate a linear correlation coefficient, which we indicate by the smaller letter r. The
formula for a linear correlation coefficient, r is
nxy xy

r
n

x x ny y
2

Where n = number of data pairs


STEPS TO BE FOLLOWED WHEN CALCULATING CORRELATION COEFFICIENT.
1. Make a table as shown below
x

x2

y2

xy

2. Find the values of x2, y2 and xy and place these values in the corresponding
columns of the table.
3. Substitute in the formula and solve for r.
Example
Suppose we are given the following pairs of x and y
x
y

10
5
Calculate r.
Solution

x
10
14
7
12
5
6

x 54

Page of 91

14
3
Y
5
3
5
2
7
8

y 30

7
5

12
2

x2
100
196
49
144
25
36

x 550
2

76

5
7

y2
25
8
25
4
49
64

y 176
2

6
8
xy
50
42
35
24
35
48

xy 234

Page 77 of 91

But r
r
r
r
r

xy xy
n x x n y

y

2

6 234 5430

6 550 54 6 176 30
2

1404 1620

3300

216
59904
216

2916 1056 900

24475
r =- 0.88 to 2 d.p.

Example
Compute r for the data obtained in a study of age and systolic blood pressure of six
randomly selected subjects. The data are shown in the table below.
Subject
A
B
C
D
E
F

Age x
43
48
56
61
67
70

Solution
x
43
48
56
61
67
70

x2
1849
2304
3136
3721
4489
4900

Y
128
120
135
143
141
152

x 345

Pressure y
128
120
135
143
141
152

y 819

y2
16384
14400
18225
20449
19881
23881

112443

20399
Here n =6 and
nxyxy

r
n

Page of 91

x x ny y
2

6 47634345819

77

xy
5504
5760
7560
8723
9447
10640

xy
47634

Page 78 of 91
(6203993452)(61124438192)
r = 0.897
CHARACTERISTICS OF CORRELATION COFFICIENT
a. The correlation coefficient is always between -1 and +1 inclusive
b. A correlation coefficient of -1.0 occurs when there is perfect negative correlation
i.e. all the points lie exactly on a straight line sloping down from left to right.
c. A correlation of 0 occurs when there is no correlation.
d. A correlation of 1.0 occurs when there is a perfect positive correlation i.e. all the points
lie exactly on a straight line sloping upwards from left to right.
e. A correlation of between 0 and +1 or 0 and -1.0 indicates that the variables are partially
correlated.
CORRELATION ANALYSIS
Correlation - an analysis method used to decide whether there is a statistically significant
relationship between two variables.
In correlation analysis, you perform the following steps
1. Draw the scatter plot for the variables
2. Compute the value of the correlation coefficient
3. State the hypotheses
The hypotheses will be H0: p=0 means there is a correlation
H1: p 0 means that there is a significant correlation
between the variables in the population.
Note: p is called population correlation coefficient.
4. Test the significance of the correlation at the given
coefficiency is
t

NOTES :(a) t test for the correlation

n
2 with degree of freedom equal to n-2
1 r
(b) the two tailed critical values are used. These values are found in the
t distribution tables.
5. Give a grief explanation of the type of relationship.

Page of 91

78

Page 79 of 91

Example
A researcher wishes to determine if a persons age is related to the number of hours, he
or she exercises per week. The data for the sample are shown here.
Age x
Hours
y

1
8
1
0

26 32 38 52
5

59

1.5 1

(a) Draw the scatter diagram for the variables (b)


Compute the value of the correlation coefficient (c)
State the hypotheses of the correlation coefficient.
(d) Test the significance of the correlation coefficient at 0.05
(e) Give a brief explanation of the type of relationship.
Solution
(a)

(b)
x
18
26
32
38
52
59

Page of 91

y
10
5
2
3
1.5
1

x2
324
676
1024
1444
2704
3481

y2
100
25
4
9
2.25
1

79

xy
180
130
64
114 78
59

Page 80 of 91

xy

225

22.5

9653

141.2

625

n= 6
r nxyxy

x
x ny
y
2

6625 22522.5

69653 2256141.25 22.5


13125
2488736.25
0.832
(c) Stating hypotheses
H0: p = 0 and1:Hp 0
( d) t r

n 2
1 r2

0.832

6 2
1 0. 832 2

2.999

Since there are 6 2 =4 degree of freedom, so its critical value is 2.78 (e).
There is a significant between a person age and the number of hours he or she
exercises.

Page of 91

80

Page 81 of 91

TUTORIAL 8: SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS


1. The following table gives the marks of eight students in each of two examinations
in Mathematics and Statistics. Construct a scatter diagram.
Mathematics 39 61 49 64 42 72 52 57
Statistics
44 62 54 70 46 76 60 64
2. The I.Q. scores and the corresponding scores in an arithmetic test of ten boys in a
class are shown in the following table:
I.Q.
96 98 104 110 117 123 126 128 130 132
Arithmetic test 27 20 50 38 57 68 54 63 78 89
(a). Plot this data in a scatter diagram.
(b). Calculate the mean I.Q and Arithmetic test
(c). Plot mean in the scatter diagram and call it M.
(d). Draw the line of best fit passing through M.
(e). Calculate the equation of line of best fit. (f).
Hence estimate the likely Arithmetic score of
(i). a boy with an I.Q of 112.
(ii).a boy with
an I.Q of 140.
3. Then number of calories and the number of milligrams of cholesterol for random sample
of fast chicken sandwiches from seven restaurants are shown here.
Calories x

(c)
(d)
(e)
(f)
(g)

39
0
43

535 720 300 430 500 440

Cholesterol
45 80 50 55
y
Compute the scatter plot for the variables.
Compute the value of the correlation coefficient.
State the hypotheses.
Test the significance of the correlation at 0.05
Give a brief explanation of the type of relationship.

Page of 91

81

52

60

Page 82 of 91

THE BINOMIAL DISTRIBUTION


Area between 0 and z

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916

Page of 91

82

Page 83 of 91
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990

Page of 91

83

Page 84 of 91

F DISTRIBUTION
F Table for alpha=.05 .

df
2/
df
1

10

12

15

20

24

30

40

60

12
0

IN
F

16 19 21 22 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25
1.4 9.5 5.7 4.5 0.1 3.9 6.7 8.8 0.5 1.8 3.9 5.9 8.0 9.0 0.0 1.1 2.1 3.2 4.3
47 00 07 83 61 86 68 82 43 81 06 49 13 51 95 43 95 52 14
6
0
3
2
9
0
4
7
3
7
0
9
1
8
1
2
7
9
4

18. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19.
51 00 16 24 29 32 35 37 38 39 41 42 44 45 46 47 47 48
28 00 43 68 64 95 32 10 48 59 25 91 58 41 24 07 91 74

10. 9.5 9.2 9.1 9.0 8.9 8.8 8.8 8.8 8.7 8.7 8.7 8.6 8.6 8.6 8.5 8.5 8.5 8.5
12 52 76 17 13 40 86 45 12 85 44 02 60 38 16 94 72 49 26
80
1
6
2
5
6
7
2
3
5
6
9
2
5
6
4
0
4
4

7.7 6.9 6.5 6.3 6.2 6.1 6.0 6.0 5.9 5.9 5.9 5.8 5.8 5.7 5.7 5.7 5.6 5.6
08 44 91 88 56 63 94 41 98 64 11 57 02 74 45 17 87 58
6
3
4
2
1
1
2
0
8
4
7
8
5
4
9
0
7
1

6.6 5.7 5.4 5.1 5.0 4.9 4.8 4.8 4.7 4.7 4.6 4.6 4.5 4.5 4.4 4.4 4.4 4.3 4.3
07 86 09 92 50 50 75 18 72 35 77 18 58 27 95 63 31 98 65
9
1
5
2
3
3
9
3
5
1
7
8
1
2
7
8
4
5
0

5.9 5.1 4.7 4.5 4.3 4.2 4.2 4.1 4.0 4.0 3.9 3.9 3.8 3.8 3.8 3.7 3.7 3.7 3.6
87 43 57 33 87 83 06 46 99 60 99 38 74 41 08 74 39 04 68
4
3
1
7
4
9
7
8
0
0
9
1
2
5
2
3
8
7
9

5.5 4.7 4.3 4.1 3.9 3.8 3.7 3.7 3.6 3.6 3.5 3.5 3.4 3.4 3.3 3.3 3.3 3.2 3.2
91 37 46 20 71 66 87 25 76 36 74 10 44 10 75 40 04 67 29
4
4
8
3
5
0
0
7
7
5
7
7
5
5
8
4
3
4
8

5.3 4.4 4.0 3.8 3.6 3.5 3.5 3.4 3.3 3.3 3.2 3.2 3.1 3.1 3.0 3.0 3.0 2.9 2.9
17 59 66 37 87 80 00 38 88 47 83 18 50 15 79 42 05 66 27
7
0
2
9
5
6
5
1
1
2
9
4
3
2
4
8
3
9
6

Page of 91

84

19.
49
57

5.6
28
1

Page 85 of 91

5.1 4.2 3.8 3.6 3.4 3.3 3.2 3.2 3.1 3.1 3.0 3.0 2.9 2.9 2.8 2.8 2.7 2.7 2.7
17 56 62 33 81 73 92 29 78 37 72 06 36 00 63 25 87 47 06
4
5
5
1
7
8
7
6
9
3
9
1
5
5
7
9
2
5
7

10

4.9 4.1 3.7 3.4 3.3 3.2 3.1 3.0 3.0 2.9 2.9 2.8 2.7 2.7 2.6 2.6 2.6 2.5 2.5
64 02 08 78 25 17 35 71 20 78 13 45 74 37 99 60 21 80 37
6
8
3
0
8
2
5
7
4
2
0
0
0
2
6
9
1
1
9

11

4.8 3.9 3.5 3.3 3.2 3.0 3.0 2.9 2.8 2.8 2.7 2.7 2.6 2.6 2.5 2.5 2.4 2.4 2.4
44 82 87 56 03 94 12 48 96 53 87 18 46 09 70 30 90 48 04
3
3
4
7
9
6
3
0
2
6
6
6
4
0
5
9
1
0
5

12

4.7 3.8 3.4 3.2 3.1 2.9 2.9 2.8 2.7 2.7 2.6 2.6 2.5 2.5 2.4 2.4 2.3 2.3 2.2
47 85 90 59 05 96 13 48 96 53 86 16 43 05 66 25 84 41 96
2
3
3
2
9
1
4
6
4
4
6
9
6
5
3
9
2
0
2

13

4.6 3.8 3.4 3.1 3.0 2.9 2.8 2.7 2.7 2.6 2.6 2.5 2.4 2.4 2.3 2.3 2.2 2.2 2.2
67 05 10 79 25 15 32 66 14 71 03 33 58 20 80 39 96 52 06
2
6
5
1
4
3
1
9
4
0
7
1
9
2
3
2
6
4
4

14

4.6 3.7 3.3 3.1 2.9 2.8 2.7 2.6 2.6 2.6 2.5 2.4 2.3 2.3 2.3 2.2 2.2 2.1
00 38 43 12 58 47 64 98 45 02 34 63 87 48 08 66 22 77
1
9
9
2
2
7
2
7
8
2
2
0
9
7
2
4
9
8

2.1
30
7

15

4.5 3.6 3.2 3.0 2.9 2.7 2.7 2.6 2.5 2.5 2.4 2.4 2.3 2.2 2.2 2.2 2.1 2.1
43 82 87 55 01 90 06 40 87 43 75 03 27 87 46 04 60 14
1
3
4
6
3
5
6
8
6
7
3
4
5
8
8
3
1
1

2.0
65
8

16

4.4 3.6 3.2 3.0 2.8 2.7 2.6 2.5 2.5 2.4 2.4 2.3 2.2 2.2 2.1 2.1 2.1 2.0 2.0
94 33 38 06 52 41 57 91 37 93 24 52 75 35 93 50 05 58 09
0
7
9
9
4
3
2
1
7
5
7
2
6
4
8
7
8
9
6

17

4.4 3.5 3.1 2.9 2.8 2.6 2.6 2.5 2.4 2.4 2.3 2.3 2.2 2.1 2.1 2.1 2.0 2.0
51 91 96 64 10 98 14 48 94 49 80 07 30 89 47 04 58 10
3
5
8
7
0
7
3
0
3
9
7
7
4
8
7
0
4
7

18

4.4 3.5 3.1 2.9 2.7 2.6 2.5 2.5 2.4 2.4 2.3 2.2 2.1 2.1 2.1 2.0 2.0 1.9 1.9
13 54 59 27 72 61 76 10 56 11 42 68 90 49 07 62 16 68 16
9
6
9
7
9
3
7
2
3
7
1
6
6
7
1
9
6
1
8

19

4.3 3.5 3.1 2.8 2.7 2.6 2.5 2.4 2.4 2.3 2.3 2.2 2.1 2.1 2.0 2.0 1.9 1.9
80 21 27 95 40 28 43 76 22 77 08 34 55 14 71 26 79 30
7
9
4
1
1
3
5
8
7
9
0
1
5
1
2
4
5
2

20

4.3 3.4 3.0 2.8 2.7 2.5 2.5 2.4 2.3 2.3 2.2 2.2 2.1 2.0 2.0 1.9 1.9 1.8 1.8
51 92 98 66 10 99 14 47 92 47 77 03 24 82 39 93 46 96 43
2
8
4
1
9
0
0
1
8
9
6
3
2
5
1
8
4
3
2

21

4.3 3.4 3.0 2.8 2.6 2.5 2.4 2.4 2.3 2.3 2.2 2.1 2.0 2.0 2.0 1.9 1.9 1.8 1.8
24 66 72 40 84 72 87 20 66 21 50 75 96 54 10 64 16 65 11
8
8
5
1
8
7
6
5
0
0
4
7
0
0
2
5
5
7
7

Page of 91

85

1.9
60
4

1.8
78
0

Page 86 of 91

22

4.3 3.4 3.0 2.8 2.6 2.5 2.4 2.3 2.3 2.2 2.2 2.1 2.0 2.0 1.9 1.9 1.8 1.8 1.7
00 43 49 16 61 49 63 96 41 96 25 50 70 28 84 38 89 38 83
9
4
1
7
3
1
8
5
9
7
8
8
7
3
2
0
4
0
1

23

4.2 3.4 3.0 2.7 2.6 2.5 2.4 2.3 2.3 2.2 2.2 2.1 2.0 2.0 1.9 1.9 1.8 1.8 1.7
79 22 28 95 40 27 42 74 20 74 03 28 47 05 60 13 64 12 57
3
1
0
5
0
7
2
8
1
7
6
2
6
0
5
9
8
8
0

24

4.2 3.4 3.0 2.7 2.6 2.5 2.4 2.3 2.3 2.2 2.1 2.1 2.0 1.9 1.9 1.8 1.8 1.7 1.7
59 02 08 76 20 08 22 55 00 54 83 07 26 83 39 92 42 89 33
7
8
8
3
7
2
6
1
2
7
4
7
7
8
0
0
4
6
0

25

4.2 3.3 2.9 2.7 2.6 2.4 2.4 2.3 2.2 2.2 2.1 2.0 2.0 1.9 1.9 1.8 1.8 1.7 1.7
41 85 91 58 03 90 04 37 82 36 64 88 07 64 19 71 21 68 11
7
2
2
7
0
4
7
1
1
5
9
9
5
3
2
8
7
4
0

26

4.2 3.3 2.9 2.7 2.5 2.4 2.3 2.3 2.2 2.2 2.1 2.0 1.9 1.9 1.9 1.8 1.8 1.7 1.6
25 69 75 42 86 74 88 20 65 19 47 71 89 46 01 53 02 48 90
2
0
2
6
8
1
3
5
5
7
9
6
8
4
0
3
7
8
6

27

4.2 3.3 2.9 2.7 2.5 2.4 2.3 2.3 2.2 2.2 2.1 2.0 1.9 1.9 1.8 1.8 1.7 1.7
10 54 60 27 71 59 73 05 50 04 32 55 73 29 84 36 85 30
0
1
4
8
9
1
2
3
1
3
3
8
6
9
2
1
1
6

1.6
71
7

28

4.1 3.3 2.9 2.7 2.5 2.4 2.3 2.2 2.2 2.1 2.1 2.0 1.9 1.9 1.8 1.8 1.7 1.7
96 40 46 14 58 45 59 91 36 90 17 41 58 14 68 20 68 13
0
4
7
1
1
3
3
3
0
0
9
1
6
7
7
3
9
8

1.6
54
1

29

4.1 3.3 2.9 2.7 2.5 2.4 2.3 2.2 2.2 2.1 2.1 2.0 1.9 1.9 1.8 1.8 1.7 1.6 1.6
83 27 34 01 45 32 46 78 22 76 04 27 44 00 54 05 53 98 37
0
7
0
4
4
4
3
3
9
8
5
5
6
5
3
5
7
1
6

30

4.1 3.3 2.9 2.6 2.5 2.4 2.3 2.2 2.2 2.1 2.0 2.0 1.9 1.8 1.8 1.7 1.7 1.6
70 15 22 89 33 20 34 66 10 64 92 14 31 87 40 91 39 83
9
8
3
6
6
5
3
2
7
6
1
8
7
4
9
8
6
5

40

4.0 3.2 2.8 2.6 2.4 2.3 2.2 2.1 2.1 2.0 2.0 1.9 1.8 1.7 1.7 1.6 1.6 1.5 1.5
84 31 38 06 49 35 49 80 24 77 03 24 38 92 44 92 37 76 08
7
7
7
0
5
9
0
2
0
2
5
5
9
9
4
8
3
6
9

60

4.0 3.1 2.7 2.5 2.3 2.2 2.1 2.0 2.0 1.9 1.9 1.8 1.7 1.7 1.6 1.5 1.5 1.4
01 50 58 25 68 54 66 97 40 92 17 36 48 00 49 94 34 67
2
4
1
2
3
1
5
0
1
6
4
4
0
1
1
3
3
3

12
0

3.9 3.0 2.6 2.4 2.2 2.1 2.0 2.0 1.9 1.9 1.8 1.7 1.6 1.6 1.5 1.4 1.4 1.3 1.2
20 71 80 47 89 75 86 16 58 10 33 50 58 08 54 95 29 51 53
1
8
2
2
9
0
8
4
8
5
7
5
7
4
3
2
0
9
9

in
f

3.8 2.9 2.6 2.3 2.2 2.0 2.0 1.9 1.8 1.8 1.7 1.6 1.5 1.5 1.4 1.3 1.3 1.2 1.0
41 95 04 71 14 98 09 38 79 30 52 66 70 17 59 94 18 21 00
5
7
9
9
1
6
6
4
9
7
2
4
5
3
1
0
0
4
0

Page of 91

86

1.6
22
3

1.3
89
3

Page 87 of 91

Right tail areas for the Chi-square


Distribution

df\
a
rea

.995

.990

.975

.950

.900

.750

.500

.250

.100

.050

.025

.010

.005

0.00
0
04

0.00
0
16

0.00
0
98

0.00
3
93

0.01
5
79

0.10
1
53

0.45
4
94

1.32
3
30

2.70
5
54

3.84
1
46

5.02
3
89

6.63
4
90

7.87
9
44

0.01
0
03

0.02
0
10

0.05
0
64

0.10
2
59

0.21
0
72

0.57
5
36

1.38
6
29

2.77
2
59

4.60
5
17

5.99
1
46

7.37
7
76

9.21
0
34

10.5
9
663

0.07
1
72

0.11
4
83

0.21
5
80

0.35
1
85

0.58
4
37

1.21
2
53

2.36
5
97

4.10
8
34

6.25
1
39

7.81
4
73

9.34
8
40

11.3
4
487

12.8
3
816

0.20
6
99

0.29
7
11

0.48
4
42

0.71
0
72

1.06
3
62

1.92
2
56

3.35
6
69

5.38
5
27

7.77
9
44

9.48
7
73

11.1
4
329

13.2
7
670

14.8
6
026

0.41
1
74

0.55
4
30

0.83
1
21

1.14
5
48

1.61
0
31

2.67
4
60

4.35
1
46

6.62
5
68

9.23
6
36

11.0
7
050

12.8
3
250

15.0
8
627

16.7
4
960

0.67
5
73

0.87
2
09

1.23
7
34

1.63
5
38

2.20
4
13

3.45
4
60

5.34
8
12

7.84
0
80

10.6
4
464

12.5
9
159

14.4
4
938

16.8
1
189

18.5
4
758

0.98
9
26

1.23
9
04

1.68
9
87

2.16
7
35

2.83
3
11

4.25
4
85

6.34
5
81

9.03
7
15

12.0
1
704

14.0
6
714

16.0
1
276

18.4
7
531

20.2
7
774

1.34
4
41

1.64
6
50

2.17
9
73

2.73
2
64

3.48
9
54

5.07
0
64

7.34
4
12

10.2
1
885

13.3
6
157

15.5
0
731

17.5
3
455

20.0
9
024

21.9
5
495

1.73
4
93

2.08
7
90

2.70
0
39

3.32
5
11

4.16
8
16

5.89
8
83

8.34
2
83

11.3
8
875

14.6
8
366

16.9
1
898

19.0
2
277

21.6
6
599

23.5
8
935

10

2.15
5
86

2.55
8
21

3.24
6
97

3.94
0
30

4.86
5
18

6.73
7
20

9.34
1
82

12.5
4
886

15.9
8
718

18.3
0
704

20.4
8
318

23.2
0
925

25.1
8
818

Page of 91

87

Page 88 of 91

11

2.60
3
22

3.05
3
48

3.81
5
75

4.57
4
81

5.57
7
78

7.58
4
14

10.3
4
100

13.7
0
069

17.2
7
501

19.6
7
514

21.9
2
005

24.7
2
497

26.7
5
685

12

3.07
3
82

3.57
0
57

4.40
3
79

5.22
6
03

6.30
3
80

8.43
8
42

11.3
4
032

14.8
4
540

18.5
4
935

21.0
2
607

23.3
3
666

26.2
1
697

28.2
9
952

13

3.56
5
03

4.10
6
92

5.00
8
75

5.89
1
86

7.04
1
50

9.29
9
07

12.3
3
976

15.9
8
391

19.8
1
193

22.3
6
203

24.7
3
560

27.6
8
825

29.8
1
947

14

4.07
4
67

4.66
0
43

5.62
8
73

6.57
0
63

7.78
9
53

10.1
6
531

13.3
3
927

17.1
1
693

21.0
6
414

23.6
8
479

26.1
1
895

29.1
4
124

31.3
1
935

4.60
0

5.22
9

6.26
2

7.26
0

8.54
6

11.0
3

14.3
3

18.2
4

22.3
0

24.9
9

27.4
8

30.5
7

32.8
0

92

35

14

94

76

654

886

509

713

579

839

791

132

16

5.14
2
21

5.81
2
21

6.90
7
66

7.96
1
65

9.31
2
24

11.9
1
222

15.3
3
850

19.3
6
886

23.5
4
183

26.2
9
623

28.8
4
535

31.9
9
993

34.2
6
719

17

5.69
7
22

6.40
7
76

7.56
4
19

8.67
1
76

10.0
8
519

12.7
9
193

16.3
3
818

20.4
8
868

24.7
6
904

27.5
8
711

30.1
9
101

33.4
0
866

35.7
1
847

18

6.26
4
80

7.01
4
91

8.23
0
75

9.39
0
46

10.8
6
494

13.6
7
529

17.3
3
790

21.6
0
489

25.9
8
942

28.8
6
930

31.5
2
638

34.8
0
531

37.1
5
645

19

6.84
3
97

7.63
2
73

8.90
6
52

10.1
1
701

11.6
5
091

14.5
6
200

18.3
3
765

22.7
1
781

27.2
0
357

30.1
4
353

32.8
5
233

36.1
9
087

38.5
8
226

20

7.43
3
84

8.26
0
40

9.59
0
78

10.8
5
081

12.4
4
261

15.4
5
177

19.3
3
743

23.8
2
769

28.4
1
198

31.4
1
043

34.1
6
961

37.5
6
623

39.9
9
685

21

8.03
3
65

8.89
7
20

10.2
8
290

11.5
9
131

13.2
3
960

16.3
4
438

20.3
3
723

24.9
3
478

29.6
1
509

32.6
7
057

35.4
7
888

38.9
3
217

41.4
0
106

22

8.64
2
72

9.54
2
49

10.9
8
232

12.3
3
801

14.0
4
149

17.2
3
962

21.3
3
704

26.0
3
927

30.8
1
328

33.9
2
444

36.7
8
071

40.2
8
936

42.7
9
565

23

9.26
0
42

10.1
9
572

11.6
8
855

13.0
9
051

14.8
4
796

18.1
3
730

22.3
3
688

27.1
4
134

32.0
0
690

35.1
7
246

38.0
7
563

41.6
3
840

44.1
8
128

24

9.88
6
23

10.8
5
636

12.4
0
115

13.8
4
843

15.6
5
868

19.0
3
725

23.3
3
673

28.2
4
115

33.1
9
624

36.4
1
503

39.3
6
408

42.9
7
982

45.5
5
851

15

Page of 91

88

Page 89 of 91

25

10.5
1
965

11.5
2
398

13.1
1
972

14.6
1
141

16.4
7
341

19.9
3
934

24.3
3
659

29.3
3
885

34.3
8
159

37.6
5
248

40.6
4
647

44.3
1
410

46.9
2
789

26

11.1
6
024

12.1
9
815

13.8
4
390

15.3
7
916

17.2
9
188

20.8
4
343

25.3
3
646

30.4
3
457

35.5
6
317

38.8
8
514

41.9
2
317

45.6
4
168

48.2
8
988

27

11.8
0
759

12.8
7
850

14.5
7
338

16.1
5
140

18.1
1
390

21.7
4
940

26.3
3
634

31.5
2
841

36.7
4
122

40.1
1
327

43.1
9
451

46.9
6
294

49.6
4
492

28

12.4
6
134

13.5
6
471

15.3
0
786

16.9
2
788

18.9
3
924

22.6
5
716

27.3
3
623

32.6
2
049

37.9
1
592

41.3
3
714

44.4
6
079

48.2
7
824

50.9
9
338

29

13.1
2
115

14.2
5
645

16.0
4
707

17.7
0
837

19.7
6
774

23.5
6
659

28.3
3
613

33.7
1
091

39.0
8
747

42.5
5
697

45.7
2
229

49.5
8
788

52.3
3
562
To index

3
0

13.7
8
672

14.9
5
346

16.7
9
077

18.4
9
266

20.5
9
923

24.4
7
761

29.3
3
603

34.7
9
974

40.2
5
602

43.7
7
297

46.9
7
924

50.8
9
218

53.6
7
196

The table should include values for p=0.1 so that a one-tailed test can be conducted at the
p=0.05 level, but we never do such tests in my class, so why clutter up the table?

Page of 91

89

Page 90 of 91

REFERENCES
Bluman, A.G (2001). Elementary Statistics: A Step by Step Approach. New-York,
McGraw-Hill

Page of 91

90

Page 91 of 91
Clarke, G.M. and Cooke,D.(1991). A Basic Course in Statistics. London: Edward Arnold.
Mendenhall, W; beaver, R.J; Beaver, B.M(2006). Introduction to Probability and Statistics.
12th Edition, Thomson Books/Cole, Belmont.
Saiti F.G. (2003). Mathematics, Module 9: Statistics 11. Domasi College of Education.

Page of 91

91

Вам также может понравиться