Вы находитесь на странице: 1из 321

Let Me Expose Statistics and

Probability
For Intermediate
Part-I
First Edition

om
9

l.c
ai
gm



s@


t
ta
es


ze

 X
By

Zafar Ali
M.Sc. Statistics
PREFACE

om
l.c
With the grace of Almighty Allah, I feel pleasure in bringing
out First Edition of my book for Intermediate (Part-I) "Let me
ai
Expose Statistics and Probability ". This book is according to
gm
the new syllabus framed for all education al institutions in Khyber
Pakhtun Khwa.
s@

It is hoped that the book in its present form will be useful for
t
ta

the students. Any suggestions for improvement of the Book will be


welcome.
es
ze

First Edition: October 2013

Zafar Ali
0333-0314-9004086, 0345-9282215
Course Contents

om
Chapter # 01 The Basic Concepts of Statistics 1-20

Chapter # 02 Collection and Organization of Data 21-80

Chapter # 03
l.c
Measures of Central Tendency 81-126
ai
Measures of Dispersion, Moments, Skewness and
Chapter # 04 127-172
Kurtosis
gm

Chapter # 05 Index Numbers 173-200

Chapter # 06 Set Theory and Basic Probability 201-262


s@

Chapter # 07 Random Variables 263-292

Chapter # 08 Some Special Probability Distributions 293-315


t
ta
es
ze
ze
es
ta
ts@
gm
ai
l.c
om
CHAPTER 01
The Basic
Concepts of Statistics

om
Chapter Contents

l.c
ai
Y
gm
ou should read this chapter if you need to learn about:

 Information, Observation and Data: (P2)


s@

 Constant : (P3)
 Variable and its Types: (P3-P5)
 Individual, Population and Sample: (P6)
 Parameter and Statistic: (P7)
t
ta

 Sampling: (P8)
 Sampling with replacement and without replacement: (P8-P9)
es

 Frequency and Frequency Distribution: (P9-P11)


 Origin of Statistics: (P12)
Meaning of the word Statistics: (P13)
ze


 Definition of Statistics: (P13)
 Descriptive and Inferential Statistics: (P14)
 Functions of Statistics: (P14)
 Scope and Importance of Statistics in Different Fields: (P14-P15)
 Students study Statistics for several reasons: (P15)
 Summation Notations: (P16)
 Exercise: (P17-P20)

1
Chapter 01 The Basic Concepts of Statistics

Suppose your teacher asks some


questions in the class room:

 What is your Name?


 What is your height?
 What is your age?
 What is your favorite color?
 What is your class number/roll number?

Then the replies of these questions from the students are called information and

om
the recording, listing or observing a single piece of information by the teacher is
called as observation, and hence the collected observations are then collectively
called data.

Information l.c
ai
gm

“To know about something is known as information”

Observation
s@

“Any recording of information (numeric or non-numeric) is called observation”


t
ta

Data
es

“Originally collected observations are


collectively called data”.
ze

 Data of selected student’s Names: Ajmal, Arif and Ali etc.


 Data of selected student’s class numbers: 39, 56 and 47 etc.
 Data of selected student’s heights: 60”, 65” and 66”.
 Data of selected student’s ages: 19, 20 and 21 etc.
 Data of selected student’s favorite color: Red, Blue and Green etc.

Names Class No. Heights Age Color


Ajmal 39 60 19 Red
Arif 56 65 20 Blue
Ali 47 66 21 Green

2
Chapter 01 The Basic Concepts of Statistics

Constant

“A fixed quantity is called a constant”.

 π = 3.14
 e = 2.71 (called as Euler’s number) etc.

A constant is usually denoted by first letters of alphabets e.g. a, b or c.

om
Variable

l.c
“A characteristic, that can vary from one person or object to another, is called a variable”.
ai
gm

 Height and weight of a person


 Eye color of people
 Number of children in a family etc.
s@

Variables are usually denoted by the last letters of alphabets e.g. X, Y or Z.


t
ta
es

Types of Variable
ze

Qualitative Quantitative
Variable Variable

Discrete Continuous
Variable Variable

3
Chapter 01 The Basic Concepts of Statistics

Qualitative Variable

“A variable is qualitative if it can be expressed non-numerically”

 Color
 Religion
 Gender (Female and Male)

om
 Education level
 Grades of students in a class, etc.

l.c
Female Male
ai
Quantitative Variable
gm

Qualitative
“A variable is quantitative if it can be expressed numerically” variables are also
called attributes
s@

 Age
t


ta

Weight
 Height

es

Number of children in a family


 Number of deaths in an accident
 Speed
ze

 Income, etc. 50 kg 80 kg

Quantitative
Variable

Discrete Variable Continuous Variable

4
Chapter 01 The Basic Concepts of Statistics

Discrete Variable

“A quantitative variable is called discrete variable if it has counting


phenomena and there can be certain jump or gap between two
If you count to get the
possible values of the variable. Further it is free from the unit of
value of a variable, it is
measurement”.
discrete. If you measure
to get the value of the
variable, it is continuous.

om
 Family sizes When deciding whether

 No. of pages in a book a variable is discrete or


 No. of apples in a basket. continuous, ask yourself

l.c
 No. of deaths in an accident if it is counted or
 No. of housing units in different blocks of a colony
ai measured!
 No. of passengers carried by PIA in last ten years Further, a discrete
variable takes on values
gm

that are usually integers


or whole numbers, while
a continuous variable
s@

takes on values that are


Continuous Variable
real numbers.
t

“A quantitative variable is called continuous variable if it has


ta

measuring phenomena and there can be infinite number of values


es

between two possible values of the variable. Further it has the unit
of measurement”
ze

Sometimes, the values of


variables such as age,
height, and weight are
 Students heights, ages, weights
 Speed of a car usually rounded to the

 Temperature of a place nearest year, inch, or

 Income of a family pound. However, these


 The amount of milk given by a cow values represent
 The life time of a TV tube measured data, so they
 Fortnightly petrol prices are continuous variables.

5
Chapter 01 The Basic Concepts of Statistics

Individual

“An element from which information may be collected is called an individual”.

Population

“An aggregate of individuals is called population”.

A population can either be finite or infinite depending

om
upon whether it contains a countable or an
uncountable numbers of units.

l.c
Population

 All students in a college (Finite)


ai
 The population of all licensed motor drivers (Finite)

gm
The population of all houses in a country (Finite)
 The population of all points on a line (Infinite)
 The population of stars in the sky (Infinite) etc.
s@

The term population does not mean only the human population; it refers to a collection of
t

measurements on individuals or objects having some common characteristics. The objects


ta

may be concrete (physical) things like the motor cars of a particular type produced by a
es

company, wheat produced in a large farm or they may be abstract (theoretical) things like
the opinions of students about an examination system.
ze

Sample

“A representative part which we select from a population is


called a sample”.
Sample

 Runs scored by a batsman in tests, in the last one year, is a sample of his whole
career scores.
 A few drops of blood, is a sample of the blood containing in the whole body of a
person.

6
Chapter 01 The Basic Concepts of Statistics

Parameter

“Any numerical value (mean, variance or standard deviation, etc.) describing a characteristic of a
population is called parameter”. OR

“The numerical value such as mean, variance or standard deviation etc. computed from population
data is called parameter”.

 Greek letters are

om
  2
used for Parameters

l.c
ai
gm

Statistic

“Any numerical value (mean, variance or standard deviation, etc.) The words Population
s@

describing a characteristic of a sample is called statistic”. OR and Parameter both


start from the letter “P”

“The numerical value such as mean, variance or standard deviation and the words Sample
t
ta

etc. computed from sample data is called statistic”. and Statistic both start
from the letter “S”.
es

Roman letters are


S
ze

2 used for Statistics


X S

 (meu)

A parameter is a fixed value while statistic is a variable because it varies from sample to
sample. It is also to be noted that a parameter is usually denoted by a Greek letter and a
statistic is usually denoted by a Roman letter. For example, the population mean is denoted
by μ while the sample mean is denoted by x . Similarly, the standard deviation of a
population is denoted by  while the sample standard deviation is denoted by S.

7
Chapter 01 The Basic Concepts of Statistics

Sampling

“The process of selecting a sample from a population such that the sample selected has the
characteristics of the whole population is called sampling”.

Sampling

om
l.c
Population
Sample
ai
gm

 A teacher judge performance of his students just


by asking few questions.
s@

 If someone decides taste of the food by tasting a


little bit of the food.

 In medical science a few drops of blood are


t

taken and tested to know whether the blood


ta

contain some abnormality or not.


es

Sampling with Replacement


ze

“If the sampling unit selected is returned to the population before


drawing the next sampling unit, then sampling is said to be
with replacement”.

In with replacement sampling:

 The sampling unit can be selected more than once.


 The population will be considered infinite.
 The number of samples of size “n” that could be drawn with
replacement from a population of size “N” will be equal to (N)n.
 The sampling units will be independent.
Sample

8
Chapter 01 The Basic Concepts of Statistics

Sampling without Replacement

“If the sampling unit selected is not returned to the population before
drawing the next sampling unit, then sampling is said to be
with out replacement”.

In without replacement sampling:

om
The sampling unit can be selected only once.
 The population will be considered finite. Sample
 The number of samples of size “n” that could be drawn with
out replacement from a population of size “N” will be equal to NCn.

l.c
 The sampling units will be dependent. ai
gm

A finite population from which sampling is done with replacement can theoretically be
considered infinite because any number of samples can be drawn without exhausting
(finishing) the population.
t s@
ta

Frequency
es

“The number of occurrences of a particular observation in a data is called frequency”.


OR
ze

“The number of observations falling in a particular group (class) is called frequency”.

cup frequency

9
Chapter 01 The Basic Concepts of Statistics

Frequency Distribution

“The organization of raw data in table form, along with frequencies


is called frequency distribution”.
 The data in the form
The types of frequency distributions that will be considered here are: of frequency
distribution is called
grouped data.
Frequency Distribution
 The purpose of a

om
frequency distribution
is to produce a
meaningful pattern for

l.c
Categorical Frequency Grouped Frequency
Distribution Distribution the overall distribution
of the data from which
ai
Ungrouped Frequency conclusions can be
gm
Distribution drawn.
s@

 A categorical frequency distribution represents Categorical frequency Distribution


data that can be placed in different categories (Categorical frequency Table)
t

Blood Group No. of students (f)


ta

such as gender, hair color, blood group etc. along


A 5
with their frequencies. The categorical frequency B 8
es

distribution is also called frequency table. O 4


ze

 An ungrouped frequency distribution simply Ungrouped frequency Distribution


(Discrete Grouped Data)
lists the data values with the corresponding
Marks No. of students (f)
frequencies. The ungrouped frequency distribution 50 4
is also called discrete grouped data. 60 6
45 3

 A grouped frequency distribution is obtained Grouped frequency Distribution


by constructing classes (or intervals) for the data (Continuous Grouped Data)
values along with corresponding frequencies.
The Height No. of students (f)
3–4 7
grouped frequency distribution is also called
5–6 30
continuous grouped data.
7–8 6

10
Chapter 01 The Basic Concepts of Statistics

Class limits f Mid Points


Hi I am the mid point
60 – 61 2 60.5
of the first class!!!
62 – 63 8 62.5
64 – 65 11 64.5
66 – 67 6 66.5
68 – 69 5 68.5
70 – 71 5 70.5 60+61
Mid points =  60.5
2

om
72 – 73 1 72.5
Total 38 --

l.c
 C.F of the first class is taken equal to the
frequency of that class
 For the other classes C.F are obtained by
ai
Class limits f C.F adding each class cumulative frequency
60 – 61 to the frequency of the next class.
gm
2 2
62 – 63 8 8 + 2 =10
64 – 65 11 11 + 10 = 21
s@

66 – 67 6 6 + 21 = 27
68 – 69 5 5 + 27 = 32
70 – 71 5 5 + 32 = 37
t
ta

72 – 73 1 1 + 37 = 38
Total 38 --
Cumulative Frequency
es
ze

Class Subtract upper limit of the first class


Class limits f boundaries from the lower limit of the second class
60 – 61 2 59.5 – 61.5 and divide it by “2” then subtract and
62 – 63 8 61.5 – 63.5 add the resultant value from the lower
and upper limits respectively!!
64 – 65 11 63.5 – 65.5
66 – 67 6 65.5 – 67.5
68 – 69 5 67.5 – 69.5
70 – 71 5 69.5 – 71.5
72 – 73 1 71.5 – 73.5
Total 38 --

Class boundaries

11
Chapter 01 The Basic Concepts of Statistics

Origin of Statistics!!!

The word statistics has been derived from the


Latin word “status” or an Italian word
“statistia” or German word “statistik”
meaning each word is an organized political
state. It was born as the Science of Kings. It
had its origin in the needs of the
administrators in the ancient days for
Pascal collecting and maintaining quantitative
Sir Sinclair
information about their population wealth and

om
armaments (weaponry used by military). With
the passage of time this word changed its
shape and now is used as “statistics”.

l.c
The word “statistik” was first used by
Gottfried Achenwall (1719-1772). Dr.
ai
Zimmerman (1787) introduced the word
Chebyshev
gm
Statistics into England. Its use was Karl Pearson
popularized by Sir John Sinclair (1754-
1835) in the 1798 publication of his book on
a statistical account of Scotland.
s@

For the last few centuries, considerable


interest had been developed for collection and
analysis of statistical data. Adolf Quetelet
t
ta

(1796-1874) applied statistical methods in the


Bernoulli
field of education and sociology.
es

Quetelet
Outstanding contributions was also made by
Pascal (1623-1662), Bernoulli (1654-1705),
ze

Gauss (1777-1855), Chebyshev (1821-


1894), Francis Galton (1822-1911), Karl
Pearson (1857-1936), William Sealy Gosset
(1876-1937), R.A Fisher (1890-1962), Jerzy
Neyman (1894-1981) ,Wald (1902-1950),
Gosset John Tukey (1915-2000) and many others.
Francis Galton

Gauss Neyman Fisher Wald John Tukey

12
Chapter 01 The Basic Concepts of Statistics

Meaning of the word Statistics

The word statistics is generally used in three different meanings:

 Firstly, the word statistics refers to "numerical facts systematically arranged


with a definite purpose in view". In this sense, the word statistics is always used
in the plural e.g. statistics of prices, statistics of road accidents, statistics of
crimes, statistics of births, statistics of educational institutions etc.

om
 Secondly, the word statistics is defined as “the procedures and techniques used
to collect, process and analyze numerical data to make inferences and to
reach decisions in the face of uncertainty". In this sense, the word statistics is

l.c
used in the singular.


ai
Thirdly, the word statistics are “numerical quantities calculated from sample
observations". The word statistics is plural when used in this sense. The mean,
gm

median, mode, etc. calculated from sample observations are the examples in this
sense.
s@

Definition of Statistics
t
ta
es

 “Statistics is the study of the principles


and methods applied in the collection, A lot of work to do
ze

summarization and description of with the data!!!

numerical data. Further it deals with the


procedures of making inferences about the
characteristics of a population on the basis
of a sample taken from the same
population”.

 “The science, which enables us to draw


conclusion about various phenomena of
the real life data (collected on sample
basis) is called statistics”.

13
Chapter 01 The Basic Concepts of Statistics

Descriptive Statistics

“Descriptive statistics deals with the concepts and methods


concerned with the collection, summarization and description of
numerical data”.
In descriptive statistics
By summarization we mean the classification of data, tabulation and we deal with:
their graphical displays; while the description is the computation of a
few numerical quantities i.e. measure of central tendency, measure of  Collection

om
dispersion, moments, skewness and kurtosis etc.  Classification
 Tabulation
 Graphical displays
Inferential Statistics

l.c
 Numerical quantities

“Inferential statistics deals with the procedures of making inferences


ai In inferential statistics
(conclusion) about the characteristics of a population on the basis of a we deal with:
gm

sample taken from the same population”.


 Estimation
This category consists of estimation of population parameters and testing  Hypothesis testing
s@

of hypothesis.

Functions of Statistics
t
ta
es

 The complex mass of data is made simple and understandable with the help of
statistical methods.
ze

 To study relationship between two or more phenomena statistical methods are


used.
 Statistics helps in formulating policies in different fields.
 Statistical methods are highly useful tools for forecasting.
 Statistics helps in decision making in the face of uncertainty.
 One important function of Statistics is to provide techniques for making
comparisons.

Scope and Importance of Statistics in Different Fields

In the ancient times the scope of statistics was limited. Census of population and wealth was conducted
in those days to determine the strength of manpower and material wealth for the purpose of wars. That is
why it was called the science of king.

14
Chapter 01 The Basic Concepts of Statistics

With the passage of time the scope of statistics became wider and wider. With the development of the
theory of probability, insurance companies were benefited. Thus the statistical methods began to be used
in other sciences.

 Statistics plays an important role in


business.
 The whole structure of insurance is
based on statistics.
 The banks make use of statistics while
framing their policies.

om
 Statistical data are now widely used in
taking all administrative decision.
 Statistics has a vast use in Economics,

l.c
Management, Industry, Transport,
Communication, Physics, Chemistry, ai
Zoology, Agriculture, Health, Atomic Energy, Petroleum, Medicine, Astronomy
and many more.
gm

Now-a-days the science of statistics has shown it's worth so much so that there is hardly any field in
s@

which its need is not felt.

Students study statistics for several reasons!!!


t
ta


es

Like professional people, you must be able to read and understand the various
statistical studies performed in your fields. To have this understanding, you must
be knowledgeable about the vocabulary, symbols, concepts, and statistical
ze

procedures used in these studies.

 You may be called on to conduct research in your field, since statistical


procedures are basic to research. To accomplish this, you must be able to design
experiments; collect, organize, analyze, and summarize data; and possibly make
reliable predictions or forecasts for future use. You must also be able to
communicate the results of the study in your own words.

 You can also use the knowledge gained from studying statistics to become better
consumers and citizens. For example, you can make intelligent decisions about
what products to purchase based on consumer studies, about government
spending based on utilization studies, and so on.

15
Chapter 01 The Basic Concepts of Statistics

Summation Notation

Suppose the heights of some students are 54”, 58”, 64”,…, 57”.
We can denote the height of the:

 First student by X1
 Second student by X2
 Last or nth student by Xn.

We can use the symbol Xi to denote any of the heights, where i = 1, 2,…,n.

om
n
Now the sum of the values X1, X2, …,Xn , i.e. X1+ X2+ …+Xn is denoted by  X , where
i=1
i the symbol

 (capital sigma) is a Greek letter and denotes sum.

l.c
Consider the following examples:
ai
gm

4
 X1 + X2 + X3 + X4 =  Xi
i=1
s@

3
 X1Y1 + X2Y2 + X3Y3 =  XY
i i
i=1
4
 X12 + X2 2 + X3 2 + X4 2 =  Xi 2
t
ta

i=1
2
 4 
    Xi 
2

es

X1 + X2 + X3 + X4 =
 i=1 
n
  a = a + a +....+ a = na
ze

i 1
4
 aX1 + aX2 + aX3 + aX4 = a  Xi
i=1
n
 (X1 - a)+(X2 - a)+....+(Xn - a)=  (Xi - a)
i=1
n
 (X1 - a)2 +(X2 - a)2 +....+(Xn - a)2 = (Xi - a)2
i=1
2
 n 
(X1 - a)+(X2 - a)+....+(Xn - a) = (Xi - a)
2

 i=1 

16
Chapter 01 The Basic Concepts of Statistics

Sharpen your Pencil


MCQ’s

(1) To know about something is known as ______

(A) Information (B) Data (C) Observation (D) None of these

om
(2) A constant can assume ______ value.

(A) 0 (B) single (C) two (D) None of these

l.c
(3) Life of a T.V tube is a _____ variable ai
(A) Discrete (B) Continuous (C) Qualitative (D) None of these
gm

(4) Color of hair is a _____ variable.

(A) Qualitative (B) Quantitative (C) Continuous (D) None of these


s@

(5) The number of flowers on a plant is a ______ variable.


t

(A) Discrete (B) Continuous (C) Qualitative (D) None of these


ta

6) The amount of milk given by a cow is _____ variable.


es

(A) Discrete (B) Continuous (C) Qualitative (D) None of these


ze

(7) The number of accidents is a _____ variable.

(A) Discrete (B) Continuous (C) Qualitative (D) None of these

(8) A discrete variable is _____

(A) Not unit free (B) unit free (C) both A & B (D) None of these

(9) The number of branches on different trees is a _____ variable.

(A) Discrete (B) Continuous (C) Qualitative (D) None of these

(10) A small part taken from a population is called a _____

(A) Population (B) Sample (C) Observation (D) None of these

17
Chapter 01 The Basic Concepts of Statistics

Sharpen your Pencil


MCQ’s

(11) The word statistics was first used by _____

(A) Newton (B) Einstein (C) Achenwall (D) None of these

om
(12) The statistik is _____ word

(A) Italian (B) Latin (C) German (D) None of these

l.c
(13) The quantity computed from population is called _____ai
(A) parameter (B) statistic (C) observation (D) None of these
gm

(14) The quantity computed from sample is called _____

(A) parameter (B) statistic (C) observation (D) None of these


s@

(15) The word Statistics has _____ different meaning.


t

(A) one (B) two (C) three (D) None of these


ta

(16) Atmospheric pressure at a certain place is _____ variable.


es

(A) Discrete (B) Continuous (C) Qualitative (D) None of these


ze

18
Chapter 01 The Basic Concepts of Statistics

Short Questions
ExeRciSe

Q.1.01. What is the difference between parameter and statistic?

Q.1.02. What is variable? Differentiate between discrete and continuous variables.


Give examples.

om
Q.1.03. What is the difference between variable and a constant?

l.c
Q.1.04. Differentiate between descriptive and inferential statistics.

Q.1.05. Define population and sample? Give examples.


ai
Q.1.06. Differentiate between quantitative and qualitative variables.
gm

Q.1.07. Define Sampling. Give examples.


s@

Q.1.08. Differentiate between sampling with and without replacement.

Q.1.09. Why is a course in Statistics important to you as a student?


t
ta
es
ze

19
Chapter 01 The Basic Concepts of Statistics

om
Hi!!!

l.c
 I am scientific calculator.
 I will help you in each and ai
every problem of statistics.
 But you should need to learn
gm
most of my functions.
 So, let’s make friendship!!!
t s@
ta
es
ze

20
CHAPTER 02
Collection and
Organization of Data

om
Chapter Contents

l.c
ai
Y
gm
ou should read this chapter if you need to learn about:

 Data: (P22)
s@

 Types of Data by Source: (P22-P24)


 Types of Data by Nature: : (P24-P26)
 Organization of Data: (P26)
 Classification of Data: (P26-P27)
t
ta

 Tabulation: (P27)
 Frequency and Frequency Distribution: (P28-P29)
es

 Construction of frequency Distribution for Qualitative Data: (P30)


 Construction of Ungrouped Frequency Distribution for
Quantitative Data: (P31-P32)
ze

 Some important points in Grouped Frequency Distribution: (P32-P34)


 Construction of Grouped Frequency Distribution for Quantitative
Data: (P35-P37)
 Relative Frequency Distribution: (P38)
 Percentage Relative Frequency Distribution: (P38)
 Cumulative Frequency: (P39)
 Cumulative Frequency Distribution and its Types: (P30-P40)
 Relative Cumulative Frequency Distribution: (P41)
 Percentage Relative Cumulative Frequency Distribution: (P41)
 Diagrams: Bar Chart, Pie Chart etc.: (P42-P54)
 Graphs: Histogram, Frequency Polygon etc.: (P55-P73)
 False Base line or the Broken line: (P74)
 Exercise: (P76-P80)

21
Chapter 02 Collection and Organization of Data

Data

“Originally collected
observations are collectively
called data”.

om
 Data of selected student’s Names: Ajmal, Arif and Ali etc.
 Data of selected student’s class numbers: 39, 56 and 47 etc.

l.c
 Data of selected student’s heights: 60”, 65” and 66”.
 Data of selected student’s ages: 19, 20 and 21 etc.
ai
 Data of selected student’s favorite color: Red, Blue and Green etc.
gm

Names Class No. Heights Age Color


Ajmal 39 60 19 Red
s@

Arif 56 65 20 Blue
Ali 47 66 21 Green
t
ta

Types of Data by Source


es

 Primary Data
 Secondary Data
ze

Types of Data by
Source

Primary Data Secondary Data

Primary Data

“The data that have been originally collected and have not undergone any sort of statistical
treatment are called primary data. In other words, a fresh data is called a primary data”.

22
Chapter 02 Collection and Organization of Data

Thus the primary data are the first hand information collected for a certain purpose.

 For example, the data in the Population Census Reports are primary because these
are originally collected by the Population Census Organization.

Methods of Collection of Primary Data

Following methods are used for collection of Primary data:

om
 Direct Personal Observation: In this method, the investigator collects the

l.c
information directly from the source concerned. The investigator must be
qualified and experienced in related field of study and should put simple questions
ai
in a simple language, which could be answered easily.
gm
 Indirect Personal Investigations: Sometimes, the informants would not either
disclose the facts at all or would give wrong information. For example, the
businessmen do not disclose their true incomes to the income tax authorities. In
s@

such a situation, information is collected from the third party.

 Registration: In this method, the information is reported to the appropriate


authority. For example, the births and deaths are registered with the Municipal
t
ta

Committee or Corporation in urban areas and the Union Council in rural areas.


es

Estimates through Local Correspondents: There is no formal collection of data


in this method. Local agents or correspondents send the required information
using their own judgments. It is a timesaving method and does not cast much. It
ze

is, however, a subjective method and gives only the estimates.

 Investigation through Enumerators: In this method information is collected


through trained enumerators. The investigators get the forms of inquiry (called
schedules) filled in from the informants. They help the informants in filling in the
schedules correctly. This method is considered to be very accurate and
timesaving. It is used in large-scale government inquiries like Population Census.

 Mailed Questionnaire Method: In this method, questionnaires along with a


letter of request are sent by mail to the informants. The informants fill in the
questionnaires and return them to the investigator. This method is very cheap and
timesaving. But sometimes, the questionnaires are returned incomplete or full of
errors.

23
Chapter 02 Collection and Organization of Data

Secondary Data

“The data that have undergone any sort of treatment by statistical methods, at least once i.e. the
data have been collected, classified, tabulated or presented in some form for a certain purpose, are
called secondary data”.

 For example, the data in the Economic Survey of Pakistan are secondary because
the Federal Bureau of Statistics, the State Bank of Pakistan, the Central Board of
Revenue, etc. originally collect these.

om
Methods of Collection of Secondary Data

Secondary data can be obtained from the following sources:


l.c
ai
gm

 International Publications: e.g. the publication of World Banks, I.M.F, UNESCO,


UNO, ILO, etc.
 Official (Government) Sources: e.g. publication of Federal Bureau of Statistics,
s@

Ministries of Agriculture, Finance, Communications and Railways, Provincial


Bureaus of Statistics and Provincial Department of Agriculture, Health and
Education.

t

Semi-official (Semi- Government) Sources: e.g. publications of State Bank of


ta

Pakistan, Central Cotton Committee, Economic Research Institutes, District


Councils, Municipal Committees, WAPDA, P.I.D.C, etc.
es

 Private (Non-Government) Sources: e.g. publications of Trade Associations,


Chambers of Commerce and Industry, Market Committees, etc.

ze

Publications of Research Organizations: e.g. Punjab University Institute of


Education and Research, Irrigation Research Institute, Punjab University Social
Sciences Research Center, Pakistan Institute of Development Economics, etc.

Types of Data by Nature

 Qualitative Data
 Quantitative Data Types of Data by
Nature

Qualitative Quantitative
Data Data

24
Chapter 02 Collection and Organization of Data

Qualitative Data

“Data collected by a qualitative variable is called qualitative data”

 Color
 Religion
 Gender (Female and Male) Karl Pearson

om
 Education level
 Grades of students in a class etc.
Female Male

l.c
ai
gm

Quantitative Data
Francis Galton
“Data collected by a quantitative variable is called quantitative data”
s@

When data were


first analyzed
t

statistically by
ta

 Age Karl Pearson and



es

Weight Francis Galton,


 Height almost all were
 Speed continuous data.
ze

 Income
50 kg 80 kg In 1899, Pearson
 Number of children in a family
began to analyze
 Number of deaths in an accident etc.
discrete data.
Pearson found

Quantitative that some data,

Data such as eye color,


could not be
measured, so he
termed such data

Discrete Continuous as qualitative data.

Data Data

25
Chapter 02 Collection and Organization of Data

Discrete Data

“Data collected by a Discrete variable is called discrete data”

 Family sizes
 No. of pages in a book
 No. of apples in a basket.
 No. of deaths in an accident
 No. of shares sold every day in the stock market.

om
 No. of housing units in different blocks of a colony
 No. of passengers carried by PIA in last ten years

Continuous Data
l.c
ai
“Data collected by a Continuous variable is called Continuous data”
gm

 Students heights, ages, weights


s@

 Speed of a car
 Temperature of a place
 Income of a family
t

 The amount of milk given by a cow


ta

 The life time of a TV tube


 Fortnightly petrol prices
es
ze

Organization of Data

To describe situations, draw conclusions, or make inferences about events, the researcher must organize
the data in some meaningful way. Thus “Organization of data means reformatting the collected data
in more understandable form” The most convenient method of organizing data is classification,
tabulation (frequency distributions) and constructing statistical diagrams and graphs.

Classification

“The process of arranging data into classes or categories according to some common characteristic
present in the data is called classification”.

26
Chapter 02 Collection and Organization of Data

Objectives of Classification

The main objectives of classification are:

 To bring out points of similarity and dissimilarity.


 By condensing the details it saves one from mental strain.
 This enables one to make comparisons and draw inferences simply.
 It prepares the ground for the proper presentation of statistical facts.

om
Types of
Classification

l.c
ai
Descriptive Numerical
gm
s@

Descriptive Classification The two separate


headings “Classification”
“When the data are classified on the basis of qualities or attributes, and “Tabulation” should
t

which are incapable of quantitative measurement, then the not lead the readers to
ta

classification is said to be descriptive”e.g. gender, marital status, assume that these are
es

educational standard, etc. Descriptive classification is also called two distinct processes.
classification according to attributes. Infect, they go together,
classification is the first
ze

step in tabulation. Before


Numerical Classification the data are put in
tabular form it has to be
“When the data are classified on the basis of quantitative
classified in different
measurements, then the classification is said to be Numerical” e.g. classes or groups having
age, income, height, weights, etc. common characteristics.
After this step the data
are displayed under
Tabulation different columns and
rows so that their
“A table is a systematic arrangement of data into vertical columns relationship can be easily
and horizontal rows. Thus the process of arranging data into rows understood.
and columns is called tabulation”.

27
Chapter 02 Collection and Organization of Data

Frequency

“The number of occurrences of a particular observation in a data is called frequency”.


OR
“The number of observations falling in a particular group (class) is called frequency”.

cup frequency

om
4

l.c
ai
3
gm
s@

Frequency Distribution

“The organization of raw data in table form, along with frequencies is called frequency distribution”.
t
ta

The types of frequency distributions that will be considered here are:


es

 Categorical frequency distribution


 Ungrouped frequency distribution
ze

 Grouped frequency distribution

Frequency Distribution

Categorical Frequency Grouped Frequency


Distribution Distribution

Ungrouped Frequency
Distribution

28
Chapter 02 Collection and Organization of Data

 A categorical frequency distribution represents data that can be placed in different


categories such as gender, hair color, blood group etc. along with their frequencies. The
categorical frequency distribution is also called frequency table. A categorical frequency
distribution of the students blood group is given below:

Categorical frequency Distribution


(Categorical frequency Table)
Blood Group No. of students (f)
A 5

om
B 8
O 4

l.c
 An ungrouped frequency distribution simply lists the data values with the corresponding
frequencies.
ai
The ungrouped frequency distribution is also called discrete grouped data. An
ungrouped frequency distribution of the students marks is given below:
gm

Ungrouped frequency Distribution


s@

(Discrete Grouped Data)


Marks No. of students (f)
50 4
60 6
t

 The data in the form


ta

45 3
of frequency
distribution is called
es

 A grouped frequency distribution is obtained by grouped data.

constructing classes (or intervals) for the data values along


ze

 The purpose of a
with corresponding frequencies. The grouped frequency frequency distribution
distribution is also called continuous grouped data. A grouped is to produce a
frequency distribution of the heights of student is given below:
meaningful pattern for
the overall distribution
of the data from which
Grouped frequency Distribution conclusions can be
(Continuous Grouped Data) drawn.
Height No. of students (f)
3–4 7
4–5 30
5–6 6

29
Chapter 02 Collection and Organization of Data

Construction of frequency distribution (or frequency table) for


Categorical (or Qualitative data)

Suppose that in all there are 625 students of first year in a large college. Suppose some of these students
have come from Urdu medium schools and the other has come from English medium schools. If we
interview the students about their schooling, we will get the observations as follows:

U, U, E, U, E, E, E, U …

(U: URDU MEDIUM) (E: ENGLISH MEDIUM)

om
Now the frequency table of the “medium of institution” is given as
follows:
Medium of No. of

l.c
institution Students (f)
Urdu 400 ai
English 225
Total 625 We must decide how
gm
many categories or
This frequency table is called univariate frequency table because it is classes to use. These
constructed for one categorical variable i.e. the medium of institution. categories must be
s@

chosen so as to
Now suppose that along with the Medium of Institution, you are also
recording the gender of the student i.e. accommodate all the
data and so that no
t

Student item is placed under


ta

Medium Gender
No. more than one category.
1 E F The concepts of class
es

2 E M limits, class boundaries,


3 U F
and class marks are of
ze

. . .
no concern when
. . .
constructing frequency
Then the frequency table of the “medium of institution” and the “Gender distribution using
of the students” is given as follows: categorical data.

Medium of Gender
Institute Total
Male Female
Urdu 300 100 400
English 100 125 225
Total 400 225 625

This frequency table is called bivariate frequency table because it is constructed for two categorical
variables i.e. the medium of institution and the gender of the students

30
Chapter 02 Collection and Organization of Data

 When the data are sorted according to one criterion only is called one-way classification
e.g. the student classified by the medium of institution. Tabulation in this case is called
one-way tabulation.
 When the data are sorted according to two criteria is called two-way classification e.g.
the student classified by the medium of institution and their gender. Tabulation in this
case is called tw0-way tabulation.
 When the data are sorted according to three criteria is called three-way classification
e.g. the student classified by the medium of institution, their gender and their residence.
Tabulation in this case is called three-way tabulation.

om
 When the data are sorted according to many criteria is called manifold classification.
Tabulation in this case is called complex tabulation.

l.c
ai
Construction of Ungrouped Frequency Distribution
gm

(Discrete Grouped Data)

The following steps are used for constructing an ungrouped frequency distribution:
s@

Step 1:First step is to denote the variable by X and then make a column of the X
t

values that are in our data.


ta

Step 2:Second step is to construct two more columns that are adjacent to the
column of X. The first of these two columns is for tally marks and the
es

second for frequency.


Step 3:Third step is to sum the frequency column and check with the total
number of observations.
ze

EXAMPLE 2.01

The following are the number of flowers on different branches of a plant:

2 4 6 1 3 3 5 7 8 6 2 9
4 7 4 2 1 3 6 4 2 5 1 4
7 9 1 2 10 1 8 9 2 3 8 2
1 2 3 4 4 4 6 6 5 5 6 1
4 5 8 5 4 3 3 2 5 0 9 1
5 9 8 10 0 10 10 -- -- -- -- --

31
Chapter 02 Collection and Organization of Data

Solution  The variable involved is “no. of flowers” which is discrete.


 Therefore the ungrouped frequency distribution for this data is:

X (no. of flowers) Tally f


0 II 2
1 IIII III 8
2 IIII IIII 9
3 IIII II 7
4 IIII IIII 10
5 8

om
IIII III
6 7
IIII II
7 3
III

l.c
8 IIII 4
9 IIII
ai 5
10 IIII 4
Total -- 67
gm

Test Yourself
s@

The following are the number of flowers on different branches of a plant. Construct Frequency
Distribution.
t

12 14 16 11 13 13 15 17 18 16 12 19
ta

14 17 14 12 11 13 16 14 12 15 11 14
17 19 11 12 20 11 18 19 12 13 18 12
es

11 12 13 14 14 14 16 16 15 15 16 11
14 15 18 15 14 13 13 12 15 10 19 11
ze

15 19 18 20 10 20 20 -- -- -- -- --

Some important Points in a Grouped Frequency Distribution

Class Interval (Class)

In the following table each of the groups (110-119), (120-129) and (130,139) is called a class
interval (or class).
Classes f
110 - 119 1
120 - 129 3
130 - 139 2

32
Chapter 02 Collection and Organization of Data

Class Limits

“The smaller and larger number, which describe the class interval, are called the class limits”

 The smaller number is the lower class limit and the larger number is the Classes
upper class limit. Class limit should be well defined and there should be 110 - 119
no overlapping. In other words the limits should be inclusive i.e. the 120 - 129
values corresponding exactly to the lower limit or the upper limit be 130 - 139
included in that class. 140 - 149

om
 Sometimes classes are taken as given in the table: In such a case, it is Classes
difficult to decide where to place an item, which is exactly 120, 130, 140, 110 - 120
etc. because each one of them seems to belong to two classes. Such 120 - 130

l.c
overlapping class limits should, therefore, be avoided. 130 - 140
ai 140 - 150

 Some times a class has either no lower class limit or no upper class Classes
gm

limit such a class is called an open-end class. As given in the table: Below (under) 15
15 - 19
It is clear from the above table that in the class “Below 15” there is no 20 - 24
s@

lower class limit and in the class “40 and over” there is no upper class 25 - 29
limit. 30 - 34
35 and over (above)
t
ta

Arithmetic mean, harmonic mean and geometric mean cannot be computed from an open-
es

end frequency distribution, because the midpoints of the open-end classes cannot be
determined. Therefore it is a bad practice to use open-end classes.
ze

Class Boundaries

“The precise (true) numbers, which remove the discontinuity between two classes, are called class
boundaries or true class limits”

 A class boundary is located halfway between the upper limit of a class and the lower limit of the
next higher class.

Classes Class boundaries


110 - 119 109.5 - 119.5
120 - 129 119.5 - 129.5
130 - 139 129.5 - 139.5

33
Chapter 02 Collection and Organization of Data

 If the classes are in the form:

Classes
110 - 120
120 - 130
130 - 140

Then in this case the class limits are the class boundaries because there is no discontinuity
between two classes.

 Sometimes:

om
Classes Class boundaries
Below (under) 15 Up to 14.5

l.c
15 - 19 14.5 - 19.5
20 - 24 19.5 - 24.5
ai
25 - 29 24.5 - 29.5
30 - 34 29.5 - 34.5
gm
35 - 39 34.5 - 39.5
40 and over (above) 39.5 and over
s@

Class Mark or Mid Point

“The number, which divides each class into two equal parts, is called class mark”
t
ta

It can be obtained by dividing either the sum of the lower and upper
es

limits of a class or the sum of the lower and upper class boundaries of the
class by 2.
ze

Class Width (Class size)


The class width may or
may not be equal for all
“The difference between the lower class limits of two consecutive
the classes. If the class
classes is called the class width”. OR
width is equal for all the
classes then it is called
“The difference between the upper class boundary and the lower class “common width”. In
boundary of a particular class is called the class width”. practice it is desirable to
have equal class widths
whenever possible.
The width (or size) of the class intervals is denoted by “h”.

34
Chapter 02 Collection and Organization of Data

Construction of Grouped Frequency Distribution


(Continuous Grouped Data)

The following steps are used for constructing a grouped frequency


distribution:

Step 1:First step is to decide the number of classes. For


H.A Sturges has
this purpose there are no hard and fast rules but
proposed an empirical

om
statistical experience tells us that no less than 5 and
no more than 20 classes are generally used. rule for determining the
number of classes into
Rule: If 2  N then, we take “k” classes.
k
which a set of

l.c
observations should be
Where “N” is the total number of observations and
grouped. The rule is:
“k” is the number of classes.
ai
k = 1+ 3.3 log N
gm

Step 2:Second step is to determine the range of variation


in the data i.e. Where k denotes the
number of classes
s@

R = Xm - Xo
N is the total number of

where R is the range, observation.


Xm is the largest value and
t
ta

X0 is the smallest value.


es

Step 3:Third step is to determine the approximate width


(size) of the class by dividing the range (R) of
ze

variation by the number of classes (k).

Step 4:Fourth step is to decide where to locate the lower


class limit of the lowest class. The lowest class A frequency distribution
usually starts with the smallest data value or a should have a minimum
number less than it (will be better if it is a multiple
of 5 and maximum of
of class width).
20 classes. For small

Step 5:Fifth step is to list all the classes and class data, use between 5 and
boundaries. 10 classes. For large
data, use up to 20
Step 6:Sixth step is to distribute the data into the classes.
appropriate classes by using a Tally-column.

Step 7:Seventh step is to complete the frequency column.

35
Chapter 02 Collection and Organization of Data

EXAMPLE 2.02

The following data indicates number of people in different locality:

20 50 60 70 35 45 39
61 74 80 25 30 39 40
58 60 67 71 81 82 85
86 80 94 89 56 58 40
45 56 63 72 79 40 18

 The variable of given data is “number of people” which is discrete.

om
Solution
 Range = 94 – 18 = 76.
 Approximate no. of classes are: 2k  N
2k  N  26  35  k = 6 (N = total no. of observations)

 Class Width = Range/k = 76/6 = 12.67  13


l.c
ai
 Hence the grouped frequency distribution is:
In calculating the class-
gm

Classes Tally f width of a frequency


18 – 30 IIII 4 distribution, use the next
31 – 43 IIII II 6 whole number as the
s@

44 – 56 IIII 5 class-width. Doing this

57 – 69 7 ensures that you will


IIII II
70 – 84 9 have enough space in
t

IIII IIII your frequency


ta

83 – 97 IIII 4
distribution for all the
Total -- 35
es

data values.
Alternate Method
ze

 We may take approximate desired no. of classes = 8 (assumed)


 Class Width = Range/k = 76/8 = 9.5  10
 Hence the grouped frequency distribution is:

Classes Tally f
15 – 24 II 2
25 – 34 II 2
35 – 44 IIII I 6
45 – 54 III 3 There is no need of Class

55 – 64 IIII III 8 boundaries because the

65 – 74 5 variable is discrete in
IIII
75 – 84 5 this case.
IIII
85 – 94 4
IIII
Total -- 35

36
Chapter 02 Collection and Organization of Data

EXAMPLE 2.03

The following data relate to heights of 1st year students (heights in inches):

62 67 65 64 70 70 66 64 63 65
66 68 71 60 64 63 62 64 63 65
66 70 71 72 69 68 62 65 64 62
68 67 65 60 69 64 66 63 -- --

Solution  The involved variable is “height” which is continuous variable.

om
 Range = 72 – 60 = 12
 No. of classes = 7 (assumed)
 Class Width =12/7 =1.714  2

l.c
 Hence the grouped frequency distribution is:
ai
Here the class limits and
Classes Tally f class boundaries are the
gm

60 – 62 II 2
same; but it is difficult
62 – 64 IIII IIII 9 to decide where to place
64 – 66 IIII IIII 10
s@

an item which is exactly


66 – 68 IIII I 6 62, 64, and 66 etc.
68 – 70 IIII 5 because each one of
them seems to belong to
t

70 – 72 IIII 5
ta

72 – 74 1 two classes. Such


I
overlapping class limits
es

Total -- 38
should therefore be
avoided.
ze

Alternate Method

Class
Classes Tally f
boundaries
60 – 61 59.5 – 61.5 II 2
62 – 63 61.5 – 63.5 IIII III 8
64 – 65 63.5 – 65.5 IIII IIII I 11
66 – 67 65.5 – 67.5 IIII I 6
68 – 69 67.5 – 69.5 IIII 5
70 – 71 69.5 – 71.5 IIII 5
72 – 73 71.5 – 73.5 I 1
Total -- -- 38

37
Chapter 02 Collection and Organization of Data

Test Yourself

Construct Frequency Distribution:

1) The following data indicates number of people in different locality:

30 60 70 80 45 55 49
71 84 90 35 40 49 50
68 70 77 81 91 92 95

om
96 90 104 99 66 68 50
55 66 73 82 89 50 28

l.c
2) The following data relate to heights of 1st year students (heights in inches):

72 77 75 74 80 80 76 74 73 75
76 78 81 70 74 73
ai
72 74 73 75
76 80 81 82 79 78 72 75 74 72
gm

78 77 75 70 79 74 76 73 -- --
s@

Relative Frequency Distribution


t
ta

“The frequency of a class divided by the total of the frequencies is called the relative frequency of
that class and a table showing the relative frequencies is called a relative frequency distribution”.
es

frequency of a class
R.F =
ze

total of frequencies of all classes

Percentage Relative Frequency Distribution

“The frequency of a class divided by the total of the frequencies and multiplied by 100, is called
the percentage relative frequency of that class and a table showing the percentage relative
frequencies is called a percentage relative frequency distribution”.

frequency of a class
P.R.F = ×100
total of frequencies of all classes

38
Chapter 02 Collection and Organization of Data

Class
Classes f R.F P.R.F
boundaries
60 – 61 59.5 – 61.5 2 2/38 (2/38)x100
62 – 63 61.5 – 63.5 8 8/38 (8/38)x100
64 – 65 63.5 – 65.5 11 11/38 (11/38)x100
6/38 (6/38)x100
66 – 67 65.5 – 67.5 6
5/38 (5/38)x100
68 – 69 67.5 – 69.5 5 5/38 (5/38)x100
70 – 71 69.5 – 71.5 5 5/38 (5/38)x100

om
72 – 73 71.5 – 73.5 1 1/38 (1/38)x100
Total -- 38 1 100

l.c
Cumulative Frequency ai
 Cumulative frequency for ungrouped frequency distribution is defined as “total frequency that
gm

is obtained by adding the frequencies for each value to frequency for preceding values”.
 Cumulative frequency for grouped frequency distribution is defied as “the total frequency of all
classes less than the upper class boundary of a given class or the total frequency of all
s@

classes greater than the lower class boundary of given class is called cumulative frequency”.
t

Cumulative Frequency Distribution


ta

“A tabular form of the variable along with the cumulative frequencies is called cumulative
es

frequency distribution” e.g.


ze

Class
X f C.F Classes f C.F
boundaries
0 2 2
60 – 61 59.5 – 61.5 2 2
1 8 2 + 8 = 10
2 9 10 + 9 = 19 62 – 63 61.5 – 63.5 8 8 + 2 =10
3 7 19 + 7 = 26 64 – 65 63.5 – 65.5 11 11 + 10 = 21
4 10 26 + 10 = 36 66 – 67 65.5 – 67.5 6 6 + 21 = 27
5 8 36 + 8 = 44
6 7 44 + 7 = 51 68 – 69 67.5 – 69.5 5 5 + 27 = 32
7 3 51 + 3 = 54 70 – 71 69.5 – 71.5 5 5 + 32 = 37
8 4 54 + 4 = 58 72 – 73 71.5 – 73.5 1 1 + 37 = 38
9 5 58 + 5 = 63
Total 63 -- Total -- 38 --

39
Chapter 02 Collection and Organization of Data

Cumulative Frequency
Distribution

“Less than” C.F “OR more” C.F


Distribution Distribution

om
“Less than” Cumulative Frequency Distribution

l.c
“A “less than” cumulative frequency distribution is that, where the cumulative frequency is
obtained by the total frequency of all classes less than the upper class boundary of a given class
ai
and starts with the lower class boundary of the first class indicating that there is no frequency
gm

below it”.
s@

Class
Class boundaries in
C.F
Classes boundaries f “less than”
t

form
ta

60 – 61 59.5 – 61.5 2 Less than 59.5 0


62 – 63 61.5 – 63.5 8 Less than 61.5 2
es

64 – 65 63.5 – 65.5 11 Less than 63.5 2 + 8 = 10


66 – 67 65.5 – 67.5 6 Less than 65.5 10 + 11 = 21
68 – 69 67.5 – 69.5 5 Less than 67.5 21 + 6 = 27
ze

70 – 71 69.5 – 71.5 5 Less than 69.5 27 + 5 = 32


72 – 73 71.5 – 73.5 1 Less than 71.5 32 + 5 = 37
Less than 73.5 37 + 1 = 38
Total -- 38 -- --

“OR more” OR “more than” Cumulative Frequency Distribution

“An “or more” cumulative frequency distribution is that, where the cumulative frequency is
obtained by the total frequency of all classes more than the lower class boundary of a given class
and ends with the upper class boundary of the last class indicating that there is no frequency
above it.”

40
Chapter 02 Collection and Organization of Data

Class
Class boundaries in
C.F
Classes boundaries f “or more”
form
60 – 61 59.5 – 61.5 2 59.5 or more 38
62 – 63 61.5 – 63.5 8 61.5 or more 38 – 2 = 36
64 – 65 63.5 – 65.5 11 63.5 or more 36 – 8 = 28
66 – 67 65.5 – 67.5 6 65.5 or more 28 – 11 =17
68 – 69 67.5 – 69.5 5 67.5 or more 17 – 6 = 11
70 – 71 69.5 – 71.5 5 69.5 or more 11 – 5 = 6

om
72 – 73 71.5 – 73.5 1 71.5 or more 6–5=1
73.5 or more 0
Total -- 38 -- --

l.c
Whenever we refer to a cumulative frequency distribution without any qualification, we
ai
always mean a “less than” type cumulative frequency distribution.
gm
s@

Relative Cumulative Frequency Distribution


t
ta

“The cumulative frequency of a class divided by the total of frequencies is called the relative
cumulative frequency and a table showing relative cumulative frequencies is called the relative
es

cumulative frequency distribution”.


ze

cumulative frequency of a class


R.C.F =
total of frequencies of all classes

Percentage Relative Cumulative Frequency Distribution

“The cumulative frequency of a class divided by the total of frequencies and multiplied by 100, is
called the percentage relative cumulative frequency and a table showing percentage relative
cumulative frequencies is called the percentage relative cumulative frequency distribution”.

cumulative frequency of a class


P.R.C.F = ×100
total of frequencies of all classes

41
Chapter 02 Collection and Organization of Data

Class
Classes f C.F R.C.F P.R.C.F
boundaries
60 – 61 59.5 – 61.5 2 2 2/38 = 0.0526 (2/38) x100 = 5.2632
62 – 63 61.5 – 63.5 8 8 + 2 =10 10/38 = 0.2632 (10/38) x100 = 26.3158
64 – 65 63.5 – 65.5 11 11 + 10 = 21 21/38 = 0.5526 (21/38) x100 = 55.2632
66 – 67 65.5 – 67.5 6 6 + 21 = 27 27/38 = 0.7105 (27/38) x100 = 71.0526
68 – 69 67.5 – 69.5 5 5 + 27 = 32 32/38 = 0.8421 (32/38) x100 = 84.2105
70 – 71 69.5 – 71.5 5 5 + 32 = 37 37/38 = 0.9737 (37/38) x100 = 97.3684
72 – 73 71.5 – 73.5 1 1 + 37 = 38 38/38 = 1 (38/38) x100 = 100

om
Total -- 38 -- -- --

l.c
Diagrams (Charts)
ai
Charts or diagrams give visual representations of the qualitative data. Diagrams also show comparisons
between two or more sets of qualitative data. Diagrams should be clear and easy to read and understand.
gm

Too much information should not be shown in the same diagram otherwise it might become confusing.

 Bar Charts
s@

 Pie Chart or Circle Diagram

Diagrams
t
ta
es

Bar Charts Pie Charts


ze

Bar Charts

 Simple bar chart


 Multiple bar charts or cluster chart
 Component bar chart or subdivided bar charts or staked bar charts

Bar Charts

Simple Bar Multiple Bar Component Bar


Chart Charts Charts

42
Chapter 02 Collection and Organization of Data

Simple Bar Chart

This chart consists of vertical or horizontal bars of equal width. The length of the bars represents the
magnitude of the values of the variable i.e. the lengths of the bars vary depending on the size of data
values.

EXAMPLE 2.04

The following table gives the population of five different cities. Draw a simple bar chart:

om
Cities A B C D E
Population 30 45 60 63 69

Solution
l.c
ai
gm
Step 1: Draw the X and Y axis and place the population on Y axis.
Step 2: Draw the bars corresponding to the population.
s@

Bar chart of the Population


t

75
ta
es

60
ze

Population 45

30

In 1786 William
15
Playfair introduced
the Bar chart.
0
A B C D E
Cities

43
Chapter 02 Collection and Organization of Data

Test Yourself

The following table gives the population of six different cities. Draw a simple bar chart:

Years A B C D E F
Population 50 60 70 80 90 100

om
l.c
ai
gm
t s@
ta
es
ze

44
Chapter 02 Collection and Organization of Data

Multiple Bars Chart

By multiple bars chart, two or more sets of inter-related data are represented. The technique of simple
bar chart is used to draw this chart but the difference is that we use different shades, colors or dots to
distinguish between different phenomena. Multiple bars chart facilities comparison between more than
one phenomenon.

EXAMPLE 2.05
The following table gives the imports and exports of Pakistan for the five Months. Draw a

om
multiple bar chart:
Months Imports Exports
January 8 4

l.c
February 10 6
March 12 ai 9
April 18 13
May 20 17
gm

Solution
s@

Step 1: Draw the X and Y axis and place the amount on Y axis.
Step 2: For each month, draw two bars (both for the imports and exports) side-by-side
corresponding to the amount.
t
ta

Multiple Bars Chart Import


es

25
Export
ze

20

15
Million(Rs.)

10

0
Jan Feb March April May

Months

45
Chapter 02 Collection and Organization of Data

Test Yourself

The following table gives the imports and exports of Pakistan for the five Months.
Draw a multiple bar chart:

Months Imports Exports


January 9 5
February 13 9
March 15 8

om
April 14 16
May 23 12

l.c
ai
gm
t s@
ta
es
ze

46
Chapter 02 Collection and Organization of Data

EXAMPLE 2.06

Construct a Multiple Bar Chart to show the population of the cities given in the following table:

Population in thousand
City May Jun July
A 70 110 200
B 80 90 160
C 90 100 120

om
Solution

l.c
Step 1: Draw the X and Y axis and place the population on Y axis.
Step 2: For each year, draw three bars (for the three cities) side-by-side corresponding to
ai
the populations.
gm
s@

Multiple Bars Chart


250
city A
city B
200
t

city C
ta

150
es

Population
ze

100

50

0
May Jun July

Months

47
Chapter 02 Collection and Organization of Data

Test Yourself

Construct a Multiple Bar Chart to show the population of the cities given in the following table:

Population in thousand
City Jan Feb March
A 90 120 180
B 60 80 200
C 70 120 100

om
l.c
ai
gm
t s@
ta
es
ze

48
Chapter 02 Collection and Organization of Data

Sub-divided Bar Chart

A sub-divided bar chart is an effective technique in which each bar is sub-divided into two or more
parts. The component parts are shaded or colored differently to increase the overall effectiveness of the
diagram.

EXAMPLE 2.07

The following table represents the monthly development in the filed of industry, transport and
agriculture of Pakistan. Construct a Sub-divided Bar Chart:

om
Months Industry Transport Agriculture Total
Jan 100 80 40 220
Feb 120 100 50 270

l.c
March 130 120 60 310
ai
Solution
gm

Step 1: Draw the X and Y axis and place the Development on Y axis.
Step 2: For each year, draw three bars corresponding to the development then sub-divide
s@

each bar for the agriculture, transport and industry by their corresponding
Developments.

350 Sub-divided Bars Chart agriculture


t
ta

transport
300 in dustry
es

250
ze

Development 200

150

100

50

0
jan Feb March
Months

49
Chapter 02 Collection and Organization of Data

Test Yourself

The following table represents the Monthly development in the filed of industry, transport and
agriculture of Pakistan. Construct a Sub-divided Bar Chart:

Years Industry Transport Agriculture Total


May 120 90 50 260
June 140 110 70 320

om
July 150 100 40 290

l.c
ai
gm
t s@
ta
es
ze

50
Chapter 02 Collection and Organization of Data

Pie Chart or Circle Diagram

A pie-diagram, also known as sector or circle diagram, is a device


consisting of a circle divided into sectors or pie-shaped pieces whose areas
are proportional to the various parts into which the whole quantity is
divided. The sectors are shaded or colored differently.

The procedure of constructing a pie chart is very simple:

om
 Draw a circle of some suitable radius.
 As a circle consists of 360o, the whole quantity to be
In 1801 the earliest
displayed is equated to 360.

l.c
known Pie chart is
 Then divide the circle into different sectors by
generally credited to
constructing angles at the center by means of a
William Playfair.
ai
protractor and draw the corresponding radii.
 The angles are calculated by the following formula:
gm

Component Part
Angle   360o
Whole Quantity
t s@
ta

The following tools are used to draw the pie chart.


es
ze

51
Chapter 02 Collection and Organization of Data

How to draw Pie chart?

Step 1 Step 2 Step 3

om
l.c
Step 4 Step 5
ai Step 6
gm
ts@
ta
es

Step 7 Step 8
ze

Wow!

52
Chapter 02 Collection and Organization of Data

EXAMPLE 2.08

Draw a Pie-diagram for the following data:

Items Expenditure in Rs.


Food 190
Clothing 64
Rent 100
Medical 46
Other 80

om
Solution

l.c
Step 1: To construct Pie-diagram; first we find Angles.
ai
Component Part
Items
Expenditure Angle   360 0
in Rs. Whole Quantity
gm

Food 190 142.5


Clothing 64 48
Rent 100 75
s@

Medical 46 34.5
Other 80 60
Total 480 360
t

Step 2: Next, using a protractor and a compass, draw the graph using the appropriate
ta

degree found in step 1, and label each section with the name.
es

Pie Chart
ze

Food
Clothing
Others
Re nt
Medical
Medical Food Others

Re nt
Clothing

53
Chapter 02 Collection and Organization of Data

Test Yourself

Draw a Pie-diagram for the following data:

Items Expenditure in Rs.


Food 160
Clothing 80
Rent 120
Medical 50

om
Other 90

l.c
ai
gm
t s@
ta
es
ze

54
Chapter 02 Collection and Organization of Data

Graphs

Graphs give visual representations of the quantitative data. A graph consists of curves or straight lines.
Graphs provide a very good method of showing fluctuations and trends in statistical data. Graphs can
also be used to make predictions and forecasts.

 Histogram
 Frequency Polygon
 Frequency Curve
 Cumulative Frequency Polygon (Ogive)

om
Graph of Ungrouped Frequency Distribution
 Graph of Time Series

l.c
Graphs

Frequency Distribution
ai Graph of
Time Series
gm

Grouped Frequency Ungrouped Frequency


Distribution Distribution
s@

Histogram Frequency
t
ta

Curve
es

Frequency Cumulative
Polygon Frequency Polygon
ze

Histogram

A histogram consists of a set of rectangles having bases on a horizontal


axis i.e. X-axis (note that these bases are marked off by class boundaries
not class limits) with centers at the class marks and areas proportional to
the class frequencies.

 If the widths of the classes are equal then the heights of the
rectangles are also proportional to the class frequencies and are
taken numerically equal to class frequencies. In 1891, Pearson
introduced
 If the widths of the classes are not equal then the heights of the “Histogram”
rectangles have to be adjusted.

55
Chapter 02 Collection and Organization of Data

First Method (Equal Class Width)

 Draw X-axis and Y-axis.


 Take class boundaries on X-axis and frequencies on Y-axis.
 Construct joint rectangles. The resulting figure is the required histogram.

EXAMPLE 2.09
Construct Histogram from the following frequency distribution:

om
Classes 40-49 50-59 60-69 70-79 80-89 90-99 100-109
Frequency 1 3 4 5 4 2 1

l.c
Solution To draw a Histogram we proceed with the following steps:ai
gm
Step 1: Find class-boundaries.
Step 2: Mark class-boundaries along the x-axis and the frequencies along y-axis.
Step 3: Construct rectangles having width proportional to class widths and heights
proportional to class frequencies.
s@

Step 4: The resulting graph will be the Histogram as given below.


t

Histogram with equal


ta

Class-
6 class width Classes f
boundaries
es

40-49 1 39.5-49.5
5 50-59 3 49.5-59.5
ze

60-69 4 59.5-69.5
70-79 5 69.5-79.5
4
80-89 4 79.5-89.5
90-99 2 89.5-99.5
Frequency 3 100-109 1 99.5-109.5

0
39.5 49.5 59.5 69.5 79.5 89.5 99.5 109.5

class boundaries

56
Chapter 02 Collection and Organization of Data

Test Yourself

Construct Histogram from the following frequency distribution:

Classes 30-39 40-49 50-59 60-69 70-79 80-89 90-99


Frequency 2 4 5 7 4 3 2

om
l.c
ai
gm
t s@
ta
es
ze

57
Chapter 02 Collection and Organization of Data

Second Method (Unequal Class Width)

 Draw X-axis and Y-axis.


 Take class boundaries on X-axis and adjusted frequencies on Y-axis.
(Frequencies are adjusted by dividing them by their respective class width)
 Construct joint rectangles. The resulting figure is the required histogram.

EXAMPLE 2.10

Construct Histogram from the following frequency distribution:

om
classes 40-49 50-53 54-64 65-79 80-89 90-99 100-109
f 10 12 44 75 40 20 10

Solution To draw a Histogram we proceed with the following steps:


l.c
ai
gm
Step 1: Find class-boundaries and adjusted Class
Class- f
frequencies. width Adj : frequency 
Step 2: Mark class-boundaries along the boundaries h
(h)
x-axis and the adjusted frequencies
s@

39.5-49.5 10 1
along y-axis. 49.5-53.5 4 3
Step 3: Construct rectangles having width 53.5-64.5 11 4
proportional to class-width and 64.5-79.5 15 5
t

heights proportional to class 79.5-89.5 10 4


ta

adjusted frequencies. 89.5-99.5 10 2


Step 4: The resulting graph will be the 99.5-109.5 10 1
es

Histogram as given below.


ze

6 Histogram with unequal


class width
5

Frequency 3

0
39.5 49.5 53.5 64.5 79.5 89.5 99.5 109.5

class boundaries
58
Chapter 02 Collection and Organization of Data

Test Yourself

Construct Histogram from the following frequency distribution:

classes 20-29 30-33 34-44 45-59 60-69 70-79 80-89


f 12 15 48 80 30 25 15

om
l.c
ai
gm
ts@
ta
es
ze

59
Chapter 02 Collection and Organization of Data

Frequency Polygon

A frequency polygon is a many sided closed figure. It is constructed by plotting the class frequencies
against their corresponding class marks (mid-points) and then joining the resulting points by means of
straight lines. It can also be obtained by joining the mid-points of the tops of rectangles in the
histograms.

Method

om
 Draw X-axis and Y-axis.
 Take class marks on X-axis and frequencies on Y-axis.
 Join the points by means of straight lines. The resulting figure is the required

l.c
frequency polygon.
ai
gm

In this method the ends of the graph do not meet the X-axis and we know that a polygon
is a many-sided closed figure. We may therefore add extra classes at both ends of frequency
s@

distribution with zero frequencies. By doing so the polygon forms a closed figure.
t
ta

EXAMPLE 2.11
es

Construct Frequency Polygon from the following frequency distribution:


ze

Classes 10-19 20-29 30-39 40-49 50-59


Frequency 5 15 40 20 10

Solution To draw a Frequency Polygon we proceed with the following steps:

Step 1: Find class-marks (mid-points).

Classes Frequency Mid-points


10-19 5 14.5
20-29 15 24.5
30-39 40 34.5
40-49 20 44.5
50-59 10 54.5

60
Chapter 02 Collection and Organization of Data

Step 2: Mark mid-points along the x-axis and the frequencies along y-axis.
Step 3: Place a dot against each mid-point with respect to its class frequency.
Step 4: Join the dots by straight lines to get Frequency Polygon as given below.

Frequency Polygon
45

35

om
Frequency 25

15

l.c
5 ai
0
4.5 14.5 24.5 34.5 44.5 54.5 64.5
gm

Mid Points
s@

We can also draw a frequency polygon by the following method:

 Draw a histogram
t
ta

 The mid-points at the top of each rectangle are joined by straight lines. The
figure is the required frequency polygon.
es

Frequency Polygon
ze

45

35

Frequency 25

15

0
4.5 14.5 24.5 34.5 44.5 54.5 64.5
Mid Points

61
Chapter 02 Collection and Organization of Data

Test Yourself

Construct Frequency Polygon from the following frequency distribution:

Classes 20-29 30-39 40-49 50-59 60-69


Frequency 7 13 42 23 6

om
l.c
ai
gm
t s@
ta
es
ze

62
Chapter 02 Collection and Organization of Data

Frequency Curve

When the frequency polygon is smoothed out as a curve then it becomes frequency curve. OR when the
mid-points are potted against the frequencies then a smooth curve passes through these points is called a
frequency curve.

Method

 Draw X-axis and Y-axis.

om
 Take class marks on X-axis and frequencies on Y-axis.
 Plot the frequencies against the class marks.
 The plotted points are then joined by a smooth curve, which gives frequency

l.c
curves.
ai
gm

EXAMPLE 2.12

Construct Frequency Curve from the following frequency distribution:


s@

Classes 10-19 20-29 30-39 40-49 50-59


Frequency 5 15 40 20 10
t
ta

Solution
es

To draw a Frequency Curve we proceed with the


following steps:
ze

Step 1: Find class-marks (mid-points).

Classes Frequency Mid-points


10-19 5 14.5 The smoothed curve
20-29 15 24.5 should pass above the
30-39 40 34.5 highest points of the
40-49 20 44.5 polygon
50-59 10 54.5

Step 2: Mark mid-points along the x-axis and the frequencies along y-axis.
Step 3: Place a dot against each mid-point with respect to its class frequency.
Step 4: Join the dots by smooth line to get Frequency Curve as given below.

63
Chapter 02 Collection and Organization of Data

45 Frequency Curve

35

Frequency 25

15

om
5

0
4.5 14.5 24.5 34.5 44.5 54.5 64.5

l.c
Mid Points
ai
gm

We can also draw a frequency curve by the following method:


s@

 Draw X-axis and Y-axis.


 Draw a histogram.
 Draw a smooth curve through the top of the rectangles. The resulting figure is the
t

required frequency curve.


ta

Frequency Curve
es

45
ze

35

Frequency 25

15

0
4.5 14.5 24.5 34.5 44.5 54.5 64.5
Mid Points

64
Chapter 02 Collection and Organization of Data

Test Yourself

Construct Frequency Curve from the following frequency distribution:

Classes 20-29 30-39 40-49 50-59 60-69


Frequency 6 16 45 18 8

om
l.c
ai
gm
t s@
ta
es
ze

65
Chapter 02 Collection and Organization of Data

Cumulative Frequency Polygon (Ogive)

When a curve is based on cumulative frequencies then it is called an ogive.

Ogive

Less than Type More than Type

om
Less than Type

l.c
Method
ai
gm
 First calculate the cumulative frequencies.
 Take upper class boundaries on X-axis and the cumulative frequencies on
Y-axis.

s@

Plot the cumulative frequency against the upper class boundaries.


 Join the potted points by straight lines. The resulting figure is the required less
than cumulative frequency polygon or less than ogive.
t
ta

EXAMPLE 2.13
es

Construct Less Than Cumulative Frequency Polygon (Ogive) from the following frequency
distribution:
ze

Classes 10-19 20-29 30-39 40-49 50-59


Frequency 5 25 45 15 10

Solution To draw a Cumulative Frequency Polygon (Ogive) we proceed with the following steps:

Step 1: Find class-boundaries and cumulative frequencies.

Cumulative Class
Classes Frequency
Frequency boundaries
10-19 5 5 9.5-19.5
20-29 25 30 19.5-29.5
30-39 45 75 29.5-39.5
40-49 15 90 39.5-49.5
50-59 10 100 49.5-59.5

66
Chapter 02 Collection and Organization of Data

Step 2: Mark upper class-boundaries along the x-axis and cumulative frequencies along
y-axis.
Step 3: Place a dot against each upper class-boundary with respect to its class cumulative
frequency.
Step 4: Join the dots by straight line to get Cumulative Frequency Polygon (Ogive) as
given below.

120
Less than Ogive

om
100

80
Cumulative

l.c
Frequency 60
ai
40
gm

20

0
s@

19.5 29.5 39.5 49.5 59.5


upper class boundaries
t
ta

More than Type


es
ze

Method

 First calculate the cumulative frequencies.


 Take lower class boundaries on X-axis and the
If we join the points in
cumulative frequencies on Y-axis.
cumulative frequency
 Plot the cumulative frequency against the lower
class boundaries. polygon by smoothed
 Join the potted points by straight lines. The line then we get a
resulting figure is the required more than smoothed ogive.
cumulative frequency polygon or more than ogive.

67
Chapter 02 Collection and Organization of Data

EXAMPLE 2.14

Construct More Than Cumulative Frequency Polygon (Ogive) from the following frequency
distribution:
Classes 10-19 20-29 30-39 40-49 50-59
Frequency 5 25 45 15 10

Solution To draw a Cumulative Frequency Polygon (Ogive) we proceed with the following steps:

om
Step 1: Find class-boundaries and cumulative frequencies.

Cumulative Class
Classes Frequency
Frequency boundaries

l.c
10-19 5 100 9.5-19.5
20-29 25 95
ai 19.5-29.5
30-39 45 70 29.5-39.5
40-49 15 25 39.5-49.5
gm

50-59 1 10 49.5-59.5

Step 2: Mark lower class-boundaries along the x-axis and cumulative frequencies along
s@

y-axis.
Step 3: Place a dot against each lower class-boundary with respect to its class cumulative
frequency.
Step 4: Join the dots by straight line to get Cumulative Frequency Polygon (Ogive) as
t
ta

given below.
es

More than Ogive


120
ze

100

80
Cumulative
Frequency 60

40

20

0
19.5 29.5 39.5 49.5 59.5
lower class boundaries

68
Chapter 02 Collection and Organization of Data

Test Yourself

Construct Both More Than and Less Than Cumulative Frequency Polygon (Ogive) from the
following frequency distribution:

Classes 30-39 40-49 50-59 60-69 70-79


Frequency 4 13 43 26 12

om
l.c
ai
gm
t s@
ta
es
ze

69
Chapter 02 Collection and Organization of Data

Graph of Ungrouped Frequency Distribution

Vertical lines graph is visual representation of an ungrouped frequency distribution. It consists of a


set of vertical lines that are perpendicular to the X- axis and intersect the X-axis at the values of the
discrete variable and the height of each line is proportional to its frequency.

First Method

 Draw X-axis and Y-axis.


om
Take the values of discrete variable on X-axis and frequencies on Y-axis.
 Draw vertical lines for each value of the variable such that the height of each line
is proportional to its frequency.

l.c
EXAMPLE 2.15
ai
gm
Construct vertical lines graph for the following data:
X 3 4 5 6 7 8 9 For discrete variable, if
f 2 3 7 9 8 3 5 we make Histogram we
s@

first find class


boundaries. These class
Solution
boundaries are called
t
ta

factitious class
Step 1: Draw X-axis and Y-axis.
boundaries because the
Take the variable “X” on X-axis and frequencies
es

Step 2:
along Y-axis. discrete variable cannot
Step 3: Draw vertical lines for each value of “X” with height assume such values.
ze

equal to its frequency.

Vertical Lines Graph Histogram


10 10

8 8

6 6
f f
4 4

2 2

0 0
3 4 5 6 7 8 9 X 3 4 5 6 7 8 9 X

70
Chapter 02 Collection and Organization of Data

Test Yourself

Construct vertical lines graph and the histogram for the following data:

X 30 40 50 60 70 80 90
Frequency 4 6 7 13 12 5 1

om
l.c
ai
gm
t s@
ta
es
ze

71
Chapter 02 Collection and Organization of Data

Graph of Time Series

A curve showing changes in the value of one or more items from one period of time to the next is known
as the graph of time series. Thus a Graph of time series displays the variations in time series dealing
with prices, production, imports, population etc.

Method

 Draw X-axis and Y-axis.


om
Take time (years, months, weeks, etc.) along X-axis
and the corresponding values along Y-axis. Do not try to fit a
 Plot the various points. Join the plotted points by smooth curve through
straight lines. The resulting figure is the required the data points

l.c
graph of time series.
ai
EXAMPLE 2.16
gm

Construct Graph from the following Time Series:


s@

Year 1995 1996 1997 1998 1999 2000 2001 2002


Values 5 7 10 8 2 11 12 10
t

Solution
ta
es

Step 1: Draw X-axis and Y-axis.


Step 2: Take time along X-axis and the corresponding values along Y-axis.
Step 3: Plot the various points. Join the plotted points by straight lines. The resulting
ze

figure is the required graph of time series.

14 Graph of Time Series


12

10

Values 8
6

4
In 1786 William
2
Playfair invented the
0 line graph.
1995 1996 1997 1998 1999 2000 2001 2002
Years

72
Chapter 02 Collection and Organization of Data

Test Yourself

Construct Graph from the following Time Series:

Year 2000 2001 2002 2003 2004 2005 2006 2007


Values 8 9 15 12 10 9 11 7

om
l.c
ai
gm
t s@
ta
es
ze

73
Chapter 02 Collection and Organization of Data

False Base Line or the broken line

In all the above graphs and diagram, if the horizontal scale is started from zero it would not only be
difficult to accommodate the whole data on the graph paper but the graph would go at the right of the
paper. In order to avoid this, false base line is used. In false base line, instead of showing the entire
horizontal scale starting from zero to the highest value involved, only that portion of the scale is shown
which serves the purpose. Thus the portion of the scale, starting from zero to the minimum value is
omitted.

Graph of Time Series


14

om
The broken line has been used
12 along the horizontal line to indicate
that we are not showing the

l.c
numbers between 0 and 1995
10
ai
Values 8
gm

4
s@

0
t
ta

1995 1996 1997 1998 1999 2000 2001 2002


Years
es
ze

The Difference between Bar Charts and Histograms

 Here is the main difference between bar charts and histograms. With bar charts,
each column represents a group defined by a categorical variable; and with
histograms, each column represents a group defined by a quantitative variable.

 It is always appropriate to talk about the skewness of a histogram; that is, the
tendency of the observations to fall more on the low end or the high end of the X
axis.

 With bar charts, however, the X axis does not have a low end or a high end; because
the labels on the X axis are categorical - not quantitative. As a result, it is less
appropriate to comment on the skewness of a bar chart.

74
Chapter 02 Collection and Organization of Data

Following differences may be noted between diagrams and graphs.

 In the construction of a graph, graph paper is used. A graph helps to study the
mathematical relation between two variables such as price and demand; income and
consumption, time and population etc. On the other hand, diagrams are generally
constructed on a plain paper. A diagram is used for sake of comparison but not for
studying the relation between two variables.

om
 Graphs are more precise and accurate than diagrams. They are more helpful to a
researcher for studying the relationship between two variables and for further
statistical analysis and interpretation. Diagrams furnish only approximate

l.c
information on the problem under study. These are not much use to a researcher
for further analysis. ai
 Graphs are used to present time series data and frequency distributions. Diagrams
gm

are useful in presenting qualitative data. Presentation of data through graphs is


easier than through diagrams.
t s@
ta
es
ze

75
Chapter 02 Collection and Organization of Data

Sharpen your Pencil


MCQ’s

(1) Class boundaries of 2.5 – 3.5 is ______

(A) 2.45 – 3.55 (B) 2.4 – 3.6 (C) 20 -- 40 (D) None of these

om
(2) Class boundaries of 2 –5 is ______

(A) 2.45 – 555 (B) 2.5 – 5.5 (C) 1.5 – 5.5 (D) None of these

l.c
(3) Class Mark of 2 –5 is ______ ai
(A) 4.5 (B) 3.5 (C) 5.5 (D) None of these
gm

(4) Class width of 2 –3, 4 – 5, 6 – 7 is ______

(A) 1 (B) 2 (C) 1.5 (D) None of these


s@

(5) Class width of 2.45 –5.45 is ______


t

(A) 4 (B) 5 (C) 3 (D) None of these


ta

(6) Class boundaries of 2.05 – 3.05 is ______


es

(A) 2.045 – 3.055 (B) 2.045 – 3.065 (C) 2–3 (D) None of these
ze

(7) For construction of an ogive we find _____ frequencies.

(A) Relative (B) Cumulative (C) Percentage (D) None of these

(8) For relative frequency we divide the class frequency by _____

(A) 100 (B) Sum of frequencies


(C) Cumulative Frequency (D) None of these

(9) The Graph of adjacent rectangles is called_____

(A) Frequency curve (B) Histogram


(C) Ogive (D) None of these

76
Chapter 02 Collection and Organization of Data

Sharpen your Pencil


MCQ’s

(10) For construction of more than ogive we draw _____on x-axis.

(A) Class boundaries (B) Upper class boundaries


(C) Lower class boundaries (D) None of these

om
(11) For relative frequency we divide the class frequency by _____

l.c
(A) 100 (B) Sum of frequencies
(C) Cumulative Frequency (D) aiNone of these

(12) Number of classes is equal to the range divide by _____


gm

(A) 100 (B) class interval (C) Mid-point (D) None of these

(13) For construction of less than Ogive we draw _____on x-axis


s@

(A) Class boundaries (B) Upper class boundaries


(C) Lower class boundaries (D) None of these
t
ta

(14) The process of arranging data into rows and columns is called _____
es

(A) Classification (B) Tabulation


(C) Both A & B (D) None of these
ze

(15) A frequency curve is _____

(A) Horizontal line (B) Straight line


(C) Smoothed graph (D) None of these

(16) An ogive is of _____

(A) 4 types (B) 3 types (C) 2 types (D) None of these

(17) Data classified by attributes is called _____

(A) Quantitative data (B) Qualitative data


(C) Time series data (D) None of these

77
Chapter 02 Collection and Organization of Data

Short Questions
ExeRciSe

Q.2.01. Define Primary and Secondary data?

Q.2.02 . Define Classification and Tabulation.

om
Q.2.03. What is frequency distribution?

Q.2.04. What are the methods for collecting primary data?

l.c
Q.2.05. Draw a Pie Chart and Bar Chart for the following data:
ai
Items Food Clothing Bills Rent Misc.
gm
Expenditure 60 10 12 10 8

Q.2.06. What Points should be considered in construction of frequency distribution?


s@

Q.2.07. Represent the following frequency distribution by frequency polygon:

Classes 1-3 4-6 7-9 10-12 13-15.


t

f 1 2 10 5 3
ta

Q.2.08. Calculate cumulative frequencies and relative frequencies for the following data:
es

Classes 1-3 4-6 7-9 10-12 13-15.


ze

f 1 2 10 5 3

Q.2.09. Explain the use of Pie chart in presenting statistical data.

Q.2.10. Draw a sub divided bar chart for the following data:

Year Wheat Maize Rice


March 80 110 90
April 100 80 90
May 120 130 60

Q.2.11. What is Frequency Curve? Draw a frequency curve from the following data:

Classes 10-19 20-29 30-39 40-49 50-59 60-99


f 8 15 30 21 12 5

78
Chapter 02 Collection and Organization of Data

Long Questions
ExeRciSe

Q.2.01. Construct a frequency distribution using class-intervals of 5. Indicate the class-


boundaries and class-limits clearly:

79.4 71.6 95.5 73.0 74.2 81.8 90.6 72.1

om
55.9 75.2 81.9 68.9 74.2 80.7 65.7 71.6
67.6 82.9 88.1 77.8 69.4 83.2 82.7 59.4
73.8 64.2 63.9 58.3 48.6 83.5 70.8 77.6

l.c
Q.2.02. Classify the data taking class intervals of size (Width) one:
ai
1 3 4 5 6 2 3 4 6 3
5 4 2 8 4 2 4 5 3 5
gm

4 5 1 0 2 3 4 5 3 4
3 2 5 5 3 5 4 6 5 6
3 4 6 1 2 4 4 3 4 5
s@

3 7 4 6 3 4 5 7 7 4

Q.2.03. Construct a frequency table for the following data by using 0.02 as the width of
t

the class-intervals.
ta

0.27 0.22 0.20 0.24 0.26 0.27 0.28 0.28 0.27 0.29
es

0.27 0.27 0.27 0.27 0.22 0.24 0.26 0.27 0.29 0.29
0.30 0.35 0.33 0.26 0.31 0.26 0.30 0.27 0.31 0.32
ze

0.26 0.23 0.24 0.22 0.23 0.25 0.23 0.25 0.27 0.30

Q.2.04. The following responses were obtained when 49 randomly selected residents of a
small city were asked the question “How safe do you think your neighborhood is
for kids?”

not at
very very not sure very not very not sure
all
very not sure somewhat very not at all not very
not very
very very very not very somewhat somewhat
very
very very not sure not at all not very very
not very
very not sure very not very very very
very
not very somewhat somewhat somewhat very very
very
not very not at all very somewhat very somewhat
very

Construct a frequency distribution, Bar chart and Pie chart for this data.

79
Chapter 02 Collection and Organization of Data

Long Questions
ExeRciSe

Q.2.05. Construct a frequency table for the following data by using 10 as the width of the
class-intervals.

76 70 54 70 104 58 88 94 89 57

om
86 62 58 73 103 90 84 90 88 59
84 63 65 72 101 56 87 92 60 87
83 69 57 71 102 57 83 93 61 86

l.c
Also construct a Histogram, Frequency Polygon and Frequency Curve.
ai
Q.2.06. Find Relative Frequencies and Percentage Relative Frequencies for the Data in
gm
Question 2.04
t s@
ta
es
ze

80
CHAPTER 03
Measures
of Central Tendency

om
Chapter Contents

l.c
ai
Y
gm
ou should read this chapter if you need to learn about:

 Types of Measures of Central Tendency: (P82)


s@

 Arithmetic Mean: (P82-P86)


 Properties of Arithmetic Mean: (P87-P88)
 Change of Origin and Scale: (P89-P90)
 Weighted Arithmetic Mean: (P91-P92)
t
ta

 Geometric Mean: (P93-P95)


 Harmonic Mean: (P96-P98)
es

 Relationship between Arithmetic Mean,


Geometric Mean and Harmonic Mean: (P99)
Mode: (P100-P103)
ze


 Median: (P104-P109)
 Symmetrical Distribution: (P110)
 Empirical Relation between Mean, Median and Mode: (P111)
 Quartiles, Deciles and Percentiles: (P111-P118)
 Main Objects of Averages: (P118)
 Requisites (desirable qualities) of a Good Average: (P118)
 Uses of Averages in Different Situations: (P119)
 Prove that:  (xi  x)2  (xi  A)2 : (P119)
 Exercise: (P120-P126)

81
Chapter 03 Measures of Central Tendency

 Usually when two or more different data sets are to be compared it is


necessary to condense the data, but for comparison the condensation of data
set into a frequency distribution and visual presentation are not enough. It is
then necessary to summarize the data set in a single value. Such a value
usually somewhere in the center and represent the entire data set and hence it
is called measure of central tendency or averages. Since a measure of central
tendency (i.e. an average) indicates the location or the general position of the
distribution on the X-axis therefore it is also known as a measure of location
or position.

om
Types of Measure of Central Tendency

l.c
Types
ai
gm

Arithmetic Median
Mean Harmonic
Mean
s@

Geometric Mode
Mean
t
ta
es

Arithmetic Mean or Simply Mean


ze

“A value obtained by dividing the sum of all the


observations by the number of observations is
called arithmetic mean”

Sum of All the Observations


Mean 
Number of Observations

The mean is that central point where the sum of the negative deviations (absolute value)
from the mean and the sum of the positive deviations from the mean are equal. This is why
the mean is considered a measure of central tendency.

82
Chapter 03 Measures of Central Tendency

Methods of Finding Arithmetic Mean

Methods

Direct Step-deviation Method


Method or Coding Method or
Short-cut Short Method
Method

om
Methods Ungrouped data Grouped data

l.c
x x   ; Here n   f
xi fx
Direct Method
n n
ai
x  A  x  A 
D fD
; Here n   f
gm
n n
Short cut Method
Where D = Xi - A and A is the provisional or assumed mean.
s@

x  A  h x  A    h ; Here n   f
u fu
Step deviation n n
Method Xi - A
t

Where u= and h is the common width of the class intervals


ta

h
es

EXAMPLE 3.01
ze

Find A.M from the following data: (ungrouped data)

2, 4, 6, 8, 10

Solution

Direct Method:
X
2
x =
xi 30 The Arithmetic mean is
4 = 6.0
6 n 5 simply called Mean. We
8 denoted Mean by
10 x (read as “X Bar”)
30

83
Chapter 03 Measures of Central Tendency

Short-cut Method:

X D = Xi - A
2 -2
x  A  = 4 +
D 10
4 0 = 6.0
6 2 n 5
To compute the mean,
8 4
(Let A = 4) round-off it one more
10 6
30 10 decimal place than the
original data values. For
Step-deviation Method: example, if the data are

om
given in whole numbers,
Xi - A then the mean should be
X u=
h rounded-off to nearest

l.c
x  A  h = 8 +
u (-5)
2 -3  2 = 6.0 tenth. If the data are
4 -2 n 5
ai given in tenths then the
6 -1
mean should be
8 0 (Here h = 2 and let A = 8)
rounded-off to nearest
gm
10 1
30 -5 hundredth and so on.
s@

EXAMPLE 3.02

Find A.M from the following data: (Discrete Grouped data)


t
ta

X 10 15 20 25 30
es

f 1 2 3 2 1
ze

Solution

Direct Method:

X f fX
10 1 10 In grouped data the
x =
15 2 30 fx 180
= 20.0 number of observations
n 9
f
20 3 60
“n” is equal to
25 2 50
30 1 30 (Here n =  f = 9 )
Total 9 180

84
Chapter 03 Measures of Central Tendency

Short-cut Method:

X f D = Xi - A fD
x  A 
10 1 -10 -10 fD 0
= 20 + = 20.0
15 2 -5 -10 n 9
20 3 0 0
25 2 5 10 (Here A = 20 and n =  f = 9 )
30 1 10 10
Total 9 -- 0

Step-deviation Method:

om
Xi - A
x  A    h = 20 +  5 = 20.0
X f u= fu fu 0
h

l.c
10 1 -2 -2
n 9
15 2 -1 -2
(Here A = 20, h = 5 and n =  f = 9 )
20 3 0 0
ai
25 2 1 2
gm
30 1 2 2
Total 9 -- 0
s@

EXAMPLE 3.03
Find A.M from the following data: (Continuous Grouped data)
t
ta

Weight 11- 20 21- 30 31- 40 41-50 51-60


f 1 2 3 2 1
es

Solution
ze

Direct Method:

Weight f X (mid points) fX


11- 20 1 15.5 15.5
21- 30 2 25.5 51.0
31- 40 3 35.5 106.5
41-50 2 45.5 91.0
51-60 1 55.5 55.5
Total 9 -- 319.5

x =
fx 319.50
= 35.50 (here n =  f = 9 )
n 9

85
Chapter 03 Measures of Central Tendency

Short-cut Method:

Weight f X (mid points) D = Xi - A fD


11- 20 1 15.5 -20 -20
21- 30 2 25.5 -10 -20
The concept of
31- 40 3 35.5 0 0
Arithmetic Mean
41-50 2 45.5 10 20
has been first used
51-60 1 55.5 20 20
by Greek
Total 9 -- -- 0
astronomers in the
third century BC.
x  A 
fD 0

om
= 35.5 + = 35.50
n 9
(Here A = 35.5 and n =  f = 9 )

l.c
Step-deviation Method: ai
Xi - A
Weight f X (mid points) u= fu
gm
h
11- 20 1 15.5 -2 -2 But In 1755,
21- 30 2 25.5 -1 -2 Thomas Simpson
officially proposed
s@

31- 40 3 35.5 0 0
41-50 2 45.5 1 2 the use of
51-60 1 55.5 2 2 Arithmetic Mean.
Total 9 -- -- 0
t
ta

x  A    h = 35.5 +  10 = 35.50
fu 0
es

n 9
(Here A = 20, h = 10 and n =  f = 9 )
ze

Test Yourself
To find Mean of the
Find the A.M from the following data:
population use the

1) 1, 3, 5, 7, 9, 11, 13, 15 following formula:

 x
X 20 25 30 35 40 N
2) f 2 4 9 3 1

3)
Weight 21- 30 31- 40 41- 50 51-60 61-70  (meu)
f 1 3 5 4 2

86
Chapter 03 Measures of Central Tendency

Properties of Arithmetic Mean

The following are the properties of arithmetic mean:

 The mean of a constant is same constant.

4, 4, 4, 4, 4

x =
xi 4 + 4 + 4 + 4 + 4 20
= = 4.0
n 5 5

om
 The sum of deviations from mean is equal to zero. i.e.  (Xi - X)  0
2, 4, 6, 8, 10
l.c
ai
gm
X (Xi - X)
x =
2 -4 xi 30
= 6.0
4 -2 n 5
s@

6 0
8 2
10 4
30 0 =  (Xi - X)   (Xi - X) = 0
t
ta
es

 The sum of squared deviations from the mean is smaller than the sum of squared deviations
ze

from any arbitrary value or provisional mean. i.e. (xi  x)2  (xi  A)2

2, 4, 6, 8, 10

X (Xi - X) (Xi - X)2 (Xi - A) (Xi - A)2


x =
2 -4 16 -2 4 xi 30
= 6.0
4 -2 4 0 0 n 5
6 0 0 2 4
8 2 4 4 16 Let A = 4
10 4 16 6 36
30 -- 40=  (Xi - X)2 -- 60=  (Xi - A)2

 (Xi - X)  (Xi - A)
2
 < 2

87
Chapter 03 Measures of Central Tendency

 The arithmetic mean is affected by the change of origin and scale i.e. when a constant is
added to or subtracted from each value of a variable or if each value of a variable is
multiplied or divided by a constant, then arithmetic mean is affected by these changes.

x =
xi 30
= 6.0
Variable Mean n 5
Xi X Let Yi = 2X + 3 (a = 3, b = 2)

om
Xi  a X a X Y = 2X+3
2 7
aX 4 11

l.c
a Xi
6 15
X
ai 8 19
Xi 10 23
a
a 30 75
gm

Now Y =
 Yi = 75 = 15.0
n
s@

5
therefore Y = bX + a = (2) (6) + 3 = 15.0
t
ta

 If k-subgroups consists of n1 ,n2 ,…,nk observations having their respective means as x 1,


x 2,…, x k then the mean of all the data or combined mean is denoted by x or xc and is
es

defined by:
n1 x1  n2 x2  ... nk xk
ze

xc 
n1  n2  ... nk
For example, if three sections of a statistics class containing 28, 32, and 35 students averaged 83,
80 and 76 respectively, on the same final examination. Then the combined mean for all 3
sections is:

n1 = 28 ; X1 = 83
n2 = 32 ; X2 = 80
n3 = 35 ; X3 = 76

n1 x1  n2 x2  n3 x3 (28)(83)  (32)(80)  (35)(76)


x  = = 79.4
c n1  n2  n3 28  32  35

88
Chapter 03 Measures of Central Tendency

Change of Origin

If we add a constant to each value of a variable or subtract a constant from each value of a variable,
then this is called as change of origin. The arithmetic mean is affected by these changes but the
standard deviation (will be discussed in Chapter 04) is independent of these changes. For example:

Mean( x) 
 x  30  6 Mean( y ) 
 y  45  9 Old Variable New Variable
n 5 n 5 X X2 Y = X+3 Y2
0 0 3 9
x x y y
2 2 2 2

om
S .D( x)     S .D( y )     3 9 6 36
n  n  n  n  6 36 9 81
2 2
9 81 12 144
270  30  495  45  12 144 15 225

l.c
     4.2      4.2
5  5  5  5  ai 30 270 45 495
gm
The following figure illustrates the idea of change of origin:
s@

Mean  6
t

S.D  4.2
ta

4.2
es

0 3 6 9 12
Origin
Mean I have just changed
ze

my position on the
X-axis

New Mean  9

New S.D  4.2


4.2

3 6 9 12 15
Origin Mean

It is now clear, if we change the origin by adding ―3‖ to each value of the variable, then the A.M will
be affected by these changes but S.D will not be changed i.e.

New Mean   Old Mean  3   6  3  9 and New S.D  Old S.D  4.2

89
Chapter 03 Measures of Central Tendency

Change of Scale

If each value of a variable is multiply or divide by a constant, then this is called as change of scale. The
arithmetic mean and standard deviation are affected by these changes. For example:

Mean( x) 
 x  30  6 Mean( y ) 
 y  10  2 Old Variable New Variable
n 5 n 5 X X2 Y = X/3 Y2
0 0 0 0
x x y y
2 2 2 2
3 9 1 1
S .D( x)     S .D( y )    

om
n 6 36 2 4
n  n   n 
9 81 3 9
2 2
270  30  30  10  12 144 4 16
     4.2      1.4

l.c
5  5  5 5 30 270 10 30
ai
The following figure illustrates the idea of change of scale:
gm
s@

Mean  6
S.D  4.2
t

4.2
ta

0 3 6 9 12
es

Mean
I have changed the
scale; the original
ze

is wider than me.


New Mean  2

New S.D  1.4


1.4

0 1 2 3 4 6 9 12
Mean

It is now clear, if we change the scale by dividing each value of the variable by ―3‖ then both the A.M
and S.D will be affected by these changes, such that:

Old Mean 6 Old S .D 4.2


New Mean    2 and New S .D    1.4
3 3 3 3

90
Chapter 03 Measures of Central Tendency

Merits and Demerits of Arithmetic Mean

Merits

 The A.M is clearly defined by a mathematical formula.


 It is based on all the observations in the data and is easy to calculate.
 It is capable of further algebraic treatment.
 It is always unique, i.e. a set of data has only one mean.

om
It is a relatively stable statistic with the fluctuations of sampling.
 It provide basis for
statistical inference.

l.c
Demerits

 It is greatly affected
ai
by extreme values
gm

in the data.
 It cannot be
calculated for
s@

qualitative data.
 If the grouped data
have ―open-end‖
t

classes, mean cannot be accurately computed.


ta

 It is not an appropriate average for highly skewed distribution.


es

Weighted Arithmetic Mean


ze

More Important

Up till now we have discussed the simple A.M or in other words un-weighted
A.M. In calculating arithmetic mean we assume that the values of a variable
have equal importance. But it is not necessary that all the values have the same
relative importance. Thus whenever it is required to find the mean of certain
variables, which are not of equal importance, then we assign certain numerical
quantities to these variables, which express their relative importance. Such
numerical quantities are technically called the weight.

So it is obvious that we would modify the formula of the simple A.M and apply
the formula of the weighted A.M i.e. Less Important

Xw  
wx
w

91
Chapter 03 Measures of Central Tendency

EXAMPLE 3.04

Calculate the weighted mean from the following data:

Item Expenditure (X) Weights (W)


Food 290 7.5
Rent 54 2.0
Clothing 98 1.5
Fuel & Light 75 1.0
Cosmetics 75 0.5

om
Solution Xw  
wx
Since
w

Item Expenditure (x)


l.c
Weights (w) wx
ai
Food 290 7.5 2175.0
gm
Rent 54 2.0 108.0
Clothing 98 1.5 147.0
Fuel & Light 75 1.0 75.0
Cosmetics 75 0.5 37.5
s@

Total -- 12.5 2542.5

Xw  
wx 2542.5
=
t

Therefore = 203.4
w
ta

12.5
es
ze

Test Yourself

Calculate the weighted mean from the following data:

Item Expenditure (X) Weights (W)


Food 390 9.5
Rent 44 3.0
Clothing 199 2.5
Fuel & Light 67 3.8
Other items 85 5.5

92
Chapter 03 Measures of Central Tendency

Geometric Mean

“The nth root of the product of “n” positive values is called geometric mean”

Geometric Mean  n Product of " n" PositiveValues

The following are the formulae of geometric mean:

Ungrouped data Grouped data

om
G  Antilog  G  Antilog 
 logx   f logx 
  ;Here n   f
 n   n 

l.c
ai
gm

EXAMPLE 3.05
s@

Find geometric mean from the following data: (ungrouped data)

5,8,10,12,15
t
ta

Solution
es

X log X
ze

5 0.6990
8 0.9031
10 1.0000
12 1.0792
15 1.1761 Hi Friends!!!
Total 4.8573

G  Antilog  
 logx 

 n 
 4.8573 
 Antilog    9.4
 5 

93
Chapter 03 Measures of Central Tendency

EXAMPLE 3.06

Find G.M from the following data: (Discrete Grouped data)


X 13 14 15 16 17
f 2 5 13 7 3

Solution

om
X f log X f log X
13 2 1.1139 2.2279
G  Antilog  
 flogx 
n 
14 5 1.1461 5.7306

l.c
15 13 1.1761 15.2892 
16 7 1.2041 8.4288  35.3679 
 Antilog 
17 3 1.2304 3.6913
ai  = 15.1
 30 
Total 30 -- 35.3679
gm
s@

EXAMPLE 3.07

Find G.M from the following data: (Continuous Grouped data)


t

Weights 65-84 85-104 105-124 125-144 145-164 165-184 185-204


ta

f 9 10 17 10 5 4 5
es

Solution
ze

Weights f X log X f log X


65-84 9 74.5 1.8722 16.8494
G  Antilog  
 flogx 
85-104 10 94.5 1.9754 19.7543
105-124 17 114.5 2.0588 34.9997  n 
125-144 10 134.5 2.1287 21.2872  124.2470 
145-164 5 154.5 2.1889 10.9446  Antilog   = 117.7
 60 
165-184 4 174.5 2.2418 8.9672
185-204 5 194.5 2.2889 11.4446
Total 60 -- -- 124.2470

94
Chapter 03 Measures of Central Tendency

Test Yourself

Find the G.M from the following data:

1) 1, 3, 5, 7, 9, 11, 13, 15

X 20 25 30 35 40
2) f 2 4 9 3 1

om
Weight 21- 30 31- 40 41- 50 51-60 61-70
3)
f 1 3 5 4 2

l.c
ai
Merits and Demerits of Geometric Mean
gm
s@

Merits
t

 The G.M is clearly defined by a mathematical formula.


ta

 It is unique and based on all the observations.


 It is capable of further algebraic treatment.
es

 It is comparatively less affected by extreme values as compared to A.M.


 It gives equal weight to all the observations and is not much affected by
ze

fluctuations of sampling.

Demerits

 It is neither easy to calculate nor simple to understand.


 It vanishes if any observation is zero.
 It cannot be calculated for qualitative data.
 In case of negative values, it cannot be computed at all.
 If the grouped data have ―open-end‖ classes, geometric mean cannot be accurately
computed.

95
Chapter 03 Measures of Central Tendency

Harmonic Mean

“The reciprocal of the arithmetic mean of the reciprocals of the values is called harmonic mean”

Harmonic Mean  Reciprocal of  Sum of Reciprocal of the Values


The Number of Values 
The following are formulae of harmonic mean:

om
Ungrouped data Grouped data
n n
H H ; Here n   f
1 f

l.c
   
 x  x ai
gm

EXAMPLE 3.08
s@

Find Harmonic mean from the following data: (ungrouped data)

5, 8, 10, 12, 15
t
ta

Solution
es
ze

X 1/X
5 0.2000
8 0.1250
10 0.1000
12 0.0833 Hi Friends!!!
15 0.0667
Total 0.5750
In 1874, Jevons
William Stanley
introduced the
n 5 Geometric Mean
H= = = 8.7
 1  0.5750 and Harmonic
 
 x Mean.

96
Chapter 03 Measures of Central Tendency

EXAMPLE 3.09
Find Harmonic mean from the following data: (Discrete Grouped data)
X 13 14 15 16 17
f 2 5 13 7 3

Solution

om
X f (f /X)
13 2 0.1538
n

l.c
14 5 0.3571 30
H= = = 15.1
15 13 0.8667  f  1.9916
16 7 0.4375  
x
ai
17 3 0.1765
gm
Total 30 1.9916
s@

EXAMPLE 3.10

Find H.M from the following data: (Continuous Grouped data)


t
ta

Weights 65-84 85-104 105-124 125-144 145-164 165-184 185-204


f 9 10 17 10 5 4 5
es

Solution
ze

Weights f X (f / X)
65-84 9 74.5 0.1208
85-104 10 94.5 0.1058
105-124 17 114.5 0.1485 n 60
125-144 10 134.5 0.0743 H= = = 113.1
 f  0.5304
145-164 5 154.5 0.0324  
165-184 4 174.5 0.0229 x
185-204 5 194.5 0.0257
Total 60 -- 0.5304

97
Chapter 03 Measures of Central Tendency

Test Yourself

Find the H.M from the following data:

1) 1, 3, 5, 7, 9, 11, 13, 15

X 20 25 30 35 40
2) f 2 4 9 3 1

om
Weight 21- 30 31- 40 41- 50 51-60 61-70
3)
f 1 3 5 4 2

l.c
ai
gm

Merits and Demerits of Harmonic Mean


 If any value of the
s@

data is negative then


G.M will become ill-
Merits
defined and the
t

remaining two
 The H.M is clearly defined by a mathematical
ta

formula. averages relate each


 It is unique and based on all the observations. other inversely i.e.
es

 It is capable of further algebraic treatment. H.M > A.M.


 It is comparatively less affected by extreme values  If any value of the
ze

as compared to A.M and G.M. data is zero, then


 It is not much affected by fluctuations of sampling. H.M will become ill-
defined and the G.M
Demerits
will be zero.

 It is neither easy to calculate nor simple to  A.M, G.M and H.M of


understand. two values “a” and
 It cannot be determined if any value is zero. “b” are:
 It cannot be calculated for qualitative data.
 If the grouped data have ―open-end‖ classes, A.M= a + b
2
geometric mean cannot be accurately computed.
G.M= (a × b)1/2

H.M= 2ab
a+b

98
Chapter 03 Measures of Central Tendency

Relationship between Arithmetic Mean, Geometric Mean and


Harmonic Mean

 A.M > G.M > H.M


 The three averages are exactly equal if the data set is constant i.e. A.M = G.M = H.M
 (G.M )2  ( A.M )  ( H .M )

x =
Consider the data: xi 30
2, 4, 6, 8, and 10 =6
n 5

om
G  Antilog  
X log X 1/X  logx 

2 0.3010 0.5000  n 
4 0.6021 0.2500  3.5844 
 Antilog 

l.c
6 0.7782 0.1667  = 5.2
8 0.9031 0.1250  5 
n 5
10 1.0000 0.1000 H= =
ai = 4.4
30 3.5844 1.1417  1  1.1417
 
gm
In 1970, the
 x
Hence it is clear that: A.M > G.M > H.M relationship
between Arithmetic
Mean, Geometric
s@

Mean and
Consider the data: Harmonic Mean is
x =
xi 50
10, 10, 10, 10, and 10 = 10 described by
t

n 5 Mitrinovic, D.S.
ta

G  Antilog  
X log X 1/X  logx 

10 1 0.1  n 
es

10 1 0.1
5
10 1 0.1  Antilog   = 10
5
ze

10 1 0.1
10 1 0.1 n 5
H= = = 10
50 5 0.5  1  0.5
 
 x
Hence it is clear that: A.M = G.M = H.M

The A.M of two observations is 127.5 and their G.M is 60 find their H.M.

A.M  127.5
G.M  60
H .M  ?
(G.M )2  60 
2

(G.M )  ( A.M )  ( H .M )  H .M 
2
  28.2
A.M 127.5

99
Chapter 03 Measures of Central Tendency

Mode

Mode in case of Ungrouped Data

“A value, that occurs most frequently in a data, is


called mode” Shop

om
e.g. 2, 3, 4, 2, 5, 6, 2, 7

Mode = 2

l.c
If each value occurs the
“If two or more values occur the same number of times but most same number of times,
ai
frequently than the other values, then there is more than one mode” then there is no mode.
gm
e.g.
1,2,3,4
(there is no mode)
s@

e.g. 2, 9, 11, 9, 2, 13, 14, 7, 18


5, 6, 5, 7, 6, 7

Mode = 2, 9 (there is no mode)


t
ta

The data having one mode is called uni-modal distribution.


es


 The data having two modes is called bi-modal distribution.
 The data having more than two modes is called multi-modal distribution.
ze

Mode in case of Discrete Grouped Data

“A value, which has the largest frequency in a set of data, is called mode”

e.g.
Mode = 43
X 41 42 43 44 45
(Against the maximum frequency)
f 1 3 5 2 1

100
Chapter 03 Measures of Central Tendency

Mode in case of Continuous Grouped Data

In case of continuous grouped data, mode would lie in the class that carries
the highest frequency. This class is called the modal class. The formula
used to compute the value of mode, is given below:

fm  f1
Mode  l  h
(fm  f1)  (fm  f2)

Where l = lower class boundary of the modal class

om
h = class-width of the modal class In 1894, Karl

fm= frequency of the modal class Pearson used the

f1= frequency of the class preceding the modal class term “Mode”

l.c
f2= frequency of the class following the modal class
ai
EXAMPLE 3.11
gm

Find mode from the following data:

Marks 30-39 40-49 50-59 60-69 70-79 80-89 90-99


s@

No. of Students 8 87 190 304 211 85 20


t

Solution
ta

fm  f1
Mode  l  h
es

Since
(fm  f1)  (fm  f2)
ze

No. of Class
Marks Students boundaries
30-39 8 29.5-39.5 Modal class: 59.5 — 69.5
40-49 87 39.5-49.5
50-59 190 49.5-59.5 l = 59.5, f1 = 190, f2 = 211, fm = 304,
60-69 304 59.5-69.5
70-79 211 69.5-79.5 h = 69.5-59.5 = 10
80-89 85 79.5-89.5
90-99 20 89.5-99.5

f m  f1
Mode  l  h
(fm  f1)  (fm  f2)
304 - 190
= 59.5 + × 10 = 65
(304 - 190)+ (304 - 211)

101
Chapter 03 Measures of Central Tendency

Mode Graphically

 Construct a Histogram form the continuous grouped data.


 Locate the modal class i.e. the class with highest rectangle.
 Draw a line from top right hand corner of the modal class rectangle to the point
where the top of the next adjacent rectangle to the left- touches. Similarly, join the
top left hand corner of the modal class rectangle to the point where the top of the
next adjacent rectangle to the right -touches.
 From the intersection of these two lines draw a perpendicular on X-axis.

om
Mode is the point where the perpendicular meets the X-axis.

EXAMPLE 3.12
l.c
ai
Find mode graphically from the following data:
gm

Marks 30-39 40-49 50-59 60-69 70-79 80-89 90-99


No. of Students 8 87 190 304 211 85 20
s@

Solution
t

No. of Class
ta

Marks
Students boundaries
30-39 8 29.5-39.5
es

350 40-49 87 39.5-49.5


50-59 190 49.5-59.5
ze

300
60-69 304 59.5-69.5
250 70-79 211 69.5-79.5
200
80-89 85 79.5-89.5
90-99 20 89.5-99.5
150

100

50

29.5 39.5 49.5 59.5 69.5 79.5 89.5 99.5

Mode = 65.0072

102
Chapter 03 Measures of Central Tendency

Merits and Demerits of Mode

Merits

 It is simple to understand and easy to calculate.


 In some cases it may be obtained by just inspection.
 It is not affected by extreme values.
 It is also useful for qualitative data.

om
It can be located even in open-end classes.

Demerits

l.c
 It is not clearly defined by a mathematical formula.
 It may not exist in some cases.

ai
It is non-unique for all types of data.
 It is not capable of further algebraic treatment.
gm

 It is not based on all the observations.


 It is unsatisfactory for statistical inference.
t s@
ta

Test Yourself
es

Find the Mode from the following data:


ze

1) 1, 3, 5, 7, 7, 11, 13, 7

X 20 25 30 35 40
2) f 2 4 9 3 1

Weight 21- 30 31- 40 41- 50 51-60 61-70


3)
f 1 3 5 4 2

103
Chapter 03 Measures of Central Tendency

Median

“When the observations are arranged in


ascending or descending order, then a
value, that divides a distribution into two
equal parts, is called median”

Median in case of Ungrouped Data

om
In this case we first arrange the observations in increasing or decreasing
order then we use the following formulae for Median:

l.c
 n 1
If “n” is Median  size of  th observation
 2 
odd
ai
 n  n  
gm

If “n” is size of  th    1th observation


even Median   2  2  
2
s@

EXAMPLE 3.13
t
ta

Find Median from the following data:


es

3,4,5,8,2,9,7,6,10
ze

Solution

Ascending order: 2,3,4,5,6,7,8,9,10 (n = 9 odd)


The number of values
 n +1  above the median
Median = size of   th observation
 2  balances (equals) the
 9 +1  number of values below
= size of   th observation
 2  the median i.e. 50% of
the data falls above and
= size of 5th observation = 6 below the median.

104
Chapter 03 Measures of Central Tendency

EXAMPLE 3.14

Find Median from the following data:

13,14,15,18,12,19,17,16,10,20

Solution

om
Ascending order: 10,12,13,14,15,16,17,18,19,20

(n = 10 even)

l.c
 n  n  
size of  th    1th observation
ai
 2  2  
Median 
2
gm

 10   10  
size of  th    1th observation
  2   2  
s@

2 The concept of
size of 5th  6th observation Median was used by
 Gauss at the
2 beginning of 19th
t

15  16
ta

  15.5 century.
2
es

Median in case of Discrete Grouped Data


ze

In case of discrete grouped data, first we find the


cumulative frequencies and then use the following
formula for Median:

 n 1
Median  size of  th observation Around 1874
 2  Francis Galton first

Here n   f introduced Median


as statistical concept

105
Chapter 03 Measures of Central Tendency

EXAMPLE 3.15

Find Median from the following data:

X 20 21 22 23 24 25
f 1 3 5 2 2 2

Solution

om
X f Cumulative

l.c
Frequency
20 1 1  n +1 
Median = size of   th observation
 2 
21 3 4
ai
22 5 9  15 +1 
= size of   th observation
gm
23 2 11
 2 
24 2 13
25 2 15 = size of 8th observation = 22
Total 15 --
s@

EXAMPLE 3.16
t
ta

Find Median from the following data:


es

X 41 42 43 44 45 46
f 2 4 4 2 1 3
ze

Solution

X f Cumulative Frequency
41 2 2  n +1 
Median = size of   th observation
42 4 6  2 
43 4 10  16 +1 
44 2 12 = size of   th observation
 2 
45 1 13
46 3 16 = size of 8.5th observation = 43
Total 16 --

106
Chapter 03 Measures of Central Tendency

Median in case of continuous Grouped Data

In continuous grouped data, when we are finding median, we first construct


the class boundaries if the classes are discontinuous. Then we find
cumulative frequencies and then we use the following two steps:
 First we determine the median class using n/2.
 When the median class is determined, then the following formula is
used to find the value of median. i.e.

h n 
Median  l   C; Here n   f

om
f 2 
Where l = lower class boundary of the median class
h = class-width of the median class

l.c
f = frequency of the median class
C = cumulative frequency of the class preceding the median class.
ai
gm

EXAMPLE 3.17
s@

Find Median from the following data:

Marks 30-39 40-49 50-59 60-69 70-79 80-89 90-99


No. of Students 8 87 190 304 211 85 20
t
ta

Solution
es

Step 1:
ze

No. of Class n


Marks Students C.F boundaries Median = Size of   th observation
2
30-39 8 8 29.5-39.5
40-49 87 95 39.5-49.5  905 
= Size of   th observation
50-59 190 285 49.5-59.5  2 
60-69 304 589 59.5-69.5 = Size of 452.5th observation
70-79 211 800 69.5-79.5
80-89 85 885 79.5-89.5 And since 452.5th observation lies in the class
90-99 20 905 89.5-99.5 (59.5-69.5); hence this is the median class.
Total 905 -- --
Here l = 59.5, f = 304, C = 285, h = 10
Step 2:
h n 
Median  l   C
f 2 
10
Median = 59.5 +
304
 452.5 - 285  = 65
107
Chapter 03 Measures of Central Tendency

Test Yourself

Find the Median from the following data:

1) 1, 3, 5, 7, 7, 11, 13, 7, 6
2) 30, 44, 34, 46, 55, 47, 20, 58

X 20 25 30 35 40
3)

om
f 2 4 9 3 1

4) Weight 21- 30 31- 40 41- 50 51-60 61-70

l.c
f 1 3 5 4 2

Graphic Representation of Median


ai
gm

 Draw an ogive on the basis of ―less than‖ or ―more than‖ type.


 Compute (n/2) and locate this point on vertical scale (y-axis).
s@

 Draw a perpendicular from the located point to the ogive.


 Now draw a perpendicular on x-axis from the point where the first perpendicular
cuts the ogive.
t

 The point at which the perpendicular will intersect the x-axis will be the Median
ta

of the distribution.
es

EXAMPLE 3.18
ze

Find Median graphically from the following data:


Marks 30-39 40-49 50-59 60-69 70-79 80-89 90-99
No. of Students 8 87 190 304 211 85 20

Solution Marks No. of Students C.F Class boundaries


30-39 8 8 29.5-39.5
40-49 87 95 39.5-49.5
50-59 190 285 49.5-59.5
60-69 304 589 59.5-69.5
70-79 211 800 69.5-79.5
80-89 85 885 79.5-89.5
90-99 20 905 89.5-99.5
Total 905 -- --

108
Chapter 03 Measures of Central Tendency

Here we construct ―less than‖ cumulative frequency distribution:

Less than Class


boundaries C.F 1000
Less than 29.5 0
Less than 39.5 8 900
Less than 49.5 95
Less than 59.5 285 800
Less than 69.5 589
Less than 79.5 800 700
Less than 89.5 885
600
Less than 99.5 905

om
500
n/2
400

300

l.c
200 ai
100
gm
29.5 39.5 49.5 59.5 69.5 79.5 89.5 99.5

Median = 65
s@

Merits and Demerits of Median


t
ta

Merits
 It will be incorrect if
es

 It is simple to understand and easy to calculate.


we get the answer of
 It is not affected by extreme values.
an average out side the

ze

It is also useful for qualitative data.


range of the data.
 It can be located even in open-end classes.
 Like mean it always exists and is unique for any  Whenever you hear
set of data. the word “average”, be
 It is the most appropriate average in highly skewed aware that the word
distribution.
may not always be

Demerits referring to the mean.


It may refer to Median
 It is not clearly defined by a mathematical formula. and Mode etc.
 It is not capable of further algebraic treatment.
 It is not based on all the observations.
 It is necessary to arrange the values in an array before finding the median, which
is a tedious (boring) work.
 It is unsatisfactory for statistical inference.

109
Chapter 03 Measures of Central Tendency

The averages that are obtained by using mathematical formulae are called mathematical
averages e.g.

 Arithmetic Mean
 Harmonic Mean
 Geometric Mean

The averages that are obtained by simple inspection of the data are called positional
averages e.g.

om
 Mode
 Median

All these averages are affected by the change of origin and scale.

l.c
ai
Symmetrical Distribution
gm

“A distribution is said to be symmetric if the values of mean, median and mode are equal” i.e.
s@

Mean = Median = Mode


t

In symmetrical distribution the sum of the deviation from the


ta

mean, mode or median is zero. The shape of such a distribution is Mean = Median = Mode
always in the form of a bell, as shown in the figure.
es
ze

“For symmetric distribution, we know that the values of mean, median and mode are equal,
but if these values differ, then the distribution is said to be skewed or asymmetric”

The following figures show the skewed distribution:

+ve Skewness –ve Skewness

Mean > Median > Mode Mean < Median < Mode

110
Chapter 03 Measures of Central Tendency

Empirical Relation between Mean,


Median and Mode

“The difference between mean and mode is three times the


difference between mean and median” i.e. If two averages are given then

we can find the third one using:


Mean – Mode= 3 (Mean– Median)
3Median  Mode
 Mean 
OR 2

om
Mode  2Mean
“The difference between median and mode is twice the  Median 
3
difference between mean and median”.
 Mode  3Median  2Mean

l.c
Median – Mode = 2 (Mean – Median) ai
gm
If Mean = 28.5 and Median = 30 then by Empirical Relation:

Mode  3Median  2Mean  Mode  3(30)  2(28.5)  33


s@

Quartiles
t
ta

“When the observations are arranged in increasing order then the values, that divide the whole
data into four (4) equal parts, are called quartiles”
es

These values are denoted by Q1, Q2 and Q3.


ze

It is to be noted that 25% of the data falls below Q1, 50% of the data falls
below Q2 and 75% of the data falls below Q3.
Quartiles, Deciles and
Deciles Percentiles are also
called Quantiles or

“When the observations are arranged in increasing order then the Fractiles.

values, that divide the whole data into ten (10) equal parts, are
called deciles”

These values are denoted by D1, D2,…,D9.


It is to be noted that 10% of the data falls below D1, 20% of the data falls below D2,…, and 90% of the
data falls below D9.

111
Chapter 03 Measures of Central Tendency

Percentiles

“When the observations are arranged in increasing order then the


values, that divide the whole data into hundred (100) equal parts,
are called percentiles” For a data 2nd quartile,
5th decile and 50th
percentile are equal to
These values are denoted by P1, P2,…,P99.
Median i.e.
Q2 = D5 = P50 = Median

om
It is to be noted that 1% of the data falls below P1, 2% of the data falls
below P2,…, and 99% of the data falls below P99.

l.c
ai
gm

Measures Data Type Formulas


s@

 j(n  1) 
Ungrouped Qj  size of  th observation
Data  4 
Quartiles Discrete  j(n  1) 
Qj  size of  th observation ; Here n   f
t

4 
Grouped
ta

j = 1, 2, 3 data 
 j(n  1) 
es

Ungrouped Dj  size of   th observation


Data  10 
ze

Deciles Discrete  j(n  1) 


Grouped Dj  size of   th observation ; Here n   f
j = 1, 2, . .,9 data  10 
 j(n  1) 
Ungrouped Pj  size of   th observation
Data  100 
Percentiles Discrete  j(n  1) 
Grouped Pj  size of   th observation ; Here n   f
j = 1, 2, . .,99 data  100 

112
Chapter 03 Measures of Central Tendency

Continuous Grouped Data

In continuous grouped data, we use the following two steps:

 First we determine the jth quartile class using jn/4.


 When the jth quartile class is determined, then the following formula
is used to find the value of jth quartile i.e.

h  jn
Q j l   C  ; Here n   f

om
Quartiles
f4 

l = lower class boundary of the jth quartile class

l.c
h = class-width of the jth quartile class
f = frequency of the jth quartile class
ai
C = cumulative frequency of the class preceding the jth quartile class.
gm

 First we determine the jth decile class using jn/10.


 When the jth decile class is determined, then the following formula
s@

is used to find the value of the jth decile. i.e.

h  jn
Deciles D j l   C  ; Here n   f
t

f  10 
ta

l = lower class boundary of the jth decile class


es

h = class-width of the jth decile class


f = frequency of the jth decile class
ze

C = cumulative frequency of the class preceding the jth decile class.

 First we determine jth percentile class using jn/100.


 When the jth percentile class is determined, then the following
formula is used to find the value of the jth percentile. i.e.

h  jn
Percentiles P j l   C  ; Here n   f
f  100 

l = lower class boundary of the jth percentile class


h = class-width of the jth percentile class
f = frequency of the jth percentile class
C = cumulative frequency of the class preceding the jth percentile class.

113
Chapter 03 Measures of Central Tendency

EXAMPLE 3.19

Find Q1, Q3, D5, and P50 from the following data:

50,51,52,53,54,55,56,57,58,59,60; (n = 11)

Solution

om
 1(n  1)   3(n  1) 
Q1  size of  th observation Q3  size of  th observation
 4   4 
 1(11  1)   3(11  1) 

l.c
 Q1  size of   th observation  Q3  size of   th observation
 4  ai  4 
= size of 3th observation = 52 = size of 9th observation = 58
gm

 5(n  1)   50(n  1) 
D5  size of   th observation P50  size of   th observation
 10   100 
 5(11  1)   50(11  1) 
s@

 D5  size of  th observation  P50  size of  th observation


 10   100 
= size of 6th observation = 55 = size of 6th observation = 55
t
ta
es

EXAMPLE 3.20
ze

Find Q1, Q3, D6 and P80 from the following data:

150,151,152,153,154,155,156,157,158,159 (n = 10)

Solution
 1(n  1) 
Q1  size of  th observation
 4 
 1(10  1) 
 Q1  size of   th observation
 4 
= size of 2.75th observation
= size of 2nd +0.75(3rd - 2nd)  observation
= 151+0.75(152 - 151)= 151.75

114
Chapter 03 Measures of Central Tendency

 3(n  1) 
Q3  size of  th observation
 4 
 3(10  1) 
 Q3  size of   th observation
 4 
= size of 8.25th observation
= size of 8th+0.25(9th - 8th)  observation

om
= 157 +0.25(158 - 157)= 157.25

l.c
 6(n  1) 
D6  size of   th observation
 10 
ai
 6(10  1) 
 D6  size of  th observation
10 
gm

= size of 6.6th observation
= size of 6th+0.6(7th - 6th)  observation
s@

= 155+0.6(156 - 155)= 155.6


t

 80(n  1) 
ta

P80  size of   th observation


 100 
es

 80(10  1) 
 P80  size of  th observation
 100 
ze

= size of 8.8th observation


= size of 8th+0.8(9th - 8th)  observation
= 157 +0.8(158 - 157)= 157.8

EXAMPLE 3.21

Find Q1, Q3, D4 and P60 from the following data:


X 20 21 22 23 24 25
f 1 3 5 2 2 2

115
Chapter 03 Measures of Central Tendency

Solution
X f Cumulative Frequency
20 1 1
21 3 4
22 5 9
23 2 11
24 2 13
25 2 15
Total 15 --

 1(n  1)   3(n  1) 
Q1  size of  Q3  size of 

om
th observation th observation
 4   4 
 1(15 +1)   3(15 +1) 
 Q1 = size of   th observation  Q3 = size of   th observation

l.c
 4   4 
= size of 4th observation = 21 = size of 12th observation = 24
ai
 4(n  1)   60(n  1) 
D4  size of   th observation P60  size of   th observation
 10   100 
gm

 4(15 +1)   60(15 +1) 


 D4 = size of  th observation  P60 = size of  th observation
 10   100 
s@

= size of 6.4th observation = 22 = size of 9.6th observation = 23


t
ta

EXAMPLE 3.22
es

Find Q1, Q3, D8 and P40 from the following data:


ze

Marks 30-39 40-49 50-59 60-69 70-79 80-89 90-99


No. of Students 8 87 190 304 211 85 20

Solution No. of Class


Marks Students C.F boundaries
30-39 8 8 29.5-39.5
40-49 87 95 39.5-49.5
50-59 190 285 49.5-59.5
60-69 304 589 59.5-69.5
70-79 211 800 69.5-79.5
80-89 85 885 79.5-89.5
90-99 20 905 89.5-99.5
Total 905 -- --

116
Chapter 03 Measures of Central Tendency

Step 1 Step 2
 1 n 
Q1 = Size of   th observation
 4  Now using the following formula:
 1  905 
 Q1 = Size of   th observation h  1 n
 4  
Q1l  C 
 905  f 4 
= Size of   th observation 10  1  905 
 4  Q1 = 49.5 + - 95 

190  4
= Size of 226.25th observation 
10
And since 226.25th observation lies in the class  226.25 - 95 

om
= 49.5 +
(49.5-59.5); hence this is the lower quartile class. 190
= 56.40
Here l = 49.5, f = 190, C = 95, h = 10

l.c
Step 1 Step 2
 3 n 
ai
Q3 = Size of   th observation
 4 
gm
Now using the following formula:
 3  905 
 Q3 = Size of  th observation
 4  h  3n 
Q 3l  C 
s@

 2715  f 4 
= Size of   th observation
 4  10  3  905 
= Size of 678.75th observation Q3 = 69.5 +  - 589 
211  4 
t
ta

And since 678.75th observation lies in the class 10


(69.5-79.5); hence this is the upper quartile class.
= 69.5 +
211
678.75 - 589 
es

Here l = 69.5, f = 211, C = 589, h = 10 = 73.75


ze

Step 1 Step 2
 8n 
D8 = Size of   th observation Now using the following formula:
 10 
 8  905 
 D8 = Size of   th observation h  8n 
 10  D8  l   C
f  10 
 7240 
= Size of   th observation 10  8  905 
 10  D8 = 69.5 +  - 589 
211  10 
= Size of 724th observation
10
And since 724th observation lies in the class
= 69.5 +
211
724 - 589 
(69.5-79.5); hence this is the 8th decile class. = 75.89
Here l = 69.5, f = 211, C = 589, h = 10

117
Chapter 03 Measures of Central Tendency

Step 1 Step 2
 40n 
P40 = Size of   th observation
 100  Now using the following formula:
 40  905 
 P40 = Size of   th observation
 100  h  40n 
P40  l  C
f  100 
= Size of 
36200 
 th observation 10  40  905 
 100  P40 = 59.5 + - 285 

304  100
= Size of 362th observation 
10
 362 - 285 

om
And since 362th observation lies in the class = 59.5 +
(59.5-69.5); hence this is the 40th percentile class. 304
= 62.03
Here l = 59.5, f = 304, C = 285, h = 10

Main Objects of Average


l.c
ai
gm

 The main object (purpose) of the average is to give a bird’s eye view (summary)
of the statistical data. The average removes all the unnecessary details of the data
s@

and gives a concise (to the point or short) picture of the huge data under
investigation.

 Average is also of great use for the purpose of comparison (i.e. the comparison of
t
ta

two or more groups in which the units of the variables are same) and for the
further analysis of the data.
es

 Averages are very useful for computing various other statistical measures such as
dispersion, skewness, kurtosis etc.
ze

Requisites (desirable qualities) of a Good Average

An average will be considered as good if:

 It is mathematically defined.
 It utilizes all the values given in the data.
 It is not much affected by the extreme values.
 It can be calculated in almost all cases.
 It can be used in further statistical analysis of the data.
 It should avoid to give misleading results.

118
Chapter 03 Measures of Central Tendency

Uses of Averages in Different Situations

 A.M is an appropriate average for all the situations where there are no extreme
values in the data.

 G.M is an appropriate average for calculating average percent increase in sales,


population, production, etc. It is one of the best averages for the construction of
index numbers.

 H.M is an appropriate average for calculating the average rate of increase of

om
profits of a firm or finding average speed of a journey or the average price at
which articles are sold.

l.c
 Mode is an appropriate average in case of qualitative data e.g. the opinion of an
average person; he is probably referring to the most frequently expressed opinion
ai
which is the modal opinion.
gm
 Median is an appropriate average in a highly skewed distribution e.g. in the
distribution of wages, incomes etc.
s@

Prove that: (xi  x)2  (xi  A)2


t
ta

Taking (xi  A)2  (xi  x  x  A)2


es

Proof:

  [(xi  x)  (x  A)]2
ze

  [(xi  x)2  (x  A)2  2(xi  x)(x  A)]

 (xi  x)2  (x  A)2  2(xi  x)(x  A)

 (xi  x)2  n(x  A)2  2(x  A)(xi  x)

 (xi  x)2  n(x  A)2   (xi  x)  0 

 (xi  A)2  (xi  x)2  n(x  A)2  0

119
Chapter 03 Measures of Central Tendency

Sharpen your Pencil


MCQ’s

(1) Mean = ______


3Median  Mode
(A) Mean  Mode (B)
2
(C) Median (D) None of these

om
(2) Mean  Mode = ______

l.c
(A) 3( x  Median) (B) x  Median
(C) Median (D) None of these
ai
1
Mean  (3Median  ______ )
gm
(3)
2
(A) 2Mean (B) Mode (C) G.M (D) None of these
s@

(4) Mode  (3median  _____ )

(A) x (B) 2x (C) H.M (D) None of these


t
ta

1
(5) Median  ( Mode  ______ )
3
es

(A) x (B) 2x (C) G.M (D) None of these

 ( xi  x )  _____
ze

(6)

(A)  ( xi  a) (B) 0 (C) 1 (D) None of these

(7)  ( xi  a)  _____
(A) nx  na (B) x a (C)  xi  a (D) None of these

(8) G.M of (X + A) is_____.

(A) G.M of (X) + A (B) G.M of (X ) + nA


(C) G.M of (X) (D) None of these

120
Chapter 03 Measures of Central Tendency

Sharpen your Pencil


MCQ’s

(9) If X = -2, -1, 20, 40 then_____ cannot be calculated.

(A) A.M (B) G.M (C) H.M (D) None of these

om
(10) If X = 22, 21, 0, 20 then_____ cannot be calculated.

(A) A.M (B) G.M (C) H.M (D) None of these

l.c
(11) For two positive integers G.M=_____ ai
(A) ( A.M )( H .M ) (B) (A.M)(H.M)
gm
(C) A.M (D) None of these

(12) G.M of ―a‖ and ―b‖ is_____


s@

ab ab
(A) (B) (ab)1/ 2 (C) (D) None of these
2 3
2ab
t

(13) H.M of "a" and "b" is


ta

_____
es

(A) ab (B) a b (C) ab (D) None of these


ze

(14) G 2 =_____

(A) AxH (B) A+H (C) (AxH)/2 (D) None of these

(15) If G.M = 60 and A.M = 110.2 then H.M is _____

(A) 28 (B) 38 (C) 32.7 (D) None of these

(16) For a set of data A.M _____G.M _____H.M.

(A) > (B) < (C) = (D) None of these

(17) If X  3.5 and n = 10 then,  X  ____


(A) 0.35 (B) 35 (C) 17.5 (D) None of these

121
Chapter 03 Measures of Central Tendency

Sharpen your Pencil


MCQ’s

(18) Median  Q2  D5  _____

(A) P2 (B) D10 (C) P50 (D) None of these

om
(19) If y  a  bx then y =_____

a  bx

l.c
(A) (B) bx (C) x (D) None of these

(20) For symmetric distribution mean, median and mode are _____
ai
(A) Different (B) same (C) both A & B (D) None of these
gm
t s@
ta
es
ze

122
Chapter 03 Measures of Central Tendency

Short Questions
ExeRciSe

Q.3.01. The A.M of two observation is 127.5 and their G.M is 60 find their H.M?

Q.3.02. What is mode? From the following data find out mode: 5, 6, 3, 4, 5, 9, 2, 7, 5.

om
Q.3.03. A group of 20 students obtained a mean score of 70 marks on an examination. A
second group of 30 students obtained a mean score of 80 marks on the same
examination. Find the mean score for the 50 students of the class?

l.c
Q.3.04. Calculate the A.M of the data: 25, 27, 28, 29, 30, 32, 34, and 36
ai
Q.3.05. Show that G.M lies between A.M and H.M of the two values 16 and 25?
gm

Q.3.06. Find Q1 and Q3 : 9, 9, 10, 12, 15, 15, 13, 8, 4, 7, and 8


s@

Q.3.07. Find mode in each case:

(i) 10, 6, 8, 0, 3, 2,
(ii) 120, 5, 4, 5, 2, 1, 0, 5, 4, 7, 8, 4
t

(iii) 1, 3, 3, 0, 5, 0, 9, 0, 10, 0
ta

Q.3.08. State the empirical relation between mean, median and mode?
es

Q.3.09. If  f  20 ,  fD  200 and D  Xi  20 the find mean?


ze

Q.3.10. If X  87 and Median  90 using empirical relation to find mode?

xi  15
Q.3.11. If ui 
10
,  fu  90 and n  100 find A.M?
Q.3.12. Given that

n1  10 n2  15 n3  20
x1  2.5 x2  4.9 x3  5.1

Find combined mean?

Q.3.13. What are the disadvantages of A.M?

123
Chapter 03 Measures of Central Tendency

Short Questions
ExeRciSe

Q.3.14. Describe partitioned values specially the use of median as a partitioned value.

Q.3.15. In a class of 10 students 4 students failed in a test. The marks of 6 students who
passed were 4, 6, 7, 8, 8, and 9. What is median of all the 10 students?

om
Q.3.16. Calculate A.M, G.M and H.M. Show that A.M > G.M > H.M for the values 4, 9.

l.c
Q.3.17. Prove that  (xi  x)2   (xi  A)2 .

Q.3.18.
ai
Calculate Harmonic mean of 5, 2, 10, 4
gm

Q.3.19. What are the advantages of Median?

Q.3.20. Write down the properties of A.M.


s@

Q.3.21. Find A.M, G.M and H.M from the following data if possible, if not possible give
reason, -1, 2, 3, 100, 89, 31, 0, 49, 50, 70
t
ta

Q.3.22. The mean of 20 observations is 10 and median is 15, if 5 is added to each


observation. Find new mean and median.
es

Q.3.23. Find the geometric mean of the series 1,3,9,...,3n .


ze

Q.3.24. H.M, A.M and G.M of a set of 5 observations are 10.2, 16 and 14 respectively,
Comments.

Q.3.25. Find Mean of 1, 2, 3… 20

Q.3.26. Define Arithmetic mean and Geometric mean

Q.3.27. Find Q3 and P25 from the given data:

Class marks 15 20 25 30 35 40
Frequency 1 1 2 3 2 1

Q.3.28. Define mean, median, mode and geometric mean.

124
Chapter 03 Measures of Central Tendency

Long Questions
ExeRciSe

Q.3.01. Calculate the mean by direct method and step deviation method from the
following data:

Class marks 6 7 8 9 10 11 12

om
Frequency 3 6 9 13 8 5 3

Q.3.02. Calculate the mean by direct method, geometric mean and harmonic mean from

l.c
the following data:

Hourly wages 4 5 6 7 8 9 10 11 12 13 14 15
ai
No. of employees 3 18 23 42 62 78 118 200 198 82 14 5
gm

Q.3.03. Calculate the mean, median and mode from the following data:

Daily
s@

50- 59.9 60- 69.9 70- 79.9 80-89.9 90-99.9 100- 109.9 110-119.9 120-129.9
wages
No. of
7 9 10 15 13 12 6 3
employees
t
ta

Q.3.04. Calculate the mean, median and mode from the following data:
es

Score in Quiz 6 7 8 9 10 11 12
No. of Students 3 6 9 13 8 5 3
ze

Q.3.05. Calculate the mean, median, mode, H.M and G.M from the following data:

Marks 0-10 10-20 20-30 30-40 40-50 50-60


f 8 11 19 16 10 5

Q.3.06. Calculate Median, Mode and Quartiles from the following data:

classes 20- 24 25- 29 30- 34 35-39 40-44


No. of Students 4 8 11 9 2

Q.3.07. Calculate D5 , P50 and P75 from the following data:

classes 9.3- 9.7 9.8- 10.2 10.3- 11.2 11.3-11.7 11.8-12.2


f 2 5 12 17 10

125
Chapter 03 Measures of Central Tendency

Long Questions
ExeRciSe

Q.3.08. Who is better on the average? Use the following data:

Sales representative A 4 7 5 9
Sales representative B 2 12 4 8

om
Q.3.09. Given the following data: 32, 35, 36, 37, 39, 41, and 43:
Calculate A.M, G.M and H.M and show that A.M > G.M > H.M

l.c
Q.3.10. Calculate D8 , P2 and Q3 from the following data: ai
Score in Quiz 6 7 8 9 10 11 12
gm
No. of Students 3 6 9 13 8 5 5

Q.3.11. Find G.M and H.M for the data and show that G.M > H.M.
s@

classes 3- 7 8- 10 11- 13 14-16 17-20


No. of Students 14 24 38 20 4
t
ta

Q.3.12. Find Mean, Median, Q1, Q3 and Mode from the data: 137, 146, 145, 181, 132,
175, 160, 190, 164, 180, 176, 130, 125, 140 and 150.
es

Q.3.13. The Reciprocals of 11 values of X are given below, find A.M, G.M and H.M of X
ze

0.015, 0.0454, 0.04, 0.0333, 0.0285, 0.0213, 0.02, 0.0182, 0.0151, 0.0143, 0.0232

Q.3.14. By taking x = -2, -1, 0, 1, 2, 3 prove or disprove the following relations:

(i) (xi  A.M )  0


(ii) (xi  2 )  A.M
 x
2

(iii)  (xi  A.M )   x


2 2

n
Q.3.15. Find Mean, Median and Mode?

classes 0- 10 10- 20 20- 30 30-40 40-50


Frequency 3 8 17 20 21

126
CHAPTER 04
Measures of Dispersion,
Moments, Skewness and
Kurtosis

om
Chapter Contents

l.c
ai
Y
gm
ou should read this chapter if you need to learn about:

 Dispersion: (P128)
s@

 Measures of Dispersion: (P129-P130)


 Range and its Coefficient: (P130-P132)
 Quartile Deviation and its Coefficient: (P133-P137)
 Mean Deviation and its Coefficient: (P138-P142)
t
ta

 Standard Deviation and its Coefficient: (P143-P150)


 Properties of Variance and S.D: (P151)
es

 Moments : (P151-P156)
 Symmetrical Distribution: (P157)
Skewness and Coefficient of Skewness: (P158-P163)
ze


 Kurtosis: (P164-P166)
 Exercise: (P167-P172)

127
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

 Sometimes when two or more different data sets are to be compared using
measure of central tendency or averages, we get the same result.

Consider the runs scored by two batsmen in their last ten matches as follows:

Batsman A: 30, 91, 0, 64, 42, 80, 30, 5, 117, 71


Batsman B: 53, 46, 48, 50, 53, 53, 58, 60, 57, 52

Clearly, mean of the runs scored by both the batsmen A and B is same i.e. 53

Can we say that the performance of two players is same? Clearly No, because the

om
variability in the scores of batsman A is from 0 to 117, whereas, the variability of
the runs scored by batsman B is from 46 to 60.

l.c
Let us now plot the above scores as dots on a number line. We find the following
diagrams: ai
Batsman A
gm
0 10 20 30 40 50 60 70 80 90 100 110 120
s@

Batsman B
0 10 20 30 40 50 60 70 80 90 100 110 120

We can see that the dots corresponding to batsman B are close to each other and is
t

clustering around the measure of central tendency (mean), while those


ta

corresponding to batsman A are scattered or more spread out. Thus, the measures
of central tendency are not sufficient to give complete information about a given
es

data. In such a situation the comparison becomes very difficult. We therefore,


need some additional information for comparison, concerning with, how the data
ze

is dispersed about (more spread out) the average. This can be done by measuring
the dispersion. Like „measures of central tendency‟ we want to have a single
number to describe variability. This single number is called a ‘measure of
dispersion’.

Dispersion
No Dispersion
“The variability (spread) that exists between the
values of a data is called dispersion”.
OR
“The extent to which the observations are
spread around an average is called dispersion
Dispersion
or the scatter”.

128
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Measures of Dispersion

As we know that, there are quite a few ways of measuring the central tendency of a data set i.e. A.M,
G.M, H.M, Mode and Median. Similarly, we have different ways of measuring and comparing the
dispersion of the distribution(s). There are two important types of measures of dispersion.

Types of Measures
of Dispersion

om
Absolute Measure of Relative Measure of
Dispersion Dispersion

Range
l.c
Coefficient of
ai
Range
gm

Quartile Coefficient of
Deviation Q.D
s@

Mean Coefficient of
Deviation M.D
t
ta
es

Standard Coefficient of
Deviation Variation
ze

Absolute Measure of Dispersion

“An absolute measure of dispersion measures the variability in terms of the same units of the
data” e.g. if the units of the data are Rs, meters, kg, etc. The units of the measures of dispersion will
also be Rs, meters, kg, etc.

The common absolute measures of dispersion are:

 Range
 Quartile Deviation or Semi Inter-Quartile Range
 Average Deviation or Mean Deviation
 Standard Deviation

129
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Relative Measure of Dispersion

“A relative measure of dispersion compares the variability of two or more data that are
independent of the units of measurements”

In other word “A relative measure of dispersion, expresses the absolute measure of dispersion
relative to the relevant average and multiplied by 100 many times” i.e.

Absolute Dispersion
Relative Dispersion 

om
Average
Absolute Dispersion
Relative Dispersion   100
Average

l.c
This is a pure number and independent of the units in which the data has been expressed. It is used for
ai
the purpose to compare the dispersion of a data with the dispersion of another data.
gm

The common relative measures of dispersion are:

 Coefficient of Dispersion or Coefficient of Range


s@

 Coefficient of Quartile Deviation


 Coefficient of Mean Deviation
 Coefficient of Standard Deviation or Coefficient of Variation (C.V)
t
ta
es

The major difference b/w Absolute and Relative Measures of Dispersion is that the Absolute
measure of dispersion measures only the variability of the data, further it has the unit of
measurement; on the other hand Relative measure of dispersion is used to compare the
ze

.
variation of two or more distributions, further it is unit less.

Range

 Ungrouped Data and for Discrete Grouped Data

“The difference between the largest and the smallest value in a set of data is called range” i.e.

R = Xm – X0

Where R is the range, Xm is the largest value and X0 is the smallest value.

130
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

 Continuous Grouped Data

“In continuous grouped data the difference between the upper class
boundary of the highest class and lower class boundary of the lowest
class is called range”.

Coefficient of Range or Coefficient of Dispersion

The coefficient of range or coefficient of dispersion is a relative measure of In 1892, Pearson

om
dispersion and is given by: introduced
statistical concept of
Xm - X0
Coefficient of Range= “range”
Xm + X0

l.c
EXAMPLE 4.01
ai
gm

Find Range and the Coefficient of Range from the following data:

51, 50, 40, 90, 75, 60, 44, 30, 23, 20 (ungrouped data)
s@

Solution
Here Xm = 90; X0 = 20
t

R = Xm - X0 = 90 – 20 = 70
ta
es

Xm - X0 90 - 20
Coefficient of Range = = = 0.64
Xm + X0 90 + 20
ze

EXAMPLE 4.02

Find Range and the Coefficient of Range from the following data: (Discrete Grouped data)

Marks (X) 13 14 15 16 17
No. of Students (f) 2 5 13 7 3

Solution
Here Xm = 17; X0 = 13
R = Xm - X0 = 17 – 13 = 4

Xm - X0 17 - 13
Coefficient of Range = = = 0.13
Xm + X0 17 +13

131
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

EXAMPLE 4.03

Find Range and the Coefficient of Range from the following data: (Continuous Grouped data)

Weight 11- 20 21- 30 31- 40 41-50 51-60


f 1 2 3 2 1

Solution

om
Class Here Xm = 60.5; X0 = 10.5
Weight f
Boundaries
11- 20 1 10.5- 20.5 R = Xm - X0 = 60.5 – 10.5 = 50

l.c
21- 30 2 20.5- 30.5
31- 40 3 30.5- 40.5
Xm - X0
41-50 2 40.5-50.5
ai
Coefficient of Range =
51-60 1 50.5-60.5 Xm + X0
gm
Total 9 -- 60.5 - 10.5
= = 0.70
60.5 +10.5
s@

Merits and Demerits of Range


t
ta

Merits
es

 It is the simplest measure of dispersion.


 It gives a quick picture of the variability.
ze

Demerits

 It does not based on each and every value of the data.


 It cannot be computed in case of open-end distributions
 It is affected by extreme values.
 It is not capable of further algebraic treatment.
 It is affected by fluctuations of sampling.
 It is unsatisfactory for statistical inference.

132
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Test Yourself

Find the Range and Coefficient of Range from the following data:

1) 1, 3, 5, 7, 9, 11, 13, 15

X 20 25 30 35 40
2) f 2 4 9 3 1

om
Weight 21- 30 31- 40 41- 50 51-60 61-70
3)
f 1 3 5 4 2

l.c
ai
Quartile Deviation or Semi-inter-quartile Range
gm

“Half of the difference between the upper quartile and lower


quartile is called the semi-inter quartile range or quartile
s@

deviation” i.e.

Q3  Q1
t

The difference between the


ta

Quartile deviation =
2 upper quartile and lower
es

quartile is called inter


Coefficient of Quartile Deviation quartile range i.e.
ze

Inter quartile range = Q3 – Q1


The coefficient of quartile deviation is a relative measure of dispersion
and is given by:

Q3  Q1
Coefficient of Q.D=
Q3  Q1

EXAMPLE 4.04

Find Q.D and the Coefficient of Q.D from the following data:

50,51,52,53,54,55,56,57,58,59,60; (n = 11)

133
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Solution  1(n  1) 
Q1  size of  th observation
 4 
 1(11  1) 
Q1  size of   th observation
 4 
= size of 3th observation = 52

 3(n  1) 
Q3  size of  th observation
 4 
 3(11  1) 
Q3  size of 

om
 th observation
 4 
= size of 9th observation = 58

l.c
Here Q1 = 52; Q3= 58 ai
Q3  Q1 58 - 52
Q.D = = =3
gm

2 2

Q3 - Q1 58 - 52
s@

Coefficient of Q.D = = = 0.0273


Q3 +Q1 58 + 52
t
ta

EXAMPLE 4.05
es

Find Q.D and the Coefficient of Q.D from the following data

20, 21, 22, 23, 24, 25, 26, 27; (ungrouped data) (n = 8)
ze

Solution
 1(n  1) 
Q1  size of  th observation
 4 
 1(8  1) 
Q1  size of  th observation
 4 
= size of 2.25th observation
= size of 2nd +0.25(3rd - 2nd)  observation
= 21+0.25(22 - 21)= 21.25

134
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

 3(n  1) 
Q3  size of  th observation
 4 
 3(8  1) 
Q3  size of  th observation
 4 
= size of 6.75th observation
= size of 6th+0.75(7th - 6th)  observation
= 25+0.75(26 - 25)= 25.75

Here Q1 = 21.25; Q3= 25.75

om
Q3  Q1 25.75 - 21.25
Q.D = = = 2.25

l.c
2 2
ai
Q3 - Q1 25.75 - 21.25
Coefficient of Q.D = = = 0.0957
Q3 +Q1 25.75 + 21.25
gm

EXAMPLE 4.06
s@

Find Q.D and the Coefficient of Q.D from the following data: (Discrete grouped data)

X 20 21 22 23 24 25
t

f 1 3 5 2 2 2
ta
es

Solution X f Cumulative Frequency


20 1 1
ze

21 3 4
 1(n  1) 
Q1  size of  th observation 22 5 9
 4  23 2 11
 1(15 +1)  24 2 13
 Q1 = size of   th observation 25 2 15
 4 
Total 15 --
= size of 4th observation = 21
 3(n  1)  Q3  Q1
Q3  size of  th observation 24 - 21
 4  Q.D =
2
=
2
= 1.5
 3(15 +1)  Q3 - Q1
 Q3 = size of   th observation Coefficient of Q.D =
 4  Q3 +Q1
= size of 12th observation = 24 24 - 21
=
24 + 21
Here Q1 = 21; Q3= 24 = 0.0667

135
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

EXAMPLE 4.07

Find Q.D and the Coefficient of Q.D from the following data: (Continuous grouped data)

Marks 30-39 40-49 50-59 60-69 70-79 80-89 90-99


f 8 87 190 304 211 85 20

Solution No. of Class


Marks C.F
Students boundaries

om
30-39 8 8 29.5-39.5
40-49 87 95 39.5-49.5
50-59 190 285 49.5-59.5
60-69 304 589 59.5-69.5

l.c
70-79 211 800 69.5-79.5
80-89 85 885 ai 79.5-89.5
90-99 20 905 89.5-99.5
Total 905 -- --
gm

Step 1: Step 1:
s@

 1 n   3 n 
Q1 = Size of   th observation Q3 = Size of   th observation
 4   4 
t

 905   3  905 
ta

= Size of   th observation = Size of  th observation


 4   4 
es

= Size of 226.25th observation = Size of 678.75th observation


ze

And since 226.25th observation lies in the And since 678.75th observation lies in the
class (49.5-59.5); hence this is the lower class (59.5-69.5); hence this is the median
quartile class. quartile class.

Here l = 49.5, f = 190, C = 95, h = 10 Here l = 69.5, f = 211, C = 589, h = 10

Step 2: Step 2:

h  1 n  h  3  905 
Q1 = l +  -C Q3 = l +  -C
f  4  f  4 
10 10
= 49.5 +
190
 226.25 - 95  = 69.5 + 678.75 - 589 
211
= 56.40 = 73.75

136
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Here Q1 = 56.40; Q3= 73.75

Q3  Q1 73.75 - 56.40
Q.D = = = 8.6750
2 2
Q3 - Q1 73.75 - 56.40
Coefficient of Q.D = = = 0.1333
Q3 +Q1 73.75 + 56.40

Merits and Demerits of Quartile Deviation

om
Merits


l.c
It is simple to understand and easy to calculate.

ai
It is a good measure for open-end distributions.
gm

Demerits

 It does not based on each and every value of the data.


s@

 It is not capable of further algebraic treatment.


 It is affected by fluctuations of sampling.
 It is unsatisfactory for statistical inference.
t
ta
es

Test Yourself
ze

Find the Q.D and Coefficient of Q.D from the following data:

1) 1, 3, 5, 7, 9, 11, 13, 15, 20, 19, 21

2) 30, 33, 23, 22, 34, 40, 41, 28, 35, 39

X 20 25 30 35 40
3) f 2 4 9 3 1

Weight 21- 30 31- 40 41- 50 51-60 61-70


4)
f 1 3 5 4 2

137
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Mean Absolute Deviation or Mean Deviation (Average Deviation)

“The arithmetic mean of the absolute deviations from an average (mean, median, etc.) is called
mean deviation or average deviation”

Ungrouped Data Grouped Data


 xi  x  f xi - x
M.D from Mean M.D  M.D =
n n
 xi  Med  f xi - Med
M.D from Median M.D  M.D =

om
n n

Coefficient of Mean Deviation

l.c
The coefficient of mean deviation is a relative measure of dispersion and is given by:
ai
M.D(from mean)
gm
Coefficient of M.D (from mean) 
Mean

M.D(from median)
s@

Coefficient of M.D (from median) 


Median

EXAMPLE 4.08
t
ta

Find M.D and the Coefficient of M.D from mean.


es

Using the data: 50,51,52,53,54,55,56,57,58,59,60; (ungrouped data)


ze

Solution
x =
xi 605 X Xi - X Xi - X
Here = 55
n 11 50 -5 5
51 -4 4
 xi - x 30 52 -3 3
M.D = = = 2.7273
53 -2 2
n 11
54 -1 1
M.D 55 0 0
Coefficient of M.D = 56 1 1
X 57 2 2
2.7273 58 3 3
=
55 59 4 4
= 0.0496 60 5 5
605 -- 30

138
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

EXAMPLE 4.09
Find M.D and the Coefficient of M.D from median.

Using the data: 50,51,52,53,54,55,56,57,58,59,61; (ungrouped data)

Solution  n +1 
Median = size of   th observation X Xi - Med Xi - Med
 2 
50 -5 5
 11+1  51 -4 4
= size of   th observation
 2 

om
52 -3 3
= size of 6th observation = 55 53 -2 2
54 -1 1
55 0 0

l.c
56 1 1
 xi - median 31 57 2 2
M.D = = = 2.8182
n 11
ai 58 3 3
M.D 59 4 4
gm
Coefficient of M.D = 61 5 6
Median
2.8182 -- -- 31
=
55
s@

= 0.0512

EXAMPLE 4.10
t
ta

Find M.D and the Coefficient of M.D from mean. (Discrete grouped data)
es

X 20 21 22 23 24 25
f 1 3 5 2 2 2
ze

Solution
x
fxi 337
Here = = 22.47
n 15 X f fX Xi - X f Xi - X
20 1 20 2.47 2.47
 f xi - x 18.41 21 3 63 1.47 4.41
M.D = = = 1.23
n 15 22 5 110 0.47 2.35
23 2 46 0.53 1.06
M.D 24 2 48 1.53 3.06
Coefficient of M.D = 25 2 50 2.53 5.06
X
1.23 Total 15 337 -- 18.41
=
22.47
= 0.05

139
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

EXAMPLE 4.11

Find M.D and the Coefficient of M.D from median. (Discrete grouped data)
X 20 21 22 23 24 25
f 1 3 5 2 2 2

Solution  n +1 
Median = size of   th observation X f C.F f Xi - Med
 2 
20 1 1 2

om
 15 +1  21 3 4 3
= size of   th observation
 2  22 5 9 0
23 2 11 2
= size of 8th observation = 22

l.c
24 2 13 4
25 2 15 6
 f xi - median 17 Total 15 -- 17
M.D = = = 1.13
ai
n 15
gm

M.D 1.13
Coefficient of M.D = =
Median 22
s@

= 0.05

EXAMPLE 4.12
t
ta

Find M.D and the Coefficient of M.D: from mean (Continuous grouped data)
es

Marks 30-39 40-49 50-59 60-69 70-79 80-89 90-99


No. of
8 87 190 304 211 85 20
Students
ze

Solution  fxi = 58902.5 = 65.09


Here x  Marks X f fX f Xi - X
n 905
30-39 34.5 8 276 244.69
 f xi - x 8449.88 40-49 44.5 87 3871.5 1790.95
M.D = = = 9.34 50-59 54.5 190 10355 2011.27
n 905
M.D 60-69 64.5 304 19608 178.03
Coefficient of M.D = 70-79 74.5 211 15719.5 1986.43
X
80-89 84.5 85 7182.5 1650.22
9.34
= 90-99 94.5 20 1890 588.29
65.09
Total -- 905 58902.5 8449.88
= 0.14

140
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

EXAMPLE 4.13
Find M.D and the Coefficient of M.D from median. (Continuous grouped data)

Marks 30-39 40-49 50-59 60-69 70-79 80-89 90-99


No. of
8 87 190 304 211 85 20
Students

Solution

om
Step 1: Step 2:

n h n 

l.c
Median = Size of   th observation Median = l +  -C
2 f 2 
 905  10
 452.5 - 285 
= Size of 
ai = 59.5 +
 th observation 304
 2 
gm

= Size of 452.5th observation = 65

And since 452.5th observation lies in the class


s@

(59.5-69.5); hence this is the median class.

Here l = 59.5, f = 304, C = 285, h = 10


t
ta

No. of Students Class f Xi - Median


Marks X C.F
es

(f) boundaries
30-39 34.5 8 8 29.5-39.5 244
40-49 44.5 87 95 39.5-49.5 1783.5
ze

50-59 54.5 190 285 49.5-59.5 1995


60-69 64.5 304 589 59.5-69.5 152
70-79 74.5 211 800 69.5-79.5 2004.5
80-89 84.5 85 885 79.5-89.5 1657.5
90-99 94.5 20 905 89.5-99.5 590
Total -- 905 -- -- 8426.5

 f xi - median 8426.5
M.D = = = 9.31
n 905
M.D 9.31
Coefficient of M.D = = = 0.14
Median 65 Hi Friends!!!

141
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Merits and Demerits of Mean Deviation

Merits

 It is simple to understand.
 It is based on each and every value of the data.

Demerits

om
It is not a good measure for open-end distributions.
 It is not capable of further algebraic treatment.
 It is difficult to handle it mathematically; because there is an element of
artificiality i.e. the deviations are not taken with their proper signs.

l.c
 It is unsatisfactory for statistical inference.
ai
gm

Test Yourself
s@

a) Find the M.D from Mean and Coefficient of M.D from the following data:

1) 1, 3, 5, 7, 9, 11, 13, 15, 20, 19, 21


t
ta

X 20 25 30 35 40
2) f 2 4 9 3 1
es

Weight 21- 30 31- 40 41- 50 51-60 61-70


3)
ze

f 1 3 5 4 2

b) Find the M.D from Median and Coefficient of M.D from the following data:

1) 1, 3, 5, 7, 9, 11, 13, 15, 20, 19, 21

2) 30, 33, 23, 22, 34, 40, 41, 28, 35, 39

X 20 25 30 35 40
3) f 2 4 9 3 1

Weight 21- 30 31- 40 41- 50 51-60 61-70


4)
f 1 3 5 4 2

142
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Standard Deviation

“The positive square root of variance


is called as Standard deviation”.
OR
 The arithmetic mean of
“The positive square root of the
the squared deviations of
arithmetic mean of the squared
the values measured
deviations from the mean is called from the mean is called
the standard deviation” variance.

om
 For a set of data:
Ungrouped Data Grouped Data Range > S.D > M.D > Q.D

l.c
(xi  μ) f(xi  μ)
   
2 2
S.D for
 The measures of
Population N N dispersion are always
ai
(xi  x)2 f(xi  x)2
S  S  positive.
S.D for
gm
Sample n n

Methods of Calculating Variance and Standard Deviation


s@

Ungrouped Data
Methods
Variance Standard Deviation
 xi 2      xi 2    
2 2
xi xi
t

Direct Method S = 2
S
ta

n  n  n  n 
es

 D2    
2
 D2    
2
Short cut D D
S  2
S
Method n  n  n  n 
ze

  ui 2   ui 2 
 
2
Step-deviation  ui 2
 ui 
S =h    
2 2
S=h 
Method  n  n   n  n 
Grouped Data
Methods
Variance Standard Deviation
2
 fxi 2    fxi   fxi 2    fxi 
2

Direct Method S  2
 n  S  n 
n   n  
2
 fD2    fD   fD 2   fD 
2
Short cut
Method
S  2
 n  S  
n   n  n 

  fui 2   fui 2 
S=h 
fui 2   fui 
2
Step-deviation S 2  h2      
Method  n  n   n  n 

143
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Coefficient of Standard Deviation OR


Coefficient of Variation

The coefficient of standard deviation is a relative measure of dispersion


and is given by:
Standard Deviation
Coefficient of S. D 
Mean
The coefficient of standard deviation is also called the coefficient of
In 1893, Pearson

om
variation, denoted by C.V and is given by:
introduced
statistical concept of
Standard Deviation
C.V   100 “S.D”. He also

l.c
Mean introduced the
coefficient of
Coefficient of Variation was introduced by Karl Pearson. It is used to
ai
compare the variation or to compare the performance of two sets of data. A
variation for the
comparison between
large value of C.V indicates that there is greater variability and vice versa.
gm
two different
Similarly, the smaller the C.V the more consistent is the performance and
groups.
vice versa.
s@

EXAMPLE 4.14
t
ta

Find Variance and Standard deviation from the following data: (ungrouped data)
es

2, 4, 6, 8, 10
ze

Solution

Direct Method

X X2
 xi 2 -    =
2 2
xi 220  30 
Variance= S 2 = -
5  5 
2 4 =8
4 16 n  n 
6 36
8 64  xi 2   xi 
2 2
220  30 
-   -
5  5 
10 100 Standard Deviation= S = =2.8
30 220
n  n 

144
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Short-cut Method Let A = 4

X D = Xi - A D2
 D2 -    =
2 2

-   = 8
2 -2 4 2 D 60 10
S =
4 0 0 n  n  5 5
6 2 4
 D2 -    =
2 2
8 4 16 D 60  10 
-
5  5 
S= =2.8
10 6 36 n  n 
-- 10 60

om
Step-deviation Method Here h = 2 and let A = 8

u=
Xi - A
u2
l.c
X
ai
  ui 2   ui 2   2

2 15  -5 
h
2 -3 9 S =h 
2 2
   = 2  -    =8
 n  n    5  5  
gm

4 -2 4
6 -1 1
S=h 
ui 2   ui 
2 2
15  -5 
  =2 -
5  5 
8 0 0 = 2.8
n  n 
s@

10 1 1
30 -5 15
t
ta

EXAMPLE 4.15
es

Find Variance and Standard deviation from the following data: (Discrete grouped data)

X 10 15 20 25 30
ze

f 1 2 3 2 1

Solution

Direct Method

X f fX fX2 2
 fxi 2 -   fxi  = 3900 -  180  = 33.3
2
10 1 10 100 2
 n  9  9 
S =
15 2 30 450 n  
20 3 60 1200
 fxi 2 -   fxi   3900 -  180  =5.7
2 2
25 2 50 1250
 n  9  9 
S=
30 1 30 900 n  
Total 9 180 3900

145
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Short-cut Method Here A = 20

X f D = Xi - A fD fD2 2
 fD2 -   fD  = 300 -  0  = 33.3
2
10 1 -10 -10 100 S2 =
15 2 -5 -10 50 n  n 
  9  9 
20 3 0 0 0
 fD 2 -   fD  = 300 -  0  =5.7
2 2
25 2 5 10 50
 n  9  9 
S=
30 1 10 10 100 n  
Total 9 -- 0 300

om
Step-deviation Method Here A = 20, h = 5

l.c
Xi - A
u= fu2
X f h fu
ai
   
S 2 = h2  
fui 2   fui 
2 2
10 1 -2 -2 4 2 12  0 
  = 5  -    =33.3
gm
15 2 -1 -2 2 
 n  n    9  9  
20 3 0 0 0
S=h 
fui 2   fui 
2 2
25 2 1 2 2 12  0 
 =5 -
 9  9 
= 5.7
s@

30 1 2 2 4 n  n 
Total 9 -- 0 12
t

EXAMPLE 4.16
ta
es

Find Variance and Standard deviation from the following data: (Continuous grouped data)
Weight(kg) 11- 20 21- 30 31- 40 41-50 51-60
ze

f 1 2 3 2 1

Solution
2
 fxi 2 -   fxi 
2
Direct Method S =  n 
n  
2
Weight(kg) f X fX fX2 12542.25  319.5 
= -  = 133.3kg2
11- 20 1 15.5 15.5 240.25 9  9 
21- 30 2 25.5 51.0 1300.5
 fxi 2 -   fxi 
2
31- 40 3 35.5 106.5 3780.75 S=  n 
41-50 2 45.5 91.0 4140.5 n  
51-60 1 55.5 55.5 3080.25 2
Total 9 -- 319.5 12542.25 12542.25  319.5 
 -  =11.5kg
9  9 

146
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Short-cut Method Here A = 35.5

Weight f X D = Xi - A fD fD2
11- 20 1 15.5 -20 -20 400
21- 30 2 25.5 -10 -20 200  To compute the
31- 40 3 35.5 0 0 0 Variance or S.D,
41-50 2 45.5 10 20 200 round-off it one more
51-60 1 55.5 20 20 400 decimal place than the
Total 9 -- -- 0 1200
original data values.
 The unit of the S.D is

om
2
 fD2 -  =
2
 fD  1200  0  the same as that for
S2 = -   = 133.3kg2
n  n  9 9 the raw data, so it is
preferable to use the

l.c
 fD 2 -   fD  = 1200 -  0  =11.5kg
2 2

 n  9  9  S.D instead of the


S=
n  
Variance.
ai
Step-deviation Method Here A = 35.5, h = 10
gm

 
S 2 = h2  
fui 2   fui 
2

  
Weight f X u=
Xi - A
fu fu2  n  n  
s@

h

2 12 0 
2
11- 20 1 15.5 -2 -2 4 = 10  -    =133.3kg2
21- 30 2 25.5 -1 -2 2  9  9  
31- 40 3 35.5 0 0 0
t

S=h 
fui 2   fui 
2
ta

41-50 2 45.5 1 2 2  
51-60 1 55.5 2 2 4 n  n 
es

Total 9 -- -- 0 12 2
12  0 
= 10 -
9  9 
= 11.5kg
ze

Test Yourself

Find the Variance and S.D from the following data:

1) 1, 3, 5, 7, 9, 11, 13, 15, 20, 19, 21


It will be incorrect if we
X 20 25 30 35 40 get a negative answer in
2) f 2 4 9 3 1 calculating measures of
dispersion.
Weight 21- 30 31- 40 41- 50 51-60 61-70
3)
f 1 3 5 4 2

147
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

EXAMPLE 4.17
The number of runs scored by two cricketers A and B during a test series of 5 test matches is
shown below for each of the 10 innings. Using coefficient of variation, find who will be more
consistent player?
A 5 26 97 76 112 89 6 108 24 16
B 51 47 36 60 58 39 44 42 71 50

Solution
Cricketer A: Cricketer B:

om
S.D (x) S.D (y)
C.V(x)   100 C.V(y)   100
x y
x=  y= 
xi yi

l.c
n n
 xi 2 -   xi 
2

S .D( x) 
ai  yi 2   yi 
2

S .D( y ) 
n  n  -
n  n 
gm

x x2 y y2
5 25 51 2601
s@

26 676 47 2209
97 9409 36 1296
76 5776 60 3600
t

112 12544 58 3364


ta

89 7921 39 1521
Hi Friends!!! 6 36 44 1936
es

108 11664 42 1764


24 576 71 5041
16 256 50 2500
ze

559 48883 498 25832

Cricketer A: Cricketer B:
559 498
x= = 55.9 y= = 49.8
10 10
2 2
48883  559  25832  498 
S .D( x)  - S .D( y)  -
10  10  10  10 
= 41.993 = 10.156

41.993 10.156
C.V(x)   100 = 75.12% C.V(y)   100 = 20.39%
55.9 49.8

Since the C.V for player B is smaller than C.V for player A, therefore player B is more
consistent.

148
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

EXAMPLE 4.18
Goals scored by two teams A and B in a football season were as follows:

No. of goals Number of Matches


scored in a (frequencies)
match (xi) A B
0 27 17
1 9 9
2 8 6
3 5 5

om
4 4 3

Using coefficient of variation, find which team may be considered more consistent?

Solution Team A:
l.c
Team B:
S.D
ai S.D
C.V   100 C.V   100
x x
gm

x=  x= 
fAxi fBxi
nA nB
s@

 fAxi 2 -   fAxi 
2
 fBxi 2 -   fBxi 
2

S .D   nA  S .D   nB 
nA   nB  
t
ta

xi fA fB fAxi fAx2 fBxi fBx2


0 27 17 0 0 0 0
es

1 9 9 9 9 9 9
2 8 6 16 32 12 24
ze

3 5 5 15 45 15 45
4 4 3 16 64 12 48
Total 53 40 56 150 48 126

Team A: Team B:
56 48
x = = 1.06 x= = 1.20
53 40
2 2
150  56  126  48 
S .D  - S .D  -
53  53  40  40 
= 1.308 = 1.308

1.308 1.308
C.V   100 = 123.4% C.V   100 = 109.0%
1.06 1.20

Since the C.V for Team B is smaller than C.V for Team A, therefore team B is more consistent.

149
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Test Yourself

1) The number of runs scored by two cricketers A and B during a test series of 5 test
matches is shown below for each of the 10 innings. Using coefficient of variation, find
who will be more consistent player?

A 15 34 27 55 0 0 6 4 123 34
B 5 67 36 55 89 33 37 89 88 111

om
2) Goals scored by two teams A and B in a football season were as follows:

No. of goals Number of Matches

l.c
scored in a (frequencies)
match (xi) A B
0 16 20
1 8
ai 7
2 4 8
gm

3 6 2
4 3 1
s@

Using coefficient of variation, find which team may be considered more consistent?
t
ta

Merits and Demerits of Standard Deviation


es

Merits
ze

 It is simple to understand.
 It is clearly defined by a mathematical formula.
 It is based on each and every value of the data.
 It is capable of further algebraic treatment.
 It is less affected by the fluctuations of sampling.
 It provide basis for statistical inference.

Demerits

 Its calculation is not very simple.


 It is affected by the extreme values.
 It is not a good measure for open-end distributions.

150
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Properties of Variance and Standard Deviation

 Variance and S.D of a constant is zero i.e.

Var (c)  0 and S.D(c)  0 where “c” is any constant.

 The variance and S.D are unaffected by the change of origin i.e. when a constant is added
to or subtracted from each value of a variable, the variance and S.D remain unchanged i.e.

om
Var ( X  c)  Var ( X ) and S.D( X  c)  S.D( X )
 Variance and S.D are affected by the change of scale i.e. when each observation of a variable

l.c
is multiplied or divided by a constant, then variance and S.D are affected by these changes
i.e. ai
Var (c X )  c 2Var ( X ) and S.D(c X )  c S.D( X )
gm
X  1 X 1
Var     2  Var ( X ) and S .D    S .D( X )
 c  c  c c
The variance and S.D of the sum or difference of two independent variables is equal to the
s@


sum of their respective variances and S.D’s respectively i.e.

Var ( X  Y )  Var ( X )  Var (Y ) and S.D( X  Y )  S.D( X )  S.D(Y )


t
ta
es

Moments
ze

“The arithmetic mean of the rth power of deviations taken either from mean, zero or from any
arbitrary origin (provisional mean) are called moments”.

 When the deviations are computed from the arithmetic mean, then such moments are called
moments about mean (mean moments) or sometimes called central moments, denoted by
mr and given as follows:

Ungrouped Data Grouped Data


(xi  x) f(xi  x)r
mr   mr  
r

n n
Where r = 1,2,3,4…

151
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

 When the deviations of the values are computed from origin or zero, then such moments are
called the moments about origin, denoted by m' r and are given by:

Ungrouped Data Grouped Data

m'r   m'r  
r
xi fxi r
n n
Where r = 1,2,3,4…
Moments about provisional
 When the deviations of the values are computed from any mean and moments about

om
arbitrary value say “A” (provisional mean), then such moments zero are called raw
are called moments about provisional mean, denoted by m' r . moments (denoted by m'r )

l.c
Ungrouped Data Grouped Data
D  fD
r r
ai
m'r  m'r 
n n
gm

Where r = 1,2,3,4… D = xi –A
s@

EXAMPLE 4.19

Calculate the first four moments about the mean from the following data.
t
ta

2, 4, 6, 8, 10
es

Solution
x=   =6
xi 30
ze

Here
n 5
xi (xi  x) (xi  x)2 (xi  x)3 (xi  x)4
2 -4 16 -64 256
4 -2 4 -8 16
6 0 0 0 0
8 2 4 8 16
10 4 16 64 256
30 0 40 0 544

(xi  x) (xi  x)2 40


m1   0 , m2   = =8
n n 5
(xi  x)3 (xi  x) 544
m3   m4  
4
0 , = = 108.8
n n 5

152
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

EXAMPLE 4.20
Calculate the first four moments about the zero from the following data.

2, 4, 6, 8, 10

Solution
xi xi2 xi3 xi4
2 4 8 16
4 16 64 256
6 36 216 1296

om
8 64 512 4096
10 100 1000 10000
30 220 1800 15664

m'1   =
xi 30
l.c
m'2   =
xi 2 220
n
=6 ,
ai n
= 44
5 5
m'3   = m'4   =
gm
xi 3 1800 xi 4 15664
= 360 , = 3132.8
n 5 n 5
s@

EXAMPLE 4.21

Calculate the first four moments about the P.M from the following data.
t
ta

2, 4, 6, 8, 10
es

Solution Here D = xi – A and (let A = 4)


ze

X D = Xi - A D2 D3 D4
2 -2 4 -8 16
4 0 0 0 0
6 2 4 8 16
8 4 16 64 256
10 6 36 216 1296
30 10 60 280 1584

D D
2
10 60
m'1  = =2 , m'2  =
= 12
n 5 n 5
D D
3 4
280 1584
m'3  = = 56 , m'4  = = 316.8
n 5 n 5

153
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

EXAMPLE 4.22
Calculate the first four moments about the mean from the following data:

xi 2 3 4 5 6
f 1 3 7 3 1

 fxi  60 = 4
Solution Here x =
n 15
(xi  x) f(xi  x)

om
xi f fx 2
f(xi  x)
3
f(xi  x)
4
f(xi  x)
2 1 2 -2 -2 4 -8 16
3 3 9 -1 -3 3 -3 3

l.c
4 7 28 0 0 0 0 0
5 3 15 1 3 3 3 3
6 1 6 2 2 4 8 16
ai
tal 15 60 -- 0 14 0 38
gm

f(xi  x) 0 f(xi  x)2 14


m1   = =0 , m2   = = 0.933
n 15 n 15
f(xi  x) f(xi  x)
m3   m4  
s@

3 4
0 38
= =0 , = = 2.533
n 15 n 15

EXAMPLE 4.23
t
ta

Calculate the first four moments about zero from the following data:
es

xi 2 3 4 5 6
f 1 3 7 3 1
ze

Solution
xi f fx fx2 fx3 fx4
2 1 2 4 8 16
3 3 9 27 81 243
4 7 28 112 448 1792
5 3 15 75 375 1875
6 1 6 36 216 1296
Total 15 60 254 1128 5222

m'1   = = 4 m'2  
fxi 60 fxi 2 254
, = = 16.93
n 15 n 15
m'3   m'4  
fxi 3 1128 fxi 4 5222
= = 75.2 , = = 348.13
n 15 n 15
154
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

EXAMPLE 4.24
Calculate the first four moments about P.M from the following data:

xi 2 3 4 5 6
f 1 3 7 3 1

Solution Here D = xi – A and (let A = 3)

xi f D = xi – A fD fD2 fD3 fD4

om
2 1 -1 -1 1 -1 1
3 3 0 0 0 0 0
4 7 1 7 7 7 7

l.c
5 3 2 6 12 24 48
6 1 3 3 9 27 81
Total 15 -- 15 29 57 137
ai
 fD  fD
gm
2
15 29
m'1  = =1 , m'2  =
= 1.933
n 15 n 15
 fD  fD
3 4
57 137
m'3  m'4 
s@

= = 3.8 , = = 9.133
n 15 n 15

Test Yourself
t
ta
es

Find Moments about Mean, about Zero and about P.M from the following data:
1) 11, 13, 15, 17, 19, 21, 23, 25, 30, 29, 31
ze

xi 20 30 40 50 60
2)
f 1 5 9 4 1

All the raw moments can then be converted into central moments or mean moments or
moments about mean, by using the following relations:

 m1  0
 m2  m'2  (m'1)2
 m3  m'3  3m'1m'2  2(m'1)3
 m4  m'4  4m'1m'3  6(m'1)2 m'2  3(m'1)4

155
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

EXAMPLE 4.25

The first four moments about origin X = 0 are 4, 16.93, 75.2 and 348.13 respectively. Find
moments about mean?

Solution Given that

m'1  4 , m'2  16.93


m'3  75.2 , m'4  348.13

om
Now we use:
m1  0

l.c
m2  m'2  (m'1)2 Hi Friends!!!
 m2  16.93-(4)2 =0.93
ai
m3  m'3  3m'1m'2  2(m'1)3
gm

 m3  75.2 - 3(4)(16.93)+2(4)3 =0.04


m4  m'4  4m'1m'3  6(m'1)2 m'2  3(m'1)4
 m4  348.13 - 4(4)(75.2) +6(4)2 (16.93) - 3(4)4 = 2.21
s@

EXAMPLE 4.26
t
ta

The first four moments about X = 12 are 2.40, 43.0, 337.50 and 5500 respectively. Find
moments about mean?
es

Solution Given that


ze

m'1  2.40 , m'2  43.0


m'3  337.50 , m'4  5500
For population data we
Now we know that use “  ” instead of “m”

m1  0 in all the formulae of

m2  m'2  (m'1)2 Moments.

 m2  43.0 -(2.4)2 = 37.24


m3  m'3  3m'1m'2  2(m'1)3  (meu)
 m3  33.7 - 3(43)(2.40)+2(2.40)3 = 55.548
m4  m'4  4m'1m'3  6(m'1)2 m'2  3(m'1)4
 m4  5500  4(2.40)(337.5)  6(2.40)2 (43)  3(2.40)4  3646.5472

156
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Test Yourself

1) The first four moments about origin X = 0 are 8, 83.71, 1019.43 and 123100 respectively.
Find moments about mean?

2) The first four moments about X = 25 are -1.9, 20.5, -96.3 and 906.1 respectively. Find
moments about mean?

om
Symmetrical Distribution

l.c

ai
A distribution in which the values of mean, median and mode are equal is called symmetrical
distribution i.e.
gm

Mean = Median = Mode

 A distribution is which the two quartiles are equidistant from the median is called a symmetrical
s@

distribution i.e.

Q3 – Median = Median – Q1 1st 2nd


Moment-
t

Moments Moments
Ratios
ta

or Q3 + Q1 – 2 Median = 0 Ratio Ratio


m32 m4
es

 A distribution is said to be symmetrical if: Sample b1 = b2 =


m23 m22
b1 = 0 32 4
ze

Population  1 = 2=
 A distribution in which the two tails are equal in length 23 22
from the central value then it is called symmetrical Moment-Ratios are independent of
distribution. The symmetrical distribution is always in the origin and units of
the form of a bell. measurements i.e. they are
dimensionless quantities.

Mean = Median = Mode

157
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Skewness

We know that for symmetrical distribution the values of mean,


median and mode are equal and that the two tails of the
distribution are equal in length from the central value etc.

“Skewness is the degree of asymmetry”

OR

om
“Skewness is the lack (absence) of symmetry around central value (average)”

l.c
The presence skewness tells us that a particular distribution is not symmetrical or in other words it is
skewed. In skewed distribution the curve is turned more to one side than the other.
ai
gm

Positive Skewness

 Skewness is said to be positive, if mean is greater than the median


s@

and median is greater than mode i.e.


To measure the skewness
Mean > Median > Mode we will use:
t
ta

m3
 Skewness is said to be positive, if:  3  b1 
m23
es

Q3 + Q1 – 2 Median > 0 3
1   1 
23
ze

 In terms of moments, skewness is said to be positive if:

3 > 0
 Skewness is said to be positive, if the right tail of a distribution is longer than its left tail.

Mean  Median  Mode

158
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Negative Skewness

 Skewness is said to be negative, if mean is smaller than the median and median is smaller than
mode i.e.
Mean < Median < Mode

 Skewness is said to be negative, if:

Q3 + Q1 – 2 Median < 0

om
 In terms of moments, skewness is said to be negative if:

3 < 0


l.c
Skewness is said to be negative, if the left tail of a distribution is longer than its right tail.
ai
gm

Mean  Median  Mode


t s@

Measures of
ta

Skewness
es
ze

Absolute Measure Relative Measure of


of Skewness Skewness

Absolute measures of
Relative measures of skewness
skewness

 A.S = Mean – Mode  Karl Pearson‟s measures of Skewness


 A.S = Mean – Median  Bowley‟s measures of Skewness
 A.S = Q3 + Q1 – 2 Median  Coefficient of Skewness based on Moments

159
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

EXAMPLE 4.27

Calculate absolute skewness if Mean = 13.25 and Median = 12.96.

Solution Given that

Mean = 13.25
Median = 12.96

To calculate absolute skewness we use the formula:

om
Absolute skewness = Mean – Median = 13.25 – 12.96 = 0.29

l.c
Hence the distribution is positively skewed.
ai
EXAMPLE 4.28
gm

Calculate absolute skewness if Mean = 12.61and Mode = 13.25.


s@

Solution Given that

Mean = 12.61
t

Mode = 13.25
ta

To calculate absolute skewness we use the formula:


es

Absolute skewness = Mean – Mode = 12.61– 13.25 = - 0.64


ze

Hence the distribution is negatively skewed.

EXAMPLE 4.29

Calculate absolute skewness if Q1  13.73 , Q3  38.29 and Median = 26.01

Solution To calculate absolute skewness we use the formula:

Absolute skewness = Q3  Q1  2Median = 38.29  13.73  2(26.01) = 0

Hence the distribution is symmetrical.

160
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Test Yourself

1) Calculate absolute skewness if Mean = 33.25 and Median = 32.96.


2) Calculate absolute skewness if Mean = 42.61and Mode = 43.25.
3) Calculate absolute skewness if Q1  124.87 , Q3  146.53 and Median = 135.7

Karl Pearson’s measures of Skewness

om
It is defined as:
Mean  Mode
Sk 

l.c
Standard Deviation
ai
”It is to be noted that, this measure is suggested by Karl Pearson
gm
(1857-1936) and is known as Pearsonian coefficient.”

Since in many cases mode is ill-defined, therefore we replace Karl Pearson


s@

(Mean – Mode) by its equivalent from the empirical relation i.e. introduced coefficient
3 (Mean – Median) and hence: of skewness based on
Mean, Median, Mode
and S.D to measure the
3(Mean  Median )
t

Sk 
ta

skewness of a
Standard Deviation distribution
es

This coefficient usually varies between –3 and +3.


ze

EXAMPLE 4.30

Calculate Pearson‟s coefficient of skewness if Mean = 13.25, Mode = 12.61and S.D = 3.73

Solution Given that

Mean = 13.25, Mode = 12.61, S.D = 3.73

To calculate absolute skewness we use the formula:

Mean  Mode 13.25 -12.61


Sk   = 0.1716
Standard Deviation 3.73
Hence the distribution is positively skewed.

161
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Bowley’s measures of Skewness

It is defined as:
Q3  Q1  2Median
Sk 
Q3  Q1

“It is also to be noted that, this measure is suggested by Bowley


(1869-1957) and is known as Bowley’s coefficient”.

Bowley introduced

om
This coefficient usually varies between –1 and +1.
coefficient of skewness
based on quartiles to
measure the skewness

l.c
of a distribution

EXAMPLE 4.31
ai
Calculate Bowley‟s coefficient of skewness if Q1  14.6 , Q3  25.2 and Median = 18.8
gm

Solution To calculate absolute skewness we use the formula:


s@

Q3  Q1  2Median 25.2+14.6 - 2(18.8)


Sk   = 0.21
t

Q3  Q1
ta

25.2 -14.6
es

Hence the distribution is positively skewed.


ze

Coefficient of Skewness based on Moments

It is defined by:

m3
3 =
m23

162
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

EXAMPLE 4.32

The first four moments about origin X = 0 are 4, 16.93, 75.2 and 348.13 respectively. Find
moments about mean also find coefficient of skewness based on moments?

Solution Given that


m'1  4 , m'2  16.93
m'3  75.2 , m'4  348.13

om
Now we know that
m1  0
m2  m'2  (m'1)2

l.c
 m2  16.93-(4)2 =0.93
Hi Friends!!!
m3  m'3  3m'1m'2  2(m'1)3
ai
 m3  75.2 - 3(4)(16.93)+2(4)3 =0.04
gm

m4  m'4  4m'1m'3  6(m'1)2 m'2  3(m'1)4


 m4  348.13 - 4(4)(75.2) +6(4)2 (16.93) - 3(4)4 = 2.21
s@

m 0.04
Now  3 = 3  = 0.0446
3 3
m2 0.93
t
ta

Hence the distribution is positively skewed.


es
ze

Test Yourself

1) Calculate Pearson‟s coefficient of skewness if Mean = 50, Mode = 55 and S.D = 12.5
2) Calculate Bowley‟s coefficient of skewness if Q1  13.73 , Q3  38.29 and Median = 26.01
3) The first four moments about origin X = 0 are 23.5, 297, 5299.6 and 110306.94 respectively.
Find moments about mean also find coefficient of skewness based on moments?

163
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

A distribution is said to be normal if its b1=0 and b2=3 respectively. The curve of the

normal distribution is Bell-shaped and symmetric.

 For a Bell-shaped symmetric distribution:

 68.27% area of the normal curve lies under the range  


 95.45% area of the normal curve lies under the range   2
 99.73% area of the normal curve lies under the range   3

om
4
 MeanDeviation  (Standard Deviation)
5

l.c
2
 QuartileDeviation  (Standard Deviation)
3 ai
5
 QuartileDeviation  (MeanDeviation)
6
gm
s@

Kurtosis

“The degree of peakedness or flatness of a frequency distribution


t
ta

relative to normal distribution is called Kurtosis”. OR To measure the skewness


we will use:
es

“The characteristic by which we compare the “hump” of a m4


b2 =
distribution with normal distribution is called kurtosis”. m22
ze

Kurtosis indicates whether a particular distribution is flatter


or more peaked than the normal curve. Kurtosis is measured
by the b2

 If b2 > 3, then the distribution is known as leptokurtic


 If b2 = 3, then the distribution is known as mesokurtic
 If b2 < 3, then the distribution is known as platykurtic

164
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

EXAMPLE 4.33

The first four moments about X = 170 are 5.2, 664, 10720 and 145600 respectively. Calculate b2
and kurtosis of the distribution?

Solution Given that

m'1  5.2 , m'2  664


m'3  10720 , m'4  1456000

om
Now
m1  0
m2  m'2  (m'1)2

l.c
 m2  664  (5.2)2  637 Hi Friends!!!

m3  m'3  3m'1m'2  2(m'1)3


ai
 m3  10720  3(5.2)(664)  2(5.2)3  642.82
gm
m4  m'4  4m'1m'3  6(m'1)2 m'2  3(m'1)4
 m4  1456000  4(5.2)(10720)  6(5.2)2 (664)  3(5.2)4  1338558
s@

m4 1338558
b2 =   3.3
m22 637 2
t
ta

Since b2 is more than 3, therefore the distribution is leptokurtic.


es

EXAMPLE 4.34
ze

Find kurtosis using m4  2.533 and m2  0.933 ?

Solution Given that m4  2.533 and m2  0.933

m4 2.533
Now b2 =  2
 2.91
m22 0.933

Since b2 is less than 3, therefore the distribution is platykurtic.

165
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

EXAMPLE 4.35

The first four central moments of a distribution are 0, 2.5, 0.7 and 18.75 respectively. Test
skewness and kurtosis?

Solution Given that


m1  0 , m2  2.5
m3  0.7 , m4  18.75
Skewness:

om
m3 0.7
3 =   0.18
m23 2.53

l.c
Therefore the distribution is positively skewed.
ai Hi Friends!!!
Kurtosis:
gm

m4 18.75
b2 =  2
3
m22 2.5
s@

Therefore the distribution is mesokurtic.


t
ta
es

Test Yourself
ze

1) The first four moments about X = 34.5 are -11, 260, -5000 and 128000 respectively.
Calculate b2 and kurtosis of the distribution?
2) Find kurtosis using m4  3646.54 and m2  37.24 ?
3) The first four central moments of a distribution are 0, 37.24, 55.55 and 3646.54 respectively.
Test skewness and kurtosis?

166
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Sharpen your Pencil


MCQ’s

(1) var(3x  2)  _____

(A) 9 var( x) (B) 9 var( x)  2 (C) var( x) (D) None of these

om
 3x 
(2) var    _____
 2 
9 9
var  x  var  x  9 var  x 

l.c
(A) (B) (C) (D) None of these
2 4

(3) S.D is always_____ than range.


ai
gm
(A) More (B) less (C) Both A & B (D) None of these

(4) (4) S.D is always_____ than M.D.


s@

(A) More (B) less (C) Both A & B (D) None of these

(5) M.D is always _____ S.D


t
ta

(A) Less than (B) Greater than (C) Equal to (D) None of these
es

(6) If each observation is multiplied by 5 then S.D is _____


ze

(A) 5 S.D (B) 5 S.D (C) 52 S.D (D) None of these

(7) If m2  4 and m4  16 then b2  _____

(A) 8 (B) 4 (C) 1 (D) None of these

(8) S.D of a, a, a, a is _____

(A) 0 (B) a (C) 1 (D) None of these

(9) If m2  4 then S.D_____

(A) 4 (B) 2 (C) 16 (D) None of these

167
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Sharpen your Pencil


MCQ’s

(10) For symmetric distribution _____

(A) b1  0 (B) b1  3 (C) Both A & B (D) None of these

om
(11) Coefficient of variation is infinite if _____ is zero.

(A) S.D (B) Mean (C) G.M (D) None of these

l.c
(12) C.V is zero if _____ is zero. ai

2
(A) x (B) (C) S.D (D) None of these
gm

(13) S.D = _____


s@

1
(A)  5 / 4  M .D (B)  4 / 5 M .D (C) Q.D (D) None of these
2

(14) If var(X) = 4 then var(2X+4) is _____


t
ta

(A) 4 (B) 16 (C) none (D) None of these


es

(15) Measures of dispersion can _____ be negative.


ze

A) Always (B) never (C) sometimes (D) None of these

(16) m2  _____

(A) Variance (B) S.D (C) Mean (D) None of these

(17) If x  20 and S  5 then C.V = _____

(A) 25% (B) 125% (C) 80% (D) None of these

168
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Short Questions
ExeRciSe

Q.4.01. The first two moments of a distribution about x = 2 are 1 and 5find its mean and
S.D?

Q.4.02. Find the Bowley‟s coefficient of skewness if Q1  8.88 , Q3  13.42

om
and Median  11.45 ?

Q.4.03. The first four central moments of a distribution are 0, 2.5, 0.7 and 18.75; test the

l.c
kurtosis of the distribution?

Q.4.04.
ai
Find the range and its coefficient from the following data: 5, 19, 3, 6, 5, 8, 9, 40,
5, and 6.
gm

Q.4.05. Find the Q.D: 2, 3, 5, 5, 5, 6, 6, 8, 9, 9, 30, 38, and 40?


s@

Q.4.06. What is meant by coefficient of variation?

Q.4.07. What are the types of absolute and relative measures of dispersion?
t

Q.4.08. Define absolute and relative measures of dispersion?


ta

Q.4.09. Define kurtosis, how is it measured?


es

Q.4.10. Distinguish between symmetry and skewness?


ze

Q.4.11. If Mean = 42 Median = 42.2 and Mode = 42.3 find skewness?

Q.4.12. If mean = 50.9 and variance = 16 then find C.V?

Q.4.13. If a data having  f  15 ,  fx  60 ,  fx 2


 254 ,  fx 3
 1128 ,
 fx 4
5222 then find the 3 moment about mean?
rd

Q.4.14. If the first four moments of a distribution about “0” are -1.5, 17, -30 and 108 then
find the coefficient of skewness b1.

Q.4.15. If Mean = 10 Median = 8 then calculate mode. What will be the skewness of the
distribution?

169
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Short Questions
ExeRciSe

Q.4.16. If mean = 10 and m2 = 16 then find C.V?

Q.4.17. What do you understand by the term dispersion? Name the methods of measuring
dispersion.

om
Q.4.18. If n = 10, x  12 and  x 2  1530 . Find the coefficient of variation.

l.c
Q.4.19. Calculate variance of the following data 6, 9, 12, 15 and 18.

Q.4.20.
ai
If a data having Q1  10 , Q2  18 and Q3  30 , then find coefficient of Q.D?
gm

Q.4.21. In each of the below cases determine whether the data set is skewed and if so
whether it is negatively or positively skewed:
s@

(i) Mean=32.9, Median=21.4, Mode=3.5


(ii) Mean=10.9, Median=10.9, Mode=10.9
(iii) Mean=42, Median=42.2, Mode=42.3
t
ta

Q.4.22. Give some merits and demerits of S.D.


es

Q.4.23. Define skewness. Explain positive and negative skewness.

Q.4.24. Sum of values = 129, Sum of squares of values = 3371, Number of values = 5.
ze

Find mean, variance and standard deviation.

Q.4.25. Define mean deviation and variance.

Q.4.26. Define moments about mean and write down their computing formula.

170
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Long Questions
ExeRciSe

Q.4.01. Compute M.D from (i) Mean (ii) Median


Also calculate their coefficients?

Marks 0- 10 11- 21 22- 32 33-43 44-54 55- 65 66-76 77-87

om
f 3 7 21 17 10 9 4 1

Q.4.02. Find range and quartile deviation also find their coefficients?

l.c
Groups 25- 50 50- 75 75- 100 100-125 125-150 150- 175
f 10 12 16 17 20 18
ai
Q.4.03. Find coefficient of variation?
gm

Groups 1.5- 1.9 2.0- 2.4 2.5- 2.9 3.0-3.4 3.5-3.9 4.0- 4.4 4.5- 4.9
f 2 1 4 15 10 5 3
s@

Q.4.04. After the 10 weeks tuition the result of the students is given below:
t

Teacher A 12 15 6 73 7 19 99 36 84 29
ta

Teacher B 47 12 76 48 4 51 37 48 13 0
es

Who is more consistent teacher?


ze

Q.4.05. The first four moments of a distribution about the value X = 4 of the variable are
-1.5, 17, -30 and 108. Find the moments about mean.

Q.4.06. Find Bowley‟s and Pearson‟s coefficient of skewness? Also find coefficient of
skewness based on moments?

Groups 20- 24 25- 29 30- 34 35-39 40-44 45- 49 50- 54


Cumulative
22 50 268 495 730 946 1000
frequencies

Q.4.07. Find first four moments about mean also find b1 and b2?

Marks 0- 10 11- 21 22- 32 33-43 44-54 55- 65 66-76 77-87


No. of
3 7 21 17 10 9 4 1
Students

171
Chapter 04 Measures of Dispersion, Moments, Skewness and Kurtosis

Long Questions
ExeRciSe

Q.4.08. In the following data are doctor‟s salaries more consistent than those of peons?
Mean S.D
Doctors Salaries 20000 6500

om
Peons Salaries 900 250

Q.4.09. Find the shape of the data given below:

l.c
If  f  100 ,  fD  15 ,  fD2  97 ,  fD3  33 ,  fD 4
253 and D  Xi  67
ai
Q.4.10. Find mean deviation and standard deviation for the data given below:
gm

Groups 0- 10 10- 20 20- 30 30-40 40-50 50- 60


f 6 7 10 8 4 2
s@

Q.4.11. Find the four moments about the mean from the following data:

x 2 3 4 5 6
t

f 1 3 7 3 1
ta

Q.4.12. Compute the coefficient of variation of (i) x (ii) y = 2x where “x” has
es

values 2, 3, 3, 5, 5, 5, 8, 10, 12. Are the two results same if so give reasons.
ze

Q.4.13. The five temperature readings in Co are as follows:

X = 15.3, 21.3, 17.4, 20.1, 15.9

(i) Calculate variance


(ii) Make a transformation y = x-12.5
(iii) Calculate the variance of the transformed observations
(iv) What is the effect of this transformation on the variance of the original
observations?

Q.4.14. Find Bowley‟s coefficient of skewness?

classes 22-25 25-28 28-31 31-34 34-37


f 2 5 9 6 1

172
CHAPTER 05
Index Numbers

om
Chapter Contents

l.c
ai
Y
gm
ou should read this chapter if you need to learn about:

 Definition of Index Number: (P174)


s@

 Types of Index Numbers by Nature: (P174-P175)


 The Base Period (Year) : (P175)
 Fixed Base and Chain Base Methods: (P176)
 Types of Index Numbers by Treatment: (P177-P191)
t
ta

 Wholesale Price Index Numbers: (P192)


 Steps involved in the construction of WPI: (P192-P193)
es

 Consumer Price Index Numbers: (P193-P195)


 Problems involved in the Construction of Index Numbers: (P195)
Exercise: (P196-P200)
ze

173
Chapter 05 Index Numbers

Index Number

“A statistical measure of the average changes in the price, quantity or value of a variable
(commodity) or group of variables with respect to time or space, is called Index Number”

 To compare the changes in the average price of Milk in 2013 with that in 2010.
 To compare the retail price of Rice in KPK with that in Lahore or Karachi
 To know the increase in the yield of Wheat in Pakistan during 2013 as compared

om
to 2010 etc.

Types of Index Number by Nature


l.c
ai
There are three types of Index numbers by nature:
gm

 Price Index Number


 Quantity Index Number Quantities Prices
s@

 Value Index Number


t
ta

Index Number
by Nature
es
ze

Price Index Quantity Value Index


Number Index Number Number

Price Index Number

“Price index number is a measure of the changes in the prices of certain commodities with respect
to time or space”.

174
Chapter 05 Index Numbers

Quantity Index Number

“Quantity index number or volume index number measures the


changes in the quantity or volume produced, consumed or sold of
certain commodities with respect to time or space”.

Value Index Number When the price “p” of a


commodity during a
“Value index number measures the changes in the value of period is multiplied by

om
commodities in given period with reference to base period” its quantity “q”
produced, purchased or

Pon  
pnqn

l.c
 100 sold we get the value
 poqo ai denoted by v(=pq).

where pn denotes the prices of the given year and


gm
po denotes the prices of the base year

The Base Period (Year)


s@

“The period with which we like to compare the relative changes is


known as reference period or base period”. The base period may be a
t

 The index for base


ta

year, month, week or a day.


period is always taken
es

A base period (year) should be a normal year. By normal year we mean a as 100, and then the
year of economic stability and free from crisis caused by wars, strikes, index number of
earthquake, floods etc. If a single year of normal conditions is not current year is
ze

available then the average of several years is used as the base period.
compared to 100.

There are two methods of selecting a base period:  Index number is a


percentage, but the
 Fixed base method
percent sign is usually
 Chain base method
omitted.

Methods

Fixed base Chain base


Method Method

175
Chapter 05 Index Numbers

Fixed Base Method

According to this method, a particular year is generally chosen as the base period which remains
unchanged during the life time of the index.

To compute index numbers by fixed base method, the value of the I am the price of a
base year is taken as 100. Index numbers for other periods are Base year!!!
computed by dividing the price of a given year by the base year
price and the results are multiplied by 100. Values so obtained are
called price relatives i.e.

om
Prices of Commodities in Given Year
Price Relative   100
Prices of Commodities in Base Year

 Pon 
pn
po
 100
ai
l.c po
gm

where pn denotes the prices of the given year and


po denotes the prices of the base year
s@

Chain Base Method


t

According to this method, the base year is not fixed but changes
ta

I am the price of a
from year to year. Here the prices of previous year are taken as
Current year!!!
es

base and thus the relatives are computed. The price relatives
computed by the chain base method are known as Link Relatives
i.e.
ze

Prices of Commodities in Given Year


Link Relative   100
Prices of Commodities in the Preceding Year

 Pn - 1, n 
pn
pn - 1
 100

where pn denotes the prices in the given year and


pn
pn-1 denotes the prices in the preceding year

The link relatives cannot be used directly to make comparison. For the purpose of comparison the link
relatives have to be converted to fixed base. The indices so obtained are called Chain Indices. The
chain index for a year is obtained by multiplying the average of link relatives of that year by the chain
index of the preceding year and then dividing the resulting product by 100.

176
Chapter 05 Index Numbers

Types of Index Number by Treatment

 Simple Index Number


 Composite index number

Index Number
by Treatment

om
Simple Index Composite
Number Index Number

l.c
Simple Index Number ai
“An index number is called a simple index number when it is computed for
gm
one variable (commodity)”.

Simple index number can be computed as under:


s@

 Fixed Base Method


 Chain Base Method
t
ta

Fixed Base Method


es

In this method index number is computed by price relative as given below:


ze

pn
Price Relative   100
po
Chain Base Method

In this method index number is computed in two steps.

Step1: In first step, we calculate link relatives by dividing current period price by price of
immediate previous period of current period and multiplying this ratio by 100. i.e.

pn
Pn - 1, n   100
pn - 1
Step 2: In second step, we take just reverse step of step 1. Hence, to get chain indices we
multiply the current period link relative by link relative of immediate previous period of
current period and divide this product by 100.

177
Chapter 05 Index Numbers

EXAMPLE 5.01
The price of wheat (per maund) is given for the year 1964 to 1973. Calculate simple index
number using:
Prices Prices
Years Years
(Rs.) (Rs.)
(i) 1964 as base 1964 20 1969 27
(ii) Average of the prices for the first 1965 18 1970 28
five years as base 1966 23 1971 30
(iii)Average of the prices of all the ten 1967 24 1972 32
1968 25 1973 33
years as base.

om
Solution Here, the necessary computations are given below:

l.c
Simple index numbers taking
ai Average of the prices
Years Prices 1964 as base For the first 5 Of all the ten
year as base years as base
gm
1964 20 100  22  100  90.9  2026  100  76.9
20

1965 18  1820  100  90  1822  100  81.8  1826  100  69.2


s@

Hi Friends!!!
1966 23  2023  100  115 104.5 88.5
1967 24 120 109.1 92.3
t

125 113.6 96.2


ta

1968 25
1969 27 135 122.7 103.8
es

1970 28 140 127.3 107.7


1971 30 150 136.4 115.4
1972 32 160 145.5 123.1
ze

1973 33 165 150.0 126.9

Test Yourself

The price of wheat (per maund) is given for the year 2000 to 2009. Calculate simple index
number using:
Prices Prices
Years Years
(i) 2000 as base (Rs.) (Rs.)
2000 120 2005 127
(ii) Average of the prices for the first 2001 118 2006 128
five years as base 2002 123 2007 130
(iii)Average of the prices of all the ten 2003 124 2008 132
years as base. 2004 125 2009 133

178
Chapter 05 Index Numbers

EXAMPLE 5.02
The price of wheat (per maund) is given for the year 1960 to 1967. Calculate index numbers by
chain base method using 1960 as base:

Years 1960 1961 1962 1963 1964 1965 1966 1967


Prices (Rs.) 40 45 48 50 52 54 56 60

Solution Here, the necessary computations are given below:

om
Years Prices (Rs.) Link Relatives Chain Indices
1960 40 100 100

l.c
1961 45  4045  100  112.5  112.5100100   112.5
 4845  100  106.7  112.5100106.7   120.04
ai
1962 48
gm

1963 50 104.2 125.08


1964 52 104.0 130.08
1965 54 103.8 135.02
s@

1966 56 103.7 140.02


1967 60 107.1 149.96
t
ta
es

Test Yourself
ze

The price of wheat (per maund) is given for the year 2001 to 2008. Calculate index numbers by
chain base method using 2001 as base:

Years 2001 2002 2003 2004 2005 2006 2007 2008


Prices (Rs.) 140 145 148 150 152 154 156 160

Composite Index Number

“An index number is called a composite index number when it is


computed for more than one variable (commodities)”.

179
Chapter 05 Index Numbers

Composite (price or quantity) index number may further be classified as:

Composite
Index Number

Un-weighted Weighted
Index Number Index Number

om
Un-weighted Index Number or Un-weighted Indices

l.c
“An index number that measures the changes in the prices of a group of commodities when the
ai
relative importance i.e. weight of the commodities are not taken in to account is called un-
gm
weighted index number or un-weighted indices”.
s@

Un-weighted index numbers may also be classified as:


t

Un-weighted
ta

Index Number
es
ze

Simple Simple Average


Aggregative of Relative Index
Index Number Number

Simple Aggregative Index

Simple aggregative index number can be computed as under:

 Fixed Base Method


 Chain Base Method

180
Chapter 05 Index Numbers

Fixed Base Method

Under this method index number can be obtained by dividing the sum of In 1738, French
the given year prices of all the commodities by the sum of the base year economist Dutot
prices of the same commodities and multiply the result by 100 i.e. introduced the simple
aggregative method

Pon    100
pn of index number.

 po

Chain Base Method

om
Under this method:

l.c
Step 1: First we compute link relatives for each year by dividing current year total of prices by
the immediate previous year’s total of prices and express the result in percentage.
ai
Step 2: Second to get chain indices we take the reverse procedure as we take in calculation link
relatives i.e. we multiply each year link relative by previous year link relative and divide
gm
this product by 100.

EXAMPLE 5.03
s@

Commodity (Prices in Rs.)


The following table shows wholesale prices of Years
Wheat Rice Mutton
t

wheat, rice and mutton for the years 1972, 1973 1972 30 80 240
ta

and 1974. Compute the simple aggregative price 1973 32 100 300
indices for 1973 and 1974 using 1972 as a base: 1974 37 110 400
es
ze

Solution Here, the necessary computations are given below:

Commodity (Prices in Rs.)


Years
Wheat Rice Mutton Total
1972 p0 30 80 240 350
1973 p1 32 100 300 432
1974 p2 37 110 400 547

Po1    100  Po1 


p1 432
Therefore, for 1973: ×100 = 123.43%
 po 350

Po2    100  Po2 


p2 547
for 1974: ×100 = 156.29%
 po 350

181
Chapter 05 Index Numbers

Test Yourself

The following table shows wholesale prices of Commodity (Prices in Rs.)


Years
wheat, rice and mutton for the years 2005, 2006 Wheat Rice Mutton
and 2007. Compute the simple aggregative 2005 130 180 340
price indices for 2006 and 2007 using 2005 as a 2006 132 200 400
base: 2007 137 210 500

om
EXAMPLE 5.04

l.c
The following table shows wholesale prices of Commodity (Prices in Rs.)
Years
wheat, rice and mutton for the years 1972, 1973
ai Wheat Rice Mutton
and 1974. Compute the chain indices for 1973 1972 30 80 240
and 1974 using 1972 as a base: 1973 32 100 300
gm
1974 37 110 400
s@

Solution Here, the necessary computations are given below:


t

Commodity (Prices in
ta

Rs.) Total Link Relative Chain Indices


Wheat Rice Mutton
es

1972 30 80 240 350 100 100


1973 32 100 300 432 432
350  100  123.43  100   123.43
123.43100
ze

1974 37 110 400 547  547


432  100  126.62  126.62100123.43   156.29

Test Yourself

Commodity (Prices in Rs.)


The following table shows wholesale prices of Years
Wheat Rice Mutton
wheat, rice and mutton for the years 2000, 2001
2000 40 90 340
and 2002. Compute the chain indices for 2001
2001 42 110 400
and 2002 using 2000 as a base:
2002 47 120 500

182
Chapter 05 Index Numbers

Simple Average of Relatives

Simple average of relatives can be computed as under: In 1764, Italian


economist Carli
 Fixed Base Method introduced the
 Chain Base Method arithmetic average of
price relative.

Fixed Base Method

om
Under this method simple average of price relatives can be obtained by:

Step 1: First, we find price relatives for each commodity given.

l.c
Step 2: Then average these relatives by using arithmetic mean,
median or geometric mean. The averages so obtained are
known as index numbers by simple average of relatives.
ai
In 1863, English
economist Jevons
gm

Chain Base Method introduced the


Geometric average of

Under this method: price relative.


s@

Step 1: First, we find link relatives for the given commodities.


Step 2: Second, we take average (arithmetic mean, median or geometric mean) of link relatives.
t

Step 3: Third, we find chain indices.


ta
es

EXAMPLE 5.05
ze

From the data given below, compute the index numbers of prices, taking 1962 as base. Use:

(i) Simple average of price relative


(ii) Median of price relative and
(iii) Geometric mean of price relative.

Commodity (Prices in Rs.)


Years Soft Kerosene
Firewood Matches
coke oil
1962 3.25 2.50 0.20 0.06
1963 3.44 2.80 0.22 0.06
1964 3.50 2.00 0.25 0.06
1965 3.75 2.50 0.25 0.06

183
Chapter 05 Index Numbers

Solution Here, the necessary computations are given below:

Price Relative Index Number By


Years Soft Kerosene (i) (ii) (iii)
Firewood Matches
coke oil Total A.M Median G.M
1962 100 100 100 100 400 100 100 100
1963  3.25  100  106 112
3.44
110 100 428 107 108 107
1964  3.25  100  108 80
3.50
125 100 413 103 104 102
 3.25  100  115 100
3.75

om
1965 125 100 440 110 108 109

l.c
Where:

abcd
(i) A.M 
ai
4
gm
(ii) Median = Exact central value after
arranging the values a, b, c and d.
Hi Friends!!!
s@

(iii) G.M  4 a  b  c  d
t
ta

Test Yourself
es

From the data given below, compute the index numbers of prices, taking 2000 as base. Use:
ze

(i) Simple average of price relative


(ii) Median of price relative and
(iii) Geometric mean of price relative.

Commodity (Prices in Rs.)


Years Soft Kerosene
Firewood Matches
coke oil
2000 4.25 5.50 1.20 1.06
2001 4.44 5.80 1.22 0.07
2002 4.50 5.00 1.25 1.07
2003 4.75 5.50 1.25 1.03

184
Chapter 05 Index Numbers

EXAMPLE 5.06

Commodity (Prices in Rs.)


Construct chain indices for the prices of the Years
A B C
three commodities, taking 1940 as the base; using 1940 2.80 10.50 2.70
1941 3.40 10.80 3.20
(i) Simple average (A.M) 1942 3.60 10.60 3.50
(ii) Median 1943 4.00 11.00 3.80
(iii) Geometric Mean 1944 4.20 11.50 4.00

om
Solution Here, the necessary computations are given below:

Link Relative Simple

l.c
Years Total Chain Indices
A B C Average
1940 100 100 100 300 100 100
ai
1941   100  121
3.40
2.80
103 119 343 114  100   114
114100
gm

1942  3.60
3.40  100  106 98 109 313 104  104100114   118.6
1943  3.60
4.00
 100  111 104 109 324 108  108100
118.6
  128.1
 4.20
4.00  100  105  105100   134.5
s@

128.1
1944 105 105 315 105

Link Relative
t

Years Median Chain Indices


ta

A B C
1940 100 100 100 100 100
es

1941   100  121


3.40
2.80
103 119 119  100   119
119100

1942  3.60
3.40  100  106 98 109 106  106100119   126.14
ze

1943  3.60
4.00
 100  111 104 109 109  109100126.14   137.49
1944  4.20
4.00  100  105 105 105 105  105100137.49   144.37

Link Relative
Years G.M Chain Indices
A B C
1940 100 100 100 100 100
1941 3.40
2.80  100  121 103 119 114.04  114.04100
100   114.04
1942  3.60
3.40  100  106 98 109 104.23  104.23100114.04   118.86
1943  3.60
4.00
 100  111 104 109 107.96  107.96100118.86   128.33
1944  4.20
4.00  100  105 105 105 105  105100128.33   134.74

185
Chapter 05 Index Numbers

Where:
abc
(i) A.M 
3
(ii) Median = Exact central value after arranging the values a, b and c.
(iii) G.M  3 a  b  c

Test Yourself

om
Construct chain indices for the prices of the three commodities, taking 2001 as the base; using

(i) Simple average (A.M)

l.c
(ii) Median
(iii) Geometric Mean ai
Commodity (Prices in Rs.)
Years
A B C
gm

2001 5.80 18.50 5.70


2002 6.40 12.80 2.20
2003 6.60 15.60 6.50
s@

2004 7.00 13.00 8.80


2005 7.20 18.50 2.00
t
ta

Weighted Index Numbers or Weighted Indices


es

“An index number that measures the changes in the prices of a group of commodities when the
ze

relative importance i.e. weight of the commodities are taken into account is called weighted index
number or weighted indices”.

Weighted index number may also be classified as:

Weighted Index
Number

Weighted Weighted Average


Aggregative of Relative Index
Index Number Number

186
Chapter 05 Index Numbers

Weighted Aggregative Index Numbers

There are various kinds of weighted aggregative index number;


some of them are discussed below:

Laspeyres Index Numbers

It is defined as:
Laspeyres

om
Pon  
pnqo
 100
 poqo

l.c
Since the Laspeyre’s formula use the base year prices (quantities) as
weight therefore it is called as base year weighted index. It is to be noted
ai
that, the Laspeyre’s index is subject to upward bias (expected to
overestimate)
gm
Paasche

Around 1875
Paasche Index Numbers German economists
s@

Laspeyres and
It is defined as: Paasche introduced
their formulae of

Pon  
pnqn index number.
t

 100
ta

 poqn
es

Since the Paasche’s formula use the current year prices (quantities) as
weight therefore it is called the current year weighted index. It is to be
ze

noted that, the Paasche’s index subject to downward bias (expected to


underestimate).

Fisher
Fisher (Ideal) Index Number
Around 1921-1922
American economists
Fisher’s index number is the geometric mean of the Laspeyre’s and
and statistician Irving
Paasche’s index number i.e. Pon  L P Fisher introduced his
formula for index

Pon  
pnqo  pnqn number.
  100
 poqo  poqn

187
Chapter 05 Index Numbers

Fisher called this index as Ideal, because of the following reasons:

 It takes into account both base period as well as current period prices and
quantities.
 It is based on geometric mean, which is theoretically considered as the best
average for the construction of an index number.
 It is free from bias. The Laspeyres index is subject to upward bias (expected to
overestimate) and Paasche index to downward bias (expected to underestimate).
These two types of bias are crossed geometrically, i.e. by an averaging process
that of itself has no bias.

om
Marshall-Edgeworth Index Number

l.c
It is defined as:
ai
 pn  qo  qn 
Pon 
gm
100
 po  qo  qn 

This formula can also be written as:


s@

Marshall

Pon   pnqo   pnqn 100


 poqo   poqn
t
ta
es

EXAMPLE 5.07
ze

Edgeworth
Construct the following weighted aggregative price
index numbers for 1960 using 1956 as a base, from Around 1887,
the given data. English economist
Marshall and Irish
(i) Laspeyre’s index (iii) Fisher’s “Ideal” index economist Edgeworth
(ii) Paasche’s index (iv) Marshall-Edgeworth index both introduced a
formula for index
Prices (Rs. Per md) Quantities (tons) number.
Commodity 1956 po 1960 p1 1956 qo 1960 q1
A 64 75 270 276
B 40 45 124 118
C 18 21 130 121
D 58 68 185 267

188
Chapter 05 Index Numbers

Solution The necessary computations are given below:

Hi Friends!!!
(i) Laspeyre’s index

Po1  
p1qo 41140
 100 = ×100 = 116.5 p1 qo po qo p1 q1 po q1
 poqo 35310 20250 17280 20700 17664
5580 4960 5310 4720
(ii) Paasche’s index 2730 2340 2541 2178

om
12580 10730 18156 15486
Po1  
p1q1 46707
 100 = ×100 = 116.6 41140 35310 46707 40048
 poq1 40048

l.c
(iii) Fisher’s “Ideal” index ai
gm

Po1  
p1qo  p1q1 41140 46707
  100 = × ×100 = 116.5
 poqo  poq1 35310 40048
s@

(iv) Marshall-Edgeworth index

Po1  
p1qo   p1q1 41140+46707
 100 = ×100 = 116.57
t

 p q  p q
ta

o o o 1 35310+40048
es

Test Yourself
ze

Construct the following weighted aggregative price index numbers for 2003 using 2000 as a
base, from the given data.

(i) Laspeyre’s index (iii) Fisher’s “Ideal” index


(ii) Paasche’s index (iv) Marshall-Edgeworth index

Prices (Rs. Per md) Quantities (tons)


Commodity 2000 po 2003 p1 2000 qo 2003 q1
A 74 80 370 376
B 77 56 224 218
C 67 78 230 221
D 76 77 285 367

189
Chapter 05 Index Numbers

EXAMPLE 5.08
Construct the following weighted aggregative price index numbers for 1960 and 1961, using
1956 as a base, from the given data.

(i) Laspeyre’s index (iii) Fisher’s “Ideal” index


(ii) Paasche’s index (iv) Marshall-Edgeworth index

Prices (Rs. Per md) Quantities (tons)


Commodity 1956 po 1960 p1 1961 p2 1956 qo 1960 q1 1961 q2
A 64 75 80 270 276 290

om
B 40 45 41 124 118 144
C 18 21 20 130 121 137
D 58 68 56 185 267 355

Solution The necessary computations are given below:


l.c
ai
gm

p1 qo po qo p1 q1 po q1 p2 qo p2 q2 po q2
20250 17280 20700 17664 21600 23200 18560
s@

5580 4960 5310 4720 5084 5904 5760


2730 2340 2541 2178 2600 2740 2466
12580 10730 18156 15486 10360 19880 20590
41140 35310 46707 40048 19644 51724 47376 Hi Friends!!!
t
ta

(i) Laspeyre’s index


es
ze

For 1960: Po1 


 p1qo  100 = 41140 ×100 = 116.5
 poqo 35310

For 1961: Po2 


 p2qo  100 = 39644 ×100 = 112.3
 poqo 35310

(ii) Paasche’s index

For 1960: Po1 


 p1q1  100 = 46707 ×100 = 116.6
 poq1 40048

For 1961: Po2 


 p2q2  100 = 51724 ×100 = 109.2
 poq2 47376

190
Chapter 05 Index Numbers

(iii) Fisher’s “Ideal” index

For 1960: Po1 


 p1qo   p1q1  100 = 41140 × 46707 ×100 = 116.5
 poqo  poq1 35310 40048

For 1961: Po2 


 p2qo   p2q2  100 = 39644 × 51724 ×100 = 110.7
 poqo  poq2 35310 47376

om
(iv) Marshall-Edgeworth index

For 1960: Po1 


 p q  p q
1 o 1 1
 100 =
41140+46707
×100 = 116.57

l.c
 p q  p q
o o o 1
ai 35310+40048

For 1961: Po2 


 p q  p q
2 o 2 2
 100 =
39644+51724
×100 = 110.49
 p q  p q
gm
o o o 2 35310+47376
s@

It is to be noted that the formulae used to obtain the price index number can also be used
to obtain the quantity index number simply by interchanging the p’s and q’s in the price
index number formula.
t
ta
es

Test Yourself
ze

Construct the following weighted aggregative price index numbers for 2010 and 2011 using
2009 as a base, from the given data.

(i) Laspeyre’s index (iii) Fisher’s “Ideal” index


(ii) Paasche’s index (iv) Marshall-Edgeworth index

Prices (Rs. Per md) Quantities (tons)


Commodity 2009 po 2010 p1 2011 p2 2009 qo 2010 q1 2011 q2
A 84 55 70 670 376 790
B 60 35 31 524 518 544
C 48 61 70 530 621 737
D 68 48 46 585 467 355

191
Chapter 05 Index Numbers

Wholesale Price Index Number (WPI)

“Wholesale price index number measures the changes in


prices of goods in wholesale markets”.

These goods include food grain (wheat, rice, etc.), raw


materials (sugarcane, cotton, etc.), manufactured goods
(textiles, fabrics, sugar, vegetable ghee, soap, etc.), electricity,
gas, petrol, building material, etc.

om
The wholesale price index numbers measure the variation of prices in general. They do not measure the
effects of rise and fall of prices on the general standard of living of the various groups of people in a
society.

Steps Involved in Construction of


l.c
ai
Wholesale Price Index Number
gm

The following steps are involved in the construction of wholesale price index number:
s@

Step 1: Purpose and Scope of Index Number

It is the most important step in the construction of an index number because most of the other points are
t

decided in the light of this point. It must be decided initially what the index number is to be measured
ta

and why? There is no single index number, which can serve all purposes. Every index is limited and for
a particular use. The scope i.e. the filed covered by the index is also decided in the light of the purpose
es

and data available.

Step 2: Selection of Commodities and Collection of Prices


ze

There is no hard and fast rule for the inclusion of commodities but a reasonable number of commodities
on the basis of their importance should be included. According to Dr. Irving Fisher more than 20
commodities should be included and 50 is the better number.

Step 3: Selection of the Base Period

The period with which we like to compare the relative changes is known as reference period or base
period. The base period may be a year, month, week or a day. It is to be noted that, the index for base
period is always taken as 100. A base period (year) should be a normal year. By normal year we mean a
year of economic stability and free from crisis caused by wars, strikes, earthquake, floods etc. If a single
year of normal conditions is not available then the average of several years is used as the base period.

192
Chapter 05 Index Numbers

There are two methods of selecting a base period:

 Fixed base method


 Chain base method

Step 4: Selection of Averages

Theoretically any average i.e. mean, median, mode, geometric mean, harmonic mean etc. can be taken in
the construction of index number. But practically geometric mean is the suitable average, because in the
construction of index numbers, we are concerned with relative changes and the geometric mean is

om
appropriate measurement of the relative changes.

Step 5: Selection of Appropriate Weights

l.c
We know that all the commodities selected are not equally important e.g. eggs and coffee cannot be
given the same importance as wheat and rice. Wheat is much more important than coffee, it is therefore
ai
desirable that wheat must be given more importance. Thus weights are assigned to the commodities
depending upon their relative importance.
gm

Consumer Price Index Number (CPI),


s@

Cost of Living Index Number, OR


Retail Price Index Number
t
ta

“Consumer price index numbers (or cost of living index


es

numbers) measure the changes in the cost of living i.e. cost of


goods of daily use, purchased by a particular class of people in
ze

a town or area”.

These goods (called the market basket) consists of food, house rent,
clothing, fuel and lighting, education, miscellaneous e.g. medicine,
transport, washing, haircut, newspaper, etc.

The price in CPI are the


There are two methods for the computation of cost of living index average retail price paid
number. by the consumer for the
purchase of goods that’s
 Aggregative expenditure method why consumer price
 Household budget method index is also known as
retail price index
number.

193
Chapter 05 Index Numbers

Aggregative Expenditure Method

In this method, we use the Laspeyre’s formula i.e.

To study the changes in


Pon   pnqo  100
prices of different
 poqo
commodities in wholesale
markets we will
Household (Family) Budget Method
construct wholesale price
index number. On the

om
In this method, we use the following formula i.e. other hand, to study the
affects of changes in
 IW price of different

l.c
Pon 
W commodities on public

 pn  we will construct cost of


ai
Where I =  ×100 and W = poqo
living index number.
 po 
gm

EXAMPLE 5.09
s@

Construct the Cost of Living index number, from the given data
using both Aggregative Expenditure and Family Budget
Methods.
t

Prices
ta

Commodities Quantity Unit


1990 po 2000 p1
es

A 50 kg 25 kg 5 9
B 5 kg 1 kg 15 20
Hi Friends!!!
C 10 kg 5 kg 10 15
ze

D 20 dozen 1 dozen 10 20
E 3 Liter 1 Liter 15 25

Solution The necessary computations are given below:

Prices
Commodities qo p1 qo W=po qo I WI
1990 po 2000 p1
A 50 5 9 450 250 180 45000
B 5 15 20 100 75 133.33 10000
C 10 10 15 150 100 150 15000
D 20 10 20 400 200 200 40000
E 3 15 25 75 45 166.67 7500
-- -- -- -- 1175 670 -- 117500

194
Chapter 05 Index Numbers

Aggregative Expenditure Method Household (Family) Budget Method

Po1    IW
p1qo
 100 Pon 
 poqo W
1175 117500
= ×100=175.37   175.37
670 670

Test Yourself

om
Construct the Cost of Living index number for 2011, from the given data using both Aggregative

l.c
Expenditure and Family Budget Methods.
ai Prices
Commodities Quantity Unit
2010 2011
gm

A 50 kg 25 kg 5 9
B 5 kg 1 kg 15 20
C 10 kg 5 kg 10 15
s@

D 20 dozen 1 dozen 10 20
E 3 Liter 1 Liter 15 25
t
ta

Problems involved in the Construction of Index Numbers


es

The problems involved in the construction of index numbers are described below:
ze

 The first problem is to understand the purpose for which an index number is to be
constructed, because every index number do not serve all the purposes e.g. if it is
required to study the changes in the consumer prices of a specified group of people,
the cost of living index number should be constructed. On the other hand to measure
the general price level in the country, wholesale price index number should be
constructed.
 The next problem is to decide what data should be included. The data to be included
to the purpose for which an index number is to be constructed e.g. if cost of living
index number is to be constructed, then retail prices should be collected , if
wholesale price index number is to be constructed, then wholesale prices should be
collected.
 The next problem is to decide what period should be chosen as the base period.

195
Chapter 05 Index Numbers

Sharpen your Pencil


MCQ’s

(1) The index number for the base period is always equal to ______

(A) 100 (B) 1000 (C) one (D) None of these

om
(2) A chain index number provides comparison between ______

(A) year to year (B) first and last year

l.c
(C) both A & B (D) None of these

(3) CPI measures ______


ai
gm
(A) Consumer Price Index (B) Wholesale Price Index
(C) Both A & B (D) None of these

(4) Fisher’s index number is _____


s@

(A) Best Index Number (B) Ideal Index Number


(C) Simple Index Number (D) None of these
t
ta

(5) WPI measures _____


es

(A) Consumer Price Index (B) Wholesale Price Index


(C) Both A & B (D) None of these
ze

(6) Fisher’s index number is ______ of Laspeyre’s and Paasche’s index numbers.

(A) A.M (B) G.M (C) H.M (D) None of these

(7) Base year weighted index number is _____

(A) Laspeyre’s (B) Paasche’s


(C) Fisher’s (D) None of these

(8) The data for each month are expressed as a percentage of a data for the previous month
these percentages are called _____

(A) Price relatives (B) Link relatives


(C) both A & B (D) None of these

196
Chapter 05 Index Numbers

Sharpen your Pencil


MCQ’s

(9) Laspeyre’s index number has _____ bias.

(A) Upward (B) downward


(C) Both A & B (D) None of these

om
(10) Paasche’s index number has _____ bias

l.c
(A) Upward (B) downward
(C) Both A & B (D) aiNone of these

(11) An index number computed for two or more variables is called _____
gm

(A) Simple index number (B) Composite index number


(C) both A & B (D) None of these
s@

(12) Steps involved in the construction of index number are _____

(A) 3 (B) 4 (C) 5 (D) None of these


t
ta

(13) Consumer’s price index number are computed by _____ formula


es

(A) Laspeyre’s (B) Paasche’s


(C) Fisher’s (D) None of these
ze

(14) Price Relative = _____


pn pn
(A)  100 (B)  100
po pn - 1
(C)
 pnqn  100 (D) None of these
 poqo

(15) Link Relative = _____


pn pn
(A)  100 (B)  100
po pn - 1
(C)
 pnqn  100 (D) None of these
 poqo

197
Chapter 05 Index Numbers

Short Questions
ExeRciSe

Q.5.01. What are the steps involved in the construction of price index number?

Q.5.02. Differentiate between fixed base and chain base method.

om
Q.5.03. Explain the term simple and composite index numbers.

Q.5.04. Differentiate between weighted and un-weighted index numbers.

l.c
Q.5.05. What are the problems in the construction of index number?
ai
Q.5.06. If  pnqo  1650 ,  pnqn  2240 ,  p0q0  1640 and  p0qn  2160 then find
gm
Paasches and Marshall index numbers.

Q.5.07. If  pnqo  1650 ,  pnqn  2240 ,  p0q0  1640 and  p0qn  2160 then find
s@

Laspeyre and Fisher index numbers.

Q.5.08. Define index number and give examples.


t
ta

Q.5.09. Calculate simple aggregative index number:


es

Commodities Rice Wheat Maize Tea Tobacco Sugar


Po 30 8 20 100 15 50
Pn 35 10 22 120 10 50
ze

Q.5.10. Give the formulae of four basic weighted index numbers.

Q.5.11. Laspeyre price index number = 254.17 and Fisher price index number = 252.37.
Find Paasche’s price index number.

Q.5.12. If  pnqo  1650 ,  pnqn  2240 ,  p0q0  1640 and  p0qn  2160 then find
Base year weighted index number and Current year weighted index number.

198
Chapter 05 Index Numbers

Long Questions
ExeRciSe

Q.5.01. Compute the price relatives taking 1988 as base and link relatives from the
following data:

Year 1988 1989 1990 1991 1992 1993 1994 1995 1996

om
Prices 5 5.5 6 6.5 7 7.5 8 8.5 9

Q.5.02. Compute chain indices from the following data:

l.c
Year 1980 1981 1982 1983
ai 1984 1985 1986 1987
Prices 15 17 20 18 22 24 23 24
gm
Q.5.03. From the data given below, compute the index numbers of prices, taking 1990 as
base. Use (i) simple average (ii) the median and (iii) the geometric mean
s@

Commodity (Prices in Rs.)


Years
A B C D
1990 2.10 0.75 1.25 1.30
1991 2.15 1.05 1.30 1.30
t

1992 2.25 0.80 1.35 1.32


ta

1993 2.75 1.15 2.05 1.35


1994 3.25 1.75 3.00 1.70
es

Q.5.04. From the data of Q.3, compute Chain indices, taking 1990 as base. Use (i) simple
ze

average (ii) the median and (iii) the geometric mean.

Q.5.05. The price of wheat (per maund) is given for the year 1984 to 1993. Calculate
simple index number using

(i) 1984 as base


(ii) Average of the prices for the first five years as base
(iii) Average of the prices of all the ten years as base.

Years Prices (Rs.) Years Prices (Rs.)


1984 120 1989 127
1985 118 1990 128
1986 123 1991 130
1987 124 1992 132
1988 125 1993 133

199
Chapter 05 Index Numbers

Long Questions
ExeRciSe

Q.5.06. Construct the following weighted aggregative price index numbers for 1991 and
1992, using 1990 as a base, from the given data.

(i) Laspeyre’s index (ii) Paasche’s index

om
(iii) Fisher’s “Ideal” index (iv) Marshall-Edgeworth index

Prices Quantities
Food item

l.c
1990 1991 1992 1990 1991 1992
Meat 38 38 50 10 13 12
Bread 11.25 11.50 11.75 3 4 6
ai
Milk 13.75 13.75 13.75 7 7 9
gm
Butter 16.00 16.50 17.00 2 4 5

Q.5.07. Construct the following weighted aggregative price index numbers for 2000 using
1987 as a base, from the given data.
s@

(i) Laspeyre’s index (ii) Paasche’s index


(iii) Fisher’s “Ideal” index (iv) Marshall-Edgeworth index
t
ta

1987 2000
Commodities
Price value Price Value
es

A 2 148 3 246
B 5 625 4 560
ze

C 7 280 6 198

Q.5.08. Construct the Cost of Living index number, from the given data using both
Aggregative Expenditure and Family Budget Methods.

Prices
Commodities Quantity Unit
1990 2000
A 50 kg 25 kg 5 9
B 5 kg 1 kg 15 20
C 10 kg 5 kg 10 15
D 20 dozen 1 dozen 10 20
E 3 Liter 1 Liter 15 25

200
CHAPTER 06
Set Theory
and Basic Probability

om
Chapter Contents

l.c
ai
Y
gm
ou should read this chapter if you need to learn about:

 Set and its Types: (P202-P205)


s@

 Tree diagram And Venn diagram: (P205-P206)


 Operations on Sets: (P207-P208)
 Experiment and Random Experiment: (P209)
 Trial, Outcome and Sample Space: (P209-P211)
t
ta

 Event and its Types: (P212-P214)


 Counting Techniques: (P214-P225)
es

 Origin of Probability: (P226)


 Probability: (P227)
Definition of Probability: (P227-P235)
ze


 Addition Rules of Probability For Mutually Exclusive Events: (P236-P239)
 Addition Rules of Probability For Not Mutually Exclusive Events:
(P240-P242)
 Understanding the meaning of the words “AND” and “OR” : (P243)
 Rule of Complementation: (P244)
 Conditional Probability: (P244-P246)
 Independent and Dependent Events: (P247)
 Multiplication rule of Probability for independent Events: (P248-P249)
 Multiplication rule of Probability for dependent Events: (P250-P252)
 Interesting in Playing: (P255)
 Exercise: (P258-P262)

201
Chapter 06 Set Theory and Basic Probability

Set

“A well-defined collection of distinct objects is called set”

The objects in a set may be the numbers, people, letters, books, rivers etc.
Sets are usually denoted by capital letters such as A, B, C etc.

 Set of vowels in English alphabets


 Set of books in a library Georg Cantor

om
 Set of students in a college etc.

Finite and Infinite Sets

l.c
“A set consisting of finite number of elements is called finite set”
ai
gm

 Set of vowels Richard Dedekind

 Set of months of a year


s@

 Set of days in a week etc. The modern study of


set theory was
initiated by two

On the other hand “a set consisting of infinite number of elements is German


t

Mathematicians
ta

called infinite sets”


Georg Cantor and
es

Richard Dedekind in
the 1870s.
 Set of points on a line
ze

 Set of stars on the sky


 Set of odd numbers
 Set of even numbers etc.

Null Set or Empty Set

“A set that contains no elements is called an empty set or null set”

A null set is denoted by the symbol  (phi) or by   .

 Number of male students in a girl’s college


 Set of first year statistics students older than 200 years etc.

202
Chapter 06 Set Theory and Basic Probability

Sub-Set

“If each element of a set A is also the elements of set B then A is said to be the subset of B written
as: A  B ”
S

B
If A  1,2,3 4
A
1
And B = 1,2,3,4,5 5 2
3
Then A  B

om
Proper Sub-Set

We call “A”, a proper sub-set of “B” if:


l.c
ai
gm

 “A” is a sub-set of “B”


Every set is a subset of
 A B
itself and the null set is a
s@

subset of every set.


Written as A B
t

If A= 1,2,3
ta

And B = 1,2,3,4,5
es

Then A  B
ze

Improper Sub-Set

We call “A”, an improper sub-set of “B” if:

 “A” is a sub-set of “B”


 A B

If A= 1,2,3
And B = 1,2,3
Then A is an improper sub set of B.

203
Chapter 06 Set Theory and Basic Probability

Equal Sets

“Two sets “A” and “B” are said to be equal, if they contain exactly the same elements”

In other words

If A  B and B  A then A = B S

A
If A= 1,2,3 3 1

om
1 2
And B = 3,1,2 2
3
Then A=B B

l.c
Power Set
ai
gm

“The set of all possible sub-sets of a set is called power set and is denoted by P (A)”

The number of subsets in power set may be counted by 2n .


s@

If A= 1,2,3 then power set contains 23 = 8 subsets i.e.


t
ta

P( A)    , 1 , 2 , 3 , 1, 2 , 1,3 , 2,3 , 1, 2,3


es
ze

Disjoint Sets

“If there is no element common in between the two “A” and “B”, then they are called disjoint
sets”

S
Disjoint sets are also called mutually exclusive sets.
A
a
If A= 1,2,3 and B  a, b, c 1 2 c
b
3
Then A and B are disjoint sets. B

204
Chapter 06 Set Theory and Basic Probability

Overlapping sets

“If at least one element is common in between two sets


such that they are not subsets of each other then they
are called overlapping sets”
S

A
If A= 1,2,3,4 and B = 4,5,6,7 5
1 2 6 7
4

om
Then A and B are overlapping sets. 3
B

l.c
Universal Set ai
“The set which is consisted of all the elements specified for some discussion is called universal set”.
gm

It is denoted by U or S.
s@

Product Set OR
Cartesian product of Sets
t

The Cartesian product of sets “A” and “B” denoted by A x B (read as “A”
ta

cross “B”) is the set of elements that contains all the ordered pairs (x, y)
where x  A and y  B
es
ze

If A  H , T  and B  1, 2
The Cartesian
 A  B   H ,1 ,  H , 2  , T ,1 , T , 2  product is named
after Rene Descartes,
a French
Mathematician

Tree diagram

“A systematic method of finding Cartesian product


through a diagram is called tree diagram”

205
Chapter 06 Set Theory and Basic Probability

If for a coin, A  H , T  and for a die, B  1, 2,3, 4,5,6

 H ,1 ,  H , 2  ,  H ,3 ,  H , 4  ,  H ,5  ,  H , 6  
 
 A B   
T ,1 , T , 2  , T ,3 , T , 4  , T ,5  , T , 6 
 

om
A B
l.c
ai
gm
t s@

Venn diagram
ta
es

“The simple and effective way of representing the relationships


between sets diagrammatically is called Venn diagram”.
ze

In Venn diagram the universal set U (or S) is represented by a rectangle


and the sub sets are represented by circles inside the rectangles e.g.

If A= 1,2,3,4 and B = 4,5,6,7 then, they can be


represented by the Venn diagram as:
S
In 1880, British
A
Philosopher John
5
1 2 6 7 Venn introduced the
4
3 Venn Diagrams.

206
Chapter 06 Set Theory and Basic Probability

Operations on Sets

Like algebraic operation such as addition, subtraction, multiplication and division in mathematics, we
have basic operations on sets i.e.:

 Union of two sets


 Intersection of two set
 Difference of two sets
 Complement of a set

om
Union of sets

l.c
The union of two sets “A” and “B” is the set of all elements that belongs to “A” or to “B” or to
both “A” and “B”. The union of two sets “A” and “B” is denoted by A  B .
ai S
gm

If A= 1,2,3,4 and B = 5,6,7 A


5 6
1 2
Then A  B = 1,2,3,4,5,6,7 3 4
7
s@

A  B is shaded
t

S
ta

If A= 1,2,3,4,5,6,7 and B = 4,5,6,7,8,9


es

A
45 8
1 2
Then A  B = 1,2,3,4,5,6,7,8,9 6 7 9
ze

3
B

Intersection of sets A  B is shaded

The intersection of two sets “A” and “B” is the set of elements that belongs to both “A” and “B”.
The intersection of two sets “A” and “B” is denoted by A  B
S

If A= 1,2,3,4,5,6,7 A
And B = 4,5,6,7,8,9 1 2
45 8
6 7 9
3
Then A  B = 4,5,6,7 B

A  B is shaded

207
Chapter 06 Set Theory and Basic Probability

Difference of sets

The difference of sets “A” and “B” is the set of elements that belongs to “A” but do not belongs to
“B”. The difference of two sets “A” and “B” is denoted by A  B or A  ( A  B) or A  B
S
If A= 1,2,3,4,5,6,7
A
And B = 4,5,6,7,8,9 45 8
1 2
6 7 9

om
Then A - B = 1,2,3 3
B

A  B is shaded

l.c
S
If A= 1,2,3,4,5
ai A
And B = 1,2,3 4
1 2
gm
5 B
3
Then A - B = 4,5
s@

A  B is shaded
S
If A= 1,2,3
t

A
And B  a, b, c
ta

a c
1 2
b
es

3
Then A - B = 1,2,3 B
ze

A  B is shaded

Complement of a Set

The complement of a set “A” is the set of elements that belongs to “S” but do not belongs to “A”
The complement of set “A” is denoted by A or Ac
S

If S = 1,2,3,4,5,6
And A= 2,4,6,7 2 4
1 3
6
5 7
Then A = S - A= 1,3,5 A

A is shaded

208
Chapter 06 Set Theory and Basic Probability

Experiment

“An experiment is a process in which we obtain results”

Random Experiment

In our daily life, we perform many activities which have a fixed result no matter any number of times
they are repeated. For example given any triangle, without knowing the three angles, we can definitely
say that the sum of measure of angles is 180°. We also perform many experimental activities, where the

om
result may not be same, when they are repeated under identical conditions. For example, when a coin is
tossed it may turn up a head or a tail, but we are not sure which one of these results will actually be
obtained. Such experiments are called random experiments.

l.c
A random experiment satisfies the following three properties:

 It can be repeated any number of times.


ai
 It has more than one possible outcome.
gm

 It is not possible to predict the outcome in advance.

Hence we may define the random experiment as “An experiment that generates uncertain results
s@

under similar conditions, is called random experiment”


t
ta

 Tossing of a coin
 Rolling of a dice
es

 Drawing a card from a pack of 52 playing cards etc.


ze

Trail

“A single performance of an experiment is called a trial” I denote a I denote a


Tail on a coin Head on a coin

Outcome

“A possible result of a random experiment is called outcome”

If we toss a coin then “H” or “T” may be the outcomes.


T H
209
Chapter 06 Set Theory and Basic Probability

Sample space

“A set consisting of all possible outcomes of a random experiment is


called a sample space”. It is denoted by “S” and each element of a
sample space is called a sample point. Number of sample points
in a sample space for
coin tossing experiment
 If a coin is tossed can be determined by
Then S  H , T  2n, where “n” is the
number of coin. And for

om
die rolling experiment
 If two coins are tossed Head Tail
6n, where “n” is the
Then S  HH , HT , TH , TT 

l.c
number of dice.

 If three coins are tossed


ai
Then S  HHH , HHT , HTH , THH , TTH , THT , HTT , TTT 
gm

 If a dice is rolled
s@

Then S  1,2,3,4,5,6
t
ta

 If two dice are rolled, then


es

 1,1 1, 2  1,3 1, 4  1,5  1, 6  



 2,1  2, 2   2,3  2, 4   2,5  2, 6  
ze

 3,1  3, 2   3,3  3, 4   3,5   3, 6  


S 
 4,1  4, 2   4,3  4, 4   4,5  4, 6  
 5,1  5, 2   5,3  5, 4   5,5   5, 6  
 
 6,1  6, 2   6,3  6, 4   6,5   6, 6  

210
Chapter 06 Set Theory and Basic Probability

 If we draw a card from a deck of 52 playing cards then the sample points in the sample space are:

om
l.c
ai
gm
s@

Hearts Diamonds Spades Clubs


t
ta

 If we draw a ball from a basket having 3 different color balls then the sample points in the
es

sample space may be as follows:


ze

211
Chapter 06 Set Theory and Basic Probability

Event

“Any sub set from a sample space is called an event”

Events are usually denoted by A, B, C etc.


 Each element of a
sample space “S” is

If we toss two coins called sample point.

Then S  HH , HT , TH , TT   Total number of


sample points in
Now if A  HH  , then “A” is called an event.

om
sample space is
denoted by n(S)
 Favorable cases of an
Simple Event

l.c
event “A” is denoted
by n(A)
“If an event contains only one sample point from the sample space
ai
then it is called simple event”
gm

If we toss two coins then S  HH , HT , TH , TT 


s@

If A  HH  , then “A” is called a simple event.


t

Compound Event
ta
es

“If an event contains two or more sample points from the sample space then this is called a
compound event”
ze

If we toss two coins then S  HH , HT , TH , TT 


If A  HT ,TH  , then “A” is called a compound event.

The Certain or Sure Event

“An event consisting of the sample space itself is called the sure event”

Impossible event

“An event consisting of the null set is called the impossible event”

212
Chapter 06 Set Theory and Basic Probability

Mutually Exclusive (Disjoint) Events

“Events in a same sample space are said to be mutually exclusive if they cannot occur together”

For two mutually exclusive events “A” and “B” A  B = 

If we toss a coin then “H” and “T” are mutually exclusive because if “H” occurs
then “T” cannot take place; similarly 1, 2, 3, 4, 5 and 6 are mutually exclusive
when a dice is rolled. In other words they exclude each other.

om
Coin S Die S

l.c
4 5
T
H 2
ai 6
3
gm
s@

Not Mutually Exclusive (Overlapping) Events

“Events in a same sample space are said to be not mutually exclusive if they can occur together”
t

For two not mutually exclusive events “A” and “B” A  B  


ta
es

If a card is drawn at random from a pack of 52 playing cards then it may be at the
same time an “Ace” and a “Diamond”; therefore “Ace” and “Diamond” are not
ze

mutually exclusive.

A
D

213
Chapter 06 Set Theory and Basic Probability

Equally likely Events

“Events are said to be equally likely if they have the same chances of occurrence”

If we toss a fair coin then “H” and “T” are equally likely; because they have the
same chances of occurences.

Exhuastive Events

om
“Two or more events defined in the same sample sapce are said

l.c
to be exhaustive if their union is equal to the sample space”
ai
If S  1,2,3,4,5,6 An event “A” and its
gm

Let A= 1,3,5 and B = 2,4,6 compliment “ A ” are


always exhaustive i.e.
Then A  B = 1,2,3,4,5,6  S
A A= S
s@

Therefore “A” and “B” are exhaustive events.

Counting Techniques
t
ta

Sometimes it is very difficult to list all the sample points of a sample space; therefore we use some
es

mathematical techniques for finding the number of sample points of the sample space. These techniques
are called counting techniques i.e.
ze

 Factorial
 Rule of Multiplication
 Permutation
 Combination

Factorial

“The product of first “n” natural numbers is called Factorial and is denoted by n!”

2! = 2×1= 2 0!=1
5! = 5  4  3  2×1= 120

In general n! = n(n - 1)(n - 2)(n - 3).....3..2.1


1!=1

214
Chapter 06 Set Theory and Basic Probability

Rule of Multiplication

“If a selection operation can be performed in “m” ways and a second selection operation can be
performed in “n” ways; then the two operations can be performed together in “ "m×n" ” ways”

 A coin is tossed and a die is rolled; here operation one i.e. the coin gives
H,T  and the second operation i.e. the die gives 1,2,3,4,5,6 ; hence the two
operations can be performed in 2×6 = 12 ways.

om
If a man has 3 suits and 5 ties; then he can wear a suit and a tie in
3×5 = 15 ways.

l.c
ai
Permutation
gm

“A permutation is an arrangement of “r” objects taken


from “n” distinct objects in a particular order”
s@

n!
It is denoted by nPr and is given by: nPr = The first book on
 n - r !
t

permutations and
ta

Instead of nPr we can also use n Pr or P(n,r) combinations is


written by Swiss
es

mathematician, Jacob
Bernoulli in 1713 A.D
ze

EXAMPLE 6.01

How many different permutations can be formed from the letters A, B, C when two letters are
taken at a time?

Solution Here n = 3 and r = 2


ABC
ACB
n! 3! BAC
Therefore nPr =  3P2 = 6
 n - r !  3 - 2 ! BCA
CAB
CBA

215
Chapter 06 Set Theory and Basic Probability

EXAMPLE 6.02

In how many ways “3” persons can be seated on “4” chairs?

Solution Here n = 4 and r = 3

n! 4!
Therefore nPr =  4P3 =  24
 n - r !  4 - 3 !

om
EXAMPLE 6.03

In how many ways can president, vice-president, secretary and treasure be selected from nine

l.c
members of a committee?
ai
Solution Here n = 9 and r = 4
gm

n! 9!
Therefore nPr =  9P4 =  3024
 n - r !  9 - 4 !
s@

EXAMPLE 6.04
t

In how many ways 2 lottery tickets are drawn from 16 for the 1st and 2nd prizes?
ta
es

Solution Here n = 16 and r = 2


ze

n! 16!
Therefore nPr =  16P2 =  240
 n - r ! 16 - 2 !

EXAMPLE 6.05

In how many ways can two different books out of 5 books be arranged on a shelf?

Solution Here n = 5 and r = 2

n! 5!
Therefore nPr =  5P2 =  20
 n - r !  5 - 2 !

216
Chapter 06 Set Theory and Basic Probability

EXAMPLE 6.06

In how many ways can 5 different books be arranged on a shelf?

Total number of
Solution Here n = 5 permutation of “n”
distinct objects taking all
Therefore Number of permutation = n!  5! = 120 “n” at a time is equal to
“n!”

om
EXAMPLE 6.07

l.c
In how many ways can four people be lined up to get on a bus?
ai
gm
Solution Here n = 4

Therefore Number of permutation = n!  4! = 24


s@

EXAMPLE 6.08
t
ta

How many different words can be formed from the letters of the word “BOXER” if:
es

1) All the letters are taken at a time


2) Three letters are taken at a time
ze

Solution

1) All the letters are taken at a time:

Here n = 5

Therefore Number of permutation = n!  5! = 120

2) Three letters are taken at a time

Here n = 5 and r = 3
Hi Friends!!!
n! 5!
Therefore nPr =  5P3 =  60
 n - r !  5 - 3 !

217
Chapter 06 Set Theory and Basic Probability

EXAMPLE 6.09

Find the number of arrangements of 8 distinct books on a shelf taken:

1) Taken all books at a time


2) Three books are taken at a time

Solution

1) All the letters are taken at a time:

om
Here n = 8
Therefore Number of permutation = n!  8! = 40320

l.c
If we arrange objects in
2) Three letters are taken at a time ai a circle then there is no
starting point to it,
Here n = 8 and r = 3 therefore we fixed one
gm
n! 8!
Therefore nPr =  8P3 =  336 object and the remaining
 n - r !  8 - 3 ! objects are arranged as
in linear permutation.
s@

The formula for


arranging “n” objects in
a circle is (n-1)!
t

EXAMPLE 6.10
ta
es

In how many ways can 4 people be seated at round table?


ze

Solution Here n = 4

Therefore
Number of circular permutation =  n - 1 !
  4 - 1 ! = 3! = 6

Group Permutation

The number of distinct permutations of “n” things when “n1” are alike, “n2” are alike but different from
the first group; “n3” are alike but different from the first and second group and so on; for “k” groups, is:
k
n!
P= Where n =  ni
n1 ! n2 !nk ! i 1

218
Chapter 06 Set Theory and Basic Probability

EXAMPLE 6.11

How many possible permutations can be formed from the letters of the word “STATISTICS”?

Solution Here n = 10

n1 = number of “S” = 3
n2 = number of “T” = 3
n3 = number of “A” = 1

om
n4 = number of “I” = 2
n5 = number of “C” = 1

l.c
n! 10!
Therefore P = = = 50400
n1!× n2!× n3!× n4 ×!n5! 3!×3!×1!× 2!×1!
ai
EXAMPLE 6.12
gm

How many different ways 3 red, 3 yellow and 3 blue balls are arranged in a string with 9
sockets?
s@

Solution Here n = 9
t

n1 = number of red balls = 3


ta

n2 = number of yellow balls = 3


n3 = number of blue balls = 3
es

Hi Friends!!!
n! 9!
ze

Therefore P = = = 1680
n1!× n2!× n3! 3!×3!×3!

EXAMPLE 6.13

In how many possible orders can two boys and three girls be born to a family having five
children?

Solution Here n = 5

n1 = number boys = 2
n2 = number of girls = 3

n! 5!
Therefore P = = = 10
n1!× n2! 2!×3!

219
Chapter 06 Set Theory and Basic Probability

Test Yourself

1) How many permutations can be formed out of the letters of the word “MISSISSIPPI”?
2) Make permutations of A, B, C, D.
3) In how many ways can 4 people be seated at round table?
4) Fine 7 P3 , 4 P2 , 12 P5 , 10 P8
5) Find the number of arrangements of 6 distinct books on a shelf taken:
(i) Taken all books at a time (ii) Three books are taken at a time
6) In how many ways can 8 people be lined up to get on a bus?

om
The Order is important in Permutation!!!

l.c
ai
There are six different ways in which three horses can finish a race as shown in the figure:
(Assume that there are no ties and that every horse finishes)
gm

C
s@


C
B
1st way B A 2nd way
A
t
ta
es

B
B


ze

A
3rd way C A C 4th way

C
C

5th way A B
 A
B 6th way

220
Chapter 06 Set Theory and Basic Probability

Combination

“A combination is a selection of “r” objects taken from “n” distinct objects without regarding
any order”

It is denoted by nCr and is given by:

n!
nCr =
n ! n - r !

om
n
Instead of nCr we can also use nCr , C(n,r) or  
r

l.c
EXAMPLE 6.14
ai
gm

How many combinations of the letters A, B, C can be made if two letters are taken at a time?
s@

Solution Here n = 3 and r = 2 ABC


BCA
t

n! 3!
ta

Therefore nCr =  3C2 = 3 CAB


r!  n - r ! 2! 3 - 2 !
es

EXAMPLE 6.15
ze

In how many ways can a team of 11 players be chosen from a total of 15 players?

Solution Here n = 15 and r = 11

n! 15!
Therefore nCr =  15C11 =  1365
r!  n - r ! 11!15 - 11!

221
Chapter 06 Set Theory and Basic Probability

EXAMPLE 6.16

In how many ways can we select a committee of 4 people from a group of 10 people?

Solution Here n = 10 and r = 4

n! 10!
Therefore nCr =  10C4 =  210
r!  n - r ! 4!10 - 4 !

om
EXAMPLE 6.17

l.c
In how many ways can we select a set of 6 books from 10 different books?
ai
Solution Here n = 10 and r = 6
gm

n! 10!
Therefore nCr =  10C6 =  210
r!  n - r ! 6!10 - 6 !
s@

EXAMPLE 6.18
t
ta

In how many ways can we select a card from a pack of 52 playing cards?
es

Solution
ze

Here n = 52 and r = 1

n! 52!
Therefore nCr =  52C1 =  52
r!  n - r ! 1! 52 - 1!

The 52 ways shown in the following figure:

222
Chapter 06 Set Theory and Basic Probability

EXAMPLE 6.19

A bag contains 7 balls; in how many ways can we select 3 balls?

Solution Here n =7 and r = 3

n! 7!
Therefore nCr =  7C3 =  35
r!  n - r ! 3!7 - 3 !

om
EXAMPLE 6.20

l.c
A basket contains 5 white and 4 black balls; in how many ways can we select 3 white and 2
black balls? ai
Solution Here
gm

White Black Total


5 4 9
s@

5!
“3” white balls can be selected out of “5” in 5C3 =  10 ways
3! 5 - 3 !
t
ta

4!
“2” black balls can be selected out of “4” in 4C2 =  6 ways
2! 4 - 2 !
es

Hence the number ways in which “3” white and “2” black balls are selected = 10 x 6 = 60
ze

EXAMPLE 6.21

In how many ways can a consonant and a vowel be chosen out of the letters of the word
SCHOLAR?

Solution Here Consonants Vowels Total


SCHOLAR
5 2 7

5!
A consonant can be selected out of “5” in 5C1 =  5 ways
1! 5 - 1!
2!
A vowel can be selected out of “2” in 2C1 =  2 ways
1! 2 - 1!
Hence the number ways in which a consonant and a vowel is selected = 5 x 2 = 10

223
Chapter 06 Set Theory and Basic Probability

EXAMPLE 6.22

From 4 black, 5 white and 6 gray balls; how many selection of 9 balls are possible if 3 balls of
each color are to be selected?

Solution Here

Black White Gray Total


4 5 6 15

om
4!
“3” black balls can be selected out of “4” in 4C3 =  4 ways
3! 4 - 3 !

l.c
5!
“3” white balls can be selected out of “5” in 5C3 =  10 ways
3! 5 - 3 !
ai
6!
“3” gray balls can be selected out of “6” in 6C3 =  20 ways
gm

3! 6 - 3 !
Hence the number ways in which 3 balls of each color are to be selected= 4x10x20=800
s@

EXAMPLE 6.23
t

A committee of 5 persons is to be selected out of 6 men and 2 women. Fine the number of ways
ta

in which more men are selected than women?


es

Solution Here
Committee Men Women Total
ze

6 2 8

Now here more men can be selected in “3” mutually exclusive ways i.e.

 5 men   4 men   3 men 


  or   or  
 0 women  1 women   2 women 

 6  2   6  2   6  2 
    or    or   
 5  0   4  1   3  2 

Since the three ways are mutually exclusive; therefore the number of ways in which more
 6  2   6  2   6  2 
men than women can be chosen are     +    +    = 56
 5  0   4  1   3  2 

224
Chapter 06 Set Theory and Basic Probability

The Order Doesn’t matter in Combination!!!

There are three different combinations in which three horses can finish a race as shown in the figure:
(Assume that there are no ties and that every horse finishes)


C
1st B
B A
A
combination

om
l.c
B


B
ai
2nd A
C A C
gm

combination
s@


C
t
ta

3rd B A
A
B
combination
es
ze

Test Yourself

1) In how many ways can we select a set of 3 tables from 9 different tables?
2) A bag contains 6 balls; in how many ways can we select 4 balls?
3) A bag contains 9 white and 8 black balls; in how many ways can we select 6 white and 4
black balls?
4) Fine 7C3 , 4C2 , 12C5 , 10C8
5) In how many ways can a consonant and a vowel be chosen out of the letters of the word
CHOSEN?
6) In how many ways can a team of 11 players be chosen from a total of 13 players?

225
Chapter 06 Set Theory and Basic Probability

Origin of Probability!!!

Probability theory had its origin in the 16th


century when an Italian physician and
mathematician J.Cardan wrote the first book
on the subject, “The Book on Games of
Chance”. Cardan was an astrologer,
philosopher, physician, mathematician, and
gambler. This book was published in 1663
De Moivre after his death. It contained techniques on J. Cardano
how to cheat and how to catch others at

om
cheating.

In 1654, a professional gambler named

l.c
Chevalier de Mere approached the well
known French Philosopher
ai and
Mathematician Blaise Pascal for certain dice
problem. Pascal became interested in these
B. Pascal
gm
Laplace problems, studied them and discussed them
with another French mathematician, Pierre
de Fermat. Both Pascal and Fermat solved
s@

the problems independently. This work was


the beginning of Probability Theory.

Outstanding contributions to probability


t

theory were also made by J. Bernoulli, De


ta

Moivre, Pierre Laplace, Lagrange


Chebyshev Fermat
es

Chebyshev, Markov, Bayes, Huygens and


Kolmogorov.
ze

The equation of the normal curve was first


published in 1733 by De Moivre. The same
result was later developed by two
mathematical astronomers Laplace and
Gauss.
Bayes
J. Bernoulli

Gauss Huygens Markov Lagrange Kolmogorov

226
Chapter 06 Set Theory and Basic Probability

Probability

In our daily life we often make the statements such as:

 It will probably rain today


 I will probably go abroad this year
 He is almost certain that he will win this game

All these statements are related with uncertainty and can


be measured numerically by means of “probability”.
Thus we may simply define probability as “the numeric

om
measure of uncertainty is called probability”.

l.c
Though probability started with gambling, it has been
used extensively in the fields of Physical Sciences,
Commerce, Biological Sciences, Medical Sciences,
ai
Weather Forecasting, etc.
gm

Definition of Probability
s@

Usually probability of an event is defined by


adopting any of the following two approaches:
t
ta

1) Subjective approach
es

2) Objective approach
ze

Probability

Subjective Objective
Approach Approach

Subjective Approach

In subjective approach the probability of an event is defined as “the measure of believe in the
occurrence of an event by a particular person”. Probability in this sense is purely subjective, and is
based on whatever evidence is available to the person.

227
Chapter 06 Set Theory and Basic Probability

For example:

 A sports-writer may say that there is a 70% probability that Australia will
win the world cup.
 A physician might say that, there is a 30% chance the patient will need an
operation etc.

Objective Approach

om
In Objective approach, the probability of an event is defined in the following three ways:

 Classical or Priori or Theoretical Definition of Probability


 Relative Frequency or Empirical or Experimental Definition of Probability

l.c
 The Axiomatic Definition of Probability
ai
Objective
gm
Approach
s@

Classical Relative Frequency The Axiomatic


Definition Definition Definition
t
ta

Classical Definition
es

“If a random experiment can produce “n” mutually exclusive and


ze

equally likely outcomes, and if “m” of these outcomes are favorable to


the occurrence of an event “A”, then the probability of the event “A”
is equal to the ratio m/n” If we take P(A) as “the probability of A” then:

m No. of favourable outcomes


P(A)= =
n No. of possible outcomes
The classical
definition was
 For example, when a fair Coin is tossed, then formulated by
we know in advance that the possible
the French
outcomes are Head and Tail. Since the Head
mathematician
and Tail are equally likely, therefore, the
probability of each is 1/2 or 0.5. P.S. Laplace

228
Chapter 06 Set Theory and Basic Probability

Relative Frequency Definition

“If “m” is the number of occurrences of an event “A” in large number of trials “n”, then the
probability of “A” is the relative frequency of “m” and “n” as the number of trials grows infinitely
large” If we take P(A) as “the probability of A” then:

m
P(A)= lim  
 
n  n

om
For example, if a coin has been loaded (unfair), then the probability of
Head and Tail will not be equal to 0.5 i.e. the Head and Tail are not
equally likely. Thus for experiments not having equally likely outcomes if
we flip the coin 10 times, say, and observe 4 heads, then, based on this

l.c
information, we say that the chance of observing a head will be 4/10 or
0.4, which is not the same as 0.5. If, however, we flip the coin a large
ai
number of times, we would expect about 50 percent of the flips result in a
head.
gm

The Axiomatic Definition


s@

Let S be a sample space with the sample points A1, A2 … Ai …An. To


each sample point, we assign a real number, denoted by P(Ai), and
t

called the probability of Ai, that must satisfy the following basic
ta

axioms:
es

 Axiom 1: For any event Ai 0  P(Ai )  1


ze

 Axiom 2: P(S)  1
 Axiom 3: If Ai and Aj are mutually exclusive events, The Axiomatic
Then P(Ai  Aj )  P( Ai )  ( Aj ) definition was
introduced in
In this case P(Ai) is defined by the formula: 1933 by the
Russian
n(Ai ) No. of sample points in the event Ai mathematician.
P(Ai )= =
n(S) No. of sample points in the sample space A.N. Kolmogorov

Subjective probability is purely subjective i.e. that two or more persons faced with the same
evidence may arrive at different probabilities. On the other hand, objective probability
relates to those situations where everyone will arrive at the same conclusion.

229
Chapter 06 Set Theory and Basic Probability

Range of Probability

If the probability of an event is 1, the event is certain to occur. If the


probability of an event is 0, the event is impossible. A probability of 0.5
indicates that an event has an even chance of occurring. The following The closer the probability
graph shows the possible range of probabilities and their meanings.
is to 1, the more likely is
an event will occur.
Similarly,
The closer the probability
is to 0, the less likely is

om
an event will occur.
EXAMPLE 6.24

l.c
A fair coin is tossed only once what is the probability that a Head
will appear? ai
Solution Since a coin is tossed
gm

Therefore S  H , T   n(S )  2
s@

Let “A” denotes the event of getting “a Head”

Then A  H   n( A)  1
t

Probabilities should be
ta

n(A) 1 expressed as reduced


Hence P(A)=   0.50
es

n( S ) 2 fractions or rounded to
two or three decimal
EXAMPLE 6.25
ze

places. When the


probability of an event is
Two fair coins are tossed simultaneously, what is the probability
that at least one head will appear? an extremely small
decimal, it is permissible
to round the decimal to
Solution Since two coins are tossed the first nonzero digit
after the point. For
Therefore S  HH , HT , TH , TT   n(S )  4 example, 0.0000587
would be 0.00006
Let “A” denotes the event of getting “at least one Head”

Then A  HH , HT , TH   n( A)  3
n(A) 3
Hence P(A)= 
n( S ) 4

230
Chapter 06 Set Theory and Basic Probability

EXAMPLE 6.26

A die is rolled find the probability of getting a six?

Probabilities can be
Solution Since a die is rolled expressed as fractions,
decimals, or percentages.
Therefore S  1,2,3,4,5,6  n(S )  6 If you ask, “What is the
probability of getting a
Let “A” denotes the event of getting “a six” head when a coin is

om
tossed?” typical
Then A  6  n( A)  1 responses can be any of

l.c
the following three.
n(A) 1 “1/2” “0.5” “50%”
Hence P(A)=  ai
n( S ) 6 These answers are all
equivalent.
gm

EXAMPLE 6.27

Two dice are rolled, find the probability that the sum is:
s@

(1) Exactly “5” (2) At least “9” (3) At most “4”


(4) Even (5) Less than “3
t
ta

Solution Since two dice are rolled therefore:


 1,1 1, 2  1,3 1, 4  1,5  1, 6  
es


 2,1  2, 2   2,3  2, 4   2,5  2, 6  
 3,1  3, 2   3,3  3, 4   3,5   3, 6  
ze

S 
 4,1  4, 2   4,3  4, 4   4,5  4, 6  
 5,1  5, 2   5,3  5, 4   5,5   5, 6  
 
 6,1  6, 2   6,3  6, 4   6,5   6, 6  
1) The sum is “Exactly “5”

Let “A” be an event of getting “sum is exactly 5”    1, 2  1,3 1, 4  1,5  1, 6  
1,1
 2,1  2, 2   2,3  2, 4   2,5  2, 6  
1, 4  ,  2,3 , 
   3,1  3, 2   3,3  3, 4   3,5   3, 6  
Then A     n( A)  4 S 
 3, 2  ,  4,1 
   4,1  4, 2   4,3  4, 4   4,5  4, 6  
n(A) 4  5,1  5, 2   5,3  5, 4   5,5   5, 6  
Hence P(A)=   
n( S ) 36  6,1  6, 2   6,3  6, 4   6,5   6, 6  

231
Chapter 06 Set Theory and Basic Probability

2) The sum is “At least “9”

Let “B” be an event of getting “sum is at


least 9” then  1,1 1, 2  1,3 1, 4  1,5  1, 6  

 2,1  2, 2   2,3  2, 4   2,5  2, 6  
 3, 6  ,  4,5  ,  5, 4  ,  6,3 ,  4, 6  , 
   3,1  3, 2   3,3  3, 4   3,5   3, 6  
B S 
 5,5 ,  6, 4  ,  5, 6  ,  6,5  ,  6, 6  


  4,1  4, 2   4,3  4, 4   4,5  4, 6  
 5,1  5, 2   5,3  5, 4   5,5   5, 6  
 
 n( B)  10  6,1  6, 2   6,3  6, 4   6,5   6, 6  

om
n(B) 10
Hence P(B)= 
n( S ) 36

l.c
3) The sum is “At most “4”

Let “C” be an event of getting “sum is at


ai
most 4” then  1,1 1, 2  1,3 1, 4  1,5  1, 6  

 2, 6  
gm

 2,1  2, 2   2,3  2, 4   2,5


1,1 , 1, 2  , 1,3 , 
   3,1  3, 2   3,3  3, 4   3,5   3, 6  
C   n(C )  6 S 
 2,1 ,  2, 2  ,  3,1 
   4,1  4, 2   4,3  4, 4   4,5  4, 6  
s@

 5,1  5, 2   5,3  5, 4   5,5   5, 6  


 
Hence P(C)=
n(C) 6
  6,1  6, 2   6,3  6, 4   6,5   6, 6  
t

n( S ) 36
ta

4) The sum is “Even”


es

Let “D” be an event of getting “sum is even” then


ze

1,1 , 1,3 ,  2, 2  ,   1,1


  1, 2  1,3 1, 4  1,5  1, 6  
 3,1 , 1,5  ,  2, 4  ,  
 2,1  2, 2   2,3  2, 4   2,5  2, 6  
 3,3 , 4, 2 , 5,1 , 
        3,1  3, 2   3,3  3, 4   3,5   3, 6  
D   n( D)  18 S 
 2, 6  ,  3,5  ,  4, 4  ,   4,1  4, 2   4,3  4, 4   4,5  4, 6  
 5,3 , 6, 2 , 4, 6 ,   5,1  5, 2   5,3  5, 4   5,5   5, 6  
        
 5,5  ,  6, 4  ,  6, 6  
   6,1  6, 2   6,3  6, 4   6,5   6, 6  

n(D) 18
Hence P(D)= 
n( S ) 36

232
Chapter 06 Set Theory and Basic Probability

5) The sum is “Less than “3”

Let “D” be an event of getting “is less than  1,1 1, 2  1,3 1, 4  1,5  1, 6  

3” then
 2,1  2, 2   2,3  2, 4   2,5  2, 6  
E  1,1  n( E )  1  3,1  3, 2   3,3  3, 4   3,5   3, 6  
S 
 4,1  4, 2   4,3  4, 4   4,5  4, 6  
n(E) 1  5,1  5, 2   5,3  5, 4   5,5   5, 6  
Hence P(E)=   
n( S ) 36  6,1  6, 2   6,3  6, 4   6,5   6, 6  

om
EXAMPLE 6.28

l.c
A card is drawn at random from an ordinary pack of 52 playing cards. Find the probability that
the card drawn is “8”?
ai
gm

Solution Since a card is drawn therefore


Eights Others Total
 52  4 48 52
S  the pack of 52 cards  n(S )     52
s@

1

S
t
ta
es

A
ze

 4
Let “A” be the event that “the card is eight” Then n( A)     4
1
n(A) 4 1
Hence P(A)=  
n( S ) 52 13

233
Chapter 06 Set Theory and Basic Probability

EXAMPLE 6.29

A basket contains 5 white and 4 black balls; what is the probability of selecting 3 white balls?

White Black Total


Solution Since “3” balls are selected out of “9”
5 4 9
9
Therefore n( S )     84
 3

om
Let “W” be the event of “selecting 3 white balls”

l.c
5 S
Then n(W )     10 W
 3 ai
n(W) 10

gm
Hence P(W)=
n( S ) 84
s@

EXAMPLE 6.30

A box contains 3 gray and 5 black balls. If 4 balls are drawn together from the box then find the
t

probability of getting:
ta

(i) At least 2 black balls (ii) At most 2 gray balls.


es

Solution Since 4 balls are drawn from 8 balls: Gray Black Total
ze

3 5 8
8
Therefore n( S )=    70
 4

Let “A” be an event of getting “at least 2 black balls i.e. two or more black balls:

Now “A” can occur in the following mutually exclusive ways:

 2 black   3 black   4 black 


  OR   OR  
 2 gray   1 gray   0 gray 

234
Chapter 06 Set Theory and Basic Probability

 5  3   5  3   5  3   5  3   5  3   5  3 
 n( A)     or    or     n( A)     +    +    = 65
 2  2   3  1   4  0   2  2   3  1   4  0 

n( A) 65 13
Hence P( A)   
n( S ) 70 14

Let “B” be an event of getting “at most 2 black balls i.e. two or less black balls:

Now “B” can occur in the following mutually exclusive ways:

om
 2 black  1 black 

l.c
  OR  
 2 gray  ai  3 gray 

 5  3   5  3   5  3   5  3 
 n( B)     or     n( B)     +    = 60
gm
 2  2   1  3   2  2   1  3 

n( B) 60 6
Hence P( B)   
s@

n( S ) 70 7
t

Test Yourself
ta
es

1) A fair coin is tossed only once what is the probability that a Tail will appear?
2) Two fair coins are tossed, what is the probability that at least two head will appear?
ze

3) A die is rolled find the probability of getting a four?


4) Two dice are rolled, find the probability that the sum is:

(i) Exactly “4” (ii) At least “10” (iii) At most “5”


(iv) Odd (v) Less than “2”

5) A card is drawn at random from an ordinary pack of 52 playing cards. Find the
probability that the card drawn is “picture”?
6) A basket contains 6 white and 3 black balls; what is the probability of selecting 4 white
balls?
7) A box contains 4 gray and 6 black balls. If 4 balls are drawn together from the box then
find the probability of getting:

(i) At least 2 black balls (ii) At most 3 gray balls.

235
Chapter 06 Set Theory and Basic Probability

Addition Rule of prbability for Mutually Exclusive Events

Statement: Let “A” and “B” are two mutually exclusive events then the probability that “A” or
“B” occurs is equal to the probability that “A” occurs plus the probability that “B”
occurs i.e.

P(A or B)= P(A)+ P(B)


OR P(A  B)= P(A)+ P(B)

om
S
Proof: To prove the theorem, consider the two Mutually
Exclusive events “A” and “B” in the Venn-diagram: B
A

l.c
It is clear from the Venn-diagram that: q
ai p
n(S )  m
m
n( A)  p
gm

n( B)  q n  A  B   p  q is shaded
n( A  B)  p  q
s@

Now

n(A) p
t

P(A)= 
ta

n( S ) m
n(B) q

es

P(B)=
n( S ) m
ze

Therefore

n(A  B) p  q p q
P(A  B)=     P(A)+ P(B)
n( S ) m m m

 P(A  B)= P(A)+ P(B) Hence proved

236
Chapter 06 Set Theory and Basic Probability

EXAMPLE 6.31

Suppose that we roll a pair of dice, what is the probability of getting a sum of 5 or a sum of 11?

Solution Since a pair of dice is rolled therefore:

 1,1 1, 2  1,3 1, 4  1,5  1, 6  



 2,1  2, 2   2,3  2, 4   2,5  2, 6  

om
 3,1  3, 2   3,3  3, 4   3,5   3, 6  
S 
 4,1  4, 2   4,3  4, 4   4,5  4, 6  

l.c
 5,1  5, 2   5,3  5, 4   5,5   5, 6  
 
 6,1  6, 2   6,3  6, 4   6,5   6, 6  
ai
gm

Let “A” be an event of getting “sum is exactly 5” then

 1,1 1, 2  1,3 1, 4  1,5  1, 6  


1, 4  ,  2,3 , 

s@

 
A   n( A)  4  2,1  2, 2   2,3  2, 4   2,5  2, 6  

 3, 2  ,  4,1 

 3,1  3, 2   3,3  3, 4   3,5   3, 6  
S 
 4,1  4, 2   4,3  4, 4   4,5  4, 6  
t

n(A) 4
ta

 P(A)=   5,1  5, 2   5,3  5, 4   5,5   5, 6  


n( S ) 36  
 6,1  6, 2   6,3  6, 4   6,5   6, 6  
es

Let “B” be an event of getting “sum is 11”


ze

Then B   5,6  ,  6,5  n( B)  2

n(B) 2
 P(B)= 
n( S ) 36

Now we have to find P(A or B) and since the two events “A” and “B” are mutually exclusive
(because they cannot occur together)

4 2 6
 P(A or B)= P(A  B)= P( A)  P( B)   
36 36 36
Hi Friends!!!

237
Chapter 06 Set Theory and Basic Probability

EXAMPLE 6.32

A card is drawn from a well-shuffled deck of 52 cards; find the probability that the card is a red
or black queen?

Solution Since a card is drawn therefore


Red Black
Others Total
 52  Queens Queens
S  the pack of 52 cards  n(S )     52 2 2 48 52
1

om
S
A

l.c
ai
gm

B
t s@
ta
es

Let “R” be the event that “red Let “B” be the event that “black
ze

queen” queen”

 2  2
Then n( R)     2 Then n( B)     2
1 1
n(R) 2 n(B) 2
P(R)=  P(B)= 
n( S ) 52 n( S ) 52

Now we have to find P(R or B) and since the two events “R” and “B” are mutually exclusive
(because they cannot occur together)
2 2 4
 P(R or B)= P(R  B)= P( R)  P( B)   
52 52 52

238
Chapter 06 Set Theory and Basic Probability

EXAMPLE 6.33

A basket contains 5 white and 4 black balls; what is the probability that a ball drawn at random is
white or black balls?

Solution Since a ball is drawn out of “9”


9
Therefore n( S )     9
1

om
White Black Total

l.c
5 4 9
ai
Let “W” be the event of “drawing a white ball”
gm

 5 S
Then n(W )     5 B
1 W
s@

n(W) 5
Therefore P(W)= 
n( S ) 9
t
ta

Let “B” be the event of “drawing a black ball”


es

 4
Then n( B)     4
ze

1

n(B) 4
Therefore P(B)=  Hi Friends!!!
n( S ) 9

Now we have to find P(W or B) and since the two events “W” and “B” are mutually exclusive
(because they cannot occur together)
5 4
 P(W or B)= P(W  B)= P(W )  P( B)    1
9 9

239
Chapter 06 Set Theory and Basic Probability

Addition Rule of prbability for Not Mutually Exclusive Events

Statement: Let “A” and “B” are two not mutually exclusive events then the probability of event
“A” or “B” or “both” occuring is equal to the probability that “A” occurs plus the
probabilitly that “B” occurs minus the probability that “both” events “A” and “B”
occur together i.e.

P(A or B)= P(A)+ P(B)- P(A and B)


OR P(A  B)= P(A)+ P(B)- P(A  B)

om
S
Proof: To prove the theorem, consider the two Not

l.c
Mutually Exclusive events “A” and “B” in the A
Venn-diagram:
p t t q t
ai
It is clear from the Venn-diagram that:
B
gm
m
n(S )  m , n( A)  p , n( B)  q
n  A  B   t is shaded
n( A  B)  p  q  t
n( A  B)  p  t  t  q  t  p  q  t
s@

n( A  B)  t

Now
t
ta

n(A) p
P(A)= 
n( S ) m
es

n(B) q
P(B)= 
n( S ) m
ze

n(A  B) t
P(A  B)= 
n( S ) m

Therefore

n(A  B) p  q  t p q t
P(A  B)=      P(A)+ P(B)- P(A  B)
n( S ) m m m m

 P(A  B)= P(A)+ P(B)- P(A  B) Hence proved

240
Chapter 06 Set Theory and Basic Probability

EXAMPLE 6.34

If a card is selected at random from a deck of 52 plyaing cards, what is the proability that the
card is a diamond or a picture card or both?

Solution Since a card is drawn, therefore


Diamonds Picture Others Total
 52 
S  the pack of 52 cards  n(S )     52 13 12 37 52
1
A B

om
B S

l.c
A
ai
gm
t s@
ta
es
ze

Let “A” be the event that “a Let “B” be the event that “a
diamond card” picture card”
13  12 
Then n( A)     13 Then n( B)     12
1 1
n(A) 13 n(B) 12
P(A)=  P(B)= 
n( S ) 52 n( S ) 52

Since the two events “A” and “B” are not mutually exclusive (because they can occur together),
therefore n( A  B)  3

n(A  B) 3
Now the probability of both “A” and “B” occur together is: P(A  B)= 
n( S ) 52
13 12 3 22
Hence P(A or B or both)= P(A  B)= P( A)  P( B)  P(A  B)    
52 52 52 52

241
Chapter 06 Set Theory and Basic Probability

EXAMPLE 6.35

In a certain college 25% of the students failed math, 15% of the students failed stats and 10% of
the students failed both mant and stats. A student is selected at random; what is the prbability the
he/she failed math or stats?

Solution Given that

25% of students who failed Math  P(Math)= 0.25


15% of students who failed Stats  P(Stats)= 0.15

om
10% of students who failed both Math and Stats  P(Math  Stats)= 0.10

P(Math  Stats)  0.10

l.c
Math
ai
gm
P(Stats)= 0.15
P(Math)= 0.25

Stats
s@

Now since the two subjects are not mutually exclusive, therefore
t
ta

P(a student failed Math or Stats )= P(Math  Stats)


= P (Math)  P (Stats )  P(Math  Stats)
es

 0.25  0.15  0.10  0.30


ze

Test Yourself

1) Suppose that we roll a pair of dice, what is the probability of getting a sum of 5 or a sum of 11?
2) A card is drawn from a well-shuffled deck of 52 cards; find the probability that the card is a red
or black King?
3) A basket contains 7 white and 3 black balls; what is the probability that a ball drawn at random is
white or black balls?
4) If a card is selected at random from a deck of 52 plyaing cards, what is the proability that the
card is a Heart or a picture card or both?
5) A customer enters a food store. The probability that the customer buys bread is 0.60, milk is 0.50
and both bread and milk is 0.30. What is the probability that the customer would buy either bread
or milk or both?

242
Chapter 06 Set Theory and Basic Probability

Understand the meaning of the words


“AND” and “OR”!!!

The word “AND” has a single meaning.


Heart and Queen

 For example, if you were asked


Queens
to find the probability of
Hearts
getting a queen and a heart

om
when you were drawing a
single card from a deck, you
would be looking for the queen
of hearts. Here the word “and”

l.c
means “at the same time.” ai
The word “OR” has two meanings.
gm

 For example, if you were asked Heart or Queen


s@

to find the probability of


selecting a queen or a heart
Queens
when one card is selected from
Hearts
a deck, you would be looking
t
ta

for one of the 4 queens or one


of the 13 hearts. In this case,
es

the queen of hearts would be


included in both cases and
counted twice. In this case,
ze

both events can occur at the


same time; we say that this is
an example of the inclusive or.
King or Queen

 On the other hand, if you were


asked to find the probability of Kings Queens
getting a queen or a king, you
would be looking for one of
the 4 queens or one of the 4
kings. In this case, both events
cannot occur at the same time,
and we say that this is an
example of the exclusive or.

243
Chapter 06 Set Theory and Basic Probability

The Rule of Complimentation

The probability that an event “A” will not occur, denoted


by P( A) is equal to one minus the probability that “A” will
occur i.e.
P( A)  P( A)  1
P( A)  1  P( A)

In other words, “If the probability of an event or the


probability of its complement is known, then the other
can be found by subtracting the probability from 1”.

om
EXAMPLE 6.36

l.c
ai
A coin is tossed 5 times, what is the probability that at least one tail occurs?
gm

Solution Since a coin is tossed 5 times therefore n(S )  25  32


s@

Let “A” is the event of getting at least one tail (i.e. one, two, three, four or five tails)

So “ A ” is the event of getting no tail (i.e. HHHHH)


t
ta

 n  A  1
es

n ( A) 1
Now P( A)  
n( S ) 32
ze

1 31
Hence P( A)  1  P( A)  1  
32 32

Conditioanl Probability

The probability that event “A” will occure; once event “B” has already
occrured is called conditional prbability of “A” given “B” denoted by
P(A/B) and is given as:

P( A  B)
P( A / B)  ; P(B) > 0 The conditional
P( B) probability was first
introduced by
P( A  B) Fermat, a French
Similarly P( B / A)  ; P(A) > 0
P( A) Mathematician.

244
Chapter 06 Set Theory and Basic Probability

EXAMPLE 6.37

Two fair dice are thrown, let “A” denotes “ the sum of dots is 10” and “B” denotes “the two dice
show the same number, then find

(i) P( A / B) (ii) P( B / A)

Solution Since two fair dice are rolled therefore:

 1,1 1, 2  1,3 1, 4  1,5  1, 6  

om

 2,1  2, 2   2,3  2, 4   2,5  2, 6  
 3,1  3, 2   3,3  3, 4   3,5   3, 6  
S 
 4, 6  

l.c
 4,1  4, 2   4,3  4, 4   4,5
 5,1  5, 2   5,3  5, 4 
ai  5,5   5, 6  
 
 6,1  6, 2   6,3  6, 4   6,5   6, 6  
gm

Let “A” be an event of getting “sum is 10” then

A   4,6  ,  5,5 ,  6, 4   n( A)  3
s@

n(A) 3
 P(A)= 
n( S ) 36
t
ta

Let “B” be an event of getting “same numbers” then


es

B  1,1 ,  2, 2  , 3,3 ,  4, 4  , 5,5  ,  6,6   n( B)  6


ze

 P(B)=
n(B) 6
  1,1 1, 2  1,3 1, 4  1,5  1, 6  

n( S ) 36
 2,1  2, 2   2,3  2, 4   2,5  2, 6  
 3,1  3, 2   3,3  3, 4   3,5   3, 6  
Also A  B   5,5  n  A  B   1 S 
 4,1  4, 2   4,3  4, 4   4,5  4, 6  
 5,1  5, 2   5,3  5, 4   5,5   5, 6  
n( A  B ) 1  
 P( A  B)  
n( S ) 36  6,1  6, 2   6,3  6, 4   6,5   6, 6  
A B
P( A  B) 1/ 36 1
(i) P( A / B)   
P( B) 6 / 36 6

P( A  B) 1/ 36 1
(ii) P( B / A)   
P( A) 3/ 36 3

245
Chapter 06 Set Theory and Basic Probability

EXAMPLE 6.38

A card is selected at random from a pack, what is the probability that the card is a King given
that it is a picture card?

Solution  52  Kings Picture Others Total


S  the pack of 52 cards  n(S )     52 4 12 36 52
1
A B
S

om
B

A
l.c
ai
gm
t s@
ta
es

Let “A” be the event that “a Let “B” be the event that “a
king” picture card”
ze

 4 12 
Then n( A)     4 Then n( B)     12
1 1
n(A) 4 n(B) 12
P(A)=  P(B)= 
n( S ) 52 n( S ) 52

Since the two events “A” and “B” are not mutually exclusive (because they can occur together),
therefore n( A  B)  4
n(A  B) 4
Now the probability of both “A” and “B” occur together is: P(A  B)= 
n( S ) 52
Hence the probability that the card is a King given that it is a picture card is:

P( A  B) 4 / 52 1
P( A / B)   
P( B) 12 / 52 3

246
Chapter 06 Set Theory and Basic Probability

Test Yourself

1) In a certain college 25% of the students failed math, 15% of the students failed stats and 10%
of the students failed both mant and stats. A student is selected at random:
(i) If he failed statistics, what is the probability that he failed math?
(ii) If he failed math, what is the probability that he failed statistics?

2) Two fair dice are thrown, let “A” denotes “ the sum of dots is 9” and “B” denotes “the two

om
dice show odd number, then find
(i) P( A / B) (ii) P( B / A)

l.c
3) A card is selected at random from a pack, what is the probability that the card is a queen
given that it is a picture card? ai
gm

Independent Events
s@

“Two events are independent if the occurrence of one of the events does not affect the probability
of the occurrence of the other event”.
t
ta

The folloing are some examples of independent events:


es

 Rolling a die and getting a 6, and then rolling a second die and getting a 3.
 Drawing a card from a deck and getting a queen, replacing it, and drawing a
ze

second card and getting a queen.

Dependent Events

“Two events are dependent if the occurrence of one of the events affects the probability of the
occurrence of the other event”.

The following are some examples of dependent events:

 Drawing a card from a deck, not replacing it, and then drawing a second card.
 Selecting a ball from an urn, not replacing it, and then selecting a second ball.

247
Chapter 06 Set Theory and Basic Probability

Multiplication Rule of probability for


Independent Events

“If “A” and “B” are two independent events, then the probability
that both of them occur is equal to the probability of “A” occurs To find the probability of

multiply by the probability of “B” occurs” i.e. two events occurring in


sequence, you can use
the Multiplication Laws.
P( A and B)  P( A  B)  P( A)  P( B)

om
EXAMPLE 6.39

l.c
A pair of dice is thrown twice. What is the probability of getting a total of 6 on first throw and a
total of 9 on the second? ai
A1
Solution Let “A1” be an event of getting “total  1,1 1, 2  1,3 1, 4  1,5  1, 6  
gm


of 6” by a pair of dice in first throw:
 2,1  2, 2   2,3  2, 4   2,5  2, 6  
A1  1,5 ,  2, 4  ,  3,3 ,  4, 2  , 5,1  3,1  3, 2   3,3  3, 4   3,5   3, 6  
s@

S 
 4,1  4, 2   4,3  4, 4   4,5  4, 6  
 n( A1 )  5  5,1  5, 2   5,3  5, 4   5,5   5, 6  
 
 6,1  6, 2   6,3  6, 4   6,5   6, 6  
t

n( A1 ) 5
 P( A1 )= 
ta

n( S ) 36
es

And let “A2” be an event of getting  1,1 1, 2  1,3 1, 4  1,5  1, 6  

 2, 6  
ze

“total of 9” in second throw.


 2,1  2, 2   2,3  2, 4   2,5
 3,1  3, 2   3,3  3, 4   3,5   3, 6  
A2   3,6  ,  4,5 ,  5, 4  ,  6,3 S 
 4,1  4, 2   4,3  4, 4   4,5  4, 6  
 5,1  5, 2   5,3  5, 4   5,5   5, 6  
 n( A2 )  4  
n( A2 ) 4  6,1  6, 2   6,3  6, 4   6,5   6, 6  
 P( A2 )=  A2
n( S ) 36

Now we have to find P( A1 and A2 ) and since the two events “A1” and “A2” are
independent, because, they belong to two different throw:

 5   4 5
 P( A1 and A2 )  P( A1 )  P( A2 )       
 36   36  324

248
Chapter 06 Set Theory and Basic Probability

EXAMPLE 6.40

Two cards are drawn in succession from a pack of playing cards and the card drawn in first
attempt is being replaced in the pack before the second attempt. Find the probability that both the
drawn cards are queens.

Solution Let “Q1” be an event of getting “a queen” on Cards


Attempt No.
a first draw. Queen Others Total
1st 4 48 52

om
Q1  4
 

l.c
 P( Q1 )=   
1 4
 52  52
 
ai 1
gm

And let “Q2” be an event of getting “a


s@

Cards
queen” on second draw such that the first Attempt No.
card is being replaced: Queen Others Total
nd
2 4 48 52
t
ta

Q2  4
es

 
 P( Q2 )=   
1 4
 52  52
ze

 
1

Now we have to find P(Q1 and Q2 ) and since the two


events “Q1” and “Q2” are independent, because, the first With replacement means
card is being replaced: that the events are
independent (the
 4   4 1
 P(Q1 and Q2 )  P(Q1 )  P(Q2 )        probability don‟t change)
 52   52  169

249
Chapter 06 Set Theory and Basic Probability

Multiplication Rule of probability for Dependent Events

“ If “A” and “B” are two dependent events, then the probability that both of them occur is equal
to the probability of “A” occurs multiply by the conditional probability of “B” given that “A” has
already occurred” i.e.
P( A and B)  P( A  B)  P( A)  P( B / A)
Similarly P( A and B)  P( A  B)  P( B)  P( A / B)

EXAMPLE 6.41

om
Two cards are drawn in succession from a pack of playing cards and the card drawn in first
attempt is not being replaced in the pack before the second attempt. Find the probability that both

l.c
the drawn cards are queens.

Solution Let “Q1” be an event of getting “a queen” on


ai Cards
Attempt No.
a first draw: Queen Others Total
gm

1st 4 48 52

 4
s@

Q1
 
 P( Q1 )=   
1 4
 52  52
 
1
t
ta
es

And let “Q2” be an event of getting “a Cards


Attempt No.
queen” on second draw such that the first Queen Others Total
ze

card is not being replaced. 2nd 3 48 51

A2  3
 
 P( Q2 Q1 )=   
1 3
 51 51
 
1
(Because “Q1” has already occurred)

Now we have to find P(Q1 and Q2 ) and since the two events “Q1” and “Q2” are
dependent, because, the first card is being replaced:

 4   3 1
 P(Q1 and Q2 )  P(Q1 )  P( Q2 Q1 )       
 52   51  221

250
Chapter 06 Set Theory and Basic Probability

EXAMPLE 6.42

A box contains 3 gray and 2 black balls. Two balls are drawn in succession. Find the probability
that both balls drawn are black when the balls are not replaced after being drawn.

Solution Let “B1” be an event of getting “a black


Balls
Draw No.
ball” on a first draw: Gray Black Total
st
1 3 2 5

om
B1  2
 
 P( B1 )=   
1 2
5 5

l.c
 
ai 1

And let “B2” be an event of getting Balls


gm
Draw No.
“a black ball” on second draw, such that the Gray Black Total
first ball is not being replaced: 2 nd
3 1 4
s@

1
B2  
 P( B2 B1 )=   
1 1
  4
4
t

 
ta

1
(Because “B1” has already occurred)
es

Now we have to find P( B1 and B2 ) and since the two events “B1” and “B2” are dependent,
ze

because, the first ball is not being replaced:

2 1 1
 P( B1 and B2 )  P( B1 )  P( B2 B1 )       
 5   4  10

EXAMPLE 6.43

Two drawings each of 3 balls are made from a box containing 4 Without replacement
gray and 7 black balls; the balls are not being replaced before the means that the events
second draw. Find the probability that first drawing gives 3 black are dependent (the
balls and second 3 gray balls.
probability changes)

251
Chapter 06 Set Theory and Basic Probability

Solution Let “B” be an event of getting “3 black Draw No.


Balls
balls” on a first draw: Gray Black Total
st
1 4 7 11

B 7
 
 P( B )=   
3 35
11 165
 
3

om
And let “G” be an event of getting Balls
Draw No.
“3 white balls” on second draw, such that Gray Black Total
the first ball is not being replaced: 2 nd
4 4 8

l.c
 4
 
ai
 P( G B )=   
G 3 4
  56
8
gm

 
 3
(Because “B” has already occurred)
s@

Now we have to find P( B and G) and since the two events “B” and “G” are dependent,
because, the first ball is not being replaced:
t
ta

 35   4  1
 P( B and G)  P( B)  P( G B )     
 165   56  66
es
ze

Test Yourself

1) Find the probability of drawing a picture card on each of two consecutive draws from a
standard pack with replacement of the first card.
2) Two drawings each of 4 balls are made from a box containing 5 white and 8 black balls; the
balls are not being replaced before the second draw. Find the probability that first drawing
gives 4 black balls and second 4 white balls.

If two events are independent, it doesn‟t mean that they can‟t occur at the same time.
Many people make the mistake of thinking of independent events as being totally separate
from each other. In probability, two independent events can occur at the same time they
just don‟t affect each other in terms of probabilities as discussed in examples 6.38 and 6.39

252
Chapter 06 Set Theory and Basic Probability

EXAMPLE 6.44

Let “A” and “B” be the two possible out comes of an experiment and suppose:
P( A)  0.33 , P( A  B)  0.25 , P( B)  p and P( A  B)  0.75

1) Find “p”, if “A” and “B” are not mutually exclusive.


2) Find “p”, if “A” and “B” are independent.

Solution

om
1) Find “p”, if “A” and “B” are not mutually exclusive.

l.c
If “A” and “B” are not mutually exclusive then:

P(A  B)= P(A)+ P(B)- P(A  B)


ai
 0.75 = 0.33+ p - 0.25
gm
 p = 0.67

Hi Friends!!!
s@

2) Find “p”, if “A” and “B” are independent.

If “A” and “B” are independent then:


t

P( A  B)  P( A)  P( B)
ta

P( A  B) 0.25
 P( B)   p  0.76
es

P( B) 0.33
ze

Sometimes there is confusion between independent events and mutually exclusive events.
Term „independent‟ is defined in terms of „probability of events‟ whereas mutually exclusive
is defined in term of events (subset of sample space). Moreover, mutually exclusive events
never have an outcome common, but independent events, may have common outcome.
Clearly, „independent‟ and „mutually exclusive‟ do not have the same meaning. In other
words, two independent events having non-zero probabilities of occurrence can not be
mutually exclusive, and conversely, i.e. two mutually exclusive events having non-zero
probabilities of occurrence can not be independent.

253
Chapter 06 Set Theory and Basic Probability

EXAMPLE 6.45

Let “A” and “B” be the two possible out comes of an experiment and suppose:
P( A)  0.60 , P( B)  p and P( A  B)  0.92

1) Find “p”, if “A” and “B” are mutually exclusive.


2) Find “p”, if “A” and “B” are independent.

Solution

om
1) Find “p”, if “A” and “B” are mutually exclusive.

If “A” and “B” are mutually exclusive then:

P(A  B)= P(A)+ P(B)


l.c
ai
 0.92= 0.60+ p
 p = 0.32
gm

2) Find “p”, if “A” and “B” are independent.


s@

If “A” and “B” are independent then:


t

P(A  B)= P(A)+ P(B)- P(A  B) Independent events are not M.E
ta

 P(A  B)= P(A)+ P(B)- P(A)  P(B) P( A  B)  P( A)  P( B)


es

 0.92 = 0.60+ p - 0.60  p


 p = 0.80
ze

 Two events A and B are independent if:

P  A B   P( A) or P  B A  P( B)

 Two events A and B are dependent if:

P  A B   P( A) or P  B A   P( B)

 Mutually Exclusive Events are always dependent.

 Two dependent events A and B cannot be mutually exclusive, unless P  A B  0

254
Chapter 06 Set Theory and Basic Probability

Interesting in Playing Cards

 There are 52 cards in a deck of playing cards. There are four suits
(clubs, diamonds, hearts and spades) in it, each have 13 cards.
The clubs and spades are black in color while hearts and  No. of spots on cards
diamonds are red in color. Total black cards are 26 and total red
365 (days in year)
cards are also 26.
 Cards in pack 52
(weeks in year)

 No. of Suits 4

om
(weeks in month)

Hearts Diamonds  No. of Picture cards

l.c
12 (months in year)
ai
gm

Spades Clubs
s@

 There are four aces.


t
ta
es

 Number of picture cards is 12 that include “4” jacks, “4” queens and “4” kings from each suit.
ze

 Number of face cards is 16 that include “4” aces, “4” jacks, “4” queens and “4” kings from each
suit.

255
Chapter 06 Set Theory and Basic Probability

Results: Consider the Venn-diagrams:


A  B  A   B  A
Result #01
S

 P  A  B   P  A  P B  A 
B S
A

A B
A B A B B A
A B A B B A

om
A B A B B A
l.c
ai
A B A
gm
s@

A  B  A   B  A
B   A  B    B  A B  A  B   A  B
Result #02 Result #03
t


 P  B  P  A  B  P B  A   
 P B  A  P  B  P  A  B
ta

S S
es

A B A B
ze

A B A B B A A B A B B A

A B B A
B A

B   A  B    B  A
B  A  B   A  B

256
Chapter 06 Set Theory and Basic Probability

Result #04 A  B  A   A  B   P  A  B   P  A  P  A  B 

A B

A B A B B A

om
A B
l.c
ai
gm

A  B  A   A  B
s@

Important!!!
t
ta

While reading probability problems, pay special attention to key phrases that translate into mathematical
symbols. The following table lists various phrases and their corresponding mathematical equivalents:
es
ze

Math Symbol Phrases

 “greater than” or “more than” or


“exceed” or “better than” or “taller than” or “above”

 “less than” or “smaller than” or “below” or “under”


or “fewer than”

 “at least” or “greater than or equal to” or “no less


than”

 “at most” or “less than or equal to” or “no more


than”

 “exactly” or “equal” or “is”

257
Chapter 06 Set Theory and Basic Probability

Sharpen your Pencil


MCQ’s

(1) Permutation of STATISTICS is_____

(A) 50400 (B) 10 (C) 100 (D) None of these

om
(2) n
Cr  _____
n! n! r!
(A) (B) (C) (D) None of these
n ! n - r  !  n - r ! n ! n - r  !

l.c
(3)
ai
Probability of a king from a pack of 52 cards is_____
gm
(A) 4/52 (B) 1/4 (C) 1/52 (D) None of these

(4) P( A)  P( A)  _____
s@

(A) P( A) (B) P ( A) (C) 1 (D) None of these


t

(5) Total possible cases with two dice _____


ta

A) 26 (B) 62 (C) 6
C2 (D) None of these
es

(6) P(S) = _____


ze

(A) 1 (B) 0 (C)  (D) None of these

(7) P(  ) _____

(A) 1 (B) 0 (C) -1 (D) None of these

(8) For Not Mutually Exclusive events P( A  B)  P( A)  P( B)  _____

(A) P( A / B) (B) P( B / A) (C) P( A  B ) (D) None of these

(9) For Mutually Exclusive events P( A  B)  P( A)  _____

(A) P( B) (B) P ( A) (C) P( A  B ) (D) None of these

258
Chapter 06 Set Theory and Basic Probability

Sharpen your Pencil


MCQ’s

(10) Two mutually exclusive events are always _____

(A) Independent (B) Dependent


(C) Nothing can be said in terms of independence (D) None of these

om
(11) If “A” and “B” are two independent events with P(A) = 0.5 and P(B) = 0.3 then
P( A  B)  _____

l.c
(A) 0 (B) 0.15 (C) 0.3
ai (D) None of these

(12) Two dice are thrown the probability of obtaining a sum of “2” is _____
gm

(A) 1/6 (B) 1/36 (C) 1/18 (D) None of these

(13) An event contains only one sample point is called _____


s@

(A) Exhaustive (B) Compound (C) Simple (D) None of these


t

(14) If two dice are thrown then total sample points are _____
ta

(A) 6 (B) 36 (C) 216 (D) None of these


es

(15) 8
C5  _____
ze

(A) 56 (B) 65 (C) 100 (D) None of these

(16) The range of probability is _____

(A) (0, 1) (B) (-1, 1) (C) (-1, 0) (D) None of these

(17) If three fair dice are rolled, total number of sample points is _____

(A) 108 (B) 36 (C) 216 (D) None of these

(18) If 4 coins are tossed, total number of sample points is _____

(A) 16 (B) 8 (C) 32 (D) None of these

259
Chapter 06 Set Theory and Basic Probability

Short Questions
ExeRciSe

Q.6.01. State additional Rule of probability?

Q.6.02. How many permutations can be formed out of the letters of the word
“MISSISSIPPI”?

om
Q.6.03. A pair of dice is rolled. Make the sample space and calculate the probability that
the sum of dots is at least 9?

l.c
Q.6.04. Make permutations of A, B, C, D. ai
Q.6.05. Define the terms:
gm

(i) Event (ii) Mutually Exclusive Events


(iii) Equally likely Events (iv) Sample Space
s@

Q.6.06. Evaluate the following:

 52   48  4   39 13   7  3  4 
(i)   (ii)    (iii)    (iv)    
t

 13   10  3   13  4   2  2  1 
ta

Q.6.07. Distinguish between independent and dependent events.


es

Q.6.08. Evaluate the following:


ze

10 52 10
(i) P3 (ii) P13 (iii) P4,2,3,1

Q.6.09. A pair of dice is rolled. Make the sample space and calculate the probability that
the sum of dots is at least 8?

Q.6.10. Define with examples;

(i) Set (ii) Null set (iii) Sub-set


(iv) Universal set (v) Equal sets

Q.6.11. What is experiment and random experiment?

Q.6.12. A card is drawn at random from a pack of 52 cards. Find the probability of
obtaining: (i) Red Card (ii) King of Spade

260
Chapter 06 Set Theory and Basic Probability

Long Questions
ExeRciSe

Q.6.01. Two dice are rolled find the probability that the sum of dots is:

(i) Exactly 3
(ii) Odd
(iii) More than 9

om
(iv) More than 5 but less than or equal to 10
(v) At least 7
(vi) At most 6

Q.6.02.
l.c
State and prove additional Rule of probability for not mutually exclusive events?
ai
Q.6.03. A card is drawn at random from an ordinary pack of 52 playing cards. Find the
gm
probability that the card:

(a) Is a seven
s@

(b) Is not a seven

Q.6.04. A card is drawn at random from an ordinary pack of 52 cards, find the probability
that the card is:
t
ta

1) A club or a diamond
2) A club or a king
es

Q.6.05. Two dice are rolled find the conditional probability of getting the sum of dots
ze

is”7” given that:

(i) The sum is 6 or more


(ii) The sum is Less than 9
(iii) The sum is More than 5

Q.6.06. In a college 30% students failed English, 40% students failed Urdu and 10%
students failed both. A student is selected at random. What is the probability that
he failed English or Urdu?

Q.6.07. What is probability of getting sum of dots as 14, when 3 fair dice are rolled?

Q.6.08. A bag contains 16 balls of which 5 are marked. If 8 balls are drawn out together,
what is the probability that all the marked balls are among 8 balls?

261
Chapter 06 Set Theory and Basic Probability

Long Questions
ExeRciSe

Q.6.09. A card is drawn from a well shuffled pack of 52 playing cards. Find the
probability that the drawn card is:

(i) Spade (ii) Jack of Clubs (iii) King

om
(iv) Queen, King of Diamond, Ace of Hearts or Jack

Q.6.10. If U  5,6,7,...,15  , A  4,6,8 and B  11,12,13,14,15 then show that

l.c
A B  A B

Q.6.11.
ai
The probability a person will alive in next 20 years is 2/3. What is the probability
he will not alive in next 20 years?
gm

Q.6.12. Two coins are tossed. What is the probability that the two heads result, given that
there is at least one head?
s@

Q.6.13. In how many ways letters of the following words be rearranged:

(i) Mathematics (ii) Manufacturer


t
ta

(iii) Convocation (iv) Sociology


es

Q.6.14. How many possible permutations can be formed from letters of each word?

(i) Infinity (ii) Unusual (iii) Statistics (iv) Hyperbola


ze

Q.6.15. State and prove additional Rule of probability for mutually exclusive events?

Q.6.16. If 3 books are picked at random from a shelf containing 5 novels, 3 books of
poems and a dictionary, what is the probability that:

(i) The dictionary is selected?


(ii) 2 novels and one book of poems are selected?

262
CHAPTER 07
Random Variables

om
Chapter Contents

l.c
ai
Y
gm
ou should read this chapter if you need to learn about:

 Random Variable: (P264)


s@

 Discrete Random Variables: (P265)


 Continuous Random Variables: (P266)
 Discrete Probability Distribution: (P266-P267)
 Graph of Discrete Probability Distribution: (P267)
t
ta

 How to find Probabilities using Discrete Probability Distributions:


(P271)
es

 Continuous Probability Distribution: (P275)


 Graph of Continuous Probability Distribution: (P276)
How to find Probabilities using Continuous Probability
ze


Distributions: (P276)
 Mathematical Expectation of a Discrete random variable le: (P280)
 Variance and S.D of a Discrete random variable: (P282-P283)
 Properties of Expectation: (P284)
 Properties of Variance and S.D: (P285-P286)
 Amazing Histogram: (P287)
 Exercise: (P288-P292)

263
Chapter 07 Random Variables

Suppose your teacher asked you


to write down your names on a
slip distributed by him. You
returned your slip, the teacher
fold the slips in a uniform pattern
and mixed them well. Then he
asked one of the students to draw
ten slips one by one or together.
The teacher open the drawn slips
one by one, read the names of the
selected students and asked the

om
following questions:

 What is your age?

l.c
 How many brothers and sisters are you?
 How many living rooms are available to your family?
 What is your height?
ai
gm
In this example, the selection of ten students by the method explained above is a
random experiment and the procedure of selection is random process. The
students selected in this way are the outcomes of the experiment and the
questions asked from selected students are the characteristics in which we are
s@

interested. Since each characteristic can assume different values from outcome to
outcome of the random experiment. So these characteristics may be considered
not only as variable but are known as random variables or chance variable or
t

stochastic variables.
ta
es

Random Variable
ze

“A variable whose values are determined by the


outcomes of a random experiment is called a
random variable”

If we toss two coins then the sample space must be:


S X
S  HH , HT , TH , TT  HH 2
HT
Let “X” is a variable denoting the “number of heads” 1
then “X” have the values 0, 1, 2; since these values TH
are determined from the results (outcomes) of the TT 0
random experiment; therefore “X” is called as a
random variable.

264
Chapter 07 Random Variables

The following are some examples of random variables:

 The number of deaths in an accident.


 The number of heads in tossing two coins
 Temperature of a place
 The life time of a TV tube
 The number of daily admissions in a hospital
 The amount of rain falls at a certain place, etc.

om
Random variables are usually denoted by the last letters of alphabets e.g. X, Y or Z.

l.c
Types of Random Variable

There are two types of random variable:


ai
gm

Random
Variable
t s@

Discrete Random Continuous Random


ta

Variable Variable
es
ze

Discrete Random Variable

“A random variable is called discrete random variable if it has counting phenomena and there
can be certain jump or gap between two possible values of the random variable. Further it is free
from the unit of measurement”.

 The number of heads in tossing two coins


 No. of deaths in an accident
 No. of apples in a basket.
 No. of passengers carried by PIA in last ten years
 The number of daily admissions in a hospital, etc.

265
Chapter 07 Random Variables

Continuous Random Variable

“A random variable is called continuous random variable if it has measuring phenomena and
there can be infinite number of values between two possible values of the variable. Further it has
the unit of measurement”

 Students heights, ages, weights


 Temperature of a place

om
 The amount of milk given by a cow
 The life time of a TV tube
 The amount of rain falls at a certain place, etc.

l.c
ai
gm

A discrete random variable has either a finite or countable infinite number of values that
are usually integers or whole numbers. The values of a discrete random variable can be
plotted on a number line with space between each point.
s@

X = No. of calls in one day


0 1 2 3 4
t

On the hand a continuous random variable has infinitely many values that are real
ta

numbers. The values of a continuous random variable can be plotted on a line in an


es

uninterrupted fashion.

X = Time spent making calls in one day


ze

0 6 12 18 24

Discrete probability distribution

“A table listing all possible values that a discrete random variable can take on together with the
associated probabilities is called discrete probability distribution”

Let “X” be a discrete random variable which can take values as x1, x2, … ,xn and the associated
probabilities be f(x1), f(x2), ….,f(xn) respectively; then the discrete probability distribution is given as:

x x1 x2 …………… xn
f(x) or P(x) f(x1) f(x2) …………… f(xn)

266
Chapter 07 Random Variables

The function f(x) or P(x) that is used to assign the probabilities to


different values of the random variable “X” is called probability function
or probability mass function (p.m.f).

The discrete probability mass function may be defined as:


Some writers do not
 Function of "x" ; x = x1 , x2 , ... ,xn make any distinction
f(x)= P(X = x)= 
0 ; otherwise between the terms
probability function and
A p.m.f has the following two properties: probability distribution

om
but they use it
(i) f(x)  0 for all “x” interchangeably.
(ii)  f(x)  1

l.c
all x

ai
Graph of Discrete probability distribution
gm

The discrete probability distribution is usually displayed by vertical lines


graph or probability histogram. In both type of graphs we take the values
s@

of X on the X-axis and probabilities on the Y-axis as shown in the


following figure:
To make probability
Histogram we first find
t

x 0 1 2 3 Total
ta

class boundaries. These


f(x) 1/8 3/8 3/8 1/8 1
class boundaries are
es

Class
-0.5 – 0.5 0.5 - 1.5 1.5 - 2.5 2.5 - 3.5 -- called factitious class
Boundaries
boundaries because the
ze

discrete random variable


cannot assume such
Vertical Lines Graph Probability Histogram
values.
f(x) f(x)

3/8 3/8

2/8 2/8

1/8
1/8

x x
0 1 2 3 0 1 2 3

267
Chapter 07 Random Variables

EXAMPLE 7.01

Find the probability distribution of the number of heads when two coins are tossed?

Solution Since two coins are tossed therefore:

S  HH , HT , TH , TT 

Let “X” is a random variable denoting the number of heads then x = 0, 1, 2

om
Now the probabilities are:

l.c
1 2 1
If X = 0 then f (0)  , If X = 1 then f (1)  , If X = 2 then f (2) 
4 4 4
ai
Hence the probability distribution of the number of heads (X) becomes:
gm

x 0 1 2 Total
f(x) 1/4 2/4 1/4 1
s@

EXAMPLE 7.02
t
ta

Find the probability distribution of the number of heads when three coins are tossed?
es

Solution Since three coins are tossed therefore:


ze

S  HHH , HHT , HTH , THH , TTH , THT , HTT , TTT 

Let “X” is a random variable denoting the number of heads then x = 0, 1, 2, 3

Now the probabilities are:

1 3
If X = 0 then f (0)  , If X = 1 then f (1) 
8 8
3 1
If X = 2 then f (2)  , If X = 3 then f (3) 
8 8

Hence the probability distribution of the number of heads (X) becomes:

x 0 1 2 3 Total
f(x) 1/8 3/8 3/8 1/8 1

268
Chapter 07 Random Variables

EXAMPLE 7.03

Find the probability distribution of the sum of dots when two dice are rolled?

Solution Since two dice are rolled then:

 1,1 1, 2  1,3 1, 4  1,5  1, 6  



 2,1  2, 2   2,3  2, 4   2,5  2, 6  

om
 3,1  3, 2   3,3  3, 4   3,5   3, 6  
S 
 4,1  4, 2   4,3  4, 4   4,5  4, 6  
 5,1  5, 2   5,3  5, 4   5,5   5, 6  

l.c
 
 6,1  6, 2   6,3  6, 4 
ai  6,5   6, 6  

For the sum of dots we may write the sample space as:
gm

2 3 4 5 6 7
3 4 5 6 7 8 

s@

4 5 6 7 8 9
S 
5 6 7 8 9 10 
 9
6 7 8 9 10 11
 
t
ta

7 8 9 10 11 12 
es

Let “X” is a random variable denoting the sum of dots then x = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12.

Now the probabilities are:


ze

1 2 3
If X = 2 then f (2)  , If X = 3 then f (3)  , If X = 4 then f (4) 
36 36 36
4 5 6
If X = 5 then f (5)  , If X = 6 then f (6)  , If X = 7 then f (7) 
36 36 36
5 4 3
If X = 8 then f (8)  , If X = 9 then f (9)  , If X = 10 then f (10) 
36 36 36
2 1
If X = 11 then f (11)  , If X = 12 then f (12) 
36 36

Hence the probability distribution of the sum of dots (X) becomes:

x 2 3 4 5 6 7 8 9 10 11 12 Total
f(x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 1

269
Chapter 07 Random Variables

EXAMPLE 7.04

A basket contains 6 balls 2 white ball and 4 black balls. If three balls are selected at random then
find the probability distribution for the number of black balls.

Solution Since “3” balls are selected out of “9” Black White Total
4 2 6
6
Therefore n( S )     20
 3

om
Let “X” is a random variable denoting the number of black balls
then x = 1, 2, 3 (because 3 balls are selected)

l.c
Now the probabilities are: ai
 4  2 
  
gm
I am impossible
If X = 1 then f (1)     
1 2 4
, as there are
20 20
total 2 white!!!
s@

 4  2 
  
If X = 2 then f (2)     
2 1 12
,
20 20
t
ta

 4  2 
  
es

If X = 3 then f (3)     
3 0 4
20 20
Hmm !
ze

Hence the probability distribution of the number of black balls (X) becomes:

x 0 1 2 Total
f(x) 4/20 12/20 4/20 1

Test Yourself

1) Find the probability distribution of the number of tails when two coins are tossed?
2) Find the probability distribution of the number of tails when three coins are tossed?
3) Find the probability distribution of the difference of dots when two dice are rolled?
4) A basket contains 6 balls 2 white ball and 4 black balls. If three balls are selected at random
then find the probability distribution for the number of white balls.

270
Chapter 07 Random Variables

How to find the probabilities using a discrete probability


distribution or a discrete probability density function

Let “X” be a discrete random variable then the discrete probability distribution is given as:

x x1 x2 …………… xn
f(x) f(x1) f(x2) …………… f(xn)

Similarly the discrete probability mass function f(x) is given as:

om
 Function of "x" ; x = x1 , x2 , ... ,xn
f(x)= P(X = x)= 
0 ; otherwise

l.c
Now to find the probabilities we have: ai
 P( X  x1 )  f ( x1 )
 P( x1  X  x3 )  f ( x1 )  f ( x2 )  f ( x3 )
gm

 P( x1  X  x3 )  f ( x2 )  f ( x3 )
 P( X  x2 )  f ( x1 )  f ( x2 )
s@

 P( X  x2 )  f ( x1 )
t

EXAMPLE 7.05
ta

Find the probability distribution of the sum of dots when two dice are rolled?
es

Also find the probability that:


ze

(i) The sum of dots is exactly 4 (iii) The sum of dots is less than 6
(ii) The sum of dots is greater than 10 (iv) The sum of dots is at least 9

Solution Since two dice are rolled then:

 1,1 1, 2  1,3 1, 4  1,5  1, 6  



 2,1  2, 2   2,3  2, 4   2,5   2, 6 
 3,1  3, 2   3,3  3, 4   3,5   3, 6  
S 
 4,1  4, 2   4,3  4, 4   4,5   4, 6 
 5,1  5, 2   5,3  5, 4   5,5   5, 6  
 
 6,1  6, 2   6,3  6, 4   6,5   6, 6  

271
Chapter 07 Random Variables

For the sum of dots we may write the sample space as:

2 3 4 5 6 7
3 4 5 6 7 8 

4 5 6 7 8 9
S 
5 6 7 8 9 10 
 9
6 7 8 9 10 11
 
7 8 9 10 11 12 

om
Let “X” is a random variable denoting the sum of dots then x = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12

Now the probabilities are:

l.c
1 2 3
If X = 2 then f (2)  , If X = 3 then f (3)  , If X = 4 then f (4) 
36 36
ai 36
4 5 6
If X = 5 then f (5)  If X = 6 then f (6)  If X = 7 then f (7) 
gm
, ,
36 36 36
5 4 3
If X = 8 then f (8)  , If X = 9 then f (9)  , If X = 10 then f (10) 
36 36 36
s@

2 1
If X = 11 then f (11)  , If X = 12 then f (12) 
36 36
t

Hence the probability distribution of the sum of dots (X) becomes:


ta

x 2 3 4 5 6 7 8 9 10 11 12 Total
es

f(x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 1
ze

(i) The sum of dots is exactly 4

P(The sum of dots is exactly 4)  P( X  4)


 f (4)
= 3/36 Hi Friends!!!

(ii) The sum of dots is less than 6

P(The sum of dots is less than 6)  P( X  6)


 P( X  2 or X  3 or X  4 or X  5 )
 P( X  2)  P( X  3)  P ( X  4)  P( X  5 )
 f (2)  f (3)  f (4)  f (5)
 1/ 36  2 / 36  3/ 36  4 / 36
= 10/36

272
Chapter 07 Random Variables

(iii) The sum of dots is greater than 10

P(The sum of dots is greater than 10)  P( X  10)


 P( X  11 or X  12)
 P( X  11)  P( X  12)
 f (11)  f (12)
 2 / 36  1/ 36
= 3/36

(iv) The sum of dots is at least 9

om
P(The sum of dots is at least 9)  P( X  9)
 P( X  9 or X  10 or X  11 or X  12)
 P( X  9)  P( X  10)  P ( X  11)  P( X  12)

l.c
 f (9)  f (10)  f (11)  f (12)
 4 / 36  3/ 36  2 / 36  1/ 36
ai
= 10/36
gm

EXAMPLE 7.06
s@

x4
Given that: f ( x)  x = 0, 1, 2, 3
98
Find (i) P(1  X  3) (ii) P( X  2) (iii) P( X  3)
t
ta

Solution (i) P(1  X  3)


es

P(1  X  3)  P( X  1 or X  2 or X  3)
ze

 P( X  1)  P( X  2)  P( X  3)
 f (1)  f (2)  f (3) x4
 1/ 98  16 / 98  81/ 98  1 Given that f ( x) 
98
4
1 1
(ii) P( X  2) f (1)  
98 98
24 16
P( X  2)  P( X  0 or X  1) f (2)  
 P( X  0)  P( X  1) 98 98
34 81
 f (0)  f (1) f (3)  
98 98
 0 / 98  1/ 98  1/ 98

(iii) P( X  3)

P( X  3)  f (3)  81/ 98

273
Chapter 07 Random Variables

EXAMPLE 7.07

What value of “k” makes the following function a density function?

f(x) = kx4 , x = 0, 1, 2, 3

Solution To find the value of “k” we use:

 f ( x)  1

om
x 0
3
  kx 4  1
x 0

l.c
 k (0)4  (1)4  (2)4  (3)4   1
 k 0  1  16  81  1
ai
 k 98  1
gm

1
k 
98
s@

EXAMPLE 7.08
t
ta

Find “K” for the probability distribution given below:


es

x 0 1 2 3
f(x) 1/8 K 3/8 1/8
ze

Solution To find the value of “k” we use:

 f ( x)  1
x 0
 1/ 8  K  3/ 8  1/ 8  1
 K  5/ 8  1
 K  1  5/ 8
 K  3/ 8

274
Chapter 07 Random Variables

Test Yourself

1) Find the probability distribution of the sum of dots when two dice are rolled?
Also find the probability that the sum of dots is exactly 6
x4
2) Given that: f ( x)  x = 0, 1, 2, 3
98
Find (i) P(1  X  3) (ii) P( X  1) (iii) P( X  0)

3) What value of “k” makes the following function a density function?

om
f ( x)  k 4Cx , x = 0, 1, 2, 3, 4

l.c
4) Find the value of “k” from the following probability distribution:

x -2 -1 0
ai1 2 3
f(x) 0.1 0.1 0.2 2k 0.3 k
gm

Continuous probability distribution


s@

“Since a continuous random variable takes all possible values in a given range, therefore, we
cannot obtain the probability of a continuous random variable at a particular point and also
t

cannot express a probability distribution in tabular form. Hence the continuous probability
ta

distribution can only be expressed in the form of a mathematical equation which is known as
es

probability function or probability density function”

Let “X” be a continuous random variable which can take values in the interval (a, b) or (, ) then
ze

The function f(x) is called probability function or probability density function (p.d.f) of the random
variable “X”.

The continuous probability density function may be defined as:

 Function of "x" ;a  x b
f(x)= 
0 ; otherwise

A p.d.f has the following two properties:

(i) f(x)  0 for all “x”


 f (a )  f (b ) 
(ii) P (a  X  b )  
 2  b  a   1

275
Chapter 07 Random Variables

Graph of Continuous probability distribution

The continuous probability distribution is usually displayed by a continuous probability curves. In this
type of graphs we take the range of X on the X-axis and the probabilities on the Y-axis as shown in the
following figure:

om
l.c
ai
How to find the probabilities using
gm

a continuous probability distribution OR


a continuous probability density function
s@

Let “X” be a continuous random variable then continuous probability density function f(x) is given as:
t

 Function of "x" ;a  x b
ta

f(x)= 
0 ; otherwise
es

Now to find the probabilities we have:


ze

 P( X  x1 )  0 (because there is no area over a single point) In discrete case the

 f ( x1 )  f ( x2 )  probabilities:
 P( x1  X  x2 )  
 2   x2  x1  P( x1  X  x3 )
P( x1  X  x3 )
have different meaning
but in the case of
continuous they are
same.

a x1 x2 b

276
Chapter 07 Random Variables

EXAMPLE 7.09

Given that:
x 1
f ( x)  , 2 x4
8
(i) Show that the area under the curve is equal to unity
(ii) Find P( X  3) (iii) Find P(3  X  4) (iv) Find P( X  3)

Solution

om
(i) Show that the area under the curve is equal to unity.

l.c
Here we use:

 f (a)  f (b) 
ai
P ( a  X  b)  
 2   b  a  Given that f ( x) 
x 1
gm
8
2 1 3
 f (2)  f (4)  f (2)  
P(2  X  4)  
   4  2  8 8
s@

2 3 1 4
f (3)  
 f (2)  f (4) 
   2 
8 8
 2 4 1 5
f (4)  
t

8 8
ta

 f (2)  f (4)
 3/ 8  5/ 8
es

1
ze

Hence the area under the curve is equal to unity.

(ii) P( X  3)

P( X  3)  P(2  X  3)

 f (2)  f (3) 

 2   3  2 
 f (2)  f (3)  Hi Friends!!!
  (1)
 2
3/ 8  4 / 8
   7 /16
 2

277
Chapter 07 Random Variables

(iii) P(3  X  4)
 f (3)  f (4) 
P(3  X  4)  
 2   4  3
4 /8  5/8
  (1)
 2
 9 /16

(iv) P( X  3)

P( X  3) = 0 (for continuous random variable the probability on a single point is zero)

om
EXAMPLE 7.10

Given that f ( x)  kx , 0 x2


l.c
ai
(i) Find the value of “k” (ii) Find P(0.5  X  1.5)
gm

(iii) Find P( X  1)
s@

Solution

(i) Here we use:


t
ta

P(a  X  b)  1
es

 f (a)  f (b)  Given that f ( x)  kx



   b  a   1
2 f (0)  k (0)  0
ze

 f (0)  f (2)  f (2)  k (2)  2k



 2   2  0   1
 f (0)  f (2) 

 2   2   1

 f (0)  f (2)  1

 0  2k  1

 k  1/ 2

x
Hence the p.d.f can be written as: f ( x)  kx  f ( x)  , 0 x2
2

278
Chapter 07 Random Variables

(ii) P(0.5  X  1.5)

 f (0.5)  f (1.5) 
P(0.5  X  1.5)  
 2  1.5  0.5
x
Since f ( x) 
2
 f (0.5)  f (1.5) 
 1.5  0.5
 0.5
f (0.5)   0.25
 2 2
 f (0.5)  f (1.5)  1.5
  (1) f (1.5)   0.75
 2 2

om
 0.25  0.75 
   0.5
 2

(iii) P( X  1)
l.c
ai
P( X  1)  P(1  X  2)
gm

x
Since f ( x) 
 f (1)  f (2) 
  2  1
 2
s@

 2
f (1) 
1
 0.5
 f (1)  f (2)  2

 2  1 2
f (2)   1
t

 0.5  1  2
  0.75
ta

 2 
es
ze

Test Yourself

1
1) If f ( x)  (5  2 x) , 1 x  4
30

(i) Show that f(x) is a density function


(ii) Find P( X  3) (iii) Find P( X  3)

2) Given that f ( x)  kx , 0 x2

(i) Find the value of “k” (ii) Find P(1  X  1.5)


(iii) Find P( X  0.5)

279
Chapter 07 Random Variables

The function that assigns probability for a discrete random variable is called a probability
mass function, because it shows how much probability (or mass), is given to each value of
the random variables. Mass is thought of as weight in this case the total mass (or weight)
for a probability distribution equals one. A continuous random variable doesn’t actually
assign probability or mass, it assigns density, which means it tells you how dense the
probability is around x for any value of X. You find probabilities for intervals of X, not for
particular values of X, when X is continuous. Continuous random variables have no
probability at any single point because there is no area over a single point.

om
Mathematical Expectation OR
Expected Value of a Discrete Random Variable

l.c
ai
A very important concept in probability is the idea of expected values. The
gm
expected value is the long-term mean or average value of a random
variable. If the random variable is observed over a long period of time, we
would expect that the expected value would be close to the average value
of the observations generated by the random process. The larger the
s@

number of observations, the closer the expected value will be to the


In 1657, Christiaan
average value of the observations. Thus we define expected value as “The
Huygens published
theoretical average of a random variable is called expected value.”
the first book on
t
ta

probability theory. In

Let “X” be a discrete random variable that text, he


es

which can take values as x1, x2, … ,xn introduced the idea

and the associated probabilities be of expected value.


ze

f(x1), f(x2),….,f(xn) respectively; then


the expectation of “X” (denoted by
E ( X ) ,  x or  ) is defined as:

 x  E ( X )   xf ( x)
all x

OR  x  E ( X )   xP( x)
all x

The expected value of X doesn’t have to be equal to a possible value of X because it


represents a long-term average value. It does, however, have to lie between the smallest and
largest possible values of X, which is something to check after you have calculated E(X). Also,
note that E(X) is not a probability, so it falls between zero and one only if all the possible
values of X are between zero and one.

280
Chapter 07 Random Variables

A Practical Example!!!

If we toss three coins let “X” represents the No. of heads so that x = 0, 1, 2, 3 then the probability
distribution of the number of heads:

S  HHH , HHT , HTH , THH , TTH , THT , HTT , TTT 

X 0 1 2 3 Total Head Tail


f(x) 1/8 3/8 3/8 1/8 1

om
Thus E ( X )   xf ( x)   0 1/ 8  1 3 / 8   2  3 / 8  3 1/ 8   1.5

l.c
If toss three coins 50 times and the number of heads are recorded as given in the following table:
ai
gm
0 2 3 1 1 2 2 2 0 3
2 1 1 1 2 3 1 1 1 2 Now the Mean of this data is:
0 1 1 3 0 1 2 1 3 2
1 1 1 2 1 1 0 1 1 2 0  2  1  1    1  3
Mean   1.48
s@

2 1 2 1 2 1 3 2 1 3 50
t
ta

Now if toss three coins 100 times and the number of heads are recorded as given in the following table:
es

0 3 0 1 0 2 2 2 0 3
ze

2 2 1 2 2 3 2 1 1 2
0 1 1 3 0 2 2 1 3 2 Now the Mean of this data is:
1 2 0 2 1 2 0 1 1 2
0  3  0  1    1  2
1 2 2 3 1 1 1 2 1 3 Mean   1.49  1.5
100
2 1 1 2 0 2 3 1 1 2
1 2 2 1 3 1 1 2 1 2
It is clear that this mean is close to 1.5
1 2 1 2 1 0 2 3 2 1
2 1 2 1 2 1 2 1 2 1
1 0 1 1 2 1 3 2 1 2

Hence, “as the number of repetitions of the experiment increases, we expect that the actual mean
get closer to the expected (theoretical) mean”

281
Chapter 07 Random Variables

Variance and Standard Deviation


of a Discrete Random Variable

Let “X” be a discrete random variable which can take values as x1, x2, … ,xn and the associated
probabilities be f(x1), f(x2), ….,f(xn) respectively; then the variance and S.D of “X” are defined as:

Var ( X )   x2  E ( X 2 )   E ( X ) 
2

S.D( X )   x  E ( X 2 )   E ( X ) 
2 Sigma

om
Here E ( X 2 )   x 2 f ( x)
all x

l.c
EXAMPLE 7.11

A random variable X has a probability distribution:


ai
gm
x 0 1 2
f(x) 1/4 2/4 1/4

Find Expected value, Variance and S.D of the random variable X.


s@

Solution
t

x f(x) xf(x) x2f(x)


ta

0 1/4 0 0
1 2/4 2/4 2/4
es

2 1/4 2/4 4/4


-- 1 4/4 6/4
ze

To compute E(X),
E ( X )   xf ( x)  4 / 4  1.0
round-off it to one more
And E ( X )   x f ( x)  6 / 4  1.5
2 2 decimal place than the
values of random
Var ( X )  E ( X )   E ( X ) 
2
variable x. This round-
2
Therefore
off rule is also used for
 1.5  1.0 
2

the variance and S.D of


 1.5  1  0.5
a probability distribution.

S .D( X )  E ( X 2 )   E ( X ) 
2

 1.5  1.0 
2

 1.5  1  0.5  0.7

282
Chapter 07 Random Variables

EXAMPLE 7.12
Find “K” for the probability distribution given below:

x 0 1 2 3
f(x) 1/8 K 3/8 1/8

Also find the value of Mean and Variance of the random variable X.

Solution
To find the value of “K” we use:

om
3

 f ( x)  1 x f(x) xf(x) x2f(x)


x 0 0 1/8 0 0
 1/ 8  K  3/ 8  1/ 8  1 1 K = 3/8 3/8 3/8

l.c
 K  5/ 8  1 2 3/8 6/8 12/8
 K  1  5/ 8 3
ai 1/8 3/8 9/8
 K  3/ 8 -- 1 12/8 24/8
gm

Mean  E ( X )   xf ( x)  12 / 8  1.5

And E ( X 2 )   x 2 f ( x)  24 / 8  3
s@

Therefore Var ( X )  E ( X 2 )   E ( X ) 
2

Hi Friends!!!
 3  1.5
2
t
ta

 3  2.25
 0.75
es
ze

Test Yourself

1) Find E(X), Var(X) and S.D(X) from the following Probability Distribution:

x 0 1 2 3
f(x) 1/4 1/6 2/6 1/4

2) Find the value of “K”, E(X), Var(X) and S.D(X) from the following Probability Distribution:

x 1 2 3 4
f(x) 2/12 K 4/12 3/12

283
Chapter 07 Random Variables

Properties of Expectation
 E (a)  a
 E (aX )  aE ( X )
 E ( X  a)  E ( X )  a
 E(aX  b)  aE( X )  b
 E( X  Y )  E( X )  E(Y )
 E( X Y )  E( X ) E(Y ) (If X and Y are independent)

EXAMPLE 7.13

om
Given the following Probability Distribution:

l.c
x 1 2 3 4
f(x) 1/8 1/4 1/2 1/8 ai
Find (1) E ( X ) (2) E ( X  10) (3) E (2  X )
gm

(4) E (3 X ) (5) E (4 X  100) (6) E (20  5 X )

Solution 1) E ( X )   xf ( x)  21/ 8  2.6


s@

2) E( X  10)  E( X )  10  2.6  10  12.6 x f(x) xf(x)


t

1 1/8 1/8
3) E(2  X )  2  E( X )  2  2.6  0.6
ta

2 1/4 2/4
3 1/2 3/2
es

4) E (3 X )  3E ( X )  3(2.6)  7.8 4 1/8 4/8


-- 1 21/8
5) E(4 X  100)  4E( X )  100  4(2.6)  100  110.4
ze

6) E(20  5 X )  20  5E( X )  20  5(2.6)  7.0

Test Yourself

Given the following Probability Distribution:

x 4 5 6 7
f(x) 1/8 1/4 1/2 1/8

Find (1) E ( X ) (2) E ( X  9) (3) E (12  X )


(4) E (8 X ) (5) E (3 X  70) (6) E (17  3 X )

284
Chapter 07 Random Variables

Properties of Variance and Standard Deviation

Variance Standard Deviation

 Var (c)  0  S.D(c)  0


 Var ( X  c)  Var ( X )  S.D( X  c)  S.D( X )
 Var (c X )  c 2Var ( X )  S.D(c X )  c S.D( X )
X  1 X 1
 Var     2  Var ( X )  S .D    S .D( X )

om
 c  c  c c
 If X and Y are independent then  If X and Y are independent then
Var ( X  Y )  Var ( X )  Var (Y ) S.D( X  Y )  S.D( X )  S.D(Y )

l.c
ai
EXAMPLE 7.14
gm

Given the following Probability Distribution:

x -50 -100 1500


s@

f(x) 1/5 3/10 1/2

Find (1) E ( X ) (2) E ( X 2 ) (3) Var ( X )


t

(4) S.D( X ) (5) Var ( X  3) (6) S.D(2  3 X )


ta

X X
(7) S.D(3 X ) (8) Var   (9) S .D  
es

5 5

Solution
ze

1) E ( X )   xf ( x)  35.0 x f(x) xf(x) x2f(x)


-50 1/5 -10 500
2) E ( X 2 )   x 2 f ( x)  14750 -100 3/10 -30 3000
1500 1/2 75 11250
Var ( X )  E ( X 2 )   E ( X ) 
2
3) -- 1 35 14750
 14750  (35)2  13525.0

S .D( X )  E ( X 2 )   E ( X ) 
2
4)
 14750  (35)2  13525  116.3

285
Chapter 07 Random Variables

5) Var ( X  3)  Var ( X )  13525.0

6) S.D(2  3 X )  3S.D( X )  3(116.3)  348.9

7) S.D(3 X )  3S.D( X )  3(116.3)  348.9


Hi Friends!!!
X   1   1 
8) Var     Var ( X )    (13525)  541.0
5   25   25 
X  1 1
9) S .D      S .D( X )    (13525)  2705.0
5  5 5

om
Test Yourself
x
l.c
-40 -900 1400
Given the following Probability Distribution:
ai
f(x) 1/5 3/10 1/2
gm

Find (1) E ( X ) (2) E ( X 2 )


(3) Var ( X ) (4) S.D( X ) (5) Var ( X  4)
 2X   3X 
(6) S.D(5  2 X ) (7) S.D(6 X ) (8) Var   (9) S .D  
 7   8 
s@

Important!!!
t
ta

While reading probability problems, pay special attention to key phrases that translate into mathematical
symbols. The following table lists various phrases and their corresponding mathematical equivalents:
es
ze

Math Symbol Phrases

 “greater than” or “more than” or


“exceed” or “better than” or “taller than” or “above”

 “less than” or “smaller than” or “below” or “under”


or “fewer than”

 “at least” or “greater than or equal to” or “no less


than”

 “at most” or “less than or equal to” or “no more


than”

 “exactly” or “equal” or “is”

286
Chapter 07 Random Variables

Amazing Histogram!!!

If two dice are rolled and “X” represents the sum of dots then:

2 3 4 5 6 7
3 4 5 6 7 8 

4 5 6 7 8 9
S 
5 6 7 8 9 10 
 9
6 8 9 10 11

om
7
 
7 8 9 10 11 12 

l.c
x 2 3 4 5 6 7 8 9 10 11 12 Total
f(x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 1
ai
gm

f ( x)
s@

6 / 36
t

5 / 36
ta

4 / 36
es

3 / 36
ze

2 / 36

1/ 36

X
2 3 4 5 6 7 8 9 10 11 12

287
Chapter 07 Random Variables

Sharpen your Pencil


MCQ’s

(1) Expected value of a random variable is called_____

(A) Mean (B) Median (C) Mode (D) None of these

om
(2) If “X” and “Y” are independent random variables then E(XY) = _____

(A) XY (B) E(X)+E(Y) (C) E(X)E(Y) (D) None of these

l.c
(3) IF “X” is a random variable and “a” and “b” are constants, then E(aX+b) = _____
ai
(A) a2E(X)+b (B) aE(X)+b (C) aE(X) (D) None of these
gm

(4) IF “X” is a random variable then E(2X+3) = _____

(A) 22E(X)+3 (B) 2E(X)+3 (C) 2E(X) (D) None of these


s@

(5) If “a” is a constant then E(aX) = _____


t

A) a (B) aE(X) (C) E(X) (D) None of these


ta

6) If “X” and “Y” are two random variables then E(X+Y) = _____
es

(A) X+Y (B) E(X)+E(Y) (C) E(X)-E(Y) (D) None of these


ze

(7) A random variable is also called a _____variable.

(A) Chance (B) Qualitative (C) Discrete (D) None of these

(8) For a discrete random variable “X” E(X) = _____

(A)  xP( x) (B)  x P( x)


2
(C)  P( x) (D) None of these

(9) The probability for a continuous random variable at a particular point is = _____

(A) 0 (B) 1 (C) -1 (D) None of these

(10) If X is a random variable with E(X) = 7, then E(2X-3) = _____

(A) 17 (B) 11 (C) 7 (D) None of these

288
Chapter 07 Random Variables

Sharpen your Pencil


MCQ’s

(11) If “a” is a constant then E(a) = _____

(A) 0 (B) a (C) 1 (D) None of these

om
(12) If Var(X) = 10, Var(Y) = 15 and if X and Y are independent then, Var(X-Y) = _____

(A) -5 (B) 5 (C) 150 (D) None of these

l.c
(13) The sum of probabilities in a probability distribution is _____
ai
(A) Negative (B) 0 (C) 1 (D) None of these
gm

(14) If f(x) is a p.d.f of a random variable X then f(x) _____

(A) f ( x)  (B) f ( x)  0 (C) f ( x)  0 (D) None of these


s@

(15) Var(-5X) = _____ when Var(X) = 25


t

(A) 625 (B) -625 (C) 25 (D) None of these


ta

(16) If X and Y are independent then, Var(X-Y) = _____


es

(A) Var(X) + Var(Y) (B) 0


ze

(C) Var(X) - Var(Y) (D) None of these

289
Chapter 07 Random Variables

Short Questions
ExeRciSe

Q.7.01. If E(X) = 3 then find E(2X-1) and E(X+1).

Q.7.02. Find “K” for the probability distribution given below and find E(X):

om
x 0 1 2 3 4
f(x) 12/210 80/210 K 24/210 1/210

l.c
Q.7.03. Given that P(x) = 0.2 for x = 2, 3, 4, 5, 6. Calculate E(X)
ai
Q.7.04. What value of “k” makes the following function a mass function?
gm

f(x) = kx6 , x = 0, 1, 2, 3

Q.7.05. What value of “k” makes the following function a mass function?
s@

f ( x)  k 7Cx , x = 0, 1, 2, 3, 4
t

Find the value of “k” from the following probability distribution:


ta

Q.7.06
es

x -2 -1 0 1 2 3
f(x) 0.1 0.1 0.2 2k 0.3 k
ze

Also find probability distribution of X by replacing the calculated value of “k”

Q.7.07. What value of “k” makes the following function a density function?

f ( x)  k (4  x) , 1 x  3

Q.7.08. What value of “A” makes the following function a density function?

f ( x)  A(2 x  3) , 1 x  2

Q.7.09. Define random variable. Write down the properties of probability distribution.

290
Chapter 07 Random Variables

Long Questions
ExeRciSe

Q.7.01. Consider the following probability distribution:

x 0 1 2 3
P(x) 0.1 0.4 0.3 0.2

om
(i) Calculate Mean and Variance
(ii) Calculate E(3X-1)

l.c
(iii) Calculate Variance of (3X-1)

Q.7.02. Find E(X), E(X2) and V(X) from the following table:
ai
gm
x 0 1 2 3
f(x) 1/4 1/6 2/6 1/4

1
s@

Q.7.03. If f ( x)  (5  2 x) , 1 x  4
30

(i) Show that f(x) is a density function


t

(ii) Find P( X  3) (iii) Find P( X  3)


ta
es

Q.7.04. A random variable X that can assume values between x = 2 and x = 5 have a
2(1  x)
density function given by: f ( x) 
27
ze

(i) Find P( X  4) (ii) Find P(2  X  3)

Q.7.05. Given that f ( x)  0.25x , 1 x  3

(i) Show that P(1  X  3)  1


(ii) Find P( X  2) and P(2  X  3)

Q.7.06. Given that f ( x)  k ( x  1) , 2 x5

(i) Find “k” (ii) Find P(2  X  3)


(iii) Find P( X  4) (iv) Find P( X  4)

Q.7.07. Find the probability distribution of the No. of tails when two coins are tossed.

291
Chapter 07 Random Variables

Long Questions
ExeRciSe

Q.7.08. Find E(X), E(X2), V(X) and S.D(X) from the following table:

x -2 3 1
f(x) 1/3 1/2 1/6

om
Q.7.09 Find Mean and S.D if f(-1) = 3/8, f(0) = 2/8 and f(1) = 3/8

l.c
Q.7.10. Check whether the following is a density function:

5  2x
f ( x)  , 0 x4
ai
30
gm

(i) Find P( X  2) (ii) Find P(2  X  3)


s@

Q.7.11. A random variable X has a probability distribution:

x 0 1 2
f(x) 1/4 2/4 1/4
t
ta

Find the value of Mean and Variance of the random variable X.


es

1
Q.7.12. Given that f ( x)  (5  2 x) , 0 x2
6
ze

(i) Find P( X  1) (ii) Find P(0.25  X  1.25)


(iii) Find P( X  0.75)

2(5  x)
Q.7.13. Given that: f ( x)  0 x5
25

(i) Find P(0  X  5) (ii) Find P( X  3) (iii) Find P( X  2)

Q.7.14. From the following table find E(X), E(X2) and E(X+3)

x 1 2 3 4
f(x) 2/12 3/12 4/12 3/12

292
CHAPTER 08
Some Special
Probability Distributions

om
Chapter Contents

l.c
ai
Y
gm
ou should read this chapter if you need to learn about:

 Bernoulli Trials: (P294)


s@

 Bernoulli Distribution: (P294)


 Mean, Variance and S.D of Bernoulli Distribution: (P295)
 Binomial Experiment: (P295)
 Binomial Distribution: (P296)
t
ta

 Mean, Variance and S.D of Binomial Distribution: (P296-P299)


 Properties of Binomial Distribution: (P300-P302)
es

 Pascal’s Triangle: (P303-P304)


 Hypergeometric Experiment: (P305)
Hypergeometric Distribution: (P306)
ze


 Mean, Variance and S.D of Hypergeometric Distribution: (P306)
 Properties of Hypergeometric Distribution: (P307)
 Discrete Uniform Distribution: (P309)
 Mean ,Variance and S.D of Discrete Uniform Distribution: (P309)
 Continuous Uniform Distribution: (P310)
 Mean ,Variance and S.D of Continuous Uniform Distribution: (P310)
 Exercise: (P312-P315)

293
Chapter 08 Some Special Probability Distributions

Bernoulli Trial

 A trial is said to be Bernoulli trial if it can results in a success or a failure.

Here is a simple example of a Bernoulli trial. From a


standard deck of cards, you pick a card, note whether it
is a club or not. So the outcome of the trial can be
classified in two categories: selecting a club (success)
and selecting another suit (failure).

om
The probabilities of success and failure are denoted by
“p” and “q” respectively. If the random variable x Club
represents the number of clubs selected, then the

l.c
possible values of the random variable are 0 and 1.
ai
gm

The random variable “X” representing the number of successes in Bernoulli trials is called a
Bernoulli random variable
s@

Bernoulli distribution
t
ta

The probability distribution of the Bernoulli variable “X” is called as


es

Bernoulli distribution.

The probability mass function of Bernoulli distribution is given below:


ze

 p x q1-x ; x = 0, 1


P(X = x)= f(x)= 

0 ; otherwise Bernoulli random
variable is named
Where in honor of the
mathematician
 x = number of success Jacob Bernoulli
 p = probability of success
(1654-1705).
 q = probability of failure

Note: “p” is the parameters of the Bernoulli distribution

294
Chapter 08 Some Special Probability Distributions

Mean, Variance and S.D


of Bernoulli distribution

Measure Formula
Mean p
Variance  2  pq
Standard Deviation   pq The prefix “bi” means
“two”. This should help
you to remind that

om
Binomial experiment binomial experiments
deal with situations in

l.c
An experiment that has the following properties is called Binomial which there are only two
experiment: outcomes i.e. success and
failure.
ai
 Every trial results in a success or a failure.

gm
The successive trials are independent.
 The probability of successes remains constant from trial to trail.
 The number of trials is fixed in advance.
s@

Here is a simple example of a binomial experiment.


From a standard deck of cards, you draw 5 cards in
t

succession, note whether it is a club or not, and


ta

replace the card. So the outcomes of each trial can


be classified in two categories: selecting a club
es

(success) and selecting another suit (failure).


ze

The probabilities of success and failure are denoted


by “p” and “q” respectively. The probability of
success remains the same because the card once
drawn has been replaced before the next draw. If
the random variable x represents the number of
clubs selected, then the possible values of the
random variable are 0, 1, 2, 3, 4, and 5. Note that x
is a discrete random variable because its possible
values can be listed.

The random variable “X” representing the number of successes in a binomial experiment is
called a binomial random variable

295
Chapter 08 Some Special Probability Distributions

Binomial Distribution

The probability distribution of the binomial variable “X” is called as


binomial distribution.

The probability mass function of binomial distribution is given below:

 n  x n-x
  p q ; x = 0, 1,2, ... ,n
 x 
The binomial

om
P(X = x)= f(x)= 
 distribution is a
 very important
0 ; otherwise
discrete probability

l.c
Where distribution. It was
discovered by James
ai
 x = number of success Bernoulli about the
gm
 p = probability of success year 1700.
 q = probability of failure
 n = number of trials that are fixed in advance
s@

Note: “n” and “p” are the parameters of the binomial distribution.
t
ta

The terms “success” and “failure” are used in the binomial doesn’t necessarily mean a
es

success is good and failure is bad. Success mean that you get the outcome you want to
count, and failure means you get the outcome you don’t want to count. For example, if you
ze

select ten 18-year-old male drivers, then a success may be an 18-year-old driver who was
involved in an accident.

Mean, Variance and S.D


of Binomial Distribution

Measure Formula
Mean   np
Variance  2  npq
Standard Deviation   npq

296
Chapter 08 Some Special Probability Distributions

EXAMPLE 8.01

Find complete binomial distribution having n = 5 and p = 1/2

Solution Here n  5  x  0,1, 2,3, 4,5

And p  1/ 2  q  1  p  1  1/ 2  1/ 2

n

om
Now f ( x)    p x q n  x
 x A Binomial distribution
having n = 5 and

l.c
5 p = 1/2 can also be
 f ( x)    1/ 2  1/ 2 
x 5 x

 x
5
1 1
written as   
ai
2 2
Hence the complete binomial distribution becomes:
gm

5
f ( x)    1/ 2  1/ 2 
x 5 x
s@

x
 x
5
f (0)    1/ 2  1/ 2   11/ 2   1/ 32
0 50 5
0
0
t
ta

 5
f (1)    1/ 2  1/ 2    51/ 2   5 / 32
1 51 5
1
es

1
5
f (2)    1/ 2  1/ 2   10 1/ 2   10 / 32
2 5 2 5
ze

2 Hi Friends!!!
 2
 5
f (3)    1/ 2  1/ 2   10 1/ 2   10 / 32
3 5 3 5
3
 3
5
f (4)    1/ 2  1/ 2    51/ 2   5 / 32
4 5 4 5
4
 4
 5
f ( x)    1/ 2  1/ 2   11/ 2   1/ 32
5 5 5 5
5
 5

297
Chapter 08 Some Special Probability Distributions

EXAMPLE 8.02
An event has the p = 3/8 and n = 5 find the probability of:

(i) P( X  3) (ii) P( X  3) (iii) P( X  3) (iv) P( X  3)

Solution Here n  5  x  0,1, 2,3, 4,5

And p  3/ 8  q  1  p  1  3/ 8  5/ 8
n

om
Now P( X  x)    p x q n  x
 x
5
 P( X  x)     3/ 8   5 / 8 
x 5 x

l.c
Hi Friends!!!
 x

(i) P( X  3)
ai
 5
gm

P( X  3)     3/ 8   5 / 8 
3 53

 3
 10  3/ 8  5 / 8  0.21
3 2
s@

(ii) P( X  3)
t

P( X  3)  P( X  4)  P( X  5) In binomial distribution
ta

we can not find the


5  5
es

    3/ 8  5 / 8      3/ 8   5 / 8  probability of the
4 5 4 5 5 5

 4  5 form P( x  2.3) because


ze

the binomial r.v. X can


  5 3/ 8  5 / 8  1 3/ 8  5 / 8
4 5 4 5 5 5
take only the integer
values.
  5 3/ 8  5 / 8  1 3/ 8  0.07
4 5

(iii) P( X  3)

P( X  3)  P( X  3)  P( X  4)  P( X  5)

 5  5  5
    3/ 8  5 / 8     3/ 8   5 / 8      3/ 8   5 / 8 
3 5 3 4 5 4 5 5 5

 3  4  5

 10  3/ 8  5 / 8   5 3/ 8  5 / 8  1 3/ 8  5 / 8


3 53 4 5 4 5 5 5

 10  3/ 8  5 / 8   5 3/ 8  5 / 8  1 3/ 8  0.28


3 2 4 5

298
Chapter 08 Some Special Probability Distributions

(iv) P( X  3)

P( X  3)  1  P( X  3)

 1  0.07  0.93 P( X  a)  P( X  a)  1

EXAMPLE 8.03

Find mean, variance and S.D for a binomial distribution having n = 20 and p = 0.3

om
Solution We know that for a binomial distribution

l.c
Mean  np
 (20)(0.3)  6 ( q  1  p  1  0.3  0.7 )
ai
Variance  npq
gm

 (20)(0.3)(0.7)  4.2 For a binomial distribution


Mean > Variance
S .D  npq
s@

 (20)(0.3)(0.7)  2.05
t
ta

EXAMPLE 8.04
es

The mean and variance of a binomial distribution are 42 and 12.6. Find the values of the
parameters “n” and “p”?
ze

Solution Given that


Now since p  1  q  p  1  0.3  0.7
Mean  40
Putting p  0.7 in equation (b) we have:
 np  40      (a)
42
And Variance  12.6 n(0.7)  42  n   60
0.7
 npq  12.6      (b) Hence the values of the parameters are:

Putting np  40 in equation (b) we have: n = 60 and p = 0.7


12.6
4.2q  12.6  q   0.3
4.2

299
Chapter 08 Some Special Probability Distributions

Prove that the mean of the Binomial distribution is “np”

Proof:
n
We know that Mean    E ( X )   xf ( x)
x 0
n
n  n 
 Mean    E ( X )   x   p x q n  x  f ( x)    p x q n  x 
x 0  x   x 

n  n  n  n
 0   p 0 q n 0  1  p1q n 1  2   p 2 q n 2  ...  n   p n q nn

om
0 1  2  n

 np qn1  n(n  1) p 2 q n2  ...  np n

l.c
 np  q n1  (n  1) p q n2  ...  p n1 
ai
 np  q  p 
n 1
gm

 (meu)
 np Hence proved
s@

Properties of Binomial Distribution


t
ta

 The mean and variance of the binomial distribution are: “np” and “npq” respectively
es

 For the binomial distribution Mean > Variance


 The shape of the binomial distribution depends on the values of “n” and “p”
ze

 The Binomial distribution is

o Symmetric if p = q =1/2
o Positively skewed if p < q
o Negatively skewed if p > q

 The Binomial distribution approach to Normal distribution; as n   such that np >5 and nq>5
 The moments about mean of the binomial distribution are:

o 1  0
o 2  npq
o 3  npq 1  2 p 
o 4  3n 2 p 2q 2  npq 1  6 pq 

300
Chapter 08 Some Special Probability Distributions

 For the binomial distribution:

q
o coefficient of variation  100
np
qp
o coefficient of skewness 
npq
1  6 pq
o coefficient of kurtosis  3 
npq

om
EXAMPLE 8.05

Is it possible to have a binomial distribution with mean = 5 and S.D = 3?

l.c
Solution Given that
ai
gm

Mean  5
S.D  3  Variance  32  9
s@

But we know that for any binomial distribution Mean > Variance

Hence it is not possible to have a binomial distribution with mean = 5 and S.D = 3.
t
ta

EXAMPLE 8.06
es

If X is a binomial r.v with n = 20 and p = 0.5 then find:


ze

(i) Coefficient of variation


(ii) Coefficient of skewness
(iii) Coefficient of kurtosis

Solution (i) Coefficient of variation

q
Coefficient of variation  100
np
0.5
 Coefficient of variation  100  22.4% ( q  1  p  1  0.5  0.5 )
(20)(0.5)

301
Chapter 08 Some Special Probability Distributions

(ii) Coefficient of skewness

q p
Coefficient of skewness 
npq

0.5  0.5
 Coefficient of skewness  0
(20)(0.5)(0.5)

Hence the distribution is symmetric.

om
(iii) Coefficient of kurtosis

l.c
Hi Friends!!!
1  6 pq
Coefficient of kurtosis  3  ai
npq
gm
1  6(0.5)(0.5)
 Coefficient of kurtosis  3   2.9
(20)(0.5)(0.5)
s@

Hence the distribution is platykurtic.


t
ta

EXAMPLE 8.07
es

If X is a binomial r.v with n = 20 and p = 0.3 then find E(2X-3)?


ze

Solution We know that for a binomial distribution

Mean  E ( X )  np
 (20)(0.3)
6

E(2 X  3)  2E( X )  3
 2(6)  3
9

302
Chapter 08 Some Special Probability Distributions

Pascal’s Triangle!!!

Consider the binomial expansion:

n n n  n


( p  q)n    p n q0 +   p n-1q1 +   p n-2 q 2 +     p o q n
0   1 2  n

n n n n


The coefficients   ,   ,   , ….. ,   are called the binomial
0   1  2 n
In 1653, Blaise

om
coefficients. These coefficients can be easily written down by using an
arrangement of numbers, called Pascal’s triangle given below: Pascal created a
triangle of numbers
called Pascal’s

l.c
triangle that can be
used in the binomial
ai
distribution
gm

Hmm !!!
s@

interesting
t
ta

and so on . . .
es
ze

Power Binomial Expansions Coefficients


2 ( p  q)2  p 2  2 pq  q 2 1 2 1
3 ( p  q)3  p3  3 p 2 q  3 pq 2  q3 1 3 3 1
4 ( p  q)4  p 4  4 p3q  6 p 2 q 2  4 pq3  q 4 1 4 6 4 1
and so on…

303
Chapter 08 Some Special Probability Distributions

EXAMPLE 8.08
4
1 2
Expand the binomial distribution   
3 3

Solution Here p = 1/3 and q = 2/3 and n = 4

Now since ( p  q)4  p 4  4 p3q  6 p 2 q 2  4 pq3  q 4

4 4 3 2 2 3 4
1 2 1 1  2 1  2  1  2   2 

om
        4      6      4      
3 3 3  3  3  3  3  3  3   3 

l.c
4
1 2 1 8 24 32 16
       
 3 3  81 81 81 81 81 ai
This is the required expansion.
gm
Hi Friends!!!
s@

Test Yourself
t
ta
es

1) Find complete binomial distribution having n = 6 and p = 1/4


2) Find mean, variance and S.D for a binomial distribution having n = 15 and p = 0.7
3) The mean and variance of a binomial distribution are 1.2 and 0.84. Find the values of the
ze

parameters “n” and “p”?


4) If X is a binomial r.v with n = 14 and p = 0.8 then find:
(i) Coefficient of variation
(ii) Coefficient of skewness
(iii) Coefficient of kurtosis
5) An event has the p = 1/8 and n = 5 find the probability of:
(i) P( X  2) (ii) P( X  2) (iii) P( X  4) (iv) P( X  2)
5
1 3
6) Expand the binomial distribution   
4 4

304
Chapter 08 Some Special Probability Distributions

Hypergeometric experiment

An experiment that has the following properties is called Hypergeometric experiment:

 Every trial results in a success or a failure.


 The successive trials are dependent.
 The probability of successes changes from trial to trail.
 The number of trials is fixed in advance.

om
Here is a simple example of a Hypergeometric experiment. If 5 cards are drawn at
random without replacement and we are interested in selecting a red card. For instance
the probability of 3 red cards on first draw is:

 26  26 
l.c First Draw
  
ai Red Black Total
P(3 red)      0.325
3 2
26 26 52
gm
 52 
 
5
s@

On the second draw the probability becomes:

 23  24 
   Second Draw
t

P(3 red)      0.318 and so on.


3 2
ta

Red Black Total


 47  23 24 47
 
es

5
ze

Thus the probabilities of success changes in this case from trail to trail because the cards
are drawn without replacement. If the random variable x represents the number of red
cards selected, then the possible values of the random variable are 0, 1, 2, 3, 4, and 5.
Note that x is a discrete random variable because its possible values can be listed.

The random variable “X” representing the number of successes in a Hypergeometric


experiment is called a Hypergeometric random variable

305
Chapter 08 Some Special Probability Distributions

Hypergeometric Distribution

The probability distribution of the Hypergeometric variable “X” is called as Hypergeometric


distribution.

The probability mass function of Hypergeometric distribution is given below:

 k   N - k 
   
 x   n - x  ; x = 0, 1,2, ... ,n if n  k
 N

om
   ; x = 0, 1,2, ... ,k if n  k
P(X = x)= f(x)=  n

l.c


0 ; otherwise
ai
gm
Where

 N = number of units in the population



s@

n = number of units in the sample (also “n” is the number of trials that are fixed in advance)
 k = number of success in the population
 x = number of success in the sample
t

Note: “N”, “n” and “k” are the parameters of the Hypergeometric distribution.
ta
es

Mean, Variance and S.D


of Hypergeometric Distribution
ze

k
Measure Formula If  p and q  1  p If N  
N
nk   np   np
Mean 
N
nk  N  k   N n  N n
Variance 2      2  npq     2  npq
N  N   N 1   N 1 
Standard nk  N  k   N  n   N n
      npq      npq
Deviation N  N   N 1   N 1 

306
Chapter 08 Some Special Probability Distributions

Properties of Hypergeometric Distribution

nk
 The mean and variance of the Hypergeometric distribution are: “ ” and
N
nk  N  k   N n 
“    ” respectively
N  N   N 1 

 The Hypergeometric distribution approach to the Binomial distribution; as N  

om
EXAMPLE 8.09

If 6 cards are drawn from a deck of 52 playing cards, what is the probability that 2 will be

l.c
hearts?
ai
Solution By using the Hypergeometric distribution with:
gm

N = 52 Total cards in a deck


k = 13 There are 13 hearts i.e. success in the population
n=6 Sample of 6 cards is selected
s@

x=2 Exactly 2 hearts i.e. success in the sample


t

13  39 
ta

   Cards
P(2 Hearts)      0.315
2 4
We have: Hearts Others Total
es

 52  13 39 52
 
6
ze

EXAMPLE 8.10

A committee of size 5 is to be selected at random from 4 women and 5 men. Find the probability
distribution for the number of women on the committee?

Solution Let X is a random variable for the number of women on Women Men Total
the committee. Then x = 0, 1, 2, 3, 4 4 5 9

 4  5 
  
P(0 women)     
0 5 1
For X = 0
9 126
 
5

307
Chapter 08 Some Special Probability Distributions

 4  5 
  
P(1 women)     
1 4 20
For X = 1
9 126
 
5
 4  5 
  
P(2 women)     
2 3 60
For X = 2
9 126
 
5
 4  5 

om
  
P(3 women)     
3 2 40
For X = 3
9 126
 

l.c
5
 4  5 
  
ai
Hi Friends!!!
P(4 women)     
4 1 5
For X = 4
gm
9 126
 
5
s@

Hence the probability distribution for “X” is given as follows:


t

x 0 1 2 3 4 Total
ta

P(x) 1/126 20/126 60/126 40/126 5/126 1


es
ze

Test Yourself

1) If 8 cards are drawn from a deck of 52 playing cards, what is the probability that 3 will be
hearts?
2) A committee of size 6 is to be selected at random from 4 women and 5 men. Find the
probability distribution for the number of men on the committee?

308
Chapter 08 Some Special Probability Distributions

Discrete Uniform Distribution

A discrete random variable “X” is said to have a uniform distribution if its p.m.f is defined as:

1
N ; x = 1,2, ... ,N

f(x)= 


0 ; otherwise

om
Note: “N” is the parameter of the discrete uniform distribution.

l.c
Mean, Variance and S.D
of Discrete Uniform Distribution
ai
gm

Measure Formula
N 1
Mean 
2
s@

N 2 1
Variance 2 
12
N 2 1
t

Standard Deviation 
ta

12
es

N 1
Prove that the mean of the discrete uniform distribution is
ze

2
N
Proof: We know that: Mean    E (X )   xf (x )
x 1
N
1  1 
 Mean    E (X )   x  f (x )  
x 1 N  N 
N
1

N
x
x 1

1
 1  2  ....  N 
N
 N ( N  1)  1  N (N  1) 
 1  2  ....  N     
 2  N  2   (meu)
N 1

2

309
Chapter 08 Some Special Probability Distributions

EXAMPLE 8.11

From the following series find mean and variance:


1001, 1002, 1003… 1009
For consecutive natural

Solution Here N = 9 numbers we may use


N  1 9  1 10 discrete uniform
Now Mean    5
2 2 2 distribution to find its

N  1 9  1 81  1 80
2 2 mean and variance.
And Variance      6.66

om
12 12 12 12

l.c
Continuous Uniform Distribution
ai
A continuous random variable “X” is said to have a uniform distribution over the interval (a, b) if its
p.d.f is defined as:
gm

 1
b - a ;a  x b

s@

f(x) = 


0 ; otherwise
t
ta

Note: “a” and “b” are the parameters of the continuous uniform distribution
es

Mean, Variance and S.D


ze

of Continuous Uniform Distribution

Measure Formula
a b
Mean 
2
b  a 
2

Variance  
2

12

 a
b 
Standard Deviation
12

310
Chapter 08 Some Special Probability Distributions

EXAMPLE 8.12

Let X has a continuous uniform distribution under the interval (2, 5) find its Mean and Variance?

Solution Since the interval is (2, 5) therefore 2  x  5

ab 25 7
Now Mean     3.5 ( here a  2 and b  5 )
2 2 2

om
b  a  5  2  3
2 2 2
9
And Variance      0.75

l.c
12 12 12 12

Important!!!
ai
gm

While reading probability problems, pay special attention to key phrases that translate into mathematical
symbols. The following table lists various phrases and their corresponding mathematical equivalents:
s@

Math Symbol Phrases


“greater than” or “more than” or
t
ta

“exceed” or “better than” or “taller than”


or “above”


es

“less than” or “smaller than” or “below”


or “under” or “fewer than”
ze

 “at least” or “greater than or equal to” or


“no less than”

 “at most” or “less than or equal to” or


“no more than”

 “exactly” or “equal” or “is”

311
Chapter 08 Some Special Probability Distributions

Sharpen your Pencil


MCQ’s

(1) The probability of success is denoted by _____

(A) p (B) q (C) n (D) None of these

(2) The sum of “p” and “q” is = _____

om
(A) 0 (B) 1 (C) 2 (D) None of these

l.c
(3) Binomial distribution has _____ parameters.

(A) 2 (B) 3 (C)


ai1 (D) None of these
gm
(4) Discrete uniform distribution has _____ parameter.

(A) 1 (B) 2 (C) 3 (D) None of these


s@

(5) Continuous uniform distribution has _____ parameters.

A) 3 (B) 2 (C) 1 (D) None of these


t
ta

6) Hypergeometric distribution has _____ parameters


es

(A) 3 (B) 1 (C) 4 (D) None of these


ze

(7) The binomial distribution is symmetrical if p = q = _____

(A) 1/2 (B) 1/4 (C) 1 (D) None of these

(8) The binomial experiment becomes the Bernoulli experiment if n = _____

(A) 0 (B) 1 (C) 3 (D) None of these

(9) The shape of binomial distribution depends on the values of _____

(A) n and p (B) n and q (C) p and q (D) None of these

(10) When p = 0.4 and n = 6 the Mean of binomial distribution is _____

(A) 2.4 (B) 1.44 (C) 3.4 (D) None of these

312
Chapter 08 Some Special Probability Distributions

Sharpen your Pencil


MCQ’s

(11) For binomial distribution Mean is ______than it’s Variance.

(A) smaller (B) greater (C) equal (D) None of these

(12) Coefficient of variation of binomial distribution is _____

om
q q
(A) (B) 100 (C) npq 100 (D) None of these
np np

l.c
(13) In a Binomial experiment the repeated trials are _____

(A) Independent (B) Dependent (C)


ai
Mixed (D) None of these
gm
(14) The Binomial distribution is negatively skewed if _____

(A) pq (B) pq (C) pq (D) None of these


s@

(15) The Binomial distribution is positively skewed if _____

(A) pq (B) pq (C) pq (D) None of these


t
ta

(16) For Binomial distribution S.D = _____


es

(A) npq (B) np (C) npq (D) None of these


ze

(17) In Binomial distribution n = 20 and p = 3/5 then its S.D = _____

(A) 9.6 (B) 4.8 (C) 2.4 (D) None of these

(18) In Binomial distribution n = 10 and p = 3/5 then its Variance = _____

(A) 9.6 (B) 4.8 (C) 2.4 (D) None of these

(19) The probability of failure is denoted by _____

(A) p (B) q (C) 0 (D) None of these

(20) The parameters of Hypergeometric distribution are _____

(A) N, n, k (B) N, k (C) n, k (D) None of these

313
Chapter 08 Some Special Probability Distributions

Short Questions
ExeRciSe

Q.8.01. What are the properties of Binomial distribution?

N 1
Q.8.02. Prove that mean of Uniform distribution is
2

om
Q.8.03. If X is a binomial random variable with n = 5 and p = 0.6, then find E(2X-3) and
Var(2X-3).

l.c
Q.8.04. If X is a binomial random variable with n = 20 and p = 0.5. Find its variance and
ai
coefficient of variation?
gm
Q.8.05. From the following series find mean and variance (using uniform distribution)

5001, 5002, 5003… 5007


s@

Q.8.06. Let X has a continuous distribution under the interval (3, 7) find its mean and
variance?
t
ta

Q.8.07. Is it possible to have a binomial distribution with mean = 8 and S.D = 9?


es

Q.8.08. If 4 cards are drawn from a deck of 52 cards. What if the probability that 2 cards
are from spade.
ze

Q.8.09. Expand the following binomial distributions:

5
1 3
    0.65  0.35
4
(i) (ii)
4 4

Q.8.10. If 5 cards are drawn from a deck of 52 cards. What if the probability that 3 cards
will be clubs.

314
Chapter 08 Some Special Probability Distributions

Long Questions
ExeRciSe

Q.8.01. An event has the p = 3/8 and n = 5 find the probability of:

(i) P( x  3) (ii) P( x  3)
(iii) P( x  3) (iv) P( x  3)

om
Q.8.02. If n = 5 and p = 1/3, find the complete Binomial distribution?

Q.8.03. If n = 4 and p = 1/2, find the complete Binomial distribution?

Q.8.04.
l.c
If X is a binomial random variable with n = 20 and p = 0.5 then find:
ai
(i) Coefficient of variation
gm

(ii) Coefficient of skewness


(iii) Coefficient of kurtosis
s@

Q.8.05. The mean and variance of a binomial distribution are 42 and 12.6 find “n” and
“p”?

Q.8.06. The mean and variance of a binomial distribution are 3 and 1.5 find its
t
ta

parameters?
es

Q.8.07. A committee of 3 members is to be selected from 3 men and 4 women. Find the
probability distribution for the number of men on the committee?
ze

315
Prepared by
Zafar Ali (M.Sc. Statistics)
Cell No. 0333-9004086, 0345-9282215 Types of Index Number
Simple Index Numbers
om Composite Index Numbers
l.c
Fixed Base Chain Base
ai
Unweighted Weighted
Method Method Index
numbers gm Index
numbers
Simple
s@
Simple average
t
Weighted Weighted
Aggregative
Method
of relative
Method
taAggregative
Method
average of
relative Method
es
Fixed Base Chain Base Fixed Base
ze
Chain Base
Method Method Method Method
8 Study Tips for Statistics Students

 Do homework daily

Doing homework on a daily basis is essential to succeeding in Statistics class. Take notes in class and
use them daily. Make a file where you can keep all your notes handy for future reference.

om
 Don't be afraid to ask help

Don't be afraid or to proud to ask for help. Your teacher will gladly help you and it will give you the

l.c
tools to conquer the problem.

 Do sample tests and use Calculator


ai
gm
Take time to do sample tests. This way you can identify problem areas quickly and eliminate them
before its time for the real test. Retest yourself regularly. This way you will be able to establish your
weak points. Do make use of calculator to simplify the computations.
s@

 Form a study group

Form a study group that can meet at least once a week where you can discuss problems or any
t

difficulties and help each other. Compare answers with one another.
ta

 Take your time until you understand a problem


es

Don't rush through problems. Take your time and make sure you understand it. What you don't
ze

understand today will become a problem tomorrow.

 Don't rush through problems.

Take your time and make sure you understand it. What you don't understand today will become a
problem tomorrow.

 Practice makes perfect

Statistics is something you need to practice. Repetition is what will give you the skill to overcome any
problem.

 Relaxation techniques

When you feel all flustered or fear grips your heart, try to relax.

Вам также может понравиться